PDF2Data – From Documents to Clean Datasets
Turning unstructured PDFs into analysis-ready data pipelines.
Roadmap
- Phase 1 – Structured Extraction (MVP)
• Upload PDFs → extract predefined fields / tables.
• Support invoices, contracts, inspection reports.
• Export CSV / JSON; basic preview UI. - Phase 2 – Data Cleansing Pipeline
• Rule-based + LLM-assisted deduplication, type casting, currency/date normalisation.
• Interactive cleaning UI with bulk actions & undo. - Phase 3 – Intelligent Analytics
• Merge multi-file datasets, build KPIs and anomaly detection.
• Natural-language querying (Chat over data).