PDF2Data – From Documents to Clean Datasets

Turning unstructured PDFs into analysis-ready data pipelines.

Roadmap

Phase 1 – Structured Extraction (MVP)
• Upload PDFs → extract predefined fields / tables.
• Support invoices, contracts, inspection reports.
• Export CSV / JSON; basic preview UI.
Phase 2 – Data Cleansing Pipeline
• Rule-based + LLM-assisted deduplication, type casting, currency/date normalisation.
• Interactive cleaning UI with bulk actions & undo.
Phase 3 – Intelligent Analytics
• Merge multi-file datasets, build KPIs and anomaly detection.
• Natural-language querying (Chat over data).