unstract
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
About unstract
Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.
Copy the value of ENCRYPTION_KEY from backend/.env or platform-service/.env to a secure location.
Destinations: Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
unstract is an open-source project written primarily in Python, with 6.7k stars on GitHub. It was last updated in July 2026.
unstract vs. the alternatives
All research & data agents →| Agent | Stars | Pricing | ||
|---|---|---|---|---|
| unstract | 6.7k | Python | AGPL-3.0 | Open source |
| firecrawl | 143k | TypeScript | AGPL-3.0 | Open source |
| Scrapling | 68k | Python | BSD-3-Clause | Open source |
| TrendRadar | 60k | Python | GPL-3.0 | Open source |
| BettaFish | 42k | Python | GPL-2.0 | Open source |
| khoj | 35k | Python | AGPL-3.0 | Open source |
