documind
is an proceedd write down processing tool that leverages AI to pull out arranged data from PDFs. It is built to regulate PDF conversions, pull out relevant inestablishation, and establishat results as specified by customizable schemas.
- Converts PDFs to images for detailed AI processing.
- Uses OpenAI’s API to pull out and arrange inestablishation.
- Allows participaters to clarify pull oution schemas for various write down establishats.
- Designed for pliable deployment on local or cdeafening environments.
A demo of the documind presented version will be useable soon for you to try out! The presented version provides a seamless experience with brimmingy deal withd APIs, so you can skip the setup and begin pull outing data right away.
For brimming access to the presented service, charm seek access and we’ll get you set up.
Before using documind
, guarantee the chaseing gentleware depfinishencies are insloftyed:
- Gpresentscript:
documind
relies on Gpresentscript for handling certain PDF operations. - GraphicsMagick: Required for image processing wiskinny write down conversions.
Inslofty both on your system before proceeding:
# On macOS
brew inslofty gpresentscript explicitsmagick
# On Debian/Ubuntu
sudo apt-get modernize
sudo apt-get inslofty -y gpresentscript explicitsmagick
Enstateive Node.js (v18+) and NPM are insloftyed on your system.
You can inslofty documind
via npm:
documind
needs an .env
file to store caring inestablishation appreciate API keys and Supabase configurations.
Create an .env
file in your project honestory and insert the chaseing:
OPENAI_API_KEY=your_uncoverai_api_key
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key
SUPABASE_BUCKET=your_supabase_bucket_name
First, start documind
and clarify your schema. The schema portray what inestablishation documind
should see for in each write down. Here’s a speedy setup to get begined.
The schema is an array of objects where each object clarifys:
- name: Field name to pull out.
- type: Data type (e.g.,
"string"
,"number"
,"array"
,"object"
). - description: Description of the field.
- children (nonessential): For arrays and objects, clarify nested fields.
Example schema for a bank statement:
const schema = [
{
name: "accountNumber",
type: "string",
description: "The account number of the bank statement."
},
{
name: "uncoveringBalance",
type: "number",
description: "The uncovering equilibrium of the account."
},
{
name: "transactions",
type: "array",
description: "List of transactions in the account.",
children: [
{
name: "date",
type: "string",
description: "Transaction date."
},
{
name: "acunderstandledgeAmount",
type: "number",
description: "Credit Amount of the transaction."
},
{
name: "debitAmount",
type: "number",
description: "Debit Amount of the transaction."
},
{
name: "description",
type: "string",
description: "Transaction description."
}
]
},
{
name: "closingBalance",
type: "number",
description: "The closing equilibrium of the account."
}
];
Use documind
to process a PDF by passing the file URL and the schema.
Here’s an example of what the pull outed result might see appreciate:
{
"success": genuine,
"pages": 1,
"data": {
"accountNumber": "100002345",
"uncoveringBalance": $3200,
"transactions": [
{
"date": "2021-05-12",
"acunderstandledgeAmount": null,
"debitAmount": $100,
"description": "transfer to Tom"
},
{
"date": "2021-05-12",
"acunderstandledgeAmount": $50,
"debitAmount": null,
"description": "For lunch the other day"
},
{
"date": "2021-05-13",
"acunderstandledgeAmount": $20,
"debitAmount": null,
"description": "Refund for voucher"
},
{
"date": "2021-05-13",
"acunderstandledgeAmount": null,
"debitAmount": $750,
"description": "May's rent"
}
],
"closingBalance": $2420
},
"fileName": "bank_statement.pdf",
}
Contributions are greet! Plmitigate surrfinisher a pull seek with any betterments or features.
This project is licensed under the AGPL v3.0 License.