Skip to content

Extractor

This document describes how to use the DIMARC API to interact with the /extractor and /extractor/async endpoints. These endpoints allow you to extract structured data from documents of any type according to models defined in the Dimarc interface.

Fenêtre de terminal
POST /v2/extractor
POST /v2/extractor/async

Prerequisites

  • An active DIMARC account.
  • Administrator status in your organization.
  • A configured “Extractor” type agent.
  • Your authentication token x-api-key (see Retrieving Your Authentication Token)

Retrieving an agent’s ID

Each agent has a unique ID. To retrieve the ID of the Extractor agent, go to your dashboard:

  1. Click on your profile icon in the top right corner, then go to the Organization > API section or by clicking here
  2. In the References of your agents section, you can retrieve the ID of the Extractor agent you want to use.

Extraction Modes

The Extractor API offers two operating modes:

  1. Synchronous: The request is processed immediately and the response is returned in the same HTTP connection.
  2. Asynchronous: The processing is queued and the results are sent to a defined webhook once the extraction is completed.

Where to configure extraction models

Extraction models are configured in the Dimarc interface. To add a new model, go to your dashboard: https://app.dimarc.ai

Go to your Extractor agent configuration and add a new model by defining the fields to extract.

Synchronous Extraction

Synchronous extraction is ideal for quick processing or when your application is directly waiting for the result.

Synchronous Request

Fenêtre de terminal
curl --location 'https://api.dimarc.ai/v2/extractor/<agent_id>' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <your_api_key>' \
--data '{
"filename": "L100.pdf",
"file": "<base64_encoded_file>",
}'

Synchronous Request Parameters

ParameterTypeDescription
filenamestringOriginal filename (with extension)
filestringFile content encoded in base64

Synchronous Response Format

{
"status": "success",
"data": { .... }
}

Asynchronous Extraction

Asynchronous extraction is recommended for:

  • Large documents
  • Complex extractions requiring more time
  • Systems that cannot wait for an immediate response

Asynchronous Request

Fenêtre de terminal
curl --location 'https://api.dimarc.ai/v2/extractor/async/<agent_id>' \
--header 'Content-Type: application/json' \
--header 'x-api-key: <your_api_key>' \
--data '{
"filename": "L100.pdf",
"file": "<base64_encoded_file>",
"callback": "https://your-callback-endpoint.com/webhook"
}'

Asynchronous Request Parameters

ParameterTypeDescription
filenamestringOriginal filename (with extension)
filestringFile content encoded in base64
callbackstringURL of the webhook that will receive the results

Immediate Response (asynchronous)

{
"message": "Extraction started",
"data": {
"extraction_id": "uid_extraction"
}
}

Webhook Response Format

{
"extract_id": "uid_extraction",
"data": { .... }
}

Structured Data Extraction

For complex data like lists (tables, series of items, etc.), use the “list” format and define sub-elements in the “items” array:

{
"name": "products",
"format": "list",
"description": "List of products in the invoice",
"items": [
{
"name": "product_name",
"format": "text",
"description": "Product name",
"items": []
},
{
"name": "quantity",
"format": "text",
"description": "Ordered quantity",
"items": []
},
{
"name": "unit_price",
"format": "text",
"description": "Unit price before tax",
"items": []
}
]
}

Limitations and considerations

  • Maximum file size: 36 MB
  • Supported formats: PDF, PNG, JPG, JPEG, DOCX, XLSX
  • Average extraction time: 5 to 10 minutes for standard documents (without too much imagery)

Support and assistance

For any questions regarding the Extractor API, contact our support team at support@dimarc.fr