Parsing Documents

Getting Started

vlm-1 can extract structured markdown from long documents and reports. Here’s a rough breakdown of the steps involved in parsing a document:

Upload Document

Use the /v1/files endpoint to upload the document you want to parse.

from vlmrun.client import VLMRun
from vlmrun.client.types import FileResponse

# Initialize the client
client = VLMRun(api_key="<your-api-key>")

# Upload the file
response: FileResponse = client.files.upload(
    file=Path("<path/to/test.pdf>")
)
print(f"Uploaded file:\n {response.model_dump()}")

You should see a response like this:

Uploaded file:
{
  'id': '1e76cfd9-ba99-49b2-a8fe-2c8efaad2649',
  'filename': 'file-20240815-7UvOUQ-earnings_single_table.pdf',
  'bytes': 62430,
  'purpose': 'assistants',
  'created_at': '2024-08-15T02:22:06.716130',
  'object': 'file'
}

Submit the Document AI Job

Submit the uploaded file (via its file_id) to the /v1/document/generate endpoint to start the document parsing job. For long documents, you should set batch=True to submit the job to a queue for processing.

from vlmrun.client.types import PredictionResponse

# Submit the document for parsing
# Note: In this case, we are using the `document.markdown` domain
# which is optimized for extracting structured markdown from documents
response: PredictionResponse = client.document.generate(
    file=response.id,
    domain="document.markdown",
    batch=True,
)
print(f"Document submitted [id={response.id}]")
print(response.model_dump_json(indent=2))

You should see a response like this:

Document parsing job submitted:
{
  "id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
  "created_at": "2024-08-15T02:22:09.157788",
  "response": null,
  "status": "pending"
}

Fetch the Results

Use the /v1/document/{request_id} endpoint to fetch the results of the document parsing job. The results of the extraction job will be in JSON format under the response field.

# Fetch the results
response: PredictionResponse = client.predictions.wait(request_id)
print(f"Document parsing job results:\n {response.model_dump()}")

You should see a response like this:

{
  "id": "052cf2a8-2b84-45f5-a385-ccac2aae13bb",
  "created_at": "2024-08-15T02:22:09.157788",
  "status": "completed",
  "response": {
    "pages": [
      {                             // page 0
        "content": "<Figure id=\"fg-0\"/>\n\n# Fine-tuning\nTechnique\n\n---\n\nFebruary 2024",
        "markdown_content": "<Figure id=\"fg-0\"/>\n\nOpenAI logo\n\n# Fine-tuning\nTechnique\n\n---\n\nFebruary 2024",
        "tables": null,
        "figures": [
          {
            "id": 0,
            "title": null,
            "caption": null,
            "content": "OpenAI logo"
          }
        ]
      },
      {                             // page 1
        "content": "# Overview\n\nFine-tuning involves adjusting the parameters of pre-trained models on a specific dataset or task. This process enhances the model's ability to generate more accurate and relevant responses for the given context by adapting it to the nuances and specific requirements of the task at hand.\n\n**Example use cases**\n-   Generate output in a consistent format\n-   Process input by following specific instructions\n\n## What we'll cover\n\n*   When to fine-tune\n*   Preparing the dataset\n*   Best practices\n*   Hyperparameters\n*   Fine-tuning advances\n*   Resources\n\n---\n3",
        "markdown_content": "# Overview\n\nFine-tuning involves adjusting the parameters of pre-trained models on a specific dataset or task. This process enhances the model's ability to generate more accurate and relevant responses for the given context by adapting it to the nuances and specific requirements of the task at hand.\n\n**Example use cases**\n-   Generate output in a consistent format\n-   Process input by following specific instructions\n\n## What we'll cover\n\n*   When to fine-tune\n*   Preparing the dataset\n*   Best practices\n*   Hyperparameters\n*   Fine-tuning advances\n*   Resources\n\n---\n3",
        "tables": null,
        "figures": null
      },
      ...
    ]
  }
}

To learn more about the document.markdown domain, see the MarkdownPage Schema guide.

Try our Document -> JSON API today

Head over to our Document -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

Parsing Documents

Getting Started

Try our Document -> JSON API today

Get Started

Capabilities

Guides - Doc AI

Guides - Image AI

Guides - Video/Audio AI

Guides - Finetuning

Misc

​Getting Started

​Try our Document -> JSON API today

Getting Started

Try our Document -> JSON API today