While traditional document processing systems often rely on template-based approaches or simple keyword matching, vlm-1 can intelligently classify documents based on their content, layout, and visual characteristics. This enables robust classification of documents like invoices, bank statements, utility bills, and other document types, even when they come in different formats or layouts.

For example, below is a diagram showing how a document is classified into different types, and how each type can have its own custom post-processing logic.

Classifying Financial Documents

Let’s look at a financial document classification example to see how vlm-1 can be used to automatically categorize different types of documents. In this example, we’ll use vlm-1 to classify documents into categories like invoices, bank statements, utility bills, and other financial documents. This classification can then be used to route documents to the appropriate processing pipeline or storage system.

Example of different types of financial documents that need classification.

Define a custom schema for document classification

In the sections below, we’ll showcase how to use the API for document classification. vlm-1 can automatically classify documents based on their content and visual characteristics, providing both a classification and a rationale for its decision. First, let’s create a custom schema that will be used to classify the documents.

from typing import Literal
from pydantic import BaseModel, Field

class DocumentClassification(BaseModel):
    rationale: str = Field(..., description="A rationale for the classification, based on the content and visual features of the document. Keep it short and concise, yet detailed enough to justify the classification.")
    document_type: Literal["invoice", "bank-statement", "utility-bill", "other"] = Field(..., description="The type of document being processed")
    confidence: Literal["hi", "med", "lo"] = Field(..., description="Confidence score for the classification, based on the rationale provided and the visual features of the document. For ambiguous documents, the confidence score should be `lo`.")

Classify documents

Once you have defined your custom schema, you can use vlm-1 to classify documents according to this schema. The classification will be validated against the schema you defined, ensuring that it conforms to the expected structure and types. First, let’s look at an example of how to classify a single document.

from pathlib import Path
from vlmrun.client import VLMRun
from vlmrun.client.types import PredictionResponse, GenerationConfig

# Initialize the client
client = VLMRun(api_key="<VLMRUN_API_KEY>")

# Classify a single document
path = Path("path/to/document.pdf")
prediction: PredictionResponse = client.document.generate(
    file=path,
    domain="document.classification",
    config=GenerationConfig(response_model=DocumentClassification)
)
response_dict = prediction.response.model_dump()
print(response_dict)

Sample Document Classification

Let’s take a look at the sample output for a typical invoice document.

{
  "rationale": "The document contains a clear 'INVOICE' header, itemized list of products/services, and total amount due. The layout matches typical invoice formats with company details at the top and payment terms at the bottom.",
  "document_type": "invoice",
  "confidence": "hi"
}

Let’s breakdown the output into their respective components:

  • rationale: A detailed explanation of why it classified the document as an invoice, based on both content and visual features. This allows the developer or user to introspect on the classification and make any necessary adjustments downstream to the model.
  • document_type: The correct document classification type, in this case an invoice.
  • confidence: A qualitative confidence level of “hi”, indicating strong certainty in the classification based on the clear presence of invoice-specific features.

Processing larger document collections with batch=True

Once you have validated the classification for a single document, you can scale this process to classify larger collections of documents. The code example below shows how to process several documents in a directory. The rationale-based approach is particularly useful when dealing with ambiguous documents or when you need to understand why a document was classified in a certain way.

# Same imports as before
# ...

# Classify the documents
requests = {}
for path in Path("path/to/documents").glob("*.pdf"):
    prediction: PredictionResponse = client.document.generate(
        file=path,
        domain="document.classification",
        config=GenerationConfig(response_model=DocumentClassification),
        batch=True
    )
    requests[prediction.id] = {
        "path": path,
        "id": prediction.id
    }

# Wait for all predictions to complete
predictions = {}

start_time = time.time()
while time.time() - start_time < 180:
    # fetch the prediction result if it's completed
    for id, request in requests.items():
        # `client.predictions.wait()` will block until the prediction id is completed
        prediction: PredictionResponse = client.predictions.wait(id=id, timeout=10)
        predictions[id] = {**request, "response": prediction.response}

    # wait for 1 second before checking again
    time.sleep(1)

    # break if all predictions are completed
    if len(predictions) == len(requests):
        break

# Get the results
for p in predictions.values():
    print(p)

Fine-tuning Document Classification

This feature is currently only available for our enterprise-tier customers. If you are interested in using this feature, please contact us.

For enterprise use-cases where you need to fine-tune the model for custom document types and improved accuracy, you can use our fine-tuning guides to customize the model performance and scalability needs. This can include fine-tuning the model on your own document collections, customizing the classification schema, or adding new document types to the classification system. Fine-tuning can help you improve the accuracy and performance of the model for your specific document types, and also help you scale the model to handle larger volumes of documents with more efficient, lightweight fine-tuned models that are optimized for your specific use-case. Contact us at support@vlm.run to learn more about how we can help you with your fine-tuning needs.

Try our Document -> JSON API today

Head over to our Document -> JSON to start building your own document processing pipeline with VLM Run. Sign-up for access on our platform.