Best Practices
- Keep schemas focused: Define schemas that extract only the information you need.
- Use validation rules: Leverage Pydantic’s validation capabilities to ensure data integrity.
- Create reusable components: Break down complex schemas into smaller, reusable models.
- Document your fields: Use the
Fieldclass with descriptive titles to improve extraction quality. - Test with diverse inputs: Validate your schemas against a variety of visual inputs to ensure robustness.
Mapping Schemas to Task Primitives
- Classification: Use
Literalto constrain your field values to a set of possible categories. - Captioning: Use
strto extract a textual description of the image. Provide some additional details regarding the style, context and also provide a rough estimate of the length of the caption (in number of words). - Date Parsing: Use
datetime.dateto extract a date from the image. Provide some additional details regarding the format of the date (e.g.YYYY-MM-DD). One additional caveat is that you can not providedateas the field name in your Pydantic BaseModel, as it is a reserved keyword in Pydantic. Usedatetime.datetimeinstead if you need to extract additional time information (e.g.YYYY-MM-DD HH:MM:SS). Otherwise, always usedatetime.datefor date parsing.