Schemas
Best Practices
Best practices for designing schemas for visual inputs.
Defining a clear and effective schema is crucial for getting the most accurate and useful information from visual inputs. A well-designed schema acts as a precise instruction set, guiding the model to extract exactly what you need in a structured format. Following best practices ensures your schemas are robust, maintainable, and yield high-quality results.
This guide outlines key principles and techniques for crafting good, robust and maintainable schemas using Pydantic. If you are interested in contributing to this guide (especially around the usage of Zod), please reach out to us on Discord or email.
Best Practices
- Keep schemas focused: Define schemas that extract only the information you need.
- Use validation rules: Leverage Pydantic’s validation capabilities to ensure data integrity.
- Create reusable components: Break down complex schemas into smaller, reusable models.
- Document your fields: Use the
Field
class with descriptive titles to improve extraction quality. - Test with diverse inputs: Validate your schemas against a variety of visual inputs to ensure robustness.
Mapping Schemas to Task Primitives
- Classification: Use
Literal
to constrain your field values to a set of possible categories. - Captioning: Use
str
to extract a textual description of the image. Provide some additional details regarding the style, context and also provide a rough estimate of the length of the caption (in number of words). - Date Parsing: Use
datetime.date
to extract a date from the image. Provide some additional details regarding the format of the date (e.g.YYYY-MM-DD
). One additional caveat is that you can not providedate
as the field name in your Pydantic BaseModel, as it is a reserved keyword in Pydantic. Usedatetime.datetime
instead if you need to extract additional time information (e.g.YYYY-MM-DD HH:MM:SS
). Otherwise, always usedatetime.date
for date parsing.