Visual web scraping is a powerful technique that goes beyond traditional text-based scraping by leveraging visual elements to extract meaningful information from websites. VLM Run’s visual AI capabilities make it an ideal tool for this task, offering more robust and comprehensive data extraction compared to traditional LLM-powered web scraping methods.

Visual Scraping of Real Estate Listings

Let’s explore how VLM Run can be used to visually scrape a real estate listing from websites like Zillow. This approach is particularly effective for real estate sites, where much of the valuable information is presented visually through images and layout.

Preview of the Zillow property details page.

Using the Web Generate API

VLM Run provides a /web/generate API endpoint specifically designed for visual web scraping tasks. Here’s how you can use it:

import requests
import json

VLMRUN_API_KEY = "<YOUR_API_KEY>"
API_URL = "https://api.vlm.run/v1/web/generate"

headers = {
    "Authorization": f"Bearer {VLMRUN_API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "model": "vlm-1",
    "domain": "web.zillow-property-details",
    "url": "https://www.zillow.com/homedetails/594-S-Mapleton-Dr-Los-Angeles-CA-90024/20524417_zpid/",
    "mode": "fast"
}

response = requests.post(API_URL, headers=headers, json=payload)
result = response.json()

print(json.dumps(result, indent=2))

Structured Output

The API returns structured data extracted from the visual elements of the webpage. For a Zillow property listing, this might include:

{
  "id": "request_id",
  "created_at": "2024-06-25T10:30:00Z",
  "completed_at": "2024-06-25T10:30:05Z",
  "response":  {
        "zillow_id": "20524417",
        "zillow_listing_url": "https://www.zillow.com/homedetails/594-S-Mapleton-Dr-Los-Angeles-CA-90024/20524417_zpid/",
        "property_type": "Single Family",
        "status": "For Sale",
        "days_on_zillow": 613,
        "address": {
            "street": "594 S Mapleton Dr",
            "city": "Los Angeles",
            "state": "CA",
            "zip_code": "90024"
        },
        "financial_details": {
            "price": 137500000.0,
            "estimated_mortgage": 849058.0,
            "hoa_fee": null,
            "property_tax": null,
            "price_history": [
            {
                "date": "2024-04-12",
                "price": 137500000.0,
                "event": "Price change"
            },
            {
                "date": "2023-01-14",
                "price": 155000000.0,
                "event": "Listed for sale"
            },
            {
                "date": "2022-02-11",
                "price": 165000000.0,
                "event": "Listed for sale"
            },
            {
                "date": "2019-07-02",
                "price": 119750000.0,
                "event": "Sold"
            }
            ],
        },
        "features": {
            "bedrooms": 14.0,
            "bathrooms": 27.0,
            "square_feet": 56500,
            "lot_size": 4.6,
            "year_built": 1990,
            "parking": "Oversized, Gated, Garage - 4+ Car, Tandem, Guest, Driveway",
            "heating": "Central",
            "cooling": "Central Air",
            "appliances": [
            "Freezer",
            "Dishwasher",
            "Range/Oven",
            "Refrigerator"
            ]
        },
        "description": "The Manor - An unparalleled offering, an unrivaled setting, a showplace of the highest caliber. The Manor is undoubtedly one of the finest estates in the World. Majestically sited on 4.68 acres in the heart of Holmby Hills, The Manor offers complete privacy bordering the Los Angeles Country Club. Entirely clad in limestone and comprised of over 56,000sqft, The Manor offers every amenity imaginable. From bowling alleys to beauty salons, rolling lawns to rose gardens, a legendary library to professional screening room: the options are vast and endless. A rare opportunity to acquire one of the most important estates ever created.",
        "schools": [
            {
            "name": "Warner Avenue Elementary School",
            "type": "Elementary",
            "rating": 8,
            "distance": 0.4
            },
            {
            "name": "Emerson Community Charter School",
            "type": "Middle",
            "rating": 6,
            "distance": 1.4
            },
            {
            "name": "University Senior High School Charter",
            "type": "High",
            "rating": 6,
            "distance": 2.7
            }
        ],
        "neighborhood": {
            "walk_score": 23,
            "transit_score": 59,
            "bike_score": 27,
            "nearby_amenities": null
        },
        "zestimate": null,
        "image_urls": [
            "https://photos.zillowstatic.com/fp/843ffbb5e7f29f22b131cdadfc70e180-cc_ft_960.jpg",
            "https://photos.zillowstatic.com/fp/57ccdb7e9daa78a873e8c5ae5a3e37f9-cc_ft_576.jpg",
            "https://photos.zillowstatic.com/fp/fc75ee41b90f1be6da87334039157ef6-cc_ft_576.jpg",
            "https://photos.zillowstatic.com/fp/de5859dc78996aed92c709de2f4f0448-cc_ft_576.jpg",
            "https://photos.zillowstatic.com/fp/332e6d23f734b8ad972dad4ad0bf8fc3-cc_ft_576.jpg"
        ],
        "virtual_tour_url": null,
        "last_updated": "2024-06-25",
        "listing_agent": "Drew Fenton",
        "listing_office": "Carolwood Estates",
        "contact_phone": "310-623-3622",
        "contact_email": null,
        "open_houses": null,
        "neighborhood_features": null,
        "visual_features": {
            "main_image_description": null,
            "key_features": [
                {
                    "name": "quality",
                    "description": "Ensure that the images are high-quality, well-lit, and showcase the property in the best possible way. Potential buyers will be more attracted to listings with clear and well-presented images."
                },
                {
                    "name": "variety",
                    "description": "Include a variety of images that showcase different aspects of the property, such as exterior shots, interior rooms, backyard, and any unique features."
                },
                {
                    "name": "order",
                    "description": "Arrange the images in a logical sequence that tells a story of the property. Start with a compelling exterior shot to grab attention, followed by key interior rooms and features."
                },
                {
                    "name": "highlights",
                    "description": "Highlight any special features or upgrades in the images, such as a renovated kitchen, a spacious backyard, or a stunning view. These can set your listing apart from others."
                },
                {
                    "name": "quantity",
                    "description": "While it’s important to have a variety of images, avoid overwhelming the listing with too many photos. Aim for around 25-30 high-quality images that provide a comprehensive view of the property."
                }
            ]
        }
    },
    "status": "completed"
}

Advantages of Visual Web Scraping

  1. Comprehensive Data Extraction: VLM Run can interpret visual elements like floor plans, property images, and layout to extract information that might not be explicitly stated in the text.

  2. Robust to Layout Changes: Unlike traditional scrapers that rely on specific HTML structures, visual scraping is more resilient to website layout changes.

  3. Context-Aware Extraction: The model can understand the context of visual information, leading to more accurate and meaningful data extraction.

  4. Handling Dynamic Content: Visual scraping can capture information from dynamically loaded content or interactive elements that might be challenging for traditional scrapers.

  5. Image Analysis: VLM Run can describe and categorize property images, providing valuable insights not available through text-only scraping.

Ethical Considerations

When using visual web scraping, it’s crucial to:

  1. Respect website terms of service and robots.txt files.
  2. Implement rate limiting to avoid overloading servers.
  3. Use the data responsibly and in compliance with applicable laws and regulations.

Conclusion

Visual web scraping with VLM Run offers a powerful way to extract structured data from visually rich websites like Zillow. By leveraging visual AI capabilities, you can obtain more comprehensive and accurate information compared to traditional text-based scraping methods, opening up new possibilities for real estate data analysis and applications.

Get Started with our Web -> JSON API

Head over to our Web -> JSON to start building your own web processing pipeline with VLM-1. Sign-up for access to our API here.