Skip to main content
Use this tutorial when you want to turn noisy HTML into clean structured data.

Best fit

  • product listings
  • company or job pages
  • invoice-like HTML
  • scraped web content that needs a typed output
  • model: Schematron
  • output control: structured outputs
  • scale path: batch

Workflow

  1. isolate the most relevant HTML if you can
  2. define the JSON schema you want back
  3. run Schematron against the HTML
  4. validate the response shape
  5. move to batch when the workload grows