[Audio] Hi everyone! I'm presenting Data Forge — an AI-powered synthetic data generator for Databricks. Let me show you how we're generating test data using just natural language.
[Audio] Data Forge takes a source table, understands your requirements in plain English, and generates realistic synthetic data that matches your schema. It works across SQL Editor, Notebooks, and Workflows — and includes a global toggle to switch between production and synthetic data for stress testing.
[Audio] Under the hood, we use foundation models like Claude or GPT-4, with adaptive batching and automatic retry logic. The function integrates with Unity Catalog for full metadata awareness..
[Audio] This slide explains how UC is leveraged to understand the metadata and process the user request via Databricks Jobs and return the data with the mentioned target table.
[Audio] Four simple parameters — my requirements in plain English, the source table, number of rows, and output location. I'm asking for customer support conversations about Databricks AI features. Look at the output — realistic support tickets about MLflow errors, Model Serving latency, and AI/BI Genie issues. Each record is unique and contextually accurate." "We've also embedded this directly into Unity Catalog. From any table, click to generate synthetic data, specify your requirements, and trigger a job — all without leaving the catalog.".
[Audio] The job completes in about 3 minutes with full lineage tracking. Your synthetic data is governed and production-ready.
[Audio] Thank you. Thank You!!!.