[Audio] Hi Everyone. Data Forge: Synthetic Data Generators.
[Audio] Excited to introduce Data Forge - a powerful tool that enables our customers to harness synthetic data to rapidly generate realistic datasets, accelerate unit testing, stress testing, scalability, and innovation. Data Forge is available as: Unity Catalog Built-In Synthetic Data Feature Scalable AI Function It leverages unity catalog metadata, rules, and statistical data to generate high-fidelity datasets. It works across SQL Editor, Notebooks, Jobs, and Workflows, and includes a Global Synthetic Mode that lets customers run production alongside synthetic data across all tables, scaling workloads 2×, 5×, or 10× to stay future-ready..
[Audio] This architecture slide explains how UC is leveraged to understand the metadata and process the user request via Databricks Jobs and return the synthetic data to provide a built-in feature..
[Audio] Additionally, we are exposing it as scalable AI function that leverages foundation models like Claude or GPT-4, with adaptive batching and automatic retry logic..
[Audio] Here's the AI function in action. Four simple parameters — my requirements in plain English, the source table, number of rows, and output location. I'm asking for customer support conversations about Databricks AI features. Look at the output — realistic support tickets about MLflow errors, Model Serving latency, and AI/BI Genie issues. Each record is unique and contextually accurate..
[Audio] We've also embedded this directly into Unity Catalog through Synthetic Data Button. From any table, click to generate synthetic data, specify your requirements, and trigger a job — all without leaving the catalog..
[Audio] Generate New Data Option. UC Synthetic Data Generator UI (Dropdown - Generate New Data).
[Audio] Submit Form. UC Synthetic Data Generator UI (Kick Off Data Generation).
[Audio] Job Triggered. Synthetic Data Generator from UC UI.
[Audio] Job Generation Insights. UC Synthetic Data Generator UI (Backend Job).
[Audio] Finally, results are back. Databricks' Lakehouse-native Data Forge lets you generate realistic, PII-safe synthetic data directly in your catalog. It's fast, governed, and cost-efficient — supporting unit tests, QA, load testing, schema validation, issue reproduction, and cross-team collaboration. You can scale synthetic workloads 2×, 5×, or 10× alongside production with full observability..
[Audio] That's Data Forge — realistic test data, where your data lives. Thank you!.