Skip to content
brickster.ai
All videos
newsDatabricks·September 21, 2021

Productionizing Unstructured Data for AI and Analytics

Description

A large Delta Lake frequently includes a mix of structured and unstructured data. Data teams use Apache SparkTM to analyze structured data, but often struggle to apply the same analysis to unstructured, unlabeled data (e.g. images, video). Teams are forced to use expensive and manual processes to transform unstructured data into something more useful –they either pay a third party to label their data, buy a labeled dataset, or narrow the scope of their project to leverage public datasets. If data teams had faster and more cost effective ways to convert unstructured data into structured data, they could support more advanced use-cases built around their companies’ unique, unstructured datasets. In this talk, we demonstrate how teams can easily prepare unstructured data for AI and analytics in Databricks. We leverage the LabelSpark library (a connector between Databricks and Labelbox) to connect an unstructured dataset to Labelbox, programmatically set up an ontology for labeling, and return the labeled dataset in a Spark DataFrame. Labeling can be done by humans, AI models in Databricks, or a combination of both. We will also show a model-assisted labeling workflow that allows huma

Description from YouTube. Full content on the video page.

More from Databricks