Skip to content
brickster.ai
All videos
newsDatabricks·August 30, 2021

How Databricks Leverages Auto Loader to Ingest Millions of Files an Hour

Description

Continuously and incrementally ingesting data as it arrives in cloud storage has become a common workflow in our customers’ ETL pipelines. However, managing this workflow is rife with challenges, such as scalable and efficient file discovery, schema inference and evolution, and fault tolerance with exactly-once guarantees. Auto Loader is a new Structured Streaming source in Databricks as our all-in-one solution to tackle these challenges. In this talk, we’ll discuss how Auto Loader: Can discover files efficiently through file notifications or incremental file listing Can scale to handling billions of files as metadata and still provide exactly once processing guarantees Can infer the schema of data and detect schema drift over time Can evolve the schema of the data being processed Is used within Databricks to ingest millions of files that are being uploaded every hour efficiently Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a

Description from YouTube. Full content on the video page.

More from Databricks