Skip to content
brickster.ai
All videos
tutorialsDatabricks·August 30, 2021

Guaranteeing Data Quality SLAs with Deequ & Databand

Description

As the importance of data grows and its connection to business value becomes more direct, data engineering teams are increasingly adopting service level agreements (SLAs) for how they deliver data, covering new factors like data freshness, completeness, and accuracy. In this session we’ll discuss how to use Deequ, a data quality library that’s purpose-built for Spark, to develop a data monitoring and QA system that will enable you to meet SLAs guaranteed to your analytics users, scientists, and other business stakeholders. We’ll cover how to use Deequ to create quality checks that report metrics and enforce rules on data arrivals, schemas, distributions, and custom metrics. We’ll cover how to visualize, trend, and alert on those metrics using pipeline observability tools. And we’ll discuss common challenges that teams face when setting up data quality logging infrastructure and best practices for adoption. We’ll use common examples such as machine learning, data transformation, and replication pipelines (such as moving data from S3 to Delta Lake). With these tools, you’ll be able to create more stable, reliable pipelines that your business can depend on. Connect with us: Websit

Description from YouTube. Full content on the video page.

More from Databricks