Introduction to Scaling Analytics Using DuckDB with Python
Description
DuckDB is a free, open-source, fast, and performant singe machine OLAP analytical database that supports the full SQL language. It's Python API is extensive, can work with data that does NOT fit into memory, and is many times faster than pandas. I predict DuckDB will dominate the landscape soon! It is a Game-Changer! Supporting Links Support Me on Patreon https://www.patreon.com/bePatron?u=63260756 My Playlists: https://www.youtube.com/@BryanCafferky/playlists Slides: https://github.com/bcafferky/shared/blob/master/DuckDB/IntroToDuckDB/IntroToDuckDB.pdf Code: https://github.com/bcafferky/shared/blob/master/DuckDB/IntroToDuckDB/duckdb_notebook_basic.ipynb Data: Yellow Taxi Data 2024-01 (Yellow Taxi Trip Records Parquet): https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page My Video on Scaling Python Pandas: https://youtu.be/bTQten5T53g DuckDB Python API Documentation: https://duckdb.org/docs/api/python/overview.html My Video About Using Python Virtual Environments: https://youtu.be/bjUjNSotYgA DBeaver Documentation: https://dbeaver.io/
Description from YouTube. Full content on the video page.
More from Bryan Cafferky
NewsMaster Dimensional Modeling Lesson 03 - Understand the ETL Pipeline
The video explains the typical stages of a data warehouse ETL pipeline, including pre-staging (raw data), staging (cleaned data), operational data store (snapshot), and data mart (star schema). It also details the benefits of having multiple stages, such as easier debugging, data recovery, and auditability, and how this maps to the Medallion Architecture (Bronze, Silver, Gold).
TutorialsMaster Databricks 2nd Ed: Lesson 4 - Use Databricks for Free!
Databricks now offers a free edition for learning purposes, providing access to most core features within a serverless environment without requiring a credit card. This free edition has limitations, including small compute resources, no custom cluster allocation, and the absence of R or Scala language support, and is not suitable for sensitive data or production use.
TutorialsMaster Databricks 2nd Ed: Lesson 3 - Understanding Clusters
This video explains Databricks clusters, detailing their components like driver and worker nodes, configuration options such as autoscaling and Photon acceleration, and how to create and manage them within Azure. It also covers common interview questions related to cluster sizing and performance tuning, emphasizing that Databricks clusters are essentially Spark clusters enhanced with the Databricks runtime for cloud environments.


