Trending GitHub projects.
Repos tagged topic:databricks — open-source tools, integrations, and accelerators built on or around Databricks. Excludes the official repos already covered by Releases.
redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
cube
📊 Cube Core is open-source semantic layer for AI, BI and embedded analytics
APIJSON
🏆 Real-Time no-code, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and Frontend(Client) can customize response JSONs 🏆 实时 零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构
sqlglot
Python SQL Parser and Transpiler
growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
SynapseML
Simple and Distributed Machine Learning
spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.
ai-dev-kit
Databricks Toolkit for Coding Agents provided by Field Engineering
zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
nao
👾 nao is an open source analytics agent. (1) Create context with nao-core cli, (2) deploy nao chat interface for everyone
mlops-stacks
This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
altimate-code
Open-source agentic data engineering harness for dbt, SQL, and cloud warehouses. 100+ tools, 10 warehouses, AI-powered.
dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
spark
Drop-in replacement for Apache Spark UI
Lynkr
Streamline your workflow with Lynkr, a CLI tool that acts as an HTTP proxy for efficient code interactions using Claude Code CLI.
dqx
Databricks framework to validate Data Quality of pySpark DataFrames and Tables
Dataflare
Fast. Simple. Database Manager.
terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
databricks_bootcamp_2026
End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.
lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
dlt-meta
Metadata driven Spark Declarative Pipelines framework for bronze/silver pipelines
databricks-sql-python
Databricks SQL Connector for Python
owox-data-marts
Open-Source Self-Service Analytics Platform
analytics-toolbox-core
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
universql
Pushdown compute from Snowflake to DuckDB running on your infrastructure
stowage
Bloat-free, no BS cloud storage SDK.
databricks-code-practice
Practice Databricks coding skills with hands-on exercises. Import into Databricks Free Edition, write code, run assertions, check pass/fail. Covers Delta Lake, Spark SQL, PySpark, Auto Loader, medallion architecture, window functions, and more.
scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
databricks-apps-cookbook
Ready-to-use code snippets for building interactive Databricks Apps.
VariantSpark
machine learning for genomic variants