Projects

Trending GitHub projects.

Repos tagged topic:databricks — open-source tools, integrations, and accelerators built on or around Databricks. Excludes the official repos already covered by Releases.

Language:

dbeaver/dbeaver51.2k

dbeaver

Free universal database tool and SQL client

aidatabasedatabricksdb2

Java4.3kpushed yesterday

getredash/redash28.7k

redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

analyticsathenabibigquery

Python4.6kpushed 3d ago

cube-js/cube20.5k

cube

📊 Cube Core is open-source semantic layer for AI, BI and embedded analytics

agentic-analyticsagentsaianalytics

Rust2.1kpushed yesterday

Tencent/APIJSON18.4k

🏆 Real-Time no-code, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and Frontend(Client) can customize response JSONs 🏆 实时零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构

baasclickhousecruddatabricks

Java2.3kpushed yesterday

tobymao/sqlglot9.5k

sqlglot

Python SQL Parser and Transpiler

bigqueryclickhousedatabricksduckdb

Python1.2kpushed yesterday

growthbook/growthbook8.1k

growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

ab-testingabtestabtestinganalytics

TypeScript808pushed yesterday

microsoft/SynapseML5.2k

SynapseML

Simple and Distributed Machine Learning

aiapache-sparkazurebig-data

Scala865pushed 2d ago

dotnet/spark2.1k

spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

analyticsapache-sparkazurebigdata

C#333pushed 2mo ago

databricks-solutions/ai-dev-kit1.8k

ai-dev-kit

Databricks Toolkit for Coding Agents provided by Field Engineering

agentsclaudecursordatabricks

Python403pushed yesterday

Multiwoven/multiwoven1.7k

multiwoven

🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.

bigquerycdpcustomer-data-platformdata-activation

Ruby92pushed yesterday

getnao/nao1.5k

nao

👾 nao is an open source analytics agent. (1) Create context with nao-core cli, (2) deploy nao chat interface for everyone

agentic-analyticsanalyticsanalytics-engineeringbigquery

TypeScript215pushed yesterday

zinggAI/zingg1.2k

zingg

Scalable master data management, identity resolution, entity resolution, and deduplication using ML

cdpcustomer-data-platformdata-sciencedatabricks

Java174pushed 2d ago

AltimateAI/altimate-code791

altimate-code

Open-source agentic data engineering harness for dbt, SQL, and cloud warehouses. 100+ tools, 10 warehouses, AI-powered.

agentagentic-data-engineeringaianalytics-engineering

TypeScript138pushed yesterday

databricks/mlops-stacks706

mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.

databricksmachine-learningmlops

Go Template262pushed 6d ago

sparklabx/drawio-ai-kit618

drawio-ai-kit

Teach your AI to draw correct, beautiful draw.io diagrams — declarative layout engine, ground-truth stencils, structural validator, vision self-check. AWS · Azure · GCP · Databricks · BPMN. Zero dependencies.

agent-skillsai-agentsarchitecture-diagramsaws

JavaScript106pushed 1w ago

DataflareApp/dataflare585

dataflare

Simple, easy-to-use database manager

bigqueryclickhousecloudflare-d1cloudflare-r2

TypeScript37pushed yesterday

Fast-Editor/Lynkr538

Lynkr

Streamline your workflow with Lynkr, a CLI tool that acts as an HTTP proxy for efficient code interactions using Claude Code CLI.

agentsaiai-gatewayclaude

JavaScript57pushed 6d ago

databrickslabs/dbldatagen485

dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

data-generationdatabricksdatagendatageneration

Python102pushed yesterday

dataflint/spark481

spark

Drop-in replacement for Apache Spark UI

apache-sparkbig-datadata-pipelinedata-pipelines

TypeScript57pushed 3w ago

databrickslabs/dbx463

dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

cicicddatabricksdatabricks-api

Python128pushed 4mo ago

databrickslabs/dqx439

dqx

Databricks framework to validate Data Quality of pySpark DataFrames and Tables

data-profilingdata-qualitydata-quality-monitoringdatabricks

Python131pushed yesterday

DataflareApp/Dataflare383

Dataflare

Fast. Simple. Database Manager.

bigqueryclickhousecloudflare-d1cloudflare-r2

17pushed 3mo ago

DataWithBaraa/databricks_bootcamp_2026375

databricks_bootcamp_2026

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

aiapache-sparkdata-analyticsdata-engineering

Jupyter Notebook177pushed 6mo ago

databricks/terraform-databricks-examples336

terraform-databricks-examples

Examples of using Terraform to deploy Databricks resources

awsazuredatabricksdatabricks-module

HCL224pushed 2w ago

adidas/lakehouse-engine293

lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

big-dataconfiguration-drivendata-engineeringdata-quality

Python51pushed 1mo ago

rocky-data/rocky291

rocky

A SQL transformation engine that type-checks your whole pipeline and catches breaking changes before they run — branches, replay, column-level lineage, compile-time contracts, per-model cost. Adapters: Databricks, Snowflake, BigQuery, DuckDB. Single static Rust binary. Apache 2.0.

bigquerycolumn-lineagedagsterdata-contracts

Rust16pushed yesterday

databrickslabs/dlt-meta268

dlt-meta

Metadata driven Spark Declarative Pipelines framework for bronze/silver pipelines

databricksdatabricks-cli-installabledltlakeflow-declarative-pipelines

Python129pushed yesterday

jrlasak/databricks-code-practice236

databricks-code-practice

Practice Databricks coding skills with hands-on exercises. Import into Databricks Free Edition, write code, run assertions, check pass/fail. Covers Delta Lake, Spark SQL, PySpark, Auto Loader, medallion architecture, window functions, and more.

auto-loadercoding-practicedata-engineeringdatabricks

Python128pushed 2d ago

databricks/databricks-sql-python233

databricks-sql-python

Databricks SQL Connector for Python

databricksdwhpython3sql

Python148pushed 3d ago

OWOX/owox-data-marts222

owox-data-marts

Open-Source Self-Service Analytics Platform

analyticsathenabigquerydashboard

TypeScript33pushed yesterday

CartoDB/analytics-toolbox-core210

analytics-toolbox-core

A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

analytics-toolboxbigquerycartodatabricks

JavaScript43pushed 1w ago

buremba/universql206

universql

Pushdown compute from Snowflake to DuckDB running on your infrastructure

databricksdbtduckdbproxy-server

Jupyter Notebook7pushed 9mo ago

aloneguid/stowage192

stowage

Bloat-free, no BS cloud storage SDK.

aws-s3azure-storagedatabricksgcp-storage

C#23pushed 4mo ago

databricks-solutions/databricks-apps-cookbook178

databricks-apps-cookbook

Ready-to-use code snippets for building interactive Databricks Apps.

databricksdatabricks-appsweb-application

Python120pushed 2mo ago

lamastex/scalable-data-science166

scalable-data-science

Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

apache-sparkdata-sciencedatabricksscala

HTML93pushed 11mo ago

databrickslabs/lakebridge148

lakebridge

Accelerates migrations to Databricks by automating key migration activities

code-analysiscode-converterdata-validationdatabricks

Python108pushed 1mo ago

aehrc/VariantSpark147

VariantSpark

machine learning for genomic variants

association-studiesawsbioinformaticsdatabricks

JavaScript48pushed 1mo ago