Skip to content
brickster.ai
All topics
Machine LearningSee on /pulse →

Vector Search

Recent items mentioning Vector Search across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.

14 recent items4 releases1 news5 videos4 community threads
What's happening in Vector SearchAI synthesis · updated 22h ago

Databricks has significantly enhanced its Vector Search capabilities, introducing AI Prep Search for document chunking and vector database preparation, alongside SQL vector functions for embedding mathematics 2. New API methods and fields for Vector Search are now available across Databricks SDKs for Java and Go, including endpoint scaling options, permission management, and support for various ingestion sources like Confluence and Jira 34. The Databricks CLI also gained support for Vector Search Endpoints and a --limit flag for paginated list commands 9.

Generated daily from the 9 most recent items mentioning Vector Search. Click any [N] to jump to the source.

RedditHelp

Knowledge bases in medallion architecture

Would you put knowledge bases in the bronze/silver/gold layer? The raw documents definitely reside in the bronze layer. But if I create AI Agents atop a volume storing the raw documents, then the knowledge base remains in bronze. However, if I create vector embeddings/do chunking/create a vector search index, then these tables should be in the silver layer. Am I on the right track?

28RazzmatazzLiving1323today
RedditNews

Vector Search in DABS

More and more resources are available under DABS. The newest addition is the Vector Search Endpoint. #databricks [https://medium.com/@databrickster/databricks-news-watermark-based-incremental-ingestion-mcp-in-ai-gateway-void-bba5021b29de](https://medium.com/@databrickster/databricks-news-watermark-based-incremental-ingestion-mcp-in-ai-gateway-void-bba5021b29de)

20hubert-dudek1w ago
RedditGeneral

Marimo on Databricks

My workflow for a long time involved me switching back/forth between vscode and browser/databricks ui. I like to write my "production code" in normal python, but notebooks are great for exploration, spikes, visualization, triage etc. I could write a small dissertation but for various reasons I don't really like jupyter, and databricks notebooks have their own problems with commented magic commands etc. This led me to check out [marimo](https://marimo.io/), and wow, these are so cool. Code that runs in normal python, merges cleanly, has visualizations, widgets, the the app runs locally and doesn't glitch out, and even the vscode extension works nicely. The problem was, the databricks support wasn't great. It just felt a bit dated. It required a warehouse for sql, doesn't seem to really support serverless, and there were just so many oppurtunities to plug databricks into Marimo. This led me to create [marimo-databricks-connect](https://github.com/brookpatten/marimo-databricks-connect) [pypi](https://pypi.org/project/marimo-databricks-connect/) I tried to plug in "all the things" databricks into the place where they go in Marimo. I'm pretty happy with the result. - Connect to databricks using databricks-connect & spark (not sql warehouse) - Authenticate/configure spark using the default databricks-connect process (env vars, .databrickscfg etc), no additional auth config. - Execution of both python & sql cells - Autocomplete Catalog/Schema/Table/Column Names - Browsing of catalogs/schemas/tables/columns in the marimo data sources view - Browsing of external locations, volumes, dbfs, workspace in the marimo storage browser Notebook widgets to monitor and control of specific instances of databricks capabilities (clusters, workflows, vector search, apps etc) - Widgets to browse & explore databricks capabilities (compute, workflows, unity catalog) - Works in local marimo marimo edit notebook.py, in the vscode extension - Deploy as a databricks app to provide an alternative web based marimo UI. I'm working on adding serving endpoints as AI providers to the notebooks too. In particular what I like to use this for is creating "command center" notebooks for given processes that can include some normal pyspark/sql code to query/triage, widgets to monitor/control various databricks resources, visualizations to monitor dq etc. I just wanted to share and see what the community thinks, would you use it? contributions are welcome. throwaway account because i'm doxing myself via gh repo.

2017yes_my_name_is_brook2w ago
RedditTutorial

I built a 54-minute hands-on RAG tutorial on Databricks — from PDF loading to retrieval and LLM answers

Hi Everyone I recently published a hands-on tutorial where I build a basic **RAG pipeline on Databricks** from scratch. The goal of the video is not just to use a high-level RAG framework, but to show what actually happens behind the scenes. In the video, I cover: * Loading PDF files inside Databricks * Extracting text from PDF pages * Splitting documents into chunks * Creating embeddings using Databricks embedding endpoints * Building a simple manual retrieval system using vector similarity * Creating prompts from retrieved chunks * Generating grounded answers using Databricks LLM endpoints * Using `databricks-langchain` for embeddings and chat models I intentionally kept the implementation simple so that beginners can understand the core mechanics of RAG before moving to more production-level tools like Vector Search, Unity Catalog, MLflow, etc. Here is the video: [https://youtu.be/7QY1iXPLgRg](https://youtu.be/7QY1iXPLgRg) Would love to hear feedback from people working with Databricks, RAG, LangChain, or enterprise GenAI systems. Also curious: for production RAG on Databricks, would you prefer starting with a simple manual implementation like this first, or directly using Mosaic AI Vector Search / Databricks Vector Search from the beginning?

93Remarkable_Nothing653w ago