MLflow
Recent items mentioning MLflow across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
MLflow is rapidly evolving as a core platform for AI agent development, with new features for building, evaluating, and deploying high-quality agents 5. Recent updates include multimodal tracing to store and render PDFs, audio, and images in tracing spans, new guardrail capabilities for gateway endpoints, and tracing support for Codex, Gemini, and Qwen coding agents 9. Furthermore, MemAlign, an open-source MLflow framework, significantly improves the evaluation of traditional machine learning in Genie Code by reducing LLM judge error 6.
Generated daily from the 10 most recent items mentioning MLflow. Click any [N] to jump to the source.
EventsDatabricks News: Lakeflow Designer, UV package manager, DABs templates, Genie scheduled tasks
Databricks introduces Lakeflow Designer for visual data preparation, though its generated code is messy; a workaround uses Genie to convert the visual workflow into clean PySpark/SQL notebooks. The UV package manager significantly speeds up package installations on Databricks serverless runtimes, and DABs templates allow for standardized, customizable Databricks Asset Bundles.
How much of this "GenAI workflow" is handled within Genie Spaces?
This slide was taken from the "Implementing GenAI on Databricks" video on the Generative AI Fundamentals Accreditation page: [https://www.databricks.com/learn/training/generative-ai-fundamentals-accreditation](https://www.databricks.com/learn/training/generative-ai-fundamentals-accreditation) I know AI development is moving at the speed of light and this course is copyrighted 2025, so I feel like all of these functionalities may already be distilled into a much simpler version in Genie Spaces. Are Genie Spaces I create just "managed" GenAI workflows that already have vector indices, RAG, and grounding handled? The initial table selection combined with the ability to add "instructions", joins, and SQL expressions and queries leads me to believe this is the case. Can I (or is it necessary to) use MLFlow with a Genie Space? It looks like the "inference tables" are already available on the monitoring tab with "ratings" (AKA human review) included. I guess my main question is... Where, along this graph, do the capabilities of Genie Spaces end so I know where I need to continue learning or plan on extending this Genie Space for a set of customers?
How much of this "GenAI workflow" is handled within Genie Spaces?
This slide was taken from the "Implementing GenAI on Databricks" video on the Generative AI Fundamentals Accreditation page: [https://www.databricks.com/learn/training/generative-ai-fundamentals-accreditation](https://www.databricks.com/learn/training/generative-ai-fundamentals-accreditation) I know AI development is moving at the speed of light and this course is copyrighted 2025, so I feel like all of these functionalities may already be distilled into a much simpler version in Genie Spaces. Are Genie Spaces I create just "managed" GenAI workflows that already have vector indices, RAG, and grounding handled? The initial table selection combined with the ability to add "instructions", joins, and SQL expressions and queries leads me to believe this is the case. Can I (or is it necessary to) use MLFlow with a Genie Space? It looks like the "inference tables" are already available on the monitoring tab with "ratings" (AKA human review) included. I guess my main question is... Where, along this graph, do the capabilities of Genie Spaces end so I know where I need to continue learning or plan on extending this Genie Space for a set of customers?
TutorialsHow to Build an AI Security Governance Hub with Agent Bricks
Databricks Agent Bricks enables building an AI Security Governance Hub by transforming static security playbooks into adaptive multi-agent systems. The video demonstrates combining a knowledge assistant for unstructured documents and a Genie space for structured data into a supervisor agent, then details how to tune and monitor these agents for improved performance and data privacy.
EventsBuilding Trustworthy, High-Quality AI Agents with MLflow
MLflow provides a comprehensive platform for building, evaluating, and deploying high-quality AI agents, offering tools for observability, automated evaluation, prompt optimization, and production monitoring. It enables developers to streamline the agent development lifecycle, from prototyping and testing with human and AI judges to fixing issues and ensuring reliable, governed deployment.
Using MemAlign to Improve Evaluation of Traditional Machine Learning in Genie Code
MemAlign, an open-source MLflow framework, significantly improved the evaluation of traditional machine learning in Genie Code by reducing LLM judge error by 74-89% on key dimensions. This alignment was achieved with ~50 labeled examples, demonstrating the importance of both semantic and episodic memory for closing the gap between LLM judges and human experts.
MLFlow tracking from Azure Container Instance
From Black Box to Observability: Tracing OpenClaw with MLflow
MLflow Tracing now provides full observability for OpenClaw agents, moving them from black box to transparent. Learn how to quickly set up tracing to understand why your agent makes specific decisions, rather than just seeing the output.
MLflow 3.12.0 introduces multimodal tracing, allowing storage and rich rendering of PDFs, audio, and images as artifact attachments in tracing spans. It also adds AI Gateway guardrails to prevent unsafe model inputs/outputs and extends coding agent tracing support to Codex, Gemini, and Qwen.
NewsDatabricks News: watermark-based incremental ingestion, MCP in AI gateway, Genie, Vector Search
Databricks now offers watermark-based incremental ingestion from SQL databases without change data feed, allowing for efficient data updates and soft deletion handling. The AI Gateway supports custom MCP servers, enabling integration with external APIs like GitHub for enhanced AI application development.
MLflow 3.12.0rc0 introduces enhanced AI agent development features, including automatic tracing for more AI coding assistants and OpenClaw, along with new AI Gateway Guardrails for safety checks. It also adds multimodal trace attachments for images and audio in the UI, and a new mlflow.diffusers flavor for saving and serving diffusion models.
I built a 54-minute hands-on RAG tutorial on Databricks — from PDF loading to retrieval and LLM answers
Hi Everyone I recently published a hands-on tutorial where I build a basic **RAG pipeline on Databricks** from scratch. The goal of the video is not just to use a high-level RAG framework, but to show what actually happens behind the scenes. In the video, I cover: * Loading PDF files inside Databricks * Extracting text from PDF pages * Splitting documents into chunks * Creating embeddings using Databricks embedding endpoints * Building a simple manual retrieval system using vector similarity * Creating prompts from retrieved chunks * Generating grounded answers using Databricks LLM endpoints * Using `databricks-langchain` for embeddings and chat models I intentionally kept the implementation simple so that beginners can understand the core mechanics of RAG before moving to more production-level tools like Vector Search, Unity Catalog, MLflow, etc. Here is the video: [https://youtu.be/7QY1iXPLgRg](https://youtu.be/7QY1iXPLgRg) Would love to hear feedback from people working with Databricks, RAG, LangChain, or enterprise GenAI systems. Also curious: for production RAG on Databricks, would you prefer starting with a simple manual implementation like this first, or directly using Mosaic AI Vector Search / Databricks Vector Search from the beginning?
AI observability for production: Seeing Inside Your Multi-Agent System with MLflow
MLflow now offers enhanced AI observability for multi-agent systems, providing crucial visibility into their internal workings. This helps practitioners prevent unintended actions like data purges or sensitive information leaks in production.
CommunityFrom Notebook to Production: MLOps Quickstart
The video demonstrates how to apply MLOps best practices on Databricks using a quickstart repository, covering data ingestion, feature preprocessing, model training, deployment, and inference. It showcases Databricks tools like MLflow and Unity Catalog for managing the ML lifecycle, including version control, experiment tracking, model governance, and automated deployment across development and production environments.
Structuring AI Evaluation and Observability with MLflow: From Development to Production
MLflow now offers enhanced tools for structuring AI evaluation and observability, including new APIs and UI features for logging LLM calls, prompts, responses, and metrics. This enables practitioners to systematically track, compare, and analyze model performance and behavior across development and production, facilitating iterative improvement and robust monitoring.
Enforce Content Policies at the Gateway with AI Gateway Guardrails
MLflow AI Gateway now supports configurable guardrails, using LLM judges to block or sanitize harmful content, PII, and custom policy violations. Enforce content policies at the gateway before requests reach your users or models.
NewsDatabricks Apps vs Model Serving: Authentication, Cost, and Performance Compared
Databricks Apps are now the recommended first choice for deploying agents due to their flexibility in handling full-stack applications with multiple components, offering faster iteration and local testing compared to Model Serving. Model Serving remains suitable for use cases prioritizing high QPS, governance features like AI Gateway, inference tables, and guardrails, or when scaling to zero is acceptable for cost optimization.
TypeScript SDK 0.2.0 RC1
Release candidate for `@mlflow/vercel` TypeScript package with version 0.2.0: https://github.com/mlflow/mlflow/pull/22105
NewsDatabricks News: AUTO CDC, Workspace skills, Ask Genie, and Type widening
Databricks introduces Auto CDC for efficient change data feed processing, notebook and govern tags for better organization, and workspace skills for Ask Genie to customize its responses. Databricks also adds type widening for streaming tables, allowing data types to automatically adjust to larger incoming values.
How to Prevent Runaway Agent Costs with MLflow AI Gateway
MLflow AI Gateway now helps prevent runaway agent costs by providing visibility into which part of your agent is driving up costs. This allows you to identify and address cost drivers before investing in the wrong optimizations.
MLflow 3.11.1 introduces AI-powered issue detection for agent traces, budget alerts and limits for AI Gateway spending, and a new interactive graph view for visualizing trace hierarchies. It also enhances security with pickle-free model serialization and improves dependency management with native UV support.
NewsDatabricks News: Excel add-in, Metrics Views UI, and Quality Monitoring
Databricks announced Lake Watch for cybersecurity, new dynamic dropdown filters in SQL editor, and improved quality monitoring with null value scanning and automated alerts. The video also demonstrates a new UI for defining metric views, an Excel add-in for data preview and import, and the ability to publish dashboards as public web pages.
Harness Your OpenHands Agent with AI Observability and Governance
MLflow now supports tracing, evaluating, and governing OpenHands agents, capturing every step of their autonomous operations. This enables practitioners to monitor agent actions, assess output quality, and manage LLM costs effectively.
NewsDatabricks News: Free Tier, Multi-statement transactions, Declarative Automation Bundles, Genie Code
Databricks now offers a free tier for Lakeflow Connect, providing 100 DBUs per day per workspace, and has introduced multi-statement transactions in Unity Catalog that ensure atomicity with rollback capabilities. The platform also announced a Databricks One mobile app, a new AI runtime with pre-installed tools for GPU use cases, and enhanced Genie Code that understands project structure for automated development tasks. Additionally, Databricks Asset Bundles are now called Declarative Automation Bundles and use a faster direct engine, and a new 5X-Large SQL warehouse is available for processing terabytes of data.
Testing and Refining Claude Code Skills with MLflow
MLflow tracing and LLM judges can now test Claude Code skills. This enables a self-improvement loop where Claude Code refines its own abilities.
Tracking and Debugging AI Safety Evaluations with Inspect AI and MLflow
Inspect AI evaluations now integrate with MLflow for experiment tracking and execution tracing via the inspect-mlflow package. This enables practitioners to track and debug AI safety evaluations using familiar MLflow tools.
MLflow Workspaces: Shared Deployment Without Separate Servers
MLflow Workspaces are now available, enabling shared MLflow deployments across multiple teams by adding a logical organization and permission layer. This allows teams to scope experiments, models, traces, prompts, AI Gateway resources, and artifacts within their own workspace.
Your Agents Need an AI Platform
MLflow 2.12 ships with new features for building and managing AI agents, including enhanced logging for agent traces, evaluation tools, and versioning capabilities. Leverage MLflow as your unified platform for developing, deploying, and governing reliable AI agents in production.
Control LLM Spend with AI Gateway Budget Alerts and Limits
AI Gateway now supports budget policies to control LLM spend with alerts and request limits. Set spending thresholds, receive webhook alerts, and automatically reject requests when budgets are exceeded.
This release introduces AI-powered issue identification for agent traces, budget alerts for AI Gateway spending, and a new interactive graph view for visualizing trace hierarchies. It also includes pickle-free model serialization for enhanced security and native OpenTelemetry GenAI convention support for trace export.
NewsDatabricks News: unit testing, OneLake federation, scoped access tokens
Databricks now allows creating Unity Catalog domains for business users, running JAR tasks on serverless compute, and federating OneLake data directly into Databricks. The platform also introduces in-workspace Python unit testing, new data connectors like HubSpot and TikTok Ads, and scoped personal access tokens for enhanced security.
This release adds a "try-it" page for Gateway usage examples and filters gateway experiments from the experiment list in the UI. It also fixes numerous UI issues, artifact download problems, and tracing errors, including issues with model copying across workspaces and artifact access permissions.
Agent Trace Evaluation with TruLens Scorers in MLflow
Evaluate agent traces with TruLens GPA framework through mlflow.genai.evaluate(). Score agent plans, tool calls, and reasoning directly within MLflow.
Benchmark Your Way to Better RAG and Agents:Tuning Vector Search with MLflow
High-level summary: problems, approaches, and takeways for better RAG with MLflow
NewsDatabricks News: Catalog and External locations in DABS, Schema Evolution, File Events, Queries Tags
Databricks Runtime 18.1 introduces schema evolution for inserts, managed file events for Autoloader, and a simplified `TABLE` syntax for querying. The video also demonstrates new features like the AI Gateway for LLM governance, query tags for tracking, and the GA release of the supervisor agent.
Ship LLM Agents Faster with Coding Assistants and MLflow Skills
MLflow now provides coding assistants with the required feedback loop to build better LLM agents. Trace, analyze, fix, validate, and repeat to ship LLM agents faster.
Deterministic Safety Checks in MLflow with Guardrails AI
MLflow evaluation pipelines now support fast, deterministic safety validation with Guardrails AI scorers. This enables adding safety checks without requiring an LLM.
Enterprise-Scale MLflow Operations and Security Practices at LY Corporation
How LY Corporation Uses MLflow: An Overview
Deploy MLflow Models to Serverless GPUs with Modal
MLflow models can now be deployed to Modal's serverless GPU infrastructure. This enables auto-scaling and streaming predictions for your MLflow models.
Multi-turn Evaluation & Simulation: Enhancing AI Observability with MLflow for Chatbots
MLflow 3.10 now supports multi-turn evaluation and conversation simulation, enabling scoring of full conversations and reproducible testing of agent changes. This helps catch failures that only emerge across multiple turns, improving chatbot observability.
Introducing MLflow AI Gateway: Governed, Observable Access to LLMs
MLflow AI Gateway provides a single, secure endpoint for all LLM providers, complete with usage tracking and native tracing. This new feature offers governed, observable access to LLMs for Databricks practitioners.
MLflow 3.10.0 introduces multi-workspace support for organizing experiments and models, alongside new GenAI features like multi-turn evaluation, LLM cost tracking, and AI Gateway usage analytics. The UI has been redesigned for improved navigation, and in-UI trace evaluation is now supported.
MLflow now supports multi-workspace environments for organizing experiments and resources, alongside a new top-level navigation split for GenAI and Classical ML workflows. Key new features include multi-turn conversation simulation, automatic LLM trace cost tracking, AI Gateway usage analytics, and a CLI command to generate a demo environment.
NewsDatabricks Breaking News: 2026 Week 6: 2 February 2026 to 8 February 2026
Databricks introduces agentic data quality monitoring with anomaly detection, LLM judge UI builder for MLflow, and new SQL warehouse features including a default option and activity details. The platform also enhances its assistant to connect with MCP servers, improves Google Sheets integration with pivot table functionality, and adds direct Git deployment and tagging for Databricks apps.
5 Tips to Get More Out of Your Claude Code with MLflow
MLflow now offers an MCP server, CLIs, and Skills to extend Claude Code, enabling you to trace tokens and monitor tool usage. These five tips will help you transform your Claude coding agent into a transparent and controllable workflow.
NewsDatabricks Breaking News: 2026 Week 5: 26 January 2026 to 1 February 2026
Databricks now allows triggering materialized views or streaming tables on update, automatically detecting source changes and refreshing the pipeline. MLflow traces can now be stored in Unity Catalog using OpenTelemetry, providing a centralized logging system for experiment data.
v.3.9.0
MLflow 3.9.0 introduces an in-product MLflow Assistant chatbot and a Trace Overview Dashboard for GenAI experiments, enhancing debugging and performance insights. The AI Gateway is revamped for direct tracking server integration, alongside new LLM judge features for online monitoring and custom prompt building.
Introducing DeepEval, RAGAS, and Phoenix Judges in MLflow
Improve your agents using MLflow's extensive, industry-leading suite of high-quality LLM judges.
NewsDatabricks Breaking News: 2026 Week 4: 19 January 2026 to 25 January 2026
Databricks introduces temporary tables that are Unity Catalog managed, materialized, and allow DML operations, automatically cleaning up after a session or seven days. Materialized views now support refresh policies like incremental strict, which verifies if a view can be incrementally refreshed before deployment.
Introducing MLflow Agents Dashboard
Monitor your Agents with comprehensive analytics and visualizations in MLflow's new Overview tab
NewsDatabricks Breaking News: 2026 Week 3: 12 January 2026 to 18 January 2026
Databricks Runtime 18 is now Generally Available, offering Spark 4.1 and improved identifier/parameter maker availability. New features include Lakeflow Connect for row filtering during ingestion, Codex models (GBT Codex Max and Mini) for code development, and Databricks One improvements like favorites and data preview in Gen Rooms.
MLflow 3.9.0rc0 introduces an in-product AI Assistant for debugging and a new Trace Overview Dashboard for GenAI experiments. The AI Gateway is now integrated into the tracking server, and users can configure LLM judges for online monitoring and build custom judges directly in the UI.
NewsDatabricks Breaking News: Week 2026 02: 5 January 2026 to 11 January 2026 #databricks news
Databricks now allows changing catalog and schema during dashboard deployments, addressing a previous issue with environment-specific configurations. The Databricks CLI has a breaking change with plan version 2, altering the structure of deployment plans.
NewsDatabricks Breaking News: Week 2026 01: 29 December 2025 to 4 January 2026 #databricks news
Databricks now supports deploying asset bundles from a generated plan, enabling CI/CD integration for review and approval. Unity Catalog introduces new secret grants, and Runtime 18 brings "everywhere" implementations for literal string colling, parameter markers, and identifiers, along with window functions in metrics view and general availability for SQL scripting.
ReleasesDatabricks Breaking News: Week 52: 22 December 2025 to 28 December 2025 #databricks news
Databricks introduces a direct mode for asset bundles, offering faster deployments without Terraform, and the Databricks Assistant agent mode is now in public preview, capable of multi-step notebook editing and data analysis. Other updates include single-use refresh tokens for enhanced security, partition columns now included in Parquet files for improved compatibility, and new dashboard features like custom labels, flexible sorting, and Microsoft Teams integration for scheduled reports.
NewsDatabricks Breaking News: Week 51: 15 December 2025 to 21 December 2025 #databricks news
Databricks introduces new Lakeflow Connect features, including custom logic for declarative pipelines and new connectors for incremental data import from sources like Confluence, PostgreSQL, and MySQL. The platform also announces the deprecation of legacy features like Hive Metastore and DBFS for new accounts, alongside updates to Lakehouse ACLs, job scheduling from notebooks, flexible node types for cluster deployment, and expanded resource assignment in Databricks apps.
NewsDatabricks Breaking News: Week 50: 8 December 2025 to 14 December 2025 #databricks news
Databricks now supports native reading and writing of Excel files in PySpark, SQL, and Autoloader, including features like sheet listing and range targeting. Additionally, Databricks Runtime 18 is available in beta, introducing improvements for streaming queries and new system columns for job tables, alongside a new Legase experience with project and branching capabilities for transactional databases.
NewsDatabricks: What’s new in October 2025 #databricks news
Databricks introduces Databricks One, a new business-focused experience with consumer access for dashboards and Genie, alongside updates to Genie for defining relations and extended API endpoints. The platform also adds features like easy conversion of external to managed tables, enhanced Databricks Asset Bundles with policy integration and script execution, and new system tables for MLflow tracking and data classification results.
NewsDatabricks: What’s new in September 2025? #databricks
Databricks now supports geospatial data types (geography and geometry) with new functions for visualization and spatial operations, and introduces serverless GPU clusters for distributed GPU code execution. The platform also offers enhanced notebook features like side-by-side editing and a notebook-specific search, along with new options for managing serverless environments, SQL warehouses, and access requests in Unity Catalog.
UnityCatalog 0.3.0
Unity Catalog now supports Spark 4.0 and Delta Lake 4.0, enhancing compatibility with the latest Databricks runtime components. New API surfaces for credentials and external locations provide more flexible handling of external storage services.