newsDatabricks·July 19, 2022

UIMeta: A 10X Faster Cloud-Native Apache Spark History Server

Open on YouTube More from Databricks

Description

Spark history server is an essential tool for monitoring, analyzing and optimizing spark jobs. The original history server is based on Spark event log mechanism. A running Spark job will produce many kinds of events that describe the job's status continuously. All the events are serialized into JSON and appended to a file —— event log. The history server has to replay the event log and rebuild the memory store needed for UI. In a cluster, the history server also needs to periodically scan the event log directory and cache all the files' metadata in memory. Actually, an event log contains too much redundant info for a history server. A long-running application can bring a huge event log which may cost a lot to maintain and require a long time to replay. In large-scale production, the number of jobs can be large and leads to a heavy burden on history servers. It needs additional development to build a scalable history server service. In this talk, we want to introduce a new history server based on UIMeta. UIMeta is a wrapper of the KVStore objects needed by a Spark UI. A job will bring a UIMeta log by stagedly serializing UIMeta. An UIMeta log is approximately 10x smaller in si…

Description from YouTube. Full content on the video page.

More from Databricks

Apache Iceberg V3 on Databricks: From Ingestion to Analytics

The video demonstrates Apache Iceberg v3 on Databricks, showcasing how its new variant column type natively handles semi-structured data and how row-level concurrency enables simultaneous data ingestion and corrections. It also highlights cross-platform data accessibility from open-source Spark via the Iceberg REST catalog, ensuring no vendor lock-in.

Databrickstoday

Databricks Genie for Marketing

Databricks' AI BI Genie allows non-technical marketers to converse with their Customer 360 data using natural language, enabling quick insights into marketing performance and campaign optimization. It helps identify issues like audience saturation and recommends budget reallocation by analyzing data and providing reasoning for its suggestions.

Databricksyesterday

Govern MCP servers in Databricks #databricks #mcp #aigovernance

Databricks Unity AI Gateway now governs MCP servers, centralizing their management alongside built-in foundation models and LLMs. This integration allows for easier governance and orchestration of various AI components and agents within Databricks.

Databricks5d ago

How Suntory Turns Data into Faster Decisions with Databricks

Suntory uses Databricks to integrate diverse datasets, including internal sales, macroeconomic factors, and consumer behavior, into "Project Brain" for faster decision-making and product launches. The company also implements an all-employee upskilling program, "Manabi no Michi," to empower its workforce to leverage AI for improved performance and efficiency.

Databricks5d ago

AIA Group x Databricks: Turning Regulated Data into Real-Time Intelligence

AIA Group leverages Databricks to manage regulated data across 18 markets, addressing challenges like data residency and varying tech maturity with features like Unity Catalog for governance. The platform enables real-time intelligence for investment decisions, fraud detection, and personalized agent coaching, with future plans for conversational analytics and autonomous AI.

Databricks5d ago

Connect Google Sheets to Databricks

The Databricks Google Sheets add-in allows users to explore, import, and refresh governed data from the Databricks Lakehouse directly within Google Sheets. It demonstrates how to browse Unity Catalog, select tables or metric views, apply filters, schedule data refreshes, and use direct SQL queries with parameters.

Databricks5d ago