tutorialsDatabricks·July 26, 2023

How to Train Your Own Large Language Models

Open on YouTube More from Databricks

Description

Given the success of OpenAI’s GPT-4 and Google’s PaLM, every company is now assessing its own use cases for Large Language Models (LLMs). Many companies will ultimately decide to train their own LLMs for a variety of reasons, ranging from data privacy to increased control over updates and improvements. One of the most common reasons will be to make use of proprietary internal data. In this session, we’ll go over how to train your own LLMs, from raw data to deployment in a user-facing production environment. We’ll discuss the engineering challenges, and the vendors that make up the modern LLM stack: Databricks, Hugging Face, and MosaicML. We’ll also break down what it means to train an LLM using your own data, including the various approaches and their associated tradeoffs. Topics covered in this session: - How Replit trained a state-of-the-art LLM from scratch - The different approaches to using LLMs with your internal data - The differences between fine-tuning, instruction tuning, and RLHF Talk by: Reza Shabani Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz Connect with us: Website: https://databricks.com Tw…

Description from YouTube. Full content on the video page.

Topics

LLM Fine-tuning

More from Databricks

Databricks launches across the Data + AI stack in 90 seconds

Databricks announced LTAP to unify lakebased and lakehouse data, eliminating ETL and enabling a single copy of data for analytical and operational needs. They also introduced Unity AI Gateway for governance, Genie Ontology for enterprise knowledge graphs, and open-sourced Omniant for managing multiple coding agents.

Databrickstoday

Databricks Product Announcements in 5 Minutes | Data + AI Summit 2026

Databricks2d ago

Introducing Omnigent: The Ultimate Meta-Harness for AI Agents

Omnigent is a new open-source meta-harness for AI agents that provides a unified interface for composition, control, and collaboration across multiple models and agent workflows. It enables stateful, data-centric policies for guardrails and allows real-time sharing and steering of live agent sessions with teammates.

Databricks3d ago

How DEFRA and Natural England Accelerate Peatland Restoration with AI and Databricks

DEFRA and Natural England utilize AI and Databricks to accelerate peatland restoration by automating the mapping of peatland features and peat dams across England. This technology significantly reduces the time required for mapping, enabling faster identification and restoration of these crucial carbon-storing habitats.

Databricks3d ago

AI Stack Explained in 3 Layers (LLM, Agent Harness, Omnigent)

The AI stack now includes a third layer, the meta harness, which sits above individual agent harnesses. This meta harness, exemplified by Databricks' open-sourced Omnigent, allows for routing queries to appropriate agents and orchestrating tasks across multiple agents, enabling seamless interaction and context sharing between them.

Databricks4d ago

What’s coming next to Free Edition

Databricks announces the availability of Genie, GPUs, Agent Hooks, Lakehouse, and Lake Flow Designer on its Free Edition. This update provides virtually all of Databricks' production platform features for free, enabling users to learn and build data and AI projects.

Databricks1w ago