Michael Goin

Michael Goin's contributions

How we optimized vLLM for DeepSeek-R1

Michael Goin +4

March 19, 2025

Explore inference performance improvements that help vLLM serve DeepSeek AI models more efficiently in this technical deep dive.

Explore how vLLM's new multimodal AI inference capabilities enhance performance, scalability, and flexibility across diverse hardware platforms.

Explore how distributed inference works within vLLM in this recap of Neural Magic's vLLM Office Hours with Michael Goin and Murali Andoorveedu, a vLLM committer from CentML.

Explore the integration of FP8 in vLLM. Learn how to receive up to a 2x reduction in latency on NVIDIA GPUs with minimal accuracy degradation.

Learn about Marlin, a mixed-precision matrix multiplication kernel that delivers 4x speedup with FP16xINT4 computations for batch sizes up to 32.

Sparse fine-tuning in combination with sparsity-aware inference software, like DeepSparse, unlocks ubiquitous CPU hardware as a deployment target for LLM inference.

Compress large language models (LLMs) with SparseGPT to make your machine learning inference fast and efficient. Prune in one-shot with minimal accuracy loss.

Report a website issue

Linux

Java runtimes & frameworks

Kubernetes

Integration & App Connectivity

AI/ML

Automation

Developer tools

Developer Sandbox

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Secure Development & Architectures

Platform Engineering

Automated Data Processing

Start exploring in the Developer Sandbox for free

Interactive Lessons and Learning Paths

Developer Sandbox Activities

E-Books

Tutorials

Cheat Sheets

Documentation

Red Hat Learning

Michael Goin

Michael Goin's contributions

How we optimized vLLM for DeepSeek-R1

vLLM V1: Accelerating multimodal inference for large language models

Distributed inference with vLLM

vLLM brings FP8 inference to the open source community

How Marlin pushes the boundaries of mixed-precision LLM inference

Sparse fine-tuning for accelerating large language models with DeepSparse

SparseGPT: Remove 100 billion parameters for free

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue