Mark Kurtz

Mark Kurtz's contributions

Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.

Explore new open source quantized reasoning models based on the DeepSeek-R1-Distill suite that deliver near-perfect accuracy and inference speed improvements.

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Quantized LLMs achieve near-full accuracy with minimal trade-offs after 500K+ evaluations, providing efficient, high-performance solutions for AI model deployment.

Discover LLM Compressor, a unified library for creating accurate compressed models for cheaper and faster inference with vLLM.

Report a website issue

Linux

Java runtimes & frameworks

Kubernetes

Integration & App Connectivity

AI/ML

Automation

Developer tools

Developer Sandbox

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Secure Development & Architectures

Platform Engineering

Automated Data Processing

Start exploring in the Developer Sandbox for free

Interactive Lessons and Learning Paths

Developer Sandbox Activities

E-Books

Tutorials

Cheat Sheets

Documentation

Red Hat Learning

Mark Kurtz

Mark Kurtz's contributions

Enable 3.5 times faster vision language models with quantization

Deployment-ready reasoning with quantized DeepSeek-R1 models

2:4 Sparse Llama: Smaller models for efficient GPU inference

Multimodal model quantization support through LLM Compressor

Compressed Granite 3.1: Powerful performance in a small package

2:4 Sparse Llama FP8: SOTA performance for NVIDIA Hopper GPUs

We ran over half a million evaluations on quantized LLMs—here's what we found

LLM Compressor is here: Faster inference with vLLM

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue