Shubhra Pandit

Shubhra Pandit's contributions

Featured image for multimodal.

Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.

Featured image for 2.4 Sparse Foundation Models.

Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

Featured image for multimodal LLM Compressor article.

Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.

Featured image for AI/ML

Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Featured image for launching Sparse FP8 models and kernels

Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Report a website issue

Your name

Your e-mail address

Subject

Message

Type of request/issue

Problem Page URL

Country/Territory

Red Hat Account Number