Shubhra Pandit
Shubhra Pandit's contributions
Article
Enable 3.5 times faster vision language models with quantization
Shubhra Pandit
+4
Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.
Article
2:4 Sparse Llama: Smaller models for efficient GPU inference
Eldar Kurtić
+4
Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.
Article
Multimodal model quantization support through LLM Compressor
Kyle Sayers
+3
Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.
Article
Compressed Granite 3.1: Powerful performance in a small package
Shubhra Pandit
+2
Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.
Article
2:4 Sparse Llama FP8: SOTA performance for NVIDIA Hopper GPUs
Alexandre Marques
+5
Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.

Article
Enable 3.5 times faster vision language models with quantization
Shubhra Pandit
+4
Learn how quantized vision-language models enable faster inference, lower costs, and scalable AI deployment without compromising capability.

Article
2:4 Sparse Llama: Smaller models for efficient GPU inference
Eldar Kurtić
+4
Discover Sparse Llama: A 50% pruned, GPU-optimized Llama 3.1 model with 2:4 sparsity, enabling faster, cost-effective inference without sacrificing accuracy.

Article
Multimodal model quantization support through LLM Compressor
Kyle Sayers
+3
Explore multimodal model quantization in LLM Compressor, a unified library for optimizing models for deployment with vLLM.

Article
Compressed Granite 3.1: Powerful performance in a small package
Shubhra Pandit
+2
Open-sourced on Hugging Face, deployment-ready with vLLM, and extensible using LLM Compressor.

Article
2:4 Sparse Llama FP8: SOTA performance for NVIDIA Hopper GPUs
Alexandre Marques
+5
Advancing AI efficiency is more critical than ever, and sparsity has proven to be a cornerstone in this pursuit.