开源 / 项目 · Projects2026-06-02 · Tuesday, June 2, 2026

Glq LLM quantization using E8 lattice

GLQ 是 post-training LLM weight quantization 库，用 E8 lattice codebook 对每 8 个权重编码为 16-bit index，并结合 RHT、LDLQ error feedback 和 fused CUDA/Triton kernels 直接对压缩索引做 matmul。README 声称支持 2-8 bpw，SmolLM3-3B 4.5bpw 对比 GPTQ 在 10/12 指标更好，vLLM 中 3.5bpw 约达 bf16 94% throughput。它也提供 KV cache E8 压缩，可把 fp16 footprint 降到约 25%。

–浏览

Glq LLM quantization using E8 lattice

评论 · Comments