Post-training Quantization for Deep Neural Networks with Provable Guarantees
Quantization is one of the compression techniques to reduce computation cost, memory, and power consumption of deep neural networks (DNNs). In this talk, we will focus on a post-training quantization algorithm, GPFQ, that is based on a deterministic greedy path-following mechanism, and its stochastic variant SGPFQ. In both cases, we rigorously analyze the associated error bounds for quantization and show that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. To empirically evaluate the method, we quantize several common DNN architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models. This is a joint work with Rayan Saab and Yixuan Zhou.
Tuesday, November 29, 2022
11:00AM AP&M 2402 and Zoom ID 986 1678 1113
Center for Computational Mathematics9500 Gilman Dr. #0112La Jolla, CA 92093-0112Tel: (858)534-9056