CCoM Events Abstract

> > >

Directors:
Randolph E. Bank
Philip E. Gill
Michael Holst

Administrative Contact:
[ Click to Send Email ]

Jinjie Zhang
UCSD

Abstract:

Quantization is one of the compression techniques to reduce computation cost, memory, and power consumption of deep neural networks (DNNs). In this talk, we will focus on a post-training quantization algorithm, GPFQ, that is based on a deterministic greedy path-following mechanism, and its stochastic variant SGPFQ. In both cases, we rigorously analyze the associated error bounds for quantization and show that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. To empirically evaluate the method, we quantize several common DNN architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models. This is a joint work with Rayan Saab and Yixuan Zhou.

Tuesday, November 29, 2022
11:00AM AP&M 2402 and Zoom ID 986 1678 1113

Center for Computational Mathematics 9500 Gilman Dr. #0112 La Jolla, CA 92093-0112 Tel: (858)534-9056 Fax: (858)534-5273