DNNs, commonly employed for complex tasks such as image and language processing, are increasingly sought for deployment on Internet of Things (IoT) devices. These devices operate with constrained resources, including limited computational power, memory, slower processors, and restricted energy requirements. Consequently, optimizing DNN models becomes crucial to minimize memory usage and computational time. However, traditional optimization methods require skilled professionals to manually fine-tune hyperparameters, striking a balance between efficiency and accuracy. This paper introduces an innovative solution for identifying optimal hyperparameters, focusing on the application of pruning, clusterization, and quantization. Initial empirical analyses were conducted to understand the relationships between model size, accuracy, pruning rate, and the number of clusters. Building upon these findings, we developed a framework that proposes two algorithms: one for discovering optimal pruning and the second for determining the optimal number of clusters. Through the adoption of efficient algorithms and the best quantization configuration, our tool integrates an optimization procedure that successfully reduces model size and inference time. The optimized models generated exhibit results comparable to, and in some cases surpass, those of more complex state-of-the-art approaches. The framework successfully optimized ResNet50, reducing the model size by 6.35x with a speedup of 2.91x, while only sacrificing 0.87% of the original accuracy.
Dettaglio pubblicazione
2024, Advanced Information Networking and Applications. AINA 2024., Pages 57-68 (volume: 203)
Targeted and Automatic Deep Neural Networks Optimization for Edge Computing (04b Atto di convegno in volume)
Giovannesi Luca, Proietti Mattia Gabriele, Beraldi Roberto
ISBN: 9783031579301; 9783031579318
keywords