Register now After registration you will be able to apply for this opportunity online.
Towards Edge Large Language Models via Model Compression
In this project, we aim to develop novel and efficient model compression algorithms to enable the deployment and inference of large language models on resource-constrained hardware.
Keywords: TinyML, Large Language Model, Machine Learning
Large language models (LLMs) have become foundational models for a wide range of applications, including natural language processing, computer vision, and multimodal tasks. These models can process both visual and textual data, holding significant potential for edge applications such as real-time language translation, smart surveillance, and human-computer interaction. However, deploying LLMs on edge devices presents major challenges due to the limited computational power, memory, and energy capacity of such hardware.
Model compression techniques, including quantization, pruning, distillation, and lightweight architecture design, aim to reduce storage and computation costs, enabling model deployment on edge hardware. However, state-of-the-art compression methods for LLMs still face substantial accuracy degradation, especially when handling multimodal tasks.
Large language models (LLMs) have become foundational models for a wide range of applications, including natural language processing, computer vision, and multimodal tasks. These models can process both visual and textual data, holding significant potential for edge applications such as real-time language translation, smart surveillance, and human-computer interaction. However, deploying LLMs on edge devices presents major challenges due to the limited computational power, memory, and energy capacity of such hardware.
Model compression techniques, including quantization, pruning, distillation, and lightweight architecture design, aim to reduce storage and computation costs, enabling model deployment on edge hardware. However, state-of-the-art compression methods for LLMs still face substantial accuracy degradation, especially when handling multimodal tasks.
In this project, we aim to develop novel and efficient model compression algorithms to enable the deployment and inference of LLMs on resource-constrained hardware. Our goal is to minimize the accuracy loss typically associated with LLM compression while maximizing inference speed and reducing memory usage, with performance evaluated on real-world hardware platforms.
In this project, we aim to develop novel and efficient model compression algorithms to enable the deployment and inference of LLMs on resource-constrained hardware. Our goal is to minimize the accuracy loss typically associated with LLM compression while maximizing inference speed and reducing memory usage, with performance evaluated on real-world hardware platforms.