LLM Projects

Dynamic Activation Function for Efficient Inference of LLMs

This project will focus on the development of a dynamic activation function for efficient inference, providing hands-on experience in optimizing language models. In addition to activation functions, the research will include methods like quantization to enhance model efficiency. The proposed activation function is a dynamic linear combination of ReLU and GELU, evolving during fine-tuning by gradually assigning a greater weight to ReLU over GELU, with the final convergence to ReLU.

Pre-Activations Research for Hysteresis Activation Function

This project offers an opportunity to take part in the research of HeLU, a novel activation function that could serve as an efficient alternative to GELU. Through hands-on research, students will collect real-time statistics during neural network training, focusing on key elements such as pre-activation and gradients distributions. The project will investigate how these statistics relate to the optimal functioning of HeLU, with the goal of refining its implementation for both language and vision tasks. By the end of the project, students will gain valuable experience in evaluating activation functions and optimizing them based on real-time training data. Building on previous findings, this project aims to extend those results, with the potential for conference submission upon successful completion.