Sumit Kumar Mandal
Eminent Speaker
Short CV: Sumit K. Mandal is currently an assistant professor in Indian Institute of Science, Bangalore. He completed his PhD from University of Wisconsin-Madison. He received best paper award from ACM TODAES in 2020 and ESweek in 2022. His research interest is energy efficient communication architecture for machine learning applications with emerging technologies.
Title of Talk 1: New Computing Paradigms for Large Language Models (LLMs)
Synopsis: Large Language Models (LLMs) are used to perform various tasks, especially in the domain of natural language processing (NLP). State-of-the-art LLMs consist of a large number of parameters that necessitate a high volume of computations. Currently, GPUs are the preferred choice of hardware platform to execute LLM inference. However, monolithic GPU-based systems executing large LLMs pose significant drawbacks in terms of fabrication cost and energy efficiency. In this work, we propose a heterogeneous 2.5D chiplet-based architecture for accelerating LLM inference. Thorough experimental evaluations with a wide variety of LLMs show that the proposed 2.5D system provides up to 972× improvement in latency and 1600× improvement in energy consumption with respect to state-of-the-art edge devices equipped with GPU.
Title of Talk 2: Energy-efficient 2.5D Architecture for Machine Learning Applications
Synopsis: Processing-in-memory (PIM) is a promising technique to accelerate deep learning (DL) workloads. Emerging DL workloads (e.g., ResNet with 152 layers) consist of millions of parameters, which increase the area and fabrication cost of monolithic PIM accelerators. The fabrication cost challenge can be addressed by 2.5-D systems integrating multiple PIM chiplets connected through a network-on-package (NoP). However, server-scale scenarios simultaneously execute multiple compute-heavy DL workloads, leading to significant inter-chiplet data volume. State-of-the-art NoP architectures proposed in the literature do not consider the nature of DL workloads. In this talk, we will discuss a novel server-scale 2.5-D manycore architecture that accounts for the traffic characteristics of DL applications. Comprehensive experimental evaluations with different system sizes as well as diverse emerging DL workloads demonstrate that the architecture achieves significant performance and energy consumption improvements with much lower fabrication cost than state-of-the-art NoP topologies.
Title of Talk 3: Assistive Technology with Transformer Algorithms
Synopsis: Assistive technology for visually impaired individuals is extremely useful to make them independent of another human being in performing day-to-day chores and instill confidence in them. One of the important aspects of assistive technology is outdoor navigation for visually impaired people. While there exist several techniques for outdoor navigation in the literature, they are mainly limited to obstacle detection. However, navigating a visually impaired person through the sidewalk (while the person is walking outside) is important too. Moreover, the assistive technology should ensure low-energy operation to extend the battery life of the device. Therefore, in this work, we propose an end-to-end technology deployed on an edge device to assist visually impaired people. Specifically, we propose a novel pruning technique for transformer algorithm which detects sidewalk. The pruning technique ensures low latency of execution and low energy consumption when the pruned transformer algorithm is deployed on the edge device. Extensive experimental evaluation shows that our proposed technology provides up to 32.49% improvement in accuracy and 1.4 hours of extension in battery life with respect to a baseline technique.
Sumit Kumar Mandal
Qualifications: Ph.D.
Title: Assistant Professor
Affiliation: Indian Institute of Science, Bangalore
Contact Details:
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/sumitkmandal/
Twitter/X:
Facebook:
Instagram: