Biography

Hi there! I am Hanmin Li, a PhD candidate in Computer Science at KAUST under the supervision of Prof. Peter Richtárik. My research focus lies at the intersection of optimization and large language models (LLMs), with a focus on training efficiency and scalability. I am also interested in distributed training and the theoretical foundations of learning from decentralized data.

More broadly, my interests also extends to the theory of modern machine learning, including first-order methods, convex and non-convex optimization, and operator theory, as well as applied areas like deep learning and language modeling.

Before starting my Ph.D., I earned my master degress in Computer Science also at KAUST, after completing a B.S. in Computer Science and Technology at the School of the Gifted Young in the University of Science and Technology of China (USTC).

Currently, I am working on:

  • Distributed training of large language models (LLMs), including experience with large-scale GPU clusters and training using PyTorch Distributed Data Parallel (DDP).
  • Efficient optimizer design for large-scale training, with a focus on advancing the Muon optimizer and its variants to achieve faster convergence and improved scalability.
  • Designing efficient algorithms for large language models (LLMs), with a focus on both theoretical analysis and empirical validation.

For any inquiries, feel free to contact me at hanmin.li@kaust.edu.sa.

I am currently open to internship opportunities in related areas.

Recent News

  • Attending NeurIPS 2024 — Dec 16, 2024

    This year, I will be attending NeurIPS in Vancouver, Canada.

  • Attending EUROPT 2024 — Jun 06, 2024

    I am invited to give a talk at the 21st Conference on Advances in Continuous Optimization (EUROPT 2024) at Lund, Sweden.

  • Attending ICLR 2024 — May 07, 2024

    This year, I will be attending ICLR in Vienna, Austria.


Papers

  • The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications
    Kaja Gruntkowska, Hanmin Li, Aadi Rane, Peter Richtárik, arXiv preprint. • [paper][BibTex]

  • The Power of Extrapolation in Federated Learning
    Hanmin Li, Kirill Acharya, and Peter Richtárik., NeurIPS 2024. • [paper][BibTex]

  • On the Convergence of FedProx with Extrapolation and Inexact Prox
    Hanmin Li, and Peter Richtárik., NeurIPS 2024 OPT-ML Workshop Poster. • [paper][BibTex]

  • Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
    Hanmin Li, Avetik Karagulyan, Peter Richtárik, ICLR 2024. • [paper][BibTex]

  • Variance reduced distributed non-convex optimization using matrix stepsizes
    Hanmin Li, Avetik Karagulyan, Peter Richtárik, NeurIPS 2023 FL@FM Workshop. • [paper][BibTex]

  • SD2: spatially resolved transcriptomics deconvolution through integration of dropout and spatial information
    Haoyang Li, Hanmin Li, Juexiao Zhou, Xin Gao, Bioinformatics. • [paper][BibTex]

→ full list


Talks

  • Poster Presentation of On the Convergence of FedProx with Extrapolation and Inexact Prox
    Dec 15, 2024 — NeurIPS 2024 OPT-ML Workshop, Vancouver, Canada

  • Poster Presentation of The Power of Extrapolation in Federated Learning
    Dec 11, 2024 — NeurIPS 2024, Vancouver, Canada

  • Talk of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
    Jun 27, 2024 — EUROPT 2024, Lund, Sweden

  • Poster Presentation of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
    May 07, 2024 — ICLR 2024, Vienne, Austria

→ full list


Reviewer Services