Biography
Hi there! I am Hanmin Li, a PhD candidate in Computer Science at KAUST under the supervision of Prof. Peter Richtárik. My research focus lies at the intersection of optimization and large language models (LLMs), with a focus on training efficiency and scalability. I am also interested in distributed training and the theoretical foundations of learning from decentralized data.
More broadly, my interests also extends to the theory of modern machine learning, including first-order methods, convex and non-convex optimization, and operator theory, as well as applied areas like deep learning and language modeling.
Before starting my Ph.D., I earned my master degress in Computer Science also at KAUST, after completing a B.S. in Computer Science and Technology at the School of the Gifted Young in the University of Science and Technology of China (USTC).
Currently, I am working on:
- Distributed training of large language models (LLMs), including experience with large-scale GPU clusters and training using PyTorch Distributed Data Parallel (DDP).
- Efficient optimizer design for large-scale training, with a focus on advancing the Muon optimizer and its variants to achieve faster convergence and improved scalability.
- Designing efficient algorithms for large language models (LLMs), with a focus on both theoretical analysis and empirical validation.
For any inquiries, feel free to contact me at hanmin.li@kaust.edu.sa.
Work Experience
- Applied Scientist Intern, Microsoft AI, June 2026 – September 2026.
Recent News
-
Starting an Applied Scientist Internship at Microsoft AI
— Jun 01, 2026
I started a new position as an Applied Scientist Intern at Microsoft AI, running from June to September 2026.
-
Talk at the ELLIIT Focus Period in Lund
— May 12, 2026
I gave a talk on Stabilizing Proximal Updates: Trust Regions, Linear Descent, and Connections to Modern ML Optimizers at the ELLIIT Focus Period Optimization for Learning in Lund.
-
Attending NeurIPS 2024
— Dec 16, 2024
This year, I will be attending NeurIPS in Vancouver, Canada.
Papers
-
Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
, arXiv preprint. • [paper]
-
Broximal Alignment for Global Non-Convex Optimization
, arXiv preprint. • [paper]
-
Stabilized Proximal Point Method via Trust Region Control
, arXiv preprint. • [paper]
-
The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications
, arXiv preprint. • [paper] • [BibTex]
-
The Power of Extrapolation in Federated Learning
, NeurIPS 2024. • [paper] • [BibTex]
-
On the Convergence of FedProx with Extrapolation and Inexact Prox
, NeurIPS 2024 OPT-ML Workshop Poster. • [paper] • [BibTex]
-
Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
, ICLR 2024. • [paper] • [BibTex]
-
Variance reduced distributed non-convex optimization using matrix stepsizes
, NeurIPS 2023 FL@FM Workshop. • [paper] • [BibTex]
-
SD2: spatially resolved transcriptomics deconvolution through integration of dropout and spatial information
, Bioinformatics. • [paper] • [BibTex]
Talks
-
Stabilizing Proximal Updates: Trust Regions, Linear Descent, and Connections to Modern ML Optimizers
May 12, 2026 — ELLIIT Focus Period Optimization for Learning, Lund, Sweden
Event -
Poster Presentation of On the Convergence of FedProx with Extrapolation and Inexact Prox
Dec 15, 2024 — NeurIPS 2024 OPT-ML Workshop, Vancouver, Canada
-
Poster Presentation of The Power of Extrapolation in Federated Learning
Dec 11, 2024 — NeurIPS 2024, Vancouver, Canada
-
Talk of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
Jun 27, 2024 — EUROPT 2024, Lund, Sweden
-
Poster Presentation of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
May 07, 2024 — ICLR 2024, Vienne, Austria
Reviewer Services
- NeurIPS 24’, 25’
- NeurIPS OPT-ML 24’
- ICLR 25’
- ICML 25’
- JMLR
- IEEE TNNLS
- IEEE TSP
- Optimization Methods and Software.