Biography
Hi there! I am Hanmin Li, a PhD candidate in Computer Science at KAUST under the supervision of Prof. Peter Richtárik. My research focus lies at the intersection of optimization and large language models (LLMs), with a focus on training efficiency and scalability. I am also interested in distributed training and the theoretical foundations of learning from decentralized data.
More broadly, my interests also extends to the theory of modern machine learning, including first-order methods, convex and non-convex optimization, and operator theory, as well as applied areas like deep learning and language modeling.
Before starting my Ph.D., I earned my master degress in Computer Science also at KAUST, after completing a B.S. in Computer Science and Technology at the School of the Gifted Young in the University of Science and Technology of China (USTC).
Currently, I am working on:
- Distributed training of large language models (LLMs), including experience with large-scale GPU clusters and training using PyTorch Distributed Data Parallel (DDP).
- Efficient optimizer design for large-scale training, with a focus on advancing the Muon optimizer and its variants to achieve faster convergence and improved scalability.
- Designing efficient algorithms for large language models (LLMs), with a focus on both theoretical analysis and empirical validation.
For any inquiries, feel free to contact me at hanmin.li@kaust.edu.sa.
I am currently open to internship opportunities in related areas.
Recent News
-
Attending NeurIPS 2024
— Dec 16, 2024
This year, I will be attending NeurIPS in Vancouver, Canada.
-
Attending EUROPT 2024
— Jun 06, 2024
I am invited to give a talk at the 21st Conference on Advances in Continuous Optimization (EUROPT 2024) at Lund, Sweden.
-
Attending ICLR 2024
— May 07, 2024
This year, I will be attending ICLR in Vienna, Austria.
Papers
-
The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications
, arXiv preprint. • [paper] • [BibTex]
-
The Power of Extrapolation in Federated Learning
, NeurIPS 2024. • [paper] • [BibTex]
-
On the Convergence of FedProx with Extrapolation and Inexact Prox
, NeurIPS 2024 OPT-ML Workshop Poster. • [paper] • [BibTex]
-
Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
, ICLR 2024. • [paper] • [BibTex]
-
Variance reduced distributed non-convex optimization using matrix stepsizes
, NeurIPS 2023 FL@FM Workshop. • [paper] • [BibTex]
-
SD2: spatially resolved transcriptomics deconvolution through integration of dropout and spatial information
, Bioinformatics. • [paper] • [BibTex]
Talks
-
Poster Presentation of On the Convergence of FedProx with Extrapolation and Inexact Prox
Dec 15, 2024 — NeurIPS 2024 OPT-ML Workshop, Vancouver, Canada
-
Poster Presentation of The Power of Extrapolation in Federated Learning
Dec 11, 2024 — NeurIPS 2024, Vancouver, Canada
-
Talk of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
Jun 27, 2024 — EUROPT 2024, Lund, Sweden
-
Poster Presentation of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
May 07, 2024 — ICLR 2024, Vienne, Austria
Reviewer Services
- NeurIPS 24’, 25’
- NeurIPS OPT-ML 24’
- ICLR 25’
- ICML 25’
- JMLR
- IEEE TNNLS
- IEEE TSP
- Optimization Methods and Software.