Hanmin Li

Biography

Hi there! I am Hanmin Li, a Ph.D. candidate in Computer Science at KAUST in the Center of Excellence for Generative AI, under the supervision of Prof. Peter Richtárik. My research brings together optimization foundations and scalable learning systems, with an emphasis on the efficient training of large language models (LLMs).

My theoretical work spans convex and non-convex optimization, proximal methods, trust-region methods, linear minimization oracles, Frank-Wolfe methods, and matrix-valued optimization. On the systems side, I work on distributed training and inference. I also work on long-horizon tool-using agents, RLHF and RLVR, LLM evaluation, model compression, activation steering, and MoE pruning.

I am currently an Applied Scientist Intern at Microsoft AI, working on long-horizon LLM agents, MoE model pruning, activation steering, and post-training.

Before starting my Ph.D., I earned my master degress in Computer Science also at KAUST, after completing a B.S. in Computer Science and Technology at the School of the Gifted Young in the University of Science and Technology of China (USTC).

Currently, I am working on:

Optimization methods for scalable learning, including proximal and trust-region methods, local linear minimization oracles, matrix stepsizes, and geometry-aware optimizer design.
Distributed training and inference for dense and MoE models on large-scale GPU clusters, using data, tensor, pipeline, expert, and context parallelism.
Long-horizon agent systems, post-training, and model efficiency, including custom evaluation harnesses, LLM-as-a-judge evaluation, model compression, activation steering, and MoE pruning.

For any inquiries, feel free to contact me at hanmin.li@kaust.edu.sa.

Work Experience

Applied Scientist Intern at Microsoft AI, June 2026 – Present.

Papers

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle
Peter Richtárik, Kaja Gruntkowska, Hanmin Li, arXiv preprint. • [paper]

Broximal Alignment for Global Non-Convex Optimization
Kaja Gruntkowska, Hanmin Li, Xun Qian, Peter Richtárik, arXiv preprint. • [paper]

Stabilized Proximal Point Method via Trust Region Control
Hanmin Li, Kaja Gruntkowska, Peter Richtárik, arXiv preprint. • [paper]

The Ball-Proximal (=”Broximal”) Point Method: a New Algorithm, Convergence Theory, and Applications
Kaja Gruntkowska, Hanmin Li, Aadi Rane, Peter Richtárik, arXiv preprint. • [paper] • [BibTex]

The Power of Extrapolation in Federated Learning
Hanmin Li, Kirill Acharya, and Peter Richtárik., NeurIPS 2024. • [paper] • [BibTex]

On the Convergence of FedProx with Extrapolation and Inexact Prox
Hanmin Li, and Peter Richtárik., NeurIPS 2024 OPT-ML Workshop Poster. • [paper] • [BibTex]

Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
Hanmin Li, Avetik Karagulyan, Peter Richtárik, ICLR 2024. • [paper] • [BibTex]

Variance reduced distributed non-convex optimization using matrix stepsizes
Hanmin Li, Avetik Karagulyan, Peter Richtárik, NeurIPS 2023 FL@FM Workshop. • [paper] • [BibTex]

SD2: spatially resolved transcriptomics deconvolution through integration of dropout and spatial information
Haoyang Li, Hanmin Li, Juexiao Zhou, Xin Gao, Bioinformatics. • [paper] • [BibTex]

→ full list

Talks

Stabilizing Proximal Updates: Trust Regions, Linear Descent, and Connections to Modern ML Optimizers
May 12, 2026 — ELLIIT Focus Period Optimization for Learning, Lund, Sweden

Poster Presentation of On the Convergence of FedProx with Extrapolation and Inexact Prox
Dec 15, 2024 — NeurIPS 2024 OPT-ML Workshop, Vancouver, Canada

Poster Presentation of The Power of Extrapolation in Federated Learning
Dec 11, 2024 — NeurIPS 2024, Vancouver, Canada

Talk of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
Jun 27, 2024 — EUROPT 2024, Lund, Sweden

Poster Presentation of Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization
May 07, 2024 — ICLR 2024, Vienne, Austria

→ full list

Reviewer Services

NeurIPS 24’, 25’
NeurIPS OPT-ML 24’
ICLR 25’
ICML 25’
JMLR
IEEE TNNLS
IEEE TSP
Optimization Methods and Software.

Hanmin Li (李瀚民)

Biography

Work Experience

Recent News

Papers

Talks

Reviewer Services

Hanmin Li (李 瀚民)

Biography

Work Experience

Recent News

Papers

Talks

Reviewer Services

Hanmin Li (李瀚民)