COMP6211J 2025 Fall

Advanced Large-Scale Machine Learning for Foundation Models

Overview

In recent years, foundation models have fundamentally revolutionized the state-of-the-art of artificial intelligence. Thus, the computation in the training, inference, or RL alignment of the foundation model could be one of the most important workflows running on top of modern computer systems. This course unravels the secrets of the efficient deployment of such workflows from the system perspective. Specifically, we will: i) explain how a modern machine learning system (i.e., PyTorch) works; ii) understand the performance bottleneck of machine learning computation over modern hardware (e.g., Nvidia GPUs); iii) discuss four main parallel strategies in foundation model training (data-, pipeline-, tensor model-, optimizer-, sequence-, MoE- parallelism); iv) real-world deployment of foundation model including various inference time optimization; and v) adpation and systematic evaluations of LLMs.

Syllabus

Date	Topic
W1 - 09/02, 09/04	- Introduction and Logistics [Slides] - Stochastic Gradient Descent [Slides]
W2 - 09/09, 09/11	- Auto-Differentiation [Slides] - Nvidia GPU Computation and Communication [Slides]
W3 – 09/16, 09/18	- LLM Pretraining - Data-, Pipeline-, Optimizer- Parallelism
W4 - 09/23, 09/25	- Tensor Model-, Sequence-, MoE- Parallelism - Generative Inference Introduction
W5 - 09/30, 10/02	- Generative Inference Optimization - Prompt Engineering and Inference Scaling
W6 - 10/09	- RAG and LLM Agent
W7 - 10/14, 10/16	- PEFT and RL Alignment - LLM Evaluation
W8 - 10/21, 10/23	- Presentation Session-1 - Presentation Session-2
W9 – 10/28, 10/30	- Presentation Session-3 - Presentation Session-4
W10 - 11/04, 11/06	- Presentation Session-5 - Presentation Session-6
W11 - 11/11, 11/13	- Presentation Session-7 - Presentation Session-8
W12 - 11/18, 11/20	- Presentation Session-9 - Presentation Session-10
W13 - 11/25, 11/27	- Presentation Session-11 - Final Review

Grading

In-class Presentation (30%), including one target paper only:
- Clearly organize the material and present the problem definition, motivation, methodology, and evaluation appropriately. (20%)
- Can answer the questions from the lecturers and other students appropriately. (5%)
- Submit short feedback for all the other presentation sessions under the same category. (5%)
- (Other student feedback determines 70% of the grades for this part.)
Course Report (70%):
- Literature review (50%):
  - Cover the relevant techniques exhaustively. (10%)
  - Understand the relevant techniques correctly. (15%)
  - Organize the techniques using good categorization. (15%)
  - The report is written in professional academic English. (10%)
  - Page limits: 4 pages in NeurIPS template (excluding reference).
- Research plan (20%):
  - The proposed research plan is executable. (10%)
  - The proposed research plan includes novelty and a concrete design. (10%)
  - Page limits: 4 pages in NeurIPS template (excluding reference).

Topics for Literature Review:

[Presentation Paper List]

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
slides		slides
.gitignore		.gitignore
README.md		README.md
topics.md		topics.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COMP6211J 2025 Fall

Advanced Large-Scale Machine Learning for Foundation Models

Overview

Syllabus

Grading

Topics for Literature Review:

About

Uh oh!

Releases

Packages

Relaxed-System-Lab/HKUST-COMP6211J-2025fall

Folders and files

Latest commit

History

Repository files navigation

COMP6211J 2025 Fall

Advanced Large-Scale Machine Learning for Foundation Models

Overview

Syllabus

Grading

Topics for Literature Review:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages