Fall 2023 - CMPT 419 D200

Special Topics in Artificial Intelligence (3)

Reinforcement Learning Theory

Class Number: 7817

Delivery Method: In Person


  • Course Times + Location:

    Sep 6 – Dec 5, 2023: Fri, 2:30–5:20 p.m.



Current topics in artificial intelligence depending on faculty and student interest.


Modern applications in machine learning require collecting data in an online fashion and reasoning about the decisions used to gather it. Sequential decision-making under uncertainty focuses on problems that involve interacting with the world, collecting data, and reasoning about it, all with incomplete information about the world.  Reinforcement Learning (RL) is a general framework for studying interactive learning and has been used to develop algorithms with applications to clinical trials in medicine, monitoring industrial plants, robotics, games such as Atari and Go, and computational marketing. 

**Students taking the course are expected to have an understanding of basic probability, the basics of concentration inequalities, linear algebra, and convex optimization.**

This course (Theoretical Foundations of Reinforcement Learning) introduces the foundational concepts of bandits and reinforcement learning (RL). It will give the students experience in
1. Proving theoretical guarantees for reinforcement learning algorithms
2. Mapping problems in practical applications (e.g. recommender systems, social networks) to the RL framework
3. Developing and analyzing new bandit and RL algorithms

* Note that this course is cross-listed as a special topics course in both Artificial Intelligence and Theory, and will count towards the breadth requirement in either area.

- Basics: Concentration inequalities, (Online) Convex Optimization
- Bandits: Multi-armed/Contextual bandits framework and the exploration-exploitation trade-off
- Bandits: Algorithms: Epsilon-greedy, Upper-confidence Bound, Thompson sampling
- Markov Decision Processes: Structural properties, Bellman equation, Linear programming view of MDPs
- MDPs: Algorithms in the tabular setting: Policy Evaluation, Temporal Difference Learning, Value Iteration, Policy Iteration
- MDPs: Algorithms with function approximation: Approximate policy iteration, Politex, Policy gradients
- MDPs: Sample-complexity of model-based learning of MDPs under a generative model
- MDPS: Regret minimization in the online setting using UCRL, LSVI-UCB
- Constrained MDPs: Primal-dual algorithms for planning



There will be a couple of assignments with the major evaluation components being a final project. The details will be discussed in the first week of classes.




  • Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto
  • Bandit Algorithms. Tor Lattimore and Csaba Szepesvari
  • Markov Decision Processes: Discrete Stochastic Dynamic Programming. Martin L. Puterman
  • Reinforcement Learning: Theory and Algorithms. Alekh Agarwal,  Nan Jiang, Sham M. Kakade and Wen Sun.


Your personalized Course Material list, including digital and physical textbooks, are available through the SFU Bookstore website by simply entering your Computing ID at: shop.sfu.ca/course-materials/my-personalized-course-materials.

Registrar Notes:


SFU’s Academic Integrity website http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating. Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the university community. Academic dishonesty, in whatever form, is ultimately destructive of the values of the university. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the university. http://www.sfu.ca/policies/gazette/student/s10-01.html


Students with a faith background who may need accommodations during the semester are encouraged to assess their needs as soon as possible and review the Multifaith religious accommodations website. The page outlines ways they begin working toward an accommodation and ensure solutions can be reached in a timely fashion.