About Me


I am a PhD student in the Machine Learning Group at the University of Toronto and the Vector Institute. I am primarily interested in neural networks, optimization, Bayesian inference, generative models, and reinforcement learning. I am also interested in applying insights about how the brain works to improve neural networks.

I obtained my Master's degree in Computer Science at Simon Fraser University. My MSc research focused on belief change, which is an area of knowledge representation concerned with updating knowledge bases in light of new information. I developed a Python package called Equibel to make it easier for researchers to experiment with belief change in multi-agent systems.

I received my BSc in computer science from Simon Fraser University in Winter 2014, and my Master's in 2016.


As a Canadian, I enjoy salmon, maple syrup, and their combination (salmon candy)! I also like sports based on water in different physical phases: swimming, skating, and skiing. As a music-lover, I support the Vancouver Symphony Orchestra and the Vancouver Opera. I myself play the saxophone and the guitar.

A collection of my favorite books can be found on my bookshelf.


March 13, 2019 Presented Self-Tuning Networks at the Vector Endless Summer School
February 22, 2019 Presented Self-Tuning Networks at the Vector Research Symposium
December 20, 2018 Self-Tuning Networks accepted to ICLR 2019
September 4, 2018 Reversible Recurrent Neural Networks accepted to NIPS 2018
August 29, 2018 Attended the Vector Science Communications Workshop
July 25 - August 3, 2018 Attended Deep Learning and Reinforcement Learning Summer School
June 27, 2018 Adversarial Distillation of Bayesian Neural Network Posteriors up on arXiv
June 22, 2018 MovieGraphs featured in a UofT news article
February 19, 2018 MovieGraphs accepted to CVPR 2018
January 29, 2018 Flipout accepted to ICLR 2018
December 19, 2017 MovieGraphs preprint up on arXiv
September, 2016 Started PhD at the University of Toronto
June 22, 2016 One paper accepted to ECAI 2016
June 6, 2015 One paper accepted to LPNMR 2015
May 7, 2015 Awarded the Governor General's Silver Medal for highest graduating GPA
December 20, 2014 Finished my BSc in Computer Science at Simon Fraser University


Self-Tuning Network Architecture

Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions

Matthew MacKay*, Paul Vicol*, Jonathan Lorraine, David Duvenaud, Roger Grosse

Abstract: Hyperparameter optimization can be formulated as a bilevel optimization problem, where the optimal parameters on the training set depend on the hyperparameters. We aim to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which maps hyperparameters to optimal weights and biases. We show how to construct scalable best-response approximations for neural networks by modeling the best-response as a single network whose hidden units are gated conditionally on the regularizer. We justify this approximation by showing the exact best-response for a shallow linear network with L2-regularized Jacobian can be represented by a similar gating mechanism. We fit this model using a gradient-based hyperparameter optimization algorithm which alternates between approximating the best-response around the current hyperparameters and optimizing the hyperparameters using the approximate best-response function. Unlike other gradient-based approaches, we do not require differentiating the training loss with respect to the hyperparameters, allowing us to tune discrete hyperparameters, data augmentation hyperparameters, and dropout probabilities. Because the hyperparameters are adapted online, our approach discovers hyperparameter schedules that can outperform fixed hyperparameter values. Empirically, our approach outperforms competing hyperparameter optimization methods on large-scale deep learning problems. We call our networks, which update their own hyperparameters online during training, Self-Tuning Networks (STNs).

Reversible RNN Attention Mechanism

Reversible Recurrent Neural Networks

Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse

Abstract: Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10--15. We extend our technique to attention-based sequence-to-sequence models, where it maintains performance while reducing activation memory cost by a factor of 5--10 in the encoder, and a factor of 10--15 in the decoder.

Adversarial Posterior Distillation Framework

Adversarial Distillation of Bayesian Neural Network Posteriors

Kuan-Chieh Wang, Paul Vicol, James Lucas, Li Gu, Roger Grosse, Richard Zemel

Abstract: Bayesian neural networks (BNNs) allow us to reason about uncertainty in a principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables efficient BNN learning by drawing samples from the BNN posterior using mini-batches. However, SGLD and its extensions require storage of many copies of the model parameters, a potentially prohibitive cost, especially for large neural networks. We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against adversarial attacks. By construction, our framework distills not only the Bayesian predictive distribution, but the posterior itself. This allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks. To our knowledge, these are the first results applying MCMC-based BNNs to the aforementioned applications.

Flipout LSTM Variance Reduction

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Abstract: Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout, an efficient method for decorrelating the gradients within a mini-batch by implicitly sampling pseudo-independent weight perturbations for each example. Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs. We find significant speedups in training neural networks with multiplicative Gaussian perturbations. We show that flipout is effective at regularizing LSTMs, and outperforms previous methods. Flipout also enables us to vectorize evolution strategies: in our experiments, a single GPU with flipout can handle the same throughput as at least 40 CPU cores using existing methods, equivalent to a factor-of-4 cost reduction on Amazon Web Services.


MovieGraphs: Towards Understanding Human-Centric Situations from Videos

Paul Vicol, Makarand Tapaswi, Lluis Castrejon, Sanja Fidler

Abstract: There is growing interest in artificial intelligence to build socially intelligent robots. This requires machines to have the ability to "read" people's emotions, motivations, and other factors that affect behavior. Towards this goal, we introduce a novel dataset called MovieGraphs which provides detailed graph-based annotations of social situations de- picted in movie clips. Each graph consists of several types of nodes, to capture who is present in the clip, their emotional and physical attributes, their relationships (i.e., parent/child), and the interactions between them. Most interactions are associated with topics that provide additional details, and reasons that give motivations for actions. In addition, most interactions and many attributes are grounded in the video with time stamps. We provide a thorough analysis of our dataset, showing interesting common-sense correlations between different social aspects of scenes, as well as across scenes over time. We propose a method for querying videos and text with graphs, and show that: 1) our graphs contain rich and sufficient information to summarize and localize each scene; and 2) subgraphs allow us to describe situations at an abstract level and retrieve multiple semantically relevant situations. We also propose methods for interaction understanding via ordering, and reasoning about the social scene. MovieGraphs is the first benchmark to focus on inferred properties of human-centric situations, and opens up an exciting avenue towards socially-intelligent AI agents.

An example model graph

A Minimization-Based Approach to Iterated Multi-Agent Belief Change

Paul Vicol, James Delgrande, Torsten Schaub

Abstract: We investigate minimization-based approaches to iterated belief change in multi-agent systems. A network of agents is represented by an undirected graph, where propositional formulas are associated with vertices. Information is shared between vertices via a procedure where each vertex minimizes disagreement with other vertices in the graph. Each iterative approach takes into account the proximity between vertices, with the underlying assumption that information from nearby sources is given higher priority than information from more distant sources. We have identified two main approaches to iteration: in the first approach, a vertex takes into account the information at its immediate neighbours only, and information from more distant vertices is propagated via iteration; in the second approach, a vertex first takes into account information from distance-1 neighbours, then from distance-2 neighbours, and so on, in a prioritized fashion. There prove to be three distinct ways to define the second approach, so in total we have four types of iteration. We define these types formally, find relationships between them, and investigate their basic logical properties. We also implemented the approaches in a software system called Equibel.

Equibel System Architecture

An Implementation of Consistency-Based Multi-Agent Belief Change using ASP

Paul Vicol, James Delgrande, Torsten Schaub

Abstract: This paper presents an implementation of a general framework for consistency-based belief change using Answer Set Programming (ASP). We describe Equibel, a software system for working with belief change operations on arbitrary graph topologies. The system has an ASP component that performs a core maximization procedure, and a Python component that performs additional processing on the output of the ASP solver. The Python component also provides an interactive interface that allows users to create a graph, set formulas at nodes, perform belief change operations, and query the resulting graph.


Equibel Logo


A Python toolkit for equivalence-based belief change.

Examples flower images from all near/far classes

The Delicate Art of Flower Classification

Abstract: In this paper, we study the feasibility of identifying flowers in real-world Flickr photos. Established datasets for flower recognition contain only close-up, centered images of flowers, which are not representative of the large variety found on Flickr. We introduce a new dataset of close-up images that has greater variation in the orientation, shape, colour, and lighting than existing datasets. We also introduce a new dataset that contains both close-up and far-away images of certain species of flowers. We show that it is possible to identify fields of flowers, as well as individual flowers, in images downloaded from Flickr. This could provide a way to mine Flickr for information about nature, that could impact our understanding of the consequences of climate change.


A solution to a graph coloring problem

An Introduction to Answer Set Programming

Paul Vicol

Guest Lecture for Artificial Intelligence Survey (CMPT 310)

Abstract: Answer Set Programming (ASP) is a declarative logic programming paradigm tailored to solve NP-hard combinatorial search problems. ASP separates problem specification from solving by combining a rich modeling language with a general-purpose, high-performance solver. This allows it to solve a wide variety of search problems in a uniform way. This talk introduces ASP with a focus on practical problem solving. By looking at classic problems like graph colouring and n-queens, we see how to model problems in the language of ASP and use off-the-shelf tools to obtain solutions. Answer Set Programming is a powerful paradigm that has been used in applications ranging from robotics to music composition and genomics. This talk shows you how to get started with ASP, so that you can apply it to your own problems!

Latest from the Blog

A Glimpse of My Research in Multi-Agent Belief Change

A look at my recent work in comparing different forms of iterated belief change in multi-agent systems.

Read more!

Recommended from the Bookshelf


by Michael Sandel
Illustrates philosophical concepts with real examples from today's world. Makes it fun to learn the differences between categorical and consequentialist moral reasoning. One of my favorite books ever! You can also watch Sandel's Harvard lectures online!