Reading List – Averyl Xu

Agent AI: Surveying the Horizons of Multimodal Interaction. Zane Durante, et al. [ArXiv] [pdf]
The Annotated Transformer. Sasha Rush, et al. [Blog] [Code]
The First Law of Complexodynamics. Scott Aaronson. [Blog]
The Unreasonable Effectiveness of Recurrent Neural Networks. Andrej Karpathy. [Blog] [Code]
Understanding LSTM Networks. Christopher Olah. [Blog]
Recurrent Neural Network Regularization. Wojciech Zaremba, et al. [ArXiv] [pdf] [Code]
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. Geoffrey E. Hinton and Drew van Camp. [Paper] [pdf]
Pointer Networks. Oriol Vinyals, et al. [Paper] [pdf]
ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, et al. [Paper] [pdf]
Order Matters: Sequence to sequence for sets. Oriol Vinyals, et al. [ArXiv] [pdf]
GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism. Yanping Huang, et al. [ArXiv] [pdf]
Deep Residual Learning for Image Recognition. Kaiming He, et al. [ArXiv ] [pdf]
Multi-Scale Context Aggregation by Dilated Convolutions. Fisher Yu and Vladlen Koltun. [ArXiv ] [pdf]
Neural Message Passing for Quantum Chemistry. Justin Gilmer, et al. [ArXiv ] [pdf]
Attention Is All You Need. Ashish Vaswani, et al. [ArXiv ] [pdf ]
Neural Machine Translation by Jointly Learning to Align and Translate. Dzmitry Bahdanau, et al. [ArXiv ] [pdf ]
Identity Mappings in Deep Residual Networks. Kaiming He, et al. [ArXiv ] [pdf ]
A simple neural network module for relational reasoning. Adam Santoro, et al. [ArXiv ] [pdf ]
Variational Lossy Autoencoder. Xi Chen, et al. [ArXiv ] [pdf ]
Relational recurrent neural networks. Adam Santoro, et al. [ArXiv ] [pdf ]
Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton. Scott Aaronson, et al. [ArXiv ] [pdf ]
Neural Turing Machines. Alex Graves, et al. [ArXiv ] [pdf ]
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. Dario Amodei, et al. [ArXiv ] [pdf ]
Scaling Laws for Neural Language Models. Jared Kaplan, et al. [ArXiv ] [pdf ]
A Tutorial Introduction to the Minimum Description Length Principle. Peter Grunwald. [ArXiv ] [pdf ]
Machine Super Intelligence. Shane Legg. [Blog] [Presentation ] [pdf ]
CS231n: Convolutional Neural Networks for Visual Recognition. [Course ] [gitHub ]