- Agent AI: Surveying the Horizons of Multimodal Interaction. Zane Durante, et al. [ArXiv] [pdf]
- The Annotated Transformer. Sasha Rush, et al. [Blog] [Code]
- The First Law of Complexodynamics. Scott Aaronson. [Blog]
- The Unreasonable Effectiveness of Recurrent Neural Networks. Andrej Karpathy. [Blog] [Code]
- Understanding LSTM Networks. Christopher Olah. [Blog]
- Recurrent Neural Network Regularization. Wojciech Zaremba, et al. [ArXiv] [pdf] [Code]
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. Geoffrey E. Hinton and Drew van Camp. [Paper] [pdf]
- Pointer Networks. Oriol Vinyals, et al. [Paper] [pdf]
- ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, et al. [Paper] [pdf]
- Order Matters: Sequence to sequence for sets. Oriol Vinyals, et al. [ArXiv] [pdf]
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism. Yanping Huang, et al. [ArXiv] [pdf]
- Deep Residual Learning for Image Recognition. Kaiming He, et al. [ArXiv] [pdf]
- Multi-Scale Context Aggregation by Dilated Convolutions. Fisher Yu and Vladlen Koltun. [ArXiv] [pdf]
- Neural Message Passing for Quantum Chemistry. Justin Gilmer, et al. [ArXiv] [pdf]
- Attention Is All You Need. Ashish Vaswani, et al. [ArXiv] [pdf]
- Neural Machine Translation by Jointly Learning to Align and Translate. Dzmitry Bahdanau, et al. [ArXiv] [pdf]
- Identity Mappings in Deep Residual Networks. Kaiming He, et al. [ArXiv] [pdf]
- A simple neural network module for relational reasoning. Adam Santoro, et al. [ArXiv] [pdf]
- Variational Lossy Autoencoder. Xi Chen, et al. [ArXiv] [pdf]
- Relational recurrent neural networks. Adam Santoro, et al. [ArXiv] [pdf]
- Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton. Scott Aaronson, et al. [ArXiv] [pdf]
- Neural Turing Machines. Alex Graves, et al. [ArXiv] [pdf]
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. Dario Amodei, et al. [ArXiv] [pdf]
- Scaling Laws for Neural Language Models. Jared Kaplan, et al. [ArXiv] [pdf]
- A Tutorial Introduction to the Minimum Description Length Principle. Peter Grunwald. [ArXiv] [pdf]
- Machine Super Intelligence. Shane Legg. [Blog] [Presentation] [pdf]
- CS231n: Convolutional Neural Networks for Visual Recognition. [Course] [gitHub]