Transformer Language Modeling
Team: Emmanuel Rajapandian
Resources: [Code]
Summary: In this project, I implemented a Transformer language model from scratch using Python and PyTorch, focusing on understanding the mechanics of neural models for language tasks. The primary dataset used was the text8 collection, derived from Wikipedia, which was split into sequences for training. In Part 1, I built a simplified Transformer encoder to predict character occurrences in a sequence, utilizing self-attention mechanisms without relying on pre-built components like nn.TransformerEncoder. This involved creating queries, keys, and values matrices to compute attention scores and integrating positional encodings to enhance model accuracy. For Part 2, I extended this work to develop a Transformer-based language model capable of predicting the next character in a sequence. This required implementing causal masking to prevent future information leakage during prediction. I structured the network to process chunks of characters and optimized it to achieve a perplexity score of 6.63 against the benchmark of perplexity score of 7. Throughout the project, I focused on training stability and accuracy, ensuring my model adhered to proper probability distributions for language modeling tasks. This assignment provided deep insights into Transformer architectures and their application in natural language processing, helping me learn the practical skills for building advanced machine learning systems.
My contribution: Sole contributor.