These past couple of week I've been working on a contribution to the 🤗Huggingface Transformers repo. Transformers is a fast moving library that uses the transformer deep learning architecture to solve problems such as translation, named entity recognition, summarization, and other nlp tasks. I also used the PyTorch Lightning framework. Lightning is a framework used to simplify and organize training and inference code.
This blog post is going to cover my experience building a system and turning that into a contribution to Transformers.
Open Source Contribution
My goal was to use the BART model for summarization. I did have access to a pre-trained model but I wanted to go through the process of training and evaluating it on the full dataset. The basic idea is that given some text article, output a short summary of the article.
There were several issues I ran into along the way. Debugging these sorts of issues while learning a library was really time consuming. I spent hours on simple problems where the solution was only 1 line of code. Understanding the library was very important, I wasn't afraid to dive into the source code, and step through simple examples. Then, once I finally got results with smaller examples I started experimenting with the hyper parameters. I knew that the intuition behind hyper parameters is a trick of the trade. And that's a little frustrating coming from an engineering background. Like I've heard before, it is more of an art than a science. My solution was to run many experiments and see what happens, unfortunately I haven't heard many better alternatives.
After days of tinkering, I got results I was pretty happy with.
Given a CNN article the model generated the following summary:
CNN's Soledad O'Brien takes a tour of the "forgotten floor," where mentally ill inmates are housed in Miami before trial. An inmate housed on the "forgotten floor.
Pretty cool huh?
So why did I decide to spend some of my time contributing to an open source project?
The first motivation was that about a year ago, I talked to the owner of a local AI company. He said that applicants stand out when they've contributed to deep learning projects. It shows self motivation, and that you can learn and contribute to an existing codebase.
The second motivation was, "Why not?". I actually didn't start my project by thinking of how I could contribute to Huggingface. I started it with a desire to understand how to train a language model for summarization. Then once I realized I built something that wasn't available to other people, I cleaned up the code and turned it into a pull request.
The rest of the pull request was pretty straight forward, I just had to make some small requested changes, and it ended up being a very constructive experience. Although this contribution did take time and I struggled through it at times, I came out with a deeper understanding of the Transformers and PyTorch APIs. I also got invited to some slack channels to discuss this implementation, it turned out to be a great way to build on that community I talked about in my previous post.
In my next post I'll be talking about the differences I've noticed between deep learning and engineering.