A few things have coincided this summer: I had some time on my hands, I wanted to play around with the programming language Go, and Euro 2024 has been happening in Germany (I’m not a big football fan, but I do enjoy following the big tournaments every 2 years).
Long story short, I’ve worked through the (excellent!) book “Let’s Go” and have built a very rough web app just in time for the kickoff match in June.
The project is called “Go Tipp” (after the German word “Tipp” for a bet or guess), the source is now on Github:
16 of my friends have joined the game since and we have been actively guessing and (more often) mis-guessing the match outcomes of the Eurocup matches.
The features I’ve built
Here’s a quick rundown of the main project features:
Signup using invite code that puts you into a private group
Enter guesses for the outcomes of all matches
Once a match is finished, points are given for all correct guesses
Custom scoring rules per phase of the event: During the k.o. phase, correct guesses yield more points than during the group phase
Leaderboard to show the ranking of best guessers in the group
Profile page of each user to show their match history compared agains your own
Live update of match scores while the match is running, using the free api from https://openligadb.de/
A simple tech stack
The tech stack is simple:
Backend: Go
Database: MySQL
Frontend: No JS framework and mostly custom CSS (with some base styling using Pure)
Hosting: Uberspace, simple deployment via ssh and supervisord
A rewarding side project
So, I “learned” Go, but really more as a side effect of building a game that 17 people have been using actively almost daily for about 6 weeks.
It feels good when people use the thing you’ve built.
MVP gone right
I took a very deliberate approach when deciding which features to build first and which to leave out. I think it’s the first time I went ultra “MVP” with a project of mine and it worked out really well.
For an embarassingly long time, the site didn’t work properly on mobile, yet it still was used daily.
Even now, you can’t change your password, there’s no admin interface for me to edit match data (I do it in the database by hand) and there’s no “proper” frontend build pipeline.
My co-developer: AI
For this project, I’ve made heavy use of Copilot, GPT4 and Claude. It’s like directing a skilled developer who can code at 10x my own speed.
Find the code online
It’s a plain monolith, yet the code base is structured enough that making the code public doesn’t feel too embarassing.
It’s not meant to be used out-of-the-box, because many things have been hardcoded for the Euro 2024. Re-using it for another tournament can be possible with some adjustments.
Over the end-of-year slowdown and the holidays, I’ve started to learn something new: React Native (and TypeScript along with it). It’s refreshing to approach a technology I haven’t actively used with a beginner’s mindset. Plus, it’s fun to build stuff.
A new tech stack for me: React Native and TypeScript
React Native is a framework to build mobile apps for both iOS and Android using the same codebase (which is either JavaScript or TypeScript).
You can do much more with React Native, but this is what it’s mostly used for.
Why React Native?
First, professional relevance: I work as an AI and Machine Learning Engineer, so I usually work in the Python ecosystem. However, ML software doesn’t live in isolation and we are often building web or mobile applications, either as internal tools or for product integration of the machine learning systems. To be able to build web and mobile applications, better knowledge of React and the ecosystem makes a lot of sense to me. In fact, my whole team has recently decided to up-skill in this direction.
Second, personal interest: Since I stopped working as a web developer in 2017, I haven’t really followed the changes in the web and JS space. I’ve remained curious about web technology and have always wanted to be able to build mobile apps for my personal use and potential side projects. React Native offers both, plus a lot of the knowledge will transfer easily to vanilla React for the web.
How I am learning
I like reading traditional paper books when learning something new because I can focus better when I look at printed paper rather than a digital screen.
Book 1: Professional React Native by Alexander Kuttig. A compact overview of the important elements of React Native projects and a collection of best practices. The book is not comprehensive in listing the available API methods, but I like this style: It’s a fast-paced guide that I can use to start building my own projects. The book has many pointers on important packages. There are some mistakes in the code listings and the code formatting is sometimes broken, so the whole thing feels a little rushed. Still, I’d recommend it if you have previous programming experience.
Book 2: Learning TypeScript by Josh Goldberg. A compact, but detailed look at the TypeScript language. I have only covered the basics of the language to get me started on my own projects, but I will continue reading this book because I want to make use of the full power of TypeScript in my projects. It’s very well explained and has clearly gone through a better editing process than Book 1 (which is what I would expect from an O’Reilly publication). Clear recommendation.
Learning by doing: As I am working through these books (and googling anything I don’t know), I am building my first project, see below.
My first project: A Mastodon client
Having looked at the Mastodon API in a previous (Python) project, I decided to build a Mastodon mobile app for my personal use – or rather my learning experience.
I have worked on the project for a few days now, and it is almost at MVP-level, meaning it provides some value to the user (i.e. to me).
What I’ve implemented and learned so far:
Project setup of a React Native app
This took longer than expected because I needed to update Node and Ruby versions on my Mac. This reminded me of the frustration I felt as a web developer 5+ years ago when every few weeks the community moved to a new build tool and all dependencies had to remain compatible. It took me around 2 hours for the setup, but I’m happy I came out on the other side because since then the dev experience with React Native and hot-reloading of the app in the phone simulator has been pleasant.
Fetching the personal home timeline
I decided not to use any Mastodon API wrappers but to use the REST API directly. It helps me learn what’s actually going on. This is straightforward using fetch() and casting the result to a matching type definition in TypeScript. Reading the home timeline requires authentication. I haven’t built a UI-based login flow yet, but I am simply passing the auth token associated with my Mastodon account.
Display of the home timeline
This is the only real feature I’ve implemented, but it helped me to learn quite a bit:
Build and structure React components
Use React hooks
Styling of React Native views
How to render the HTML content of the posts as native views
How to implement pull-to-refresh and infinite scrolling of a list view
What is still missing
For a full-fledged Mastodon client, I’ve maybe implemented 2% and the remaining 98% is still missing. Even for an MVP “read-only” app, I am still missing some crucial pieces:
Login flow
Display attachments (images, videos, …)
Detail view of the toots with more details (replies, like count, …)
I need to learn a few more core concepts to be able to implement these features, most notably navigation of multiple views and storing data on the device.
My plan is to build out this MVP version to continue learning the core concepts.
Afterwards, I will probably look for another project idea, one that is uniquely “my project”.
Ambitious ideas for this project
If I do end up working on the Mastodon app longer term, there are some ideas that would be fun to implement. In particular, I’d love to bring some of my Data Science / ML experience over to a mobile app. How about these ideas:
Detect the language of posts and split your timeline into localized versions
Detect the sentiment of posts and let the app know if you want to filter out clickbaity posts today
Summarize today’s posts in a short text (possible GPT3/ChatGPT integration)
Cluster posts into topics (like “news”, “meme”, “personal” or “cat content”) so that you can decide if you’re in the mood to explore or simply want to focus on what’s relevant today
Include tools to explore your Mastodon instance or the whole fediverse: Find accounts you would like, and find accounts that are popular outside your own circles. Some inspiration is in my previous post on exploring the Fediverse.
Follow along
If you want to follow along, you can find my current project progress on Github. Remember that this isn’t meant as an actual Mastodon client, but as an educational exercise for myself. Use at your own risk.
I’ve picked the Mastodon instance sigmoid.social, an AI-related instance that is only 3 months old but already has close to 7000 users.
Machines talking to each other
Each Mastodon instance has a public API so it’s straightforward to fetch some basic statistics even without any authentication. I wrote some simple Python scripts to fetch basic info about my home instance.
I wondered: Who are the other users on sigmoid.social? To gain an overview, I fetched the profiles of all user accounts that are discoverable (which at the time of writing means 1300 accounts out of 6700).
Most profiles have a personal description text, typically this is a short bio. I plotted these as an old-fashioned word cloud.
The insight isn’t that surprising: The place is swarming with ML researchers and research scientists, both from universities and commercial research labs.
A stroll through the neighborhood
You don’t want to have an account surrounded by AI folk? No problem, there are more than 12,000 instances to choose from (according to a recent number I found). And they can all talk to each other.
I wanted to see how connected the instance sigmoid.social is and plotted its neighborhood.
This is the method I used to generate the neighborhood graph:
Fetch the 1000 most recent posts present on the instance (which can originate from any other Mastodon instance).
Identify all instances that occur among these posts, and fetch their respective recent posts.
With all these posts of a few hundred instances, create a graph: Each instance becomes a node. Two nodes are connected by an edge if at least five of the recent posts connect the two instances.
My method is naive, but it works sufficiently well to create a simple undirected graph.
The graph yields another unsurprising insight: All roads lead to mastodon.social, the largest and most well-known instance (as far as I know).
Join us on Mastodon?
I may or may not become more active as a poster myself. In any case, feel free to come over and say Hi: https://sigmoid.social/@florian
If you’re one of the few who still use RSS feeds, make sure to update your feed URL to https://casualcoding.com/feed. The old URLs will redirect automatically, but better safe than sorry.
For a few years now, the team from fast.ai has been providing free education about deep learning on their website. Their video course promises a hands-on approach that aims to de-mystify the technologies of modern deep learning. With the book “Deep Learning for coders with fast.ai“, they bring these education principles to the written format, either as a printed book from O’Reilly or on Github (for free).
Before I talk about the book, some context: fast.ai is the name of a website with a video course of the same name. The course is taught using a Python library (called fastai, no dot) which is built on top of PyTorch, the popular deep learning framework. Nomenclature can be confusing. I’ll try to be specific and reference “the fast.ai team” or “the fastai library” in this review of the book.
Teaching structure: Top-down, then bottom-up
The authors are very vocal about their teaching principles: The goal is to “teach the whole game” while skipping the often demotivating mathematical principles at the beginning.
Instead, the first example gives you all instructions needed to train a state-of-the-art image classification model from scratch.
Then, the book progresses deeper into the technical and mathematical foundations, which they use to build up a (simple) version of their fastai library from scratch.
I’m torn: While this structure lowers the barrier of entry, it also makes for a repetitive experience: You encounter the same example many times, just at different levels of abstraction.
What’s in the book
The book covers a wide range of deep learning topics at different levels of depth.
You see practical examples across the main applications areas of deep learning: Computer vision, natural language processing, tabular modeling, and collaborative filtering
All examples are presented with full code listings and everything in the book invites you to go and try things out yourself.
The book presents a wide collection of deep learning techniques that help to get trainings running properly in practice.
Popular deep learning architectures are explained, including ResNet, LSTMs, and U-Nets. With a mix of code, visualization, and (some) maths, the authors do a good job of conveying the core ideas of important architectures.
The authors don’t stop at the technical explanations but stress that it’s important to think further. Deep learning is a powerful tool and one that should be used responsibly. Yes, the technical implementor has in fact a responsibility to consider fairness criteria and ask the question “should we even do this at all?”
What I liked
The book is packed with code examples. I personally learn best when implementing something by hand and seeing how an abstract idea translates to actual source code, so this really matched my learning style.
The language of the text is also very easy to digest. You can tell that the fast.ai team wants to teach a little differently and is genuinely excited about the topic. The text is mixed with personal anecdotes and examples of Twitter conversations to create a sense of community around the otherwise technical topic.
The collection of the latest deep learning techniques and condensed experience is immensely valuable: You learn how a proper training process looks like, which techniques you can use to improve the training and how to investigate if the training is not behaving nicely.
What I didn’t like
My biggest gripe with the fast.ai material is their Python coding style: Everything has to be an abbreviation, apparently. I don’t know why you call a parameter ni when it could just as well be called num_inputs. If the goal is to “reduce jargon”, using explicit naming in the code would be part of that, if you ask me.
Secondly, the teaching principle of “top-down, then bottom-up” has its quirks: You repeat the same example over and over again, just on different levels of abstraction. When I want to look up “the chapter on convolutional neural networks”, it’s not one chapter I have to browse, but 4 or 5. This may make for a good didactic progression but feels quite repetitive at times.
Who should read the book
The name and subtitle of the book capture it quite well: The code-centric approach of learning (and trying out) deep learning lends itself for people who self-identify as “coders” and not so much as academic scholars who want to have the theory laid out first.
Still, it shows that this book originated in a course. The material will stick if you really follow along and try things for yourself. If you don’t, and you’re completely new to deep learning, it will be hard to map out where in the level of abstractions each chapter is situated.
I actually found the book very helpful for myself, because it helped me understand how to use the latest deep learning technologies such as learning rate finder, 1-cycle training, label smoothing, and mixup augmentation. Having worked with deep learning for a while, I still learned quite some new methods and was able to gain a deeper understanding of concepts I had known before.
Summary
Overall, I really liked the book. The authors did a great job of covering a wide range of deep learning applications while showing both: easy-to-use black box examples and the deepest insides of that black box. This helps to de-mystify the ai hype and teaches helpful hands-on skills.
They share a lot of expert advice on how to set up training procedures properly and I actually agree with their claim: Those who really complete this material have a great starting point working in the field of deep learning.
The didactic style may not be for everyone and I personally hope the fastai coding style doesn’t stick, but I am grateful for the fast.ai team’s contribution: Making deep learning accessible for anyone who is interested.
“Meow” — I’m sorry? “Meow!” — Oh, right! Here you go.
What if I could understand exactly what my cat is trying to tell me? We live in 2021, which is basically the future. How hard can it be?
A dataset of meows
A group of dedicated researchers from northern Italy has recently released a public dataset of cat vocalizations (let’s call them “meows”). 21 cats from two different breeds were exposed to three different situations while a microphone was listening:
Brushing: The owner brushed the cat in a familiar environment.
Isolation: The cat was placed in an unfamiliar environment for a few minutes.
Food: The cat was waiting for food.
In total, the dataset comprises 440 audio files.
Dataset statistics
The dataset is not evenly split between those three situations.
Neither is it evenly split between cat breeds or the sex of the cat.
In fact, some cats occur way more often in the recordings than others. I don’t know why. Maybe “CAN01” is just very talkative whereas “NIG01” prefers to keep to himself?
Looking at these distributions is important. When we train a neural network to classify a given voice recording, we want to make sure it performs better than simply guessing the most frequent label.
For example, always guessing “female” when asked for the cat’s gender would be correct in 78% of cases because there are 345 female voice recordings and only 95 recordings of male cats.
Any classifier that is supposed to be useful has to surpass this baseline of “informed” guessing.
Feature
Most frequent label
Absolute count
Relative count = baseline accuracy
Situation (3 classes)
isolation
221 of 440 recordings
50.2 %
Sex (2 classes)
female
345 of 440 recordings
78.4 %
Breed (2 classes)
european_shorthair
225 of 440 recordings
51.1 %
Table that lists the most frequent label per feature. The numbers highlight which baseline accuracy a model has to achieve to be better than guessing.
Now we have an idea of what our data distributions look like. In total, there are three interesting tasks we can have a model learn from the data: (1) What situation was the cat in, (2) what is the sex of the cat, and (3) what is the breed of the cat. It will be interesting to see if these tasks can be learned from the data at all. Let’s start preparing our data to train a model.
Turning audio into images
There are many ways to encode an audio signal before passing it into a neural network. For my project, I am choosing a visual approach: We plot the spectrogram of the audio recordings as an image.
This allows us to use well-established neural networks from the field of computer vision. Also, spectrograms look nice.
Spectrograms are a plot where the location in the image represents a given frequency at a given point in time in the audio file. The brightness of a pixel represents the intensity of the audio signal.
The following example shows one of the recordings as a spectrogram. The time axis goes from top left (zero) to bottom left. The x-axis denotes the frequencies.
Image classification using a pretrained ResNet
Having turned our audio classification task into an image classification task, we can start with our model training. We are going to train three models for three different tasks:
Given a spectrogram image, classify the situation the cat was in.
Given a spectrogram image, classify the sex of the cat.
Given a spectrogram image, classify the breed of the cat.
I have been playing around with the fast.ai library in the past few weeks which provides convenient wrappers around the PyTorch framework, so I decided to use fast.ai for this project.
Like most deep learning frameworks, it is easy to re-use popular computer vision architectures in fast.ai. With one(-ish) line of Python, you have a capable neural network for image classification at your hands. It comes pre-trained so that you need fewer images for your task at hand.
ResNets are a popular neural network architecture from 2015 that introduced residual connections – a mechanism that improves training behavior and allows the training of (very) deep networks.
The catmeows dataset is quite small, so I was satisfied with the smallest ResNet flavor (called ResNet-18). It has “only” 18 layers and it is still oversized for my 440 images.
The ResNet implementation wants to have square images as its input, so I took random square crops from the spectrograms during training. The crops were 81 x 81 pixels in size and could be from different points in time of the recording, but always contain the full spectrogram.
Splitting the data for training and validation
When training a classifier it is important not to show all of your data to the model during training. You want to hold out some samples for validating the classifier during the training process. That way you get an idea if the model learns the training data by heart or if it actually learns something useful.
Sometimes it is fine to take a random percentage of the dataset as the validation set. In this case, I wanted to separate the cats across train and validation split so that the model can’t cheat by memorizing the characteristics of an individual cat.
I took 4 individual cats out of the training data. Their recordings combined made up 66 samples of the dataset, which means 15% of the data was reserved for validation and only the remaining 85% were used for training.
The results
For the three different tasks, the 3 models I trained achieved the following accuracy scores.
Task
Classification accuracy
Guessing baseline (see above)
Situation
63.6 %
50.2 %
Sex
90.9 %
78.4 %
Breed
93.9 %
51.1 %
Results: The accuracy scores of the three task-specific models. For easy comparison, I also list the guessing baseline as described above.
Across all three tasks, the models performed well above the guessing baseline we have determined earlier.
Let’s also take a look at the confusion matrix for each task. A confusion matrix plots each sample of the validation set and indicates how many were classified correctly and which errors were made.
What to make of this
First of all, these are quick results. We haven’t built a super AI that understands every single cat in the world. (Yet.)
What these results mostly show are interesting aspects of the dataset: Most of all, I was surprised how well the sex and breed can be told apart by the model. As I made sure to separate individual cats across train and validation data, I do have some confidence that the model didn’t cheat. There may still be some information leakage that I’m not aware of, of course.
What to improve
This is a small dataset. ResNet-18 is a big network. This mix can cause problems.
In my case, I am using a pre-trained version of ResNet, so the convolutional features don’t have to be learned from scratch. Still, I found myself re-running the training multiple times with varying success. I think with such little data it is still easy for the model to run into a local optimum and overfit on the training data.
Ideas for improvement:
Try freezing different layers and sets of layers of the network. It’s a tiny amount of data, we wouldn’t want to destroy the pre-trained features by accident. At the same time, spectrograms are not natural images, so fine-tuning probably makes sense.
Some additional data augmentation would surely help to enrich the training data. As these are not natural images but visualizations of an audio signal, I think some augmentation operations make sense (cropping at different points in time, jitter contrast, and brightness to simulate volume fluctuations). Some others are more questionable (perspective transformations, cropping different frequency bands). I haven’t tried them so far, but they could very well improve the results.
To learn more about the data, it would be interesting to extract quantitative audio characteristics and train a logistic regression or random forest on the data. These models are easier to interpret and could help to understand if the models look at something meaningful in the data or if there is some data leakage that allows the models to cheat.
Conclusion
Playing with public datasets is fun! You should try it.
I may continue with this pet project (pet! get it?) or start something fresh with the next dataset that looks interesting.
If you’ve found an issue in my data or training setup, please let me know.
You can find the complete project code in a messy Jupyter notebook on Github.