If you are anything like me, “machine learning” to you means working with algorithms that adapt their function from data. And while this is true, it’s not the complete story when actually working in the field of machine learning.
Yes, picking the right algorithm and creating the appropriate model is important. But data cleaning, optimizing for production, and setting up scalable infrastructure is just as much part of the day-to-day work on the job.
Along comes Machine Learning Design Patterns, a book that looks at common challenges in practical machine learning. It leaves out model architectures on purpose and instead promises to collect best practices to move machine learning to production.
Each chapter: A (machine learning) design pattern
The book’s title plays on the famous 1994 book by the Gang of Four that popularised the concept of design patterns in software engineering. It comes as no surprise that in Machine Learning Design Patterns, the authors attempt something very similar and identify reoccurring problems from their field.
Each chapter is structured the same way:
- What is a common problem in machine learning?
- What is the recommended solution?
- What are trade-offs and alternatives?
What’s in the book
Written by three engineers from the Google Cloud AI team, the book covers a breadth of topics:
- Data Representation
- Problem Representation
- Model Training
- Resilient Serving
- Reproducibility
- Responsible AI
The early chapters cover topics frequently occurring in the “data science” section of machine learning: What is an embedding, how to work with imbalanced datasets, how to create proper checkpoints during training.
In the later parts, the authors focus more and more on inference and challenges of automation, repeatability and scalability. I now have an idea what a feature store is and how to bridge data schemas when mixing old and new data sources.
Examples: Python and SQL code
The book is full of examples and they use different technologies: Tensorflow examples in Python, a BigQuery listing, a Google cloud SDK API call.
From the preference of technologies, you notice that the book has been written by Googlers. This doesn’t matter in my mind, because the concepts are always explained clearly, so that porting to other platforms or products should be straight forward.
What I liked
The book covers a wide range of topics and helped extend my knowledge of machine learning to areas I am not an expert in: Resilient Serving, Reproducibility and MLOps in general.
The structure of design patterns lends itself to keep this book on the shelf for future reference. Chapters have a clear motivation and are written to the point, so that I can see myself looking up a design pattern in the future.
Aside from technical topics, the authors also include three chapters about responsible AI and a (brilliant) section about the ML Life Cycle and the AI Readiness of organisations.
What I didn’t like
I have a few complaints, though.
In places it becomes clear that the idea of extracting design patterns from machine learning approaches works well for some topics, but becomes a bit of a stretch for others. I personally didn’t mind this too much, but it’s not as elegant the title of the book suggests.
What I did mind was the fact that this first edition is quite riddled with errors: From figures containing incorrect numbers that don’t align with the text (just annoying) to an explanation of convolution layers that confused convolution with pooling, I think (potentially misleading).
And a final nitpick: The printed copy I ordered was a a monochrome version with low contrast. Many figures were completely indecipherable. A bit disappointing for an O’Reilly book upwards of 40€.
Conclusion: Great overview to bring ML to production
In conclusion, Machine Learning Design Patterns gives a great overview over common problems you encounter when designing, building and deploying machine learning algorithms.
It will offer valuable content for many in the industry: Data scientists who have never deployed a cloud pipeline, ops experts who are curious about “MLOps” and the product person who wants to understand the constraints and possibilities of modern machine learning development.
My favourite chapter was actually a non-technical one: How to move a team and a whole company from running first ML experiments to becoming an ML-first organisation. This idea ties a lot of the technical and human topics together and it is a topic that excites me personally.
I’ve enjoyed working through this book (together with my data science study group) and it will find a valued place in my bookshelf – to be referenced whenever I encounter one of the problems in the wild again and need a foundational perspective.