It's been a while since my last post so I thought I'd write about something I've been working on recently.
NOTE as well, this post is the first to use an AI generated header photo, felt fitting. Judge accordingly.
I've been (trying to) work a lot with HuggingFace and their transformers library. Lot's of projects and people want to integrate machine learning into their projects and in my opinion, HuggingFace is the best place to start. It's a great platform - view my previous post on them for more information as well.
One of the tasks I think is pretty easily integrated into a lot of projects, and current models are pretty good at is Image Classification.
It's a pretty powerful tool, however, it can be a bit daunting to get started with.
One of the scariest parts about it is the data and the training - although that may just be the software engineer in me. I'm not a data scientist, so while I can do the whole 'clean, preprocess, train, test, etc.' it's not super fast for me.
This is where Zero Shot Image Classification comes in.
Zero Shot Image Classification is simply a way to classify images without having to train a model on a specific dataset.
You just give the model an image and a list of labels and it will tell you which label it thinks is most appropriate.
You can control the model a bit as well and it will do its absolute best to classify the image based on the labels you give it. It can also 'kind of' learn a bit more about the labels you give it as well over time but not nearly as well as a model you train yourself explicitly
There's a lot more to it than that, but that's the applicable part for most people just trying to work with it.
For a more detailed look at Zero Shot, check out the HuggingFace docs and/or the 'Resources' section below.
So for this post and my personal testing, I've created a 'playground' of sorts for Zero Shot Image Classification. Feel free to view it here
Worth noting as well the HuggingFace docs have a pretty great tutorial I followed to get started with this.
To get started you'll generally want to
pipenv install pillow transformers torch
* PyTorch is what I personally use, but you can use TensorFlow as well.
* The question of PyTorch vs. TensorFlow is a fairly big one and I'm not going to get into it here - but definitely worth reading into a bit!
* See "Resources" below for more information on PyTorch vs. TensorFlow
* You may need to install PyTorch as well - see "Resources" below for more information on that
basic.py
in repo for an example but an idea of the minimal code could even be:from transformers import pipeline
from PIL import Image
import requests
model_name = "openai/clip-vit-large-patch14-336"
classifier = pipeline("zero-shot-image-classification", model=model_name)
url = "https://unsplash.com/photos/g8oS8-82DxI/download?ixid=MnwxMjA3fDB8MXx0b3BpY3x8SnBnNktpZGwtSGt8fHx8fDJ8fDE2NzgxMDYwODc&force=true&w=640"
image_to_classify = Image.open(requests.get(url, stream=True).raw)
labels_for_classification = ["owl", "bird", "cat", "dog", "car"]
scores = classifier(image_to_classify, candidate_labels=labels_for_classification)
for obj in scores:
print(f"{obj['label']}: {obj['score']}")
If you run this it should give you something like:
owl: 0.9953024387359619
bird: 0.0046501727774739265
car: 2.145067810488399e-05
cat: 1.9642637198558077e-05
dog: 6.136761385278078e-06
Incredible! You've just done Zero Shot Image Classification!
Pretty cool right? It raises some questions though:
- How do you know what labels to give it?
- How many labels should you give it?
- What if you give it the wrong labels?
The list goes on. Labels are a big part of your success here.
Before we get into the nitty gritty of labeling performance and strategies, I think it's useful to briefly talk about where you can get labels from.
This is a pretty big question depending on how you're using this tool, but here are a few ideas:
I played around a lot with different labeling strategies and found it to be really useful for getting a better intuitive understanding of how the model works.
These aren't formal strategies or methodologies but just a few ways I thought about labeling images.
labels.txt
- you can even ask chat gpt for some appropriate labels to help fill in.animals.txt
, cars.txt
, food.txt
, etc. - and you put labels in each file that are relevant to that category.View main.py
in the repo for the actual code on how I implemented these strategies into a little Python text program. Note it's not really split into methods like this, I've tried to draw some analogies to how I thought about it but the script was more of a progression of ideas and methods than a formal methodology.
So that all sounds great, but how do you know if you're doing it right?
How do these strategies compare? How do you know if you're getting better at labeling images?
Well I wrote a test script to help with that - also in the repo, test.py
- I'll let you play with it yourself.
It gives generally statistics for a set of hardcoded images how these different labeling strategies compare. It's not perfect, but it's a good start and will give you an idea of how you can play with this.
Some takeaways I learned from this testing were:
Again the above is going to depend a lot on your images, labels, model, etc. but it's a good starting point for how to think about labeling images for Zero Shot Image Classification.
Well, no, it's pretty useful and you'll probably find the limitations of Zero Shot pretty quickly. However, it's a great way to get started and see if image classification generally is worth it for you and your project.
I think Zero Shot is great for MVPs and for some projects entirely. Over time, if you're building a business off of your ML model or how it performs, you'll probably want to train it yourself.
However, for a lot of projects, Zero Shot is a great way to at least get started.
Worth noting as well, zero shot generally performs worse the more labels you give it and definitely the more specific said labels get. So, if you're trying to classify between 'cat' and 'dog' you're probably fine. If you're trying to classify between 'german shepherd', 'golden retriever', 'labrador', etc. you might have a bad time.
That's it!
Zero Shot Image Classification is a great way to get started with image classification and machine learning in general.
It's a great tool to have in your toolbox and can be a great way to get started with a project. It's also a bit of a gateway drug to wanting to train your own models and get more into the nitty gritty of machine learning.
Thanks for reading, please leave any comments or questions below!
Zero Shot Playground: A playground for experimenting with zero-shot classification.
HuggingFace Tutorial: A tutorial on zero-shot image classification using the Transformers library.
Install PyTorch:
PyTorch vs. TensorFlow: