True AI on a Raspberry Pi, with no extra hardware

Building a camera-based person counter using Xnor’s AI2GO platform

Matt Welsh
6 min readMay 16, 2019

My new employer, Xnor.ai, develops deep learning AI models that run efficiently on low-power CPUs and microcontrollers, including the Cortex ARM processor on the Raspberry Pi. No GPU or TPU needed!

A Raspberry Pi 3B+ with a camera and Unicorn Hat HD display makes for a standalone person detector using Xnor’s embedded AI2GO library.

In this post, I’m going to walk though how to use Xnor’s AI2GO platform to build a Raspberry Pi-based person counter: an app that periodically counts people in images from the Pi camera, and displays counts and statistics on a beautiful LED matrix display. In this demo, there’s no use of the cloud (it runs completely offline), no special AI hardware — all you need is a Raspberry Pi, a camera module, and a display.

Deep learning inference models are computationally heavyweight, and running them on a standard CPU — especially a low-power, embedded processor — isn’t usually viable. Hardware accelerators like the Coral TPU and Intel Neural Compute Stick speed things up, but cost upwards of $100 a pop and can consume a nontrivial amount of energy. With Xnor’s software solution, you can run the same algorithms — object detection, face and person detection, segmentation, and more — directly on the Raspberry Pi’s CPU.

Xnor’s AI2GO platform consists of two pieces: (1) A simple SDK, which provides C and Python bindings and sample code; and (2) a collection of hundreds of highly-optimized AI models covering use cases like object, animal, and person detection, classification, facial expression recognition, and more. Each model is provided as a library which you link into your app. You feed the model an image and it spits out inferences — object detections, facial expressions, etc. depending on the specific model being used.

Hardware and software

For this demo, I’m using a Raspberry Pi 3B+, the Pi Camera Module v2, and the beautiful Pimoroni Unicorn Hat HD matrix LED display. (I also 3D printed a camera stand, which you can find on Thingiverse here).

All of the source code for the demo in this post is here:
https://github.com/mdwelsh/teamsidney/tree/master/pi/personcounter

Download AI2GO SDK and pick a model bundle

The AI2GO website is pretty self-explanatory. You start by selecting your hardware platform, and selecting a model based on a wide range of use cases: home, automotive, photography, and so forth. For this demo, I’m using the person detector model which is found under the photography category.

Next, you pick the specific model bundle that you want for your application. A model bundle is just a binary containing the pre-trained AI model which you can link into your application (C and Python are supported). Different model bundles are provided with latency, memory footprint, and accuracy tradeoffs. You can just download the recommended one, or pick a different one from the list. For this demo I’m using the default person-detector-mediumlarge-300.rpi3 bundle.

Make sure you also download the AI2GO SDK from the website — you need both the SDK and a model bundle.

Software setup

On the Raspberry Pi, make sure you have Python3 installed. You also need the Picamera and Pillow libraries installed for this demo, so do this:

pip3 install picamera pillow

For the Unicorn Hat HD, you need to install the drivers using the instructions at https://github.com/pimoroni/unicorn-hat-hd.

The README in the AI2GO SDK is pretty self-explanatory. Just unzip it and drop it somewhere on your Raspberry Pi. To install the model bundle, you’d run the following commands:

% unzip person-detector-mediumlarge-300.rpi3.zip
% cd lib/rpi3/person-detector
% python3 -m pip install xnornet*.whl

and that’s it! You’re now ready to use the Xnor model simply using import xnornet from your Python code.

Reading from the camera

Reading from the Raspberry Pi camera in Python is pretty standard stuff, but it takes a few lines of code to get everything working. The following code initializes the Pi camera and starts recording frames to a circular IO buffer.

camera = picamera.PiCamera()
res = camera.resolution
SINGLE_FRAME_SIZE_YUV = res[0] * res[1] * 3 // 2
# These constants are used later on.
YUV420P_Y_PLANE_SIZE = res[0] * res[1]
YUV420P_U_PLANE_SIZE = YUV420P_Y_PLANE_SIZE // 4
YUV420P_V_PLANE_SIZE = YUV420P_U_PLANE_SIZE
stream = picamera.PiCameraCircularIO(camera,
size=SINGLE_FRAME_SIZE_YUV)
camera.framerate = 8
camera.brightness = 60
camera.start_recording(stream, format='yuv')

Loading the Xnor model

The following code imports the xnornet library — which is simply the Python-wrapped model bundle downloaded previously — and instantiates the model in the program:

import xnornet
model = xnornet.Model.load_built_in()

Note that when you run this, you’ll see the following on stdout:

You're using an Xnor.ai model for evaluation purposes only.
This evaluation version has a limit of 13500 inferences per
startup, after which an error will be returned. Commercial
Xnor.ai models do not contain this limit or this message.
Please contact Xnor.ai for commercial licensing options.

Well, there ain’t so such thing as a free lunch, right? But this only means that each time the program starts, it will only do 13,500 calls to the Xnor model before returning an error — if you sample at one frame every 10 seconds, that will last for about a day and a half before it needs to be restarted. (Of course, a commercial license will not come with this limit!)

Feeding images to the model

Okay, now here’s where the magic happens: we read an image from the camera and pass it to the Xnor model to detect people:

while True:  cam_output = stream.getvalue()  # This can happen if no frame has been read yet.
if len(cam_output) != SINGLE_FRAME_SIZE_YUV:
return None
# Split YUV plane
y_plane = cam_output[0:YUV420P_Y_PLANE_SIZE]
u_plane = cam_output[
YUV420P_Y_PLANE_SIZE:YUV420P_Y_PLANE_SIZE +
YUV420P_U_PLANE_SIZE]
v_plane = cam_output[YUV420P_Y_PLANE_SIZE +
YUV420P_U_PLANE_SIZE:SINGLE_FRAME_SIZE_YUV]
model_input = xnornet.Input.yuv420p_image(
INPUT_RES, y_plane, u_plane, v_plane)

# Pass the frame to the model.
results = model.evaluate(model_input)
print("Detected {} people in image".format(len(results)))

That’s it! The results that are returned from the model also include bounding boxes for each person detected. If you print out the results list you will see something like:

[BoundingBox(class_label=ClassLabel(class_id=191210132, label='person'), rectangle=Rectangle(x=0.802842378616333, y=0.20286493003368378, width=0.15309381484985352, height=0.26512062549591064)),BoundingBox(class_label=ClassLabel(class_id=191210132, label='person'), rectangle=Rectangle(x=0.3909456729888916, y=0.10030053555965424, width=0.1306557059288025, height=0.19471324980258942)),BoundingBox(class_label=ClassLabel(class_id=191210132, label='person'), rectangle=Rectangle(x=0.08110178261995316, y=0.09425853192806244, width=0.07710828632116318, height=0.20227928459644318))]

If you drew these bounding boxes on top of the camera image, you’ll get something like:

While I was posing for this image, one of my coworkers ran by and gave me a high five. Which I totally deserved.

The model I’m using doesn’t detect people who aren’t close to the camera, which is why the folks in the background aren’t picked up. I also need to focus the camera for my next glamorous photoshoot.

Displaying the results on the Unicorn Hat

For this demo, I hacked together some (pretty ugly) Python code to do a scrolling display on the Unicorn Hat HD display. Please don’t judge me by this code — it’s pretty terrible — but hopefully shows what you can do with this cool little LED matrix.

The Plotter class does the heavy lifting here. It has two methods: update() which takes a new person count value, and draw() which updates the display. update() simply records the timestamp and person count into a circular buffer. draw() cycles through several display modes: showing the current count, the max over the last 10 minutes, the max over the last hour, as well as a bar graph, a clock, and a banner image. Fun stuff!

For example, to show the max count of people detected over the last ten minutes, I do this:

def drawRecent(self):
unicornhathd.clear()
now = datetime.datetime.now()
vals = [val for (dt, val) in self.values
if now-dt <= datetime.timedelta(seconds=600)]
maxval = max(vals)
im = self.blueFont.drawString("{:02d}".format(maxval))
showImage(im, 0, 8)
unicornhathd.show()
im = self.redFont.drawString("LAST TEN MINUTES ")
scrollImage(im, 0, 0, WIDTH+1, -im.size[0], 0.02)

where utility functions like showImage() and scrollImage() are defined elsewhere in the code. I also wrote some hacky code to render bitmapped fonts, many of which can be downloaded at https://opengameart.org/.

The full demo code in all of its glory is here:
https://github.com/mdwelsh/teamsidney/tree/master/pi/personcounter

Going beyond

The AI2GO SDK has a bunch of other demo programs in the samples directory, including one that displays live bounding boxes on the camera image feed, another that runs object detection on a single image in a file, and so forth. You can also play around with the hundreds of other models available for AI2GO, including a food classifier, a pet detector, face detector, and more.

Let me know in the comments if you have any questions, and happy hacking!

--

--

Matt Welsh

AI and Systems hacker. Formerly at Fixie.ai, OctoML, Google, Apple, Harvard CS prof. I like big models and I cannot lie.