My friend Sam DuBois has enlisted my help in building Pumpkin Man for Halloween 2020. Pumpkin Man is set to be a larger-than-life being with a 🎃 for a head who interacts with passersby. The project is reuniting Sam, Kai, and me, the team behind 2019’s Avenue Adventure. My primary focus is on developing the computer control systems to bring Pumpkin Man to life.
Pumpkin Man will be a roboticized creation that is operated remotely, so that he can interact with people without creating a COVID-unsafe situation. He will have a number of different controllable motions, including body motions and jaw movements. We’ll be performing live dialogue as Pumpkin Man, and we want his mouth to approximately sync up. Today I took a first stab at this feature. This post covers what I did and what I learned.
Skip to a section
Controlling relays via GPIO
I’m using a Raspberry Pi to control an 8-channel relay board. A relay, as I learned last week, is basically a switch for heavy-duty electronics that can be controlled by wimpier electronics. My Pi is the wimpy computer here, and Sam’s mighty pistons are the heavy-duty devices. A relay board contains some number of relays (ours has eight).
The relay board is hooked up to the Raspberry Pi via the Pi’s GPIO (General Purpose Input/Output) pins. This video was a very helpful guide that I followed step-for-step, keeping the Raspberry Pi GPIO pinout1 handy:
The creator of that video even provides helpful testing scripts you can run to make sure your relay board is working and properly connected. Since the board has LEDs below each relay to indicate its state, it’s easy to make sure they’re working as intended.
Once I verified everything was in working order, I wrote a thin wrapper
around the RPi.GPIO
library (the Python library used for controlling the GPIO
pins), discovering in the process that setting a pin’s voltage high disables its
corresponding relay, counter to what I’d expect2.
Because the Pumpkin Man robot need to be controlled remotely, I created a webpage with a button to open or close the mouth relay. I was surprised that the latency between pressing the button and the relay actuating was imperceptible, despite being mediated by a wifi connection.
Translating sound to movement
The next step was to figure out how to turn microphone input into relay state. I’m only working with a binary mouth state (either it’s open, or it isn’t), so I took a fairly simple approach based on the amplitude (volume) of the sound input. When the sound is loud enough, the relay triggers, opening the mouth. Otherwise, it closes.
I chose to do all of the audio processing on the client side (my laptop or phone) rather than on the Raspberry Pi because I was worried that throwing more work at the Raspberry Pi would result in increased latency, especially considering that my web server was running in Python to begin with.
MDN has a great guide to recording sound in the browser3, but unfortunately I couldn’t use it because it relies on the MediaRecorder interface, which is unsupported in Safari, which would make it impossible to control Pumpkin Man from my phone.
Instead, I stumbled upon this4 vanilla JS example of audio recording that works even in Safari. It even had an amplitude meter built in, which was perfect because it meant I wouldn’t have to figure out how to determine amplitude myself. The project was under the MIT open-source license, which let me copy the code and adapt it for my purposes!
I took the Javascript code nearly wholesale and added a callback hook on the amplitude meter so that I could get the amplitude periodically (about 22 times per second). There were some hurdles along the way567, but when I was done I had a relay that responded to my voice’s amplitude with low latency.
SSL on the LAN
For security reasons, browsers forbid non-HTTPS sites from requesting access to your microphone. This presented a bit of an issue, because I wasn’t originally planning on running my Pumpkin Man control site over HTTPS, and I wan’t sure how I could achieve this for a website that would only be available on a local network.
First, I tried a sketchy approach that technically worked8, but it was cumbersome and it required creating potential security vulnerabilities on a device before it could access the site. As it turns out, Let’s Encrypt (a project that gives away SSL certificates for free) supports alternate verification methods that allow them to issue a certificate for a domain name even if the site is not available over the internet. Based on this blog post9, I ran the following commands to get a genuine SSL certificate for my site:
sudo apt-get update && sudo apt-get install certbot
certbot --manual --preferred-challenges dns certonly \
-d localraspberrypi.reeshill.net
The process involves creating a custom TXT-type DNS entry containing a secret generated by Let’s Encrypt, which proves that I indeed control the domain. In normal circumstances, certbot takes care of automatically renewing SSL certificates, but it won’t be able to in this case since the verification type I chose requires manual intervention. That’s fine for me, since this certificate will last 3 months, long past Halloween.
The domain localraspberrypi.reeshill.net
resolves to an IP address on my
local network10.
Since it now has a valid SSL certificate, the site can access my microphone
and it just works for anyone on the local network.
If you want to check out the code, it’s open source here. When reading through it, please keep in mind that this is a purpose-built solution that prioritizes fast iteration over code quality or maintainability, since I won’t have to touch this code after October. 😁
-
which I would have discovered if I had looked briefly at the testing script. ↩︎
-
For some reason, in Safari on my Mac (but not on my phone), the amplitude did not dip below -3 dB or so (where normally a quiet room would be around -40 or so), and the recordings were extremely prone to clipping. If this behavior were just in the amplitude graph, I would have assumed that the fault lay there, but since it appeared in the recordings as well, I assume that there was a fault either in Safari’s handling of audio or the Javascript that encoded the audio to WAV format. ↩︎
-
For a while, the voice feature had much more latency than the plain button. I figured this was because the audio processing was computationally expensive, but this was actually the fault of an “optimization” I had made to rate-limit state updates to no more often than once every 100 ms. With updates coming in about every 45 ms, this meant I was essentially tripling my latency. Once I removed this brain-dead optimization, the experience was far more responsive, and I called Sam to share my genius fix. ↩︎
-
I also switched to WebSockets in an effort to reduce latency by eliminating the overhead of making a new request for each relay state update. I’m not sure that I noticed any benefit from it, but I don’t think it hurt, either. ↩︎
-
I didn’t think I could get a legitimate SSL certificate for my site, since it wasn’t available on the internet. The first thing I tried was creating a root Certificate Authority and installing it on my devices. Normally, when you visit a site like https://duckduckgo.com and it presents its SSL certificate to your browser, your browser knows that the certificate is legit because it’s been cryptographically “signed” by a trusted third party, called a Certificate Authority. By creating my own CA and trusting it on my devices, I was able to load my site using HTTPS, though it required acknowledging many (rightfully) scary messages. When you trust a Certificate Authority, you’re trusting it not to sign any phony certificates, or else you could end up sending sensitive information to an imposter. ↩︎
-
Let’s Encrypt Server Certificate via DNS Challenge (archived webpage) ↩︎
-
though it might be on yours, too ;) ↩︎