Building a personal assistant PWA: Part 3

Offline capable doesn’t mean offline exclusive

Knowing that we can run offline and fulfill the basic goal we set for our chatbot doesn’t mean we should be limiting it to the assumption that it will only run offline. It’s a feature and a very important part of our app (and most other apps, IMO), but we can always teach our bot new tricks if we allow it to have access to online stuff–assuming we have a working connection.

One cool bit that we can add to a personal assistant is an additional mode of input. If one had a real-life personal assistant, they would usually communicate with them by way of speech. That’s exactly what we’ll do for our chatbot next–teach it how to understand speech.

To achieve this, we need to be online (unfortunately), as no browser vendor ships an offline-capable speech recognition engine with their browser just yet. Plus, there is the additional complication of some browsers having it, some browsers kinda having it, and some others downright not supporting the API. Fret not, however, we’ll do our best to make it make as much sense as possible.

The “wild west” that is in-browser speech recognition

There are ways to make a browser do speech recognition offline (e.g.: https://picovoice.ai/blog/offline-voice-ai-in-a-web-browser/), but we’re trying to keep things light and use as much of the standard browser APIs as possible, so we’ll be going the Web Speech API route.

A quick glance at the caniuse table leaves us with a somewhat optimistic, but also not-that-accurate picture. Chrome, Safari, and Edge support the API (with the limitation that you need to be online for it to work), but other Chromium derivatives don’t support the API. They do have it defined, but it simply doesn’t do anything.

Note: In theory, offline speech recognition via the standard API can be a thing. In practice, nobody seems to want to support that yet. Hopefully, that improves over the next few years as we make advancements towards big-and-open speech recognition solutions.

The Web Speech API

Since almost all of the WebSpeech API pieces are namespaces at the time of writing, using moz and webkit, we could do it the manual way–or we can just npm install a library that already does that. Luckily, there are plenty out there, so I picked vocal–reason being, I like the name. Of course, I had a look at the code beforehand, which amounts to a single class that wraps up all of the gnarly parts that one wouldn’t want to deal with in order to reliably use an API that happens to be prefixed.

So, to kick things off:

npm i -S @untemps/vocal

It doesn’t cover all our bases, unfortunately. If you happen to be using Brave, Vivaldi, or a similar Chromium-based browser, this won’t work and has to be polyfilled. While there are a bunch of those available, none of them (at least to the extent I was able to find) cover the Web Speech API verbatim. Quite frankly, it is a huge hassle to properly cover ourselves, given the need for API keys for various online services, different API shapes, stream wrangling…it’s quite a long list that really starts to make this whole undertaking lose its appeal. So, we’ll stick to the standards (such as they are), and again, I am accepting PRs on the GitHub repo, if you happen to be in the mood for doing a polyfill. 😉

With that out of the way, let’s dive back into code. We’ll need to make a few fairly small adjustments to make our bot understand speech using vocal:

1. Add a button and a new callback for that button to ChatInput.tsx

2. Tweak app.tsx to initialize, listen to changes, and mutate state based on data we get from vocal

With those two changes in place, our bot can now respond to voice input. Mind you, this isn’t always 100% accurate, but it works pretty well for this use case. All of the existing skills for our bot already work with this new input mode, as we just introduced a new fancy way of inputting strings, meaning that all of the code we had in place works as expected while not knowing or caring whether this particular input came from a keyboard or was spoken. Neat, huh?

Bonus round: Giving our chatbot a voice

Since our chatbot can now understand speech, it’s time we gave it a voice of its own, no?

Again, to make the whole ritual of init and detection easier and simpler, we’ll install a small package from NPM called easy-speech. As the case was with vocal, easy-speech wraps the Speech Synthesis part of the Web Speech API and normalizes the cross-browser differences, allowing for easier initiation.

We’ll get started by installing it:

npm i -S easy-speech

This set of changes is even simpler than our previous one. All of the changes needed are in app.tsx:

Note: You may get very different voice results than I’m getting. Truth is, these are dependent on at least the browser and operating system, so some variation is expected.

Packaging and deployment

With that done, we have a personal assistant that’s offline capable and can understand speech, if we happen to be online. Plus, it can reply back, if everything works right.

Next up, we may want to package and deploy it…and maybe more. Let’s have a look at all of the available options.

Packaging

…is probably the simplest part here, thanks to the Vite PWA plugin we added in Part 1. The only thing we need to do here is run

npm run build

and we’re good to go. We get our entire packaged application inside the ./dist directory.

Deploying online

Since we’re essentially dealing with a static site, the only thing we need to do is upload the ./dist directory somewhere and point an SSL-secured domain to the deployment IP address. There are plenty of options for this approach out there: Netlify, Vercel, S3, Digital Ocean…perhaps your own server? Trust me, it’s not as hard as it sounds.

Packaging for App Stores

One neat trick with PWAs is that nowadays, we can put them on app stores, too!–to varying degrees, depending on platform, but we’ll go over each possibility briefly:

Android (Play Store)

The Android deployment story is probably the best one available currently. Besides being supported in most major browsers via the regular install pop-up, you also have the option of promoting them to a Trusted Web Activity (TWA for short): https://developer.chrome.com/docs/android/trusted-web-activity/.

In order for your PWA to get on the play store, it has to cover a few more things, detailed here: https://blog.pwabuilder.com/docs/testing-and-publishing-your-android-pwa-to-the-google-play-store.

Microsoft Store

The Microsoft store is a close second here. If you have a properly working PWA, there are three additional steps you have to take in order to get it on the store (detailed here).

It is otherwise treated as a first-class citizen on the Windows desktop. The UI / UX story is, of course, dependent on how much you want or need to be close to the actual Windows UI. Luckily, there are plenty of options out there for that.

Apple (App Store)

Things aren’t great. Like I already mentioned in Part 1, I’m really not looking to get into that particular mess. In short, some parts of PWAs are supported on Apple’s platforms, some are not, and the criteria to get a PWA or a packaged PWA (see below) into the App Store are confusing at best. More details here.

Desktop platforms (other than Microsoft)

I’m putting these under a single umbrella, since they are largely driven by your choice of a browser. Things are generally good here, assuming you’re using a Chromium derived browser. Outside of that, again, things are not great. Firefox used to say that they were going to support that, but then changed their minds. Safari on the desktop simply doesn’t support any kind of installation of these.

A more detailed resource with all the details and gotchas here.

Packaging the app inside a dedicated browser wrapper

Finally, there is the option of packaging the app inside a chromium (or similar) wrapper, which allows the app access to more native APIs (if needed) and may provide a more desktop-like (or mobile-like) experience. There are plenty of choices here too, depending on your needs and preferences (in no particular order):

Capacitor– probably one of the oldest on the market that’s still maintained. In many ways, a successor to the now-deprecated Cordova and PhoneGap. Offers a bunch of native platform bridges and supports practically all modern platforms. Can produce builds for Android, iOS, desktop OS’ (via electron). It probably has the widest platform support of all listed here.
Tauri – newer on the market, less (current) platform support, but they are keeping one of the fastest development cycles I’ve seen in a project so far. It uses the built-in WebView of an OS which makes it very light. In less than 3 years, they managed to become a viable alternative to Electron while planning to support mobile platforms. too. Personally, I’m very excited for this one. Built in Rust, blazing fast, and with a big focus on security.
Electron – Good ol’ electron. Love it or hate it, it’s been around for a while now, and it gets the job done. Realistically, a lot of its bad rep is due to bad practices and assumptions that memory and CPU time are infinite resources (looking at you, Slack). Still, it’s a fine choice if you’re only looking to render a PWA without complicating things too much and only need to support desktop platforms. Honorable mention to NW – they did the same before it was cool. 🙂
Neutralino – Built in C++, does a very similar thing to Tauri, but uses WebkitGTK+ (where Tauri uses WRY). Lighter footprint, also a solid choice if you happen to be more into the design choices they have.

So, which one do I pick?

As much as I’d like a simple answer to this, there isn’t one. It very much depends on who your audience is and what platforms you intend to support. My go-to mantra for this is YAGNI (you ain’t gonna need it). Keep it as simple as possible. More often than not, it’s enough to do a proper PWA with good offline support–assuming that your installation path would be a pop-up for regular users. That limits discovery, of course. If you aim to get discovered via an app store and have to support mobile platforms, TWA wouldn’t hurt. If you have to be on the App Store, you don’t have a lot of choice, unfortunately, and you’d have to go the Capacitor route.

The takeaway would be: Start with the simplest possible common denominator, and build up as needed. Doing it the other way around is the path to much frustration (ask me how I know).

Takeaway

Browsers are pretty powerful these days, and show no sign of stopping (even though there is some “political” opposition to that). They make for decent app platforms, even though they aren’t good fits for everything. Next time you build something, consider a PWA as a serious contender. It’s a simple choice that can get you pretty far.

In this three-part series, we managed to build an offline capable personal assistant chatbot which understands human input, can give meaningful replies, and even understand speech and talk back–all inside the browser! You can freely use this one as a base for your own (it’s fairly extensible), and maybe even publish it via the methods listed above? I’d love to see someone do that.

Thanks for taking this three-part journey with me!

Interested in working with Darko, or one of our other amazing devs on Gun.io? We specialize in helping engineers hire (and get hired by) the best minds in software development.

Learn more