I’ve put this off a few days while I worked on a bunch of different aspects of the code. I wanted to integrate some pieces into one page. I wanted to do a bunch to the code. I ended up reverting a bunch of code and sticking with plain stuff for now instead of complicating it for the moment.
So, this is about the architecture.
The code is currently installable via the shell script on a Debian 9 machine. I’ve not tried to install it on anything else… at all. I may try a VirtualBox later or my Pi 3.
The first hurdles that come with the architecture are knowing that I may not want the STT to listen all of the time – so I’ll need a way to turn that off, on, push to talk, and listen continuously.
As of this moment, Serinda will open a menu and follow some links that I’ve given in a static command processor. I want to be able to extend the command processor and interpreter so that it’s much better. String formats maybe. Grammar files maybe. I’m not sure yet.
- listening for input
- waiting for user input through clicks
- Watching for input through video recognition
- Taking input from the host system through whatever – weather battery etc
- Output to the GUI
- Processing input
- Processing display
Current Implementation (as told from the last post)
The user’s speech is converted to text on the page
The text is then sent to a route in node.js that is evaluated for any condition that would evaluate to a command
A command is generated for the page
That command is sent back to the page
In the success portion of the ajax call the command is sent to a command processor that runs our custom event.
This isn’t ideally what should happen but it is what’s happening for the time being
The architecture in the future may actually use a separate server instance to run pocketsphinx or even a clustered pi (or a whole other sbc).