ᎣᏏᏲ. ᏙᎯᏧ? Hey, welcome back!
Since my last update, which was 2 weeks ago, I have doubled down and worked quite a bit on the project. I’ve managed to make so many improvements that I’m finally able to work on the first enhancement that I wanted to do. Then, I’ll go back through and alter the code.
Let’s go over some of the features I have working:
– FastAPI – I refactored from Flask so I could use multithreading
– STT – speak and you can interact – it needs work, but it’s a great start. I have older code that I can pull from to fix some of this for like menu interaction. I also have a bunch of recognizers available that I’ve not tested fully, yet
– TTS – currently works with gTTS – I need to get other libraries to work, but that’s in the future
– OpenCV – currently doing some gesture detection and manipulation in python. This may change to Rust manipulation for speed, for now, it makes testing faster. I have a CameraPool I created to manage multiple cameras
– Mediapipe – I’m using this for gesture detection, right now, there may be more added to it later. On the backend, it’s working very well. I will need to alter the gestures so they can be created and not the ones that were added by default.
– commands – I have a CommandProcessor that can process verbal or text commands (via an input box on the page) and then run through the filters to perform a task
– config – there are two flavors of config: the first is a properties file for pretty much the whole of the software; the second is
– C code – I have included code in the startup that will compile the C code in src directory and produce a library (.so or .dll). If you’re so inclined to write some C code and then access that library in Python, you may do so. There is also an example of using a library in Python in the tests directory under “cFuncs.”
– Rust code – I have included code in the startup that will compile the Rust code in src directory and produce a library (.so or .dll). If you’re so inclined to write some Rust code and then access that library in Python, you may do so. There is also an example of using a library in Python in the tests directory under “rust.”
– fix intents either by working on the MIT version of snips and producing my own version or by working with some other libraries like pyjsgf, rhasspynlu, pocketsphinx, etc. Using SnipsNLU the python version is limited as the Apache licensed version hasn’t been updated in 8 years
– fix the code so that it doesn’t always request ‘camera 0’ in certain places
– better fit the STT so it does a better job
Note that these are just for the V3.1 release. There are many more features coming. The original goal of this release was winnowed down to a base framework and pretty much the full function of the release is to be able to wear a headset, interact with the entire screen using gestures, STT, and TTS, and read a PDF as well as manipulating the scroll, page move, etc via STT and gestures. The idea is that this proof of concept will make the backbone of this 7 year coding project complete and a major milestone in this, nearly, 40 year idea I’ve had since Blue Thunder, Airwolf, and more.
There are optimizations that I want to do, but I wanted to get 3.1 marked as the base version because the actual goals I had for the release are complete.
You can check out everything related to the project at http://github.com/cdrchops/serinda as well as the official zip file release and check up on milestones and see what’s done and what’s coming up.
Now that the basic functionality I wanted is done, I’ve got to review all of the pieces and make sure:
– they’re how I want them architected
– clean up code
– optimize some elements for speed
– document the code
– so much more
I’m in a groove right now so I’ll probably just keep working on functionality while I can.
Until next time. Dodadagohvi. ᏙᏓᏓᎪᎲᎢ.