Osiyo. Dohiju? Hey. Welcome back.
I worked really hard on trying to get opencv to work with what I wanted using web applications. Really hard. I was able to use opencv.js and opencv4nodejs to do some work. The issue has always been that I know the python side of opencv far better and both opencv.js and opencv4nodejs don’t implement everything that is available in the Python or C++ versions. I know HTML, CSS, and Javascript very well. What I wanted was a way to use the Python OpenCV libraries within HTML. I hadn’t looked at using Python in some kind of webservice before. Then comes Flask.
If you want to know more about Flask you can do the research on it. In addition to using Python OpenCV libraries within HTML I also wanted to be able to control aspects of my OS (Raspbian, Debian, even maybe Windows) using Python. The idea being that if I want to open say Spotify and load a playlist I say “Serinda. Open Spotify playlist 80s” and it would do just that. Even if I had to say “Serinda Open Spotify. Serinda open 80s playlist” that’s a big win. I’m fair sure I can do that with some of the automation frameworks out there for Python. I have a few to experiment with, but overall I’m more confident in this approach than I am any of the other approaches I’ve started and abandoned because they didn’t accomplish what I needed.
In past blogs I’ve covered the features of SERINDA. I’m going to give a brief overview of the main ones. Remember, SERINDA is an Intelligent Personal Assistant (IPA). The goal is to use speech-to-text (STT) to run commands. Those commands could be anything such as loading pdfs, google searches, opening and interfacing with applications and the OS, etc. There is also a text-to-speech (TTS) that will talk back to you with prompts. Tesseract for OCR in one of the many languages. OpenCV for object recognition (including faces, people, and gestures). Hand gestures to be used for some interaction in AR/MX environments and selecting sections to take pictures, crop, even select portions of documents to separate and save maybe even recompile into a new document. An example would be reading a PDF in AR/MX and then using your hands ala Minority Report to surround some text which is then moved in AR/MX space and can be put into a new document with citation of where it came from. One other function is using neural networks (NN) for object recognition and being able to take photos or video and train a NN for new objects. Even translate text in real time. Eventually, I’d like to add a way to track where my eyes are looking and then the camera will outline approximately where my pupils are looking. Using gestures I can designate a specific object to track.
These are lofty goals. However, I have already written an OpenCV app that will detect lanes in real time and when a road sign comes by on the left it will read off what the sign says. I’ve written much more in OpenCV. The point is that I’m pretty confident in my abilities to make these other items work well.
I’ve already started with a Flask service – I’ll add speech recognition and then build out more OpenCV plugins to perform specific tasks. The hardest part has always been gestures. I’ve created gloves with different color duct tape on the fingers and that worked ok in the past. I’d rather not have to wear special gloves, but for now, I’ll start with the gloves. My goal is to have Tensorflow 2, Keras, OpenCV, Flask, Redis, Tesseract, Speech Recognition (TTS and STT), and the ability to detect objects if not identify them by the 15th. I don’t think this is unreasonable. After that, I’ll make sure I can install everything from a new image and it’ll run correctly. At that point, I’ll open up the project on bitbucket for anyone to use.
I like that my 30 year old dream is taking better shape than it has been in the past. I’ve made a lot of strides in the last 20 years all building up to this version. I might be a little excited.
Until next time. Dodadagohvi.