Osiyo. Dohiju? Hey, welcome back.
Let’s start with a little about me. I have been a software developer for 24 years or so. I have worked with C, C++, Java, Python, Groovy, OpenGL, game engines in C/C++ and Java, and much more. I have worked with CMUs Sphinx project and many other Text-to-Speech (TTS) and Speech-to-Text (STT) projects. I have worked for big name companies on many web projects. I have studied 26 written and spoken languages not programming languages. My biggest language work has been with the Cherokee English Dictionary online (http://cherokeedictionary.net). I am an Operation Iraqi Freedom veteran. I have a bunch of kids and a fantastic life partner.
A little about the SERINDA project. I have had a dream since the late 80s to develop a Head Mounted Display, or HMD, that could interact with the world in a way that was only partially dreamed about at that time. Movies like Blue Thunder, Airwolf, and The Last Starfighter, The Abyss, among other movies, were my inspiration. I dreamed of a way to interact with the world. My first patent filing was for a project entitled “monohelm” because I couldn’t think of a cool name at that age. I didn’t realize the technology wasn’t there to be able to do what I wanted. Research and development alone was going to cost 400k and there was no guarantee I’d have a working prototype. Even then some items like cameras and displays were only rudimentary compared to even 10 years ago. Fast forward to 10 or so years ago I wrote the first version of this project in Java using Sphinx, later I would add Sikuli. It did ok for some aspects. But it was just to interact with the computer. It wasn’t wearable at all and I had to do a lot of manipulation with Sikuli and I couldn’t work and use it. I’d have to say a command then not type or use the mouse until Sphinx and Sikuli finished. I had already been doing OpenCV work in Python. Thanksgiving, 2016, I realized a lot of the technology was available to do what I wanted. Projects like Jasper were out for home automation. The problem with many of the projects was the setup. I spent more time attempting to configure a project to work than I did actually working on the code I wanted to work on. I moved from Jasper to a NodeJS project using OpenCV4NodeJS which is a fantastic port by Vincent. I thought I could use NodeJS for the display and then use GPIO for other portions. Again I spent more time debugging what I did wrong with OpenCV4NodeJS than working on code. Don’t get me wrong; OpenCV4NodeJS is fantastic, it wasn’t designed to be used the way I was using it. It was at this time I was joking with my SO about Serinda Swan since we’d just watched Graceland and Breakout Kings and she suggested I name it after her. From this point on I continued my OpenCV work in Python and worked on Debian, Raspbian, Mac, and Kali to do various tasks. Then I asked myself, “Why should I port all of my code to another language and flavor when it works now? Maybe there’s a Python webserver.” At that point, I switched to Flask and development has been moving at a rapid pace.
What is the intention of SERINDA? What is it’s purpose? SERINDA has been thought of and designed as a Mixed Reality Intelligent Personal Assistant. There have been so many projects that say “I want EDITH glasses” and then they make something similar. They don’t make the thing that they should. They are limited by their own perception of what the technology is and not what they could do with the technology today. For example, I was limited because I thought I needed a transparent OLED (TOLED) so I could see through the screen. I have since found that I just need any way to display it and simple technologies for prototyping exist like the Vufine AR Kit, the HoloKit, and some Chinese AR headsets. I can still accomplish my display without a 3500 dollar HoloLens. Others are limiting their work to Arduinos, Raspberry Pis, and Monochrome displays. That’s fine for them. They’re good starting points. The limitation (which is the same one I had for years) is that you must use whatever display you can put from the hardware. When in reality you can use nearly anything for the display.
Let’s talk about the display. Differences between VR, AR, MR. SERINDA, as I said, is a Mixed Reality (MR) environment. So what does that mean compared to VR and AR environments. I have covered this in previous posts, but I’ll give a very quick summary. Virtual Reality (VR) is all encompassing. That means everything you see is generated. The landscape, interactions, all of it is generated. Augmented Reality (AR) is simply added display over the real world. So you can see the world as it is, and if there is a marker (like ArUco, or even recognizing an image of some kind) then additional information will be shown in the field of view. Like a 3d popup in your view. Mixed Reality (MR) is marrying AR and portions of VR together. You see the real world like you do with AR. There are augmented (or added) visual or audio items in that world with the aid of markers or without markers. And you can interact with the world around you just like you can in VR. So you get a real world you’re interacting with and enhancements to that world that you can interact with. That is the purpose of SERINDA – Enhancing the real world with immersive technology.
What is it that I hope to accomplish with this project? An affordable, easy to use, mixed reality setup that will enhance peoples lives. It’s that simple. My original goal was a system that could enhance my life and be used in the smallest possible pocket computer and used offline. So translations, TTS, STT, OCR, all of it would be done offline with no need to connect to the internet at all. That would still be a fantastic achievement. What I’ve done instead is allow some concepts to be connected to the internet such as translations I’ll allow them to go to Google for now. Pretty much everything else should be housed on the computer that is used. Information such as searches, or weather, or anything where data must be pulled from an external source is also allowed to connect to the internet for. As with some neural network pre-trained models will be downloaded as needed instead of on the initial load or setup. Because of these self-imposed restrictions, as it were, I would really like all of this to run with minimal hardware. Such as a Raspberry Pi, 5″ display, a couple USB cameras, a bluetooth or hardwired earphones with microphone, and whatever viewer makes the most sense. The total cost should be under 100 bucks or so, IMO, for something fairly descent.
What are my hardware choices? I have been testing with a Raspberry Pi 3B v2, LattePanda Alpha, Mac, Windows 10, and Kali. The display choices are the HoloKit, Vufine AR kit, and this other one. Interaction with the hardware is with OpenCV and gestures, STT, and I’m working on LeapMotion integration. Some of these features haven’t been integrated into the bitbucket project, yet.
Let’s do a quick peek through the project source to give any developers an idea of what’s in there and how it all works. There is the installers directory that contains installers. This hasn’t been cleaned up for installation on different systems so depending on which system you’re installing on you may need to read through the commented code in python3Install.sh and some of the others to install specific packages for your machine. I will be working on an installer that detects your system and runs the appropriate commands. The README contains the basic startup information and a list of all of the items I am adding or have added to this project and a todo list that is the things I need to do more immediately. Here is the plugins directory for making your own plugins, or adding filters to OpenCV, all of the templates from the plugins subdirectories are added to the display.
I think that’s probably enough for now. The next item I’ll work on is building a simple display rig so I can test the MR a little easier than holding my hands up in the air. This will also make it so I can test the gestures better.
NOTE: As it turns out some of the audio didn’t record for the screen captures so I’m recording audio. I’ll post the whole video when that’s done.
Until next time. Dodadagohvi.