I don’t know what else to name it. I’ve been calling it SERINDA (after Serinda Swan) for so long I’m just going to keep it.
Serinda has been a dream project of mine for about 30 years. It started as the sci-fi of the 80s TV shows like Knight Rider, Airwolf, Blue Thunder, GI Joe, and MASK.
I had a dream to build an interactive helmet (because back then I could only dream of something as small as a helicopter helmet and not small enough as glasses) that could do a myriad of specialty items that you saw in TV shows. Track objects, record video, take pictures, listen on microphones, place trackers, zoom in cameras, interactive computer (keyboard and voice of course), alert you of threats like a missile detection hud, and a host of things that any teenager would think was awesome.
Many years ago I wrote a voice activated program in Java using CMU Sphinx to play songs among other things. It was fledgling and the grammar files were intense. A few years after that I rewrote that software with a screenshot based Java software that I could interact with. I’d say a command and it would use screenshots to interact with my computer.
Then two years ago I started work on a Python based project formed from the home automation project Jasper. I started transitioning from Jasper to a Groovy/java based solution for CMU Sphinx because Python wasn’t working as well as I wanted for the Raspberry Pi. Then I stopped work on that part because it seemed like I was spending a ton of time working on that part and installing files and nothing else.
So, I took time away from installations and the Raspberry Pi work and focused on the tools. Tesseract, OpenCV, NLTK, Tensorflow, Keras, and many, many more. I used a Debian 8 virtual machine in Parallels to work knowing I could learn but one day I’d have to come back and work on the installers.
The installations alone were in the 3GB range and there was no OS. I was having issues with everything. However, I made a switch from Python to Node.js and Express for my main server because I like that servers can handle more than a blocking loop in Python or Java. I also like that using Node.js I can manipulate my new GUI, the HTML page because that’s what I’ve done for 22 years. *whispers* Netscape Navigator Gold.
So I began porting my work over. Easier than ever, there are some ports that are wrappers, like the Node Tesseract OCR implementation (https://www.npmjs.com/package/node-tesseract-ocr) . Some are new writings like Tesseract.js (http://tesseract.projectnaptha.com/). What you have to look for is what you want and how to interact. In this case, both are incredible. However, Tesseract.js does not support Cherokee and I use that OCR a lot so, while I may install it, I may not use it. I’ll probably give a future person a note in the install that they could use it.
So now that I’ve said all of that and 30 years after I had a vision. I’m going to build my dream.
The good stuff
I’m using Docker to build my base image – it’s built on the opencv4nodejs base then I added all of my libraries to an image on that. My code is then an app running on that image entitled serindaproject/serindabase.
Express, webkit, speech, opencv4nodejs
Inside that app it’s running node.js express as the main app currently on port 3000. There is a mic in the browser, but I’ll end up supplementing later on. For now, webkit and speech are coming from the browser. Opencv4NodeJs is supplying streaming video and image manipulation.
These are for processing images and turning text into something readable. There are ways to use this in the Node.js framework so I don’t need to go to Python.
This is the initial setup. And this is how it’ll be setup.
The webpage will be a super dark grey or black for the AR goggles I have coming. The goggles are about 70 bucks, but just in case I’ve also ordered two different sets of near view screens to test. The dark grey or black background will keep the field in front (the normal field of vision) available for full view. On the background (sans bookmark bar, menu, address bar, etc) of the browser I will use that as my GUI canvas.
Because Node.js can interact directly with the OS using some commands and get back data I don’t need to worry about what I use where. I can install native utilities that I can interact with later. So if I want to know the battery level I’m sure there’s some utility that I can get that information from and access it in Node.js.
I have multiple super small USB cameras coming. They will be mounted on a helmet rig with one of the AR systems that I’ll decide on. I haven’t decided about IR LEDs yet.
The display will be based on the AR system. One system will be two small screens split by HDMI. The other will be one screen that I will have to manipulate to split. I’ll deal with that when I decide which one I like better. In either case, the full browser should be visible without any issues. Display and readability is going to be of concern, but may not be an issue.
Next, the HMD put together will run off of one device that I’ve yet to decide on. Initially it’ll be a larger computer, then I’d like to get it down to something smaller. Again, I’m working on the getting it running the way I want and then I’ll worry about the other parts.
Alright, so GUI is a Chrome browser with black or dark grey background. The elements on the display will be tracked so if the browser refreshes or something then the elements will be back where they were.
The voice commands will be defined to control display of the views, whether a camera feed is shown or not, whether the feed is captured or not, if the feed is passive – meaning it’s just on, or active meaning it’s still running algorithms assessing for whatever we’re wanting it to do (say we want it to be looking for a specific object we’ve lost around the house, or a kid we can’t find in the store).
The voice commands will also be defined to control Tesseract and the speech read back. Or even “open a pdf and read it to me” – eventually the active gaze will make the reading part easier. This will be much easier now with the components available.
I know, it seems like a lot of stuff… but I’ve spent the last 20 years working on game code, OpenGL libraries, CMU Sphinx work, Tesseract, NLTK, Java, and much more and this stuff, while it may take more than a weekend to perfect – this is easier than all of that. 30 years ago what would have taken me hundreds of thousands of dollars to do I can do with about one thousand dollars of equipment total including all of the cameras and the computer.
There’s a second part to all of this. Since Serinda was meant to be a personal assistant and pure awesome… the rest hasn’t even been discussed.
After the initial part there’s more:
Natural Language Processing – processing through Syntaxnet, Tensorflow, NLTK, what? exactly how do I want to process the speech in order to get to my commands from what someone said? My goal is that this is offline so everything I do is constricted to non-net no google use here.
Next there are utilities like some sort of active gaze that allows you to read without moving some sort of joystick or gesture.
There are gestures – that’s an integral part of this is being able to use gestures to do commands. smart glasses have gestures on the side. I want something natural while you’re wearing this giant helmet from Space Balls.
Finally, I’ve had a dream of a 3D AR IDE for coding. Where I can edit while standing and not be tied exactly to a computer. Where I can move through space like Minority Report and find code then using gestures, select code and copy it then paste it where I need to, maybe audibly code, maybe air type, who knows. But I have a dream.
And while all of this says 1000 bucks for everything (that’s my hope). I’m going to release my work for free. I always release open source MIT. I hope others find it useful.
This is just the first blog of the project. As I port my original code and tests over and build the HMD and AR displays and test stuff out I’ll post vlogs and more blogs here. As well, I’ll post links to the latest git repos, branches, and docker files.
There’s a whole lot that I’ve not said about this project. This is just the introduction.