ᎣᏏᏲ. ᏙᎯᏧ? Hey, welcome back!
There has been so much going on with this project that I’ve been too excited to want to sit down and write a blog post about it. In my last update, I was having issues with the server getting bogged down and some other things. What I’ve done is broke through those issues and made so much progress.
The primary and original goal for the first release is to have an HMD with HUD that I can read a PDF and move through the PDF with voice commands and gestures. That’s it!
There is so much more to do, however, this is all I want for the first release.
What is the status? The dual display works and can be adjusted with minor pixel additions or subtractions to match what looks best for you. Gestures work. OpenCV and BabylonJS work in WebRTC and in Python. I’ve written a PDF plugin that needs to be integrated. TTS and STT are complete and also need to be integrated. And yesterday, I changed from Flask to FastAPI for the multithreading as I believe that’s the primary issue with why Flask slowed down.
I am so very close, after close to 40 years of thought, to having a working prototype. What’s left to do for a V3.1 release? Testing FastAPI, integrating TTS and STT, removing the frontend WebRTC code, integrating the PDF plugin and I think that’s it.
I’ll update my plugin framework in V3.2 and surprisingly, the issues for V3.3 are already complete, so I’ll move those into this V3.1 release and add items to V3.3 from the backlog.
For example, I will need to calibrate the dual cameras so I can use some version of SLAM/VIS reliably for this new “world” that’s being created. As well, calibrating the cameras means more reliable gestures and better guestimation on the part of the application.
I came across this article and if you look through it you’ll see how much of the work is very, very similar to what I’ve done. Note: I’m not saying they copied me, I’m saying that our works are very similarly approached. The difference is they are using Unity. I’m going to go over those differences.
As you know, I’ve thought about Unity a few times and even thought about how to use it to create intricate billboards. I’ve ultimately decided, for now, that BabylonJS is where I want to go. In the future, it may be nice to use Unity for some of the graphics; I’m not sure. I haven’t used Unity and I’ve looked at it, but never too in-depth. I know a lot of AR/XR/MR projects use Unity and it’s very popular. If I could use it to display and still manage the work of the SERINDA project via a webapp then I’ll see if I can integrate further.
The point, though, is that it’s nice to see other works parallel your own AND have novel and inspired approaches that you hadn’t thought about. It gives me something to think about for the future if I want to delve into that more.
It’ll be a few days before I can get back into this section of the project, but I hope to complete my refactoring of WebRTC code and Python backend as I had first by like the 15th. We’ll see. I’m very excited for these next steps and how close I am to my first fully integrated and working prototype. I cannot wait to test it; it’s been a long road that is well worth it.
Until next time. Dodadagohvi. ᏙᏓᏓᎪᎲᎢ.