I’ve spent the last two days (well actually it’s been about a week) feeling quite overwhelmed. I’ve managed to do what I usually do. I see the task list I create and I overwhelm myself then I can’t focus on the tasks. I used to think I was alone in this aspect of programming. I don’t think I am.
So, I need to project manager myself and do what I would normally do. It would help if I had another member or two on this project to bounce ideas off of (programmer’s rubber duck sometimes). Especially if someone had the experience with a lot of these technologies I’m trying to catch up with. So let’s look at the list from this post.
- Migration from AngularJS to Angular 6 for my MEAN stack
- Migration to SQLite from MongoDB
- Decide how I want the AR aspects (Three.js, AR.js, AFrame, even Unity or Vuforia) to interact with the user and vice versa
- Leap Motion and LeapC (Javascript – SDK 3) – Python
- Tensorflow
- NLP
- Tesseract vs PyTesseract vs TesseractJS**
- Porting all of my OpenCV code and examples to this project.****
** Tesseract, when tested, natively gives me a read on a specific image I test with. PyTesseract and TesseractJS do not.
**** All of these examples whether they would be in Python, C++, or opencv4nodejs (99.9% Python) are geared towards running once and not for chained filter use. So these will need to be slightly modified to have an entry point then filters turned on and off and the resultant feed or image sent to the browser.
I don’t think I’m alone as a developer/programmer in the things I do in the way I’m doing them. That is part of the reason I’m documenting this process. My brain runs so fast all of the time. In one moment I’m writing code to utilize OpenCV for making my life easier and in turn releasing it to the world in hopes that someone might be able to use it to make their life easier. Then I watch the movie First Man and am reminded of visiting Cape Canaveral where I met the guy who came up with the idea for painting the black lines on the rockets so they could figure out how fast they were going and spinning. Which made me think of barcodes and stickers and cards used for Vuforia and AFrame which I thought you could then stick or put on furniture (with safe stickers of course). Which those stickers could then be used with an AR app or website to learn a new language where you point your app/camera at the code and it pops up the new language word for that item and maybe allows you to click what that would sound like. Now, if you thought that was exhausting to read. That happens all day, every day, even while I sleep.
I said that to say this. There are a great many details in software that are givens for me. Such as how I see the code in my head. Or how some of it just makes sense. Or how I might code subconsciously and not remember what I just worked on. One of my pitfalls is that I shoehorn myself into a predicament that I shouldn’t have and then I don’t look for a way out because I’ve already put blinders on. Such as, I originally wanted this to work on a Raspberry Pi and worked for a long time to get features to work on it. Then looked to slightly more expensive hardware to run it. Eventually, I did more research on hardware than I did coding. Then I knew I wanted this to be a MEAN application (minus the MongoDB so more a SEAN with SQLite) but instead of saying “what can I do with Python that I’ve already got code for?” I said “what opencv is out there for nodejs?” And I can do it all with node and minimal Python installs on a PC – probably even a Raspberry Pi.
I think my first step should be taking the Angular 6 stack that I’ve already got working with the base SERINDA and I should get streaming video to work from Python back to the browser. This is an intelligent personal assistant plus-plus, however, a lot of functionality is in video and image processing – the camera(s) need to be able to do work passively and actively.
Next, I think since the /say/ portion is working, for now, I’ll leave that alone and focus on image processing. I can do image processing in opencv4nodejs – however, if I’m already making Python calls, I might as well think about having all of my image processing done in Python. So, I’ll consider this as well, especially since most of the best libraries are already in Python or C++ and not Javascript (yet).
SQLite swap from MongoDB. This should probably be done sooner than later. The biggest reason for this change is that I don’t want another live service running on the computer. The second biggest reason is that I don’t want to use MongoDB. I’ve heard good things, doesn’t mean I want to use it. If I make the switch now or at least set up swappable configuration then I can switch it out however I want. So, while this may seem like something that should be number one it’s number three because I don’t want to do the work. I’m lazy, that’s the truth. I don’t want to get in and do the work on this because I think it’s going to suck and I’m not ready to embrace the suck. However, if I have video running first with image processing then while I’m working on this I can be switching between another filter and swapping this out. Which, in all honesty, is probably like 8 lines of code.
Tensorflow, Deep Learning – Natural Language Processing, Neural Networks, and Deep Learning are really in the same boat. I’m going to need neural networks for images, video, and language. All of it is intertwined. There are a bunch of other pieces here that will have to be created one at a time such as image training catalogs to be able to take stills and use those to train for someone in real time. Or scrub through video and take stills where directed and train for that person to find them in footage or in real-time. That kind of thing. These are ideas. Still, considerations to think about. I write all of the ideas down. Then when I start building I know what to accommodate for even if it’s not realistic at this moment.
Natural Language Processing is really for the speech to text. Knowing what the user says and turning that into something that is actionable by the computer.
Tesseract is also important. I want to be able to use the image detection of Tesseract to take a camera capture and get me the text so I can use it in one of my future AR works. However, It’s not at the top of the list of items I need, yet. If I can get it to work in 1-2 days easily (so like <6 hours total) then I’ll take it. I’ll save it for somewhere down the road probably just before Leap Motion.
AR, MR – Augmented Reality with Unity, Vuforia or web-based AR.js, AFrame, etc.
Leap Motion I’m saving for the very last. This is the hardest one to do because I’m going to have to use the big heavy laptop with the giant fan that’s 6 years old to do testing with. I want everything neat on the LattePanda… but no, alas that’s not going to happen, yet. So, for now, I’m saving this to the end. Maybe, by the time I get to this point, I’ll have an a-ha moment where I can take on me or some other lyric in the song that will help me resolve speed and I can integrate gestures sooner than later.
In the meantime, that covers all of the items so then it becomes a practical matter of reading about the next item on the list. Fortunately, I know a lot about OpenCV, SQLite, and Tesseract so I can skip those for the moment. I’ve done a lot of reading about Angular 6 (and 7) so I’m not hands-on practiced with many aspects, but I can get around. So, I’ll go through my many, many books on Deep Learning, NLP, and Unity and give those a read. I own 96 courses and books on OpenCV, Deep Learning, Tensorflow, Unity, Vuforia, Augmented Reality, Angular, and React Native. So I have a few to choose from.
After all of these are done, then I’ll look into OpenFace, OpenPose, active gaze, lidar (for the next project), and another project that expands on this SEAN stack and will use Python and Node together as well on a group of Raspberry Pi.