SERINDA update

I had some early success with Docker then I have no idea what happened it all went to crap on both my Windows 10 and Mac.

So I went and fired up my base Debian 9 in Parallels and took the Docker files I’d written and did very little work to create an install file.  I ran that in the VM and I was done. I had everything I needed and was up and running.

I took the first project for opencv4nodejs that I liked and used it as my base. I could have used my cherokeedictionary app, but decided not to for now. I might later, but this works.

I chose this code because it’s the first code I’ve ever seen that was given in an example that was truly written as object oriented.  Usually the developer throws some code together and done.  In this case, it wasn’t needed, but I appreciate the effort.

Here is the article and source. This code provides streaming video to the page. I’d done this before but it was clunky. As a matter of fact there’s another tutorial out there that uses sockets and the results are plain shitty.  The explanation is ok, but when you run it the result is so jittery it’s not worth the time to make it happen. This implementation is super fkng smooth.  Well done!  There’s some face detection code in the video portion as well as a second page that auto-loads an image and then detects faces of the people in that image.

My next step, I took some file upload code by opencv4nodejs author Vincent Mühler in his opencv-express sample code which you can find here. Again well written, easy to follow. A simple example of how to upload an image, then detect orbs, surf, sift, and find faces. I integrated this into the first code because a future step is to integrate the dnn work that Mühler talks about into this app.

Vincents articles are here: https://medium.com/@muehler.v

Alright… what else?  Well, as I’d said in my previous article here I did a lot of work on backend items for this project that were all experiments and tests learning about different frameworks and technologies.  That also means I wrote a lot of code much of it in Python.  I have OpenCV, NLTK, Tesseract, TextBlob, Tensorflow, Pocketsphinx, and much, much more code for various applications including interacting with a Raspberry Pi GPIO and various sensors for another project.

So, the first item I attempted to interact with was npm-tesseract-ocr – this one seems to be doing well except that I’m having an issue with the image getting to the backend and decoding it.  I shouldn’t be, but I am. So once I fix that then I’ll be fine.  I also interacted on the client side with Tesseract.js.  Now this was much more problematic. My sample image contains a grey background and slightly skewed text.  Tesseract OCR native has no issue converting it mostly reliably. Tesseract.js did not come close. To verify Tesseract.js was working I used their sample image and the text was correctly converted.  So, I’ll have to look at both of these and see what the limitations are.  I always have the option of saving the image to the hard drive and calling Tesseract natively and then sending the text back to the browser which is pretty much what the node version does.

I’m going to fix this one upload issue with multer because uploads will be a big deal. As will screenshots from the camera.

Also, as I go if I find code or an error somewhere I like to document that in-code. So if you see a comment with some link to stackoverflow or code then I used it for something. I try to document why because I like to know for later on. My code isn’t always pretty, but I like to use it as a reference for future works.

After I fix this one issue with the upload and display for the two tesseracts and allow the user to switch between them, then I’ll add a lot of the test code and convert it over to nodejs.  I’ll eventually write some better architecture for the extensible portions since the base is well done already.

What I mean by that is, I have an idea for the Command architecture so giving a command will return some information in a json response that the page will interpret and use to manipulate the gui (the browser) – now I may need to utilize the certificates such as one article had – but we’ll see.

I have a lot of code and a lot of thoughts. The first step is getting those ideas to work in this new architecture then architect the commands and NLP, etc.

My next article will be on the overall architecture – what I plan to use for hardware initially. The goal for hardware. And more.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: