Possible methods of solving this question, understanding this captcha - python

Keep in mind that this is a challenge, I am not expecting a full answer, just a bit of help.
On the image archived you are able to see a complex captcha that I am attempting to solve using code.
The captcha has a column for # and another composed of a random string.
The captcha itself is actually identifying which string row has the different color in it, once you identify that then you can input the # of that string row.
For the image below the string row #4. has a smaller portion of str colored differently, and #4 is the correct answer.
How could I solve this programmatically?
I thought of using AI but I think that is overkill, I should be able to do this with OpenCV.
Note: The background is also a random image, it can be anything.
Fortuntely, the foreground is very noticeable and clear making it easy to ignore the background with code
I am mostly researching possible methods of solving this captcha, fortunately I think I'm on the right path, which involves using OpenCV, but I'm not sure which functionalities of OpenCV I should use for this.
The challenge requires the app to return an integer, in this case it is the # of the correct 'messed' up string.

Related

How to find an exact match of an image in hashed data with openCV

for my school project, I need to find images in a large dataset. I'm working with python and opencv. Until now, I've managed to find an exact match of an image in the dataset but it takes a lot of time even though I had 20 images for the test code. So, I've searched few pages of google and I've tried the code on these pages
image hashing
building an image hashing search engine
feature matching
Also, I've been thinking to search through the hashed dataset, save their paths, then find the best feature matching image among them. But most of the time, my narrowed down working area is so much different than what is my query image.
The image hashing is really great. It looks like what I need but there is a problem: I need to find an exact match, not similar photos. So, I'm asking you guys, if you have any suggestion or a piece of code might help or improve the reference code that I've linked, can you share it with me? I'd be really happy to try or research what you guys send or suggest.
opencv is probably the wrong tool for this. The algorithms there are geared towards finding similar matches, not exact ones. The general idea is to use machine learning to teach the code to recognize what a car looks like so it can detect cars in videos, even when the color or form changes (driving in the shadow, different make, etc).
I've found two approaches work well when trying to build an image database.
Use a normal hash algorithm like SHA-256 plus maybe some metadata (file or image size) to find matches
Resize the image down to 4x4 or even 2x2. Use the pixel RGB values as "hash".
The first approach is to reduce the image to a number. You can then put the number in a look up table. When searching for the image, apply the same hashing algorithm to the image you're looking for. Use the new number to look in the table. If it's there, you have a match.
Note: In all cases, hashing can produce the same number for different pictures. So you have to compare all the pixels of two pictures to make sure it's really an exact match. That's why it sometimes helps to add information like the picture size (in pixels, not file size in bytes).
The second approach allows to find pictures which very similar to the eye but in fact slightly different. Imagine cropping off a single pixel column on the left or tilting the image by 0.01°. To you, the image will be the same but for a computer, they will by totally different. The second approach tries to average small changes out. The cost here is that you will get more collisions, especially for B&W pictures.
Finding exact image matches using hash functions can be done with the undouble library (Disclaimer: I am also the author). It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.

Opencv remove ghosting distortion

I get a problem when handling images taken from cell phones.
Image sample:
So, get ghosting especially for the question number area.
I think the reason is a little joggle when press the shutter.
Is there any way to remove the ghosting thus question number area will look more clear?
There is another worse one:
Actually, I find some image denoising functions like cv2.fastNlMeansDenoisingColored(), and it indeed works well upon some images.
Unfortunately, doesn't work for the above two images.
Env: Python3.6.5 Opencv:3.4.0
Thanks.
Wesley

Draw an image into another one in imagemagick?

I would like to draw an image into another image with Wand (an ImageMagick binding for Python). The source image should totally replace the destination image (at given position).
I can use:
destinationImage.composite_channel(channel='all_channels', image=sourceImage, operator='replace', left=leftPosition, top=topPosition)
But I was wondering if there is a simple or faster solution.
But I was wondering if there is a simple or faster solution.
Not really. In the scope of wand, this would be one of the fastest methods. For simplicity, your already doing everything on one line of code. Perhaps you can reduce this with Image.composite.
destinationImage.composite(sourceImage, leftPosition, topPosition)
But your now compromising the readability of your current solution. Having the full command with channel='all_channels' & operator='replace' kwargs will help you in the long run. Think about revisiting the code in a year.
destinationImage.composite(sourceImage, leftPosition, topPosition)
# versus
destinationImage.composite_channel(channel='all_channels',
image=sourceImage,
operator='replace',
left=leftPosition,
top=topPosition)
Right away, without hitting the API docs, you know the second option is replacing destination with a source image across all channels. Those facts are hidden, or assumed, in the first variation.

Python Pygtk making colored tags in Text View dynamically

I have seen some ways of making colored text in textview in Python pygtk. the issue seems that it will just print text in that colour or make the whole line that color rather than for certain items make them a certain colour.
I want it to where I type "" that is will colour is blue. or if there is "string" in the text view it will be orange or any kind of
and if there is an '#comment' then it will be italicized and grey.
not sure if it helps, but I have a part where as I am typing it writes the text to a page. is it possible to to keep this kind of syntax coloring active?
I hope this makes sense.
any help is much appreciated! Thank you!
Use GtkSourceView for syntax highlighting. Don't reinvent the wheel.
In general, what you are looking for, I'd say, is to use regular expressions (re module, there are abundant of questions on this here...probably some for the exact patterns you need) to find the patterns you mention above in your TextBuffer. That means you need to connect a signal to the buffer so you see what the user types. Then you'll need a set of TextTags (one tag per formatting rule/pattern) to apply to regions of the buffer where the regular expressions match the patterns you've described. Finally you want to apply the tags to the buffer and those TextTags can reformat the text-display in the TextView in an array of ways (as the documentation says here).
Without any supplied code, it's hard to be precise on where you might be having a problem.
Hope it points you in the right direction...
Mind though that if you overwrite the GTK-theme, that another user could have a theme with e.g. orange background in the TextViews, so you should be careful with making sure that it will work visually independent of what theme you have.

Designing an open source OCR engine specifically for rendered text (screenshots)

So my current personal project is to be able to automatically grab screenshots out of a game, OCR the text, and count the number of occurrences of given words.
Having spent all evening looking around at different OCR solutions, I've come to realize that the majority of OCR packages out there are designed for scanned text. If there are any packages that can read screen text reliably, they're well outside this hobbyist's budget.
I've been reading through some other questions, and the closest I found was OCR engines designed for screen-reading.
It seems to me that reading rendered text should be much easier than printed and scanned text. Lines are always straight, and any given letter will always appear with the exact same pixel representation (mostly, anyways). Also, why not use the actual font file (if you have it) as a cheat sheet to recognizing characters? We might actually reach 100% accuracy with a system like this.
Assuming you have the font file for a cheat sheet and your source image is perfectly square and has no noise, how would you go about recognizing characters from the screen?
(Problems I can foresee are ui lines and images that could confuse any crude attempt at pixel-guessing.)
If you already know of a free/open-source OCR package designed for screen-reading, please let me know. I kind of doubt that's going to show up though, as no other askers seem to have gotten a lead either.
A Python interface is preferred, but beggars can't be choosers.
EDIT:
To clarify, I'm looking for design suggestions for an OCR solution that is specifically designed to read text from screenshots. Popular tools like tesseract (mentioned in the question I linked) are hard to use at best because they are not designed for this kind of source file.
So I've been thinking about it and I feel that the best approach will be to count the number of pixels in each blob/glyph/character. This should really cut down on the number of tests I need to do to differentiate between glyphs.
Regretfully, I'll have to be very specific about fonts. The software will only be able to recognize fonts at the right dpi, for the right font face and weight, etc.
It isn't ideal, and I'd still like to see someone who knows more about this stuff design OCR for rendered text; but it will work for my limited case.
If your goal is to count occurrences of certain events in a game, OCR is really not the right way to be going about it. That said, if you are determined to use OCR, then tesseract-OCR is a well-known open source package for performing optical character recognition. I'm not really sure what you are getting at with respect to scanned vs. rendered text, but tesseract will probably do as good a job as any opensource package that is available. OCR is still a tricky art, so I wouldn't expect 100% accuracy.
This isn't exactly what you want, but you may want to look at Sikuli.

Categories