Captcha Image with Python Mechanize - python

My question is very similiar to this one
The difference is that the captcha is generated by server at request-time with some kind of authentication(that must come from a javascript request) or whatever. I know this, because if I follow the src attribute of the captcha image the server gives me a big 404 right on me face.
If I open it with mechanize, it will probably "see" this image(so, it means the server already generated). Since I can't generate the captcha manually, I'll have to seek for this image inside what mechanize gives me.
So, If I could get a binary response of what mechanize is "seeing" I could decode the image and build it with base64(don't know if that's right, I saw a pdf-to-html converter converting an image for a base64 representation), right?
Is there such a way? Or a similar approach?
Thank you VERY much in advance!

Related

Empty image when scraping using urllib in Python

import urllib.request
url = 'https://cdn.discordapp.com/avatars/305196810048110603/f31411d41b42b65a0b6eca686dd67b08.webp?size=1024.jpg'
pic = open('abc.jpg', 'wb+')
pic.write(urllib.request.urlopen(urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})).read())
pic.close()
So, I've browsed some questions on here and this is my current code which fakes a browser since discord doesn't like people downloading avatar pics.
The problem I'm having is that the image seems to be blank, which means that there is something wrong with my usage of urllib. I would appreciate any help I can get
Try opening the image in Chrome. It seems to work fine there. It seems the image is not actually a jpeg, but WEBPVP8/RIFF. I haven't encountered that before.
Here is a page explaining how to convert riff files to jpg/png
http://www.freewaregenius.com/convert-webp-image-format-jpg-png-format/
I know there is a python-imagemagick library, that might be useful for automating / coding the conversion.

Image Uploading Script in Python

I was trying to make an image uploading script for Postimage.org.
I tried searching for an API but it seems that there is not any available. Can anyone help me how to make this script ? I don't have any idea how to make the uploading proccess? I think that something that i should do is open the image file in read binary mode ("rb").
Anyway i am waiting for your suggestions and ideas.
Firstly you should think about what happens when you press the upload button on the website. What your script could do is mimic this functionality, because essentially all it's triggering is a POST request to the web server with the specified information in the form and the image file data. You can initiate HTTP requests (e.g. GET, POST, etc.) using a library such as Requests (http://docs.python-requests.org/en/latest/index.html).
However, as this seems to have been discussed before, I will instead point you in the right direction: Send file using POST from a Python script

Python/Mechanize: Preventing images from being downloaded/opened

When I open a page, I'm assuming all images are also downloaded with Mechanize.
Is there a way for me to just retrieve the source code?
If mechanize doesn't allow for this, is there an alternative that does?

Grabbing a .jsp generated PNG in Python

I am trying to grab a PNG image which is being dynamically generated with JSP in a web service.
I have tried visiting the web page it is contained in and grabbing the image src attribute; but the link leads to a .jsp file. Reading the response with urllib2 just shows a lot of gibberish.
I also need to do this while logged into the web service in question, using mechanize. This seems to exclude the option of grabbing a screenshot with webkit2png or similar.
Thanks for any suggestions.
If you use urllib correctly (for example, making sure your User-Agent resembles a browser etc), the "gibberish" you get back is the actual file, so you just need to write it out to disk (open the file with "wb" for writing in binary mode) and re-read it with some image-manipulation library if you need to play with it. Or you can use urlretrieve to save it directly on the filesystem.
If that's a jsp, chances are that it takes parameters, which might be appended by the browser via javascript before the request is done; you should look at the real request your browser makes, before trying to reproduce it. You can do that with the Chrome Developer Tools, Firefox LiveHTTPHeaders, etc etc.
I do hope you're not trying to break a captcha.

Is there a way to save a captcha image and view it later in python?

I am scripting in python for some web automation. I know i can not automate captchas but here is what i want to do:
I want to automate everything i can up to the captcha. When i open the page (usuing urllib2) and parse it to find that it contains a captcha, i want to open the captcha using Tkinter. Now i know that i will have to save the image to my harddrive first, then open it but there is an issue before that. The captcha image that is on screen is not directly in the source anywhere. There is a variable in the source, inside some javascript, that points to another page that has the link to the image, BUT if you load that middle page, the captcha picture for that link changes, so the image associated with that javascript variable is no longer valid. It may be impossible to gather the image using this method, so please enlighten me if you have any ideas on this.
Now if I use firebug to load the page, there is a "GET" that is a direct link to the current Captcha image that i am seeing, and i'm wondering if there is anyway to make python or ullib2 see the "GET"s that are going on when a page is loaded, because if that was possible, this would be simple.
Please let me know if you have any suggestions.
Of course the captcha's served by a page which will serve a new one each time (if it was repeated, then once it was solved for one fake userid, a spammer could automatically make a million!). I think you need some "screenshot" functionality to capture the image you want -- there is no cross-platform way to invoke such functionality, but each platform (or desktop manager in the case of Linux, BSD, etc) tends to have one. Or, you could automate the browser (e.g. via SeleniumRC) to "screenshot" (e.g. "print to PDF") things at the right time. (I believe what you're seeing in firebug may be misleading you because it is "showing a snapshot"... just at the html source or DOM level rather than at a screen/bitmap level).

Categories