Empty image when scraping using urllib in Python

Empty image when scraping using urllib in Python - python

import urllib.request
url = 'https://cdn.discordapp.com/avatars/305196810048110603/f31411d41b42b65a0b6eca686dd67b08.webp?size=1024.jpg'
pic = open('abc.jpg', 'wb+')
pic.write(urllib.request.urlopen(urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})).read())
pic.close()
So, I've browsed some questions on here and this is my current code which fakes a browser since discord doesn't like people downloading avatar pics.
The problem I'm having is that the image seems to be blank, which means that there is something wrong with my usage of urllib. I would appreciate any help I can get

Try opening the image in Chrome. It seems to work fine there. It seems the image is not actually a jpeg, but WEBPVP8/RIFF. I haven't encountered that before.
Here is a page explaining how to convert riff files to jpg/png
http://www.freewaregenius.com/convert-webp-image-format-jpg-png-format/
I know there is a python-imagemagick library, that might be useful for automating / coding the conversion.

Related

Python weasyprint convert page to pdf problem!【from china】

I try to convert cnn.com pages to PDF by Weasyprint, it actually works but with some unfriendly, the pdf headers are always been covered with a black block, it annoys when the content are covered, does any body know how to remove the annoyed things, likes definite a CSS sheetstyle? sincerely appreciate!!!
You can repeated the problem with any article from cnn.com.
or recommend a better converting tools, I ever try pdfkit, but it cannt download the full page with 'readmore' button are always
always display, even UserAgent has been append in http headers.
Those tools works different among websites, weird
enter image description here

import weasyprint
url = 'https://edition.cnn.com/2021/07/23/tech/taiwan-china-cybersecurity-
intl-hnk/index.html'
weasyprint.HTML(url).write_pdf('1.pdf')
that is my codes

Trouble submitting image URL from Reddit using requests.get()

I am trying to submit an image URL from Reddit.com to a vision API using requests.get() in Python but I am running into difficulties in what could be a simple error on my part. The requests.get() request is successful when the link points to an explicit *.jpg, e.g., https://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Beef_fillet_steak_with_mushrooms.jpg/800px-Beef_fillet_steak_with_mushrooms.jpg, but unsuccessful when the link points to what I perceive to be a soft link, e.g., https://preview.redd.it/9xu97c5snpr51.jpg?width=640&crop=smart&auto=webp&s=e68c02166f6fd21a47a957b187b98b92608f54a9. Note that when pasted into a browser, both links work fine.
Does anyone have a suggestion for how I might preprocess the second link so it is handled like the first link? I would like to eventually have this code run remotely, so avoiding having to download the file locally is preferred.

From the documentation on: https://requests.readthedocs.io/en/master/user/quickstart/
You can access the response body as bytes, for non-text requests:
from PIL import Image
from io import BytesIO
i = Image.open(BytesIO(r.content))

How can i get someone's profile pic on Discord to edit it using PIL?

I'm trying to make some code on python to edit someone's profile pic, but all I've got so far is this:
image = ctx.message.author.avatar_url
background = Image.open(image)
Apparently that just gets the URL itself, but i need the image itself to edit a picture with PIL. Any insight on how to get it?

with requests.get(ctx.message.author.avatar_url) as r:
img_data = r.content
with open('image_name.jpg', 'wb') as handler:
handler.write(img_data)
So I played about with this link a bit:
https://cdn.discordapp.com/avatars/190434822328418305/6a56d4edf2a82409ffc8253f3afda455.png
And I was able to save my own avatar image (the one I use for my accounts everywhere). I was then able to open the file regularly with the photo viewer app within Pycharm.
After, it would simply become a case of opening the new jpeg file with PIL or pillow instead of trying to open anything from a website, if that makes sense.
You should consider that this will save a file onto your Discord bot server, so this is extremely crude, a malformed or maliciously formed jpeg file could lead to some sort of remote vulnerability.
Furthermore to your comment, if you want the size of the image you download to be bigger, for example, please see the amended link below to solve your problem there:
https://cdn.discordapp.com/avatars/190434822328418305/6a56d4edf2a82409ffc8253f3afda455.png?size=<Number from list [16,32,64,128,256,512,1024,2048]>
Hope this helps :)

Captcha Image with Python Mechanize

My question is very similiar to this one
The difference is that the captcha is generated by server at request-time with some kind of authentication(that must come from a javascript request) or whatever. I know this, because if I follow the src attribute of the captcha image the server gives me a big 404 right on me face.
If I open it with mechanize, it will probably "see" this image(so, it means the server already generated). Since I can't generate the captcha manually, I'll have to seek for this image inside what mechanize gives me.
So, If I could get a binary response of what mechanize is "seeing" I could decode the image and build it with base64(don't know if that's right, I saw a pdf-to-html converter converting an image for a base64 representation), right?
Is there such a way? Or a similar approach?
Thank you VERY much in advance!

Windows Python How to download dropbox popup from a website?

Alright so the issue is that I visit a site to download the file I want but the problem is the website that I try to download the file from doesn't host the actual file instead it uses dropbox to host it so as soon as you click download your redirected to a blank page that has dropbox pop up in a small window allowing you to download it. Things to note, there is no log in so I can direct python right to the link where dropbox pops up but it wont download the file.
import urllib
url = 'https://thewebsitedownload.com'
filename = 'filetobedownloaded.exe'
urllib.urlretrieve(url, filename)
Thats the code I use to use and it worked like a charm for direct downloads but now when I try to use it for the site that has the dropbox popup download it just ends up downloading the html code of the site (from what I can tell) and does not actually download the file.
I am still relatively new to python/ coding in general but I am loving it so far this is just the first brick wall that I have hit that I didn't find any similar resolutions to.
Thanks in advance! Sample codes help so much thats how I have been learning so far.

Use Beautifulsoup to parse the html you get. You can then get the href link to the file. There are a lot of Beautifulsoup tutorials on the web, so I think you'll find it fairly easy to figure out how to get the link in your specific situation.
First you download the html with the code you already have, but without the filename:
import urllib
from bs4 import BeautifulSoup
import re
url = 'https://thewebsitedownload.com'
text = urllib.urlopen(url).read()
soup = BeautifulSoup(text)
link = soup.find_all(href=re.compile("dropbox"))[0]['href']
print link
filename = 'filetobedownloaded.exe'
urllib.urlretrieve(link, filename)
I made this from the docs, but haven't tested it, but I think you get the idea.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Empty image when scraping using urllib in Python - python

Related

Python weasyprint convert page to pdf problem!【from china】

Trouble submitting image URL from Reddit using requests.get()

How can i get someone's profile pic on Discord to edit it using PIL?

Captcha Image with Python Mechanize

Windows Python How to download dropbox popup from a website?

Categories

Resources