Scraping data uri image [duplicate] - python

This question already has an answer here:
Downloading Image Data URIs from Webpages via BeautifulSoup
(1 answer)
Closed 7 years ago.
I would like to scrape images from a webpage, the problem is the images are included in the source code as Data URI. How do I save them to a file?
(I need to access URI images only from specific scraped Data URI codes)

The image/string is in base64 encoding (even stated in the URI itself!). All you have to do is decode it, then write it to a file.
imageContents = "/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxQSEhUUE"
myfile = open("image.jpg","w")
myfile.write(imageContents.decode("base64"))
myfile.close()

Related

Is it possible to save an HTML page as PDF using Python? [duplicate]

This question already has answers here:
How to convert webpage into PDF by using Python
(10 answers)
Closed 4 years ago.
I'm trying to create a button which saves an HTML page in PDF format using Python. Are there any libraries which do that? If so, how would you write it up?
The HTML page I'm building contains school information such as name, url, city, state, zip, number of students, etc.
Have you tried pdfkit?
It is easy to use as well -
import pdfkit
pdfkit.from_file('test.html', 'out.pdf')

Download images from google drive [duplicate]

This question already has an answer here:
Downloading Images from Google Drive
(1 answer)
Closed 6 years ago.
I have multiple google drive images url in the text file, I want to download each image from its url, Here catch is I want to download and save images to it's original name.
Here is the reference
Can anyone help me with the solution
Alternate Solution:
I have found out one solution to make it download images
Original URL:-
https://drive.google.com/open?id=0BwJzkr_gZEA0d1h2dTN6MndvdkE
Convert it to:-
https://drive.google.com/uc?export=download&id=0BwJzkr_gZEA0d1h2dTN6MndvdkE
After that add this url into IDM you will able to download image with original name.
Hope that will help.
Have you try to do it throught Google Drive API :
here to create a simple script and here for endpoints

how to use python download available image from http://..../*.jpg? [duplicate]

This question already has an answer here:
How to use urllib to download image from web
(1 answer)
Closed 7 years ago.
I use the following code to download the image 14112758275517_800X533.jpg.
The problem is that I cannot open the 14112758275517_800X533.jpg saved as G:\\image.jpg because the
Windows photo viewer was unable to open the picture, as the file may be corrupted, damaged or too large
import urllib
imageurl="http://img.vogue.com.cn/userfiles/201409/14112758275517_800X533.jpg"
pic_name = "G:\\image.jpg"
urllib.urlretrieve(imageurl, pic_name)
How can I download the image so that it is readable?
I think you cannot. It is likely a website problem, not something in the code you posted.
I am given a 403 Forbidden when trying to type the url you gave even with a navigator.
As an example, the following works and only the url has changed :
import urllib
imageurl="https://www.python.org/static/img/python-logo.png"
pic_name = "./image.png"
urllib.urlretrieve(imageurl, pic_name)
However, you might want to check other topics about the subjet as there are more advanced techniques to download images from the web such as https://stackoverflow.com/a/8389368/2549230

Google Search by Image Script for Local Images [duplicate]

This question already has answers here:
How to compose the URL for a reverse Google image search?
(3 answers)
Closed 3 years ago.
I'm searching a script for finding similar images to my local images. Actually, I have searched similar topic on stackoverflow but I could not find any solution or clue for my problem.
The topic which is in following url is similar to my problem but it searches with using texts.
python search with image google images
I think that I must pass my local images to my http request as raw bytes but I did not find how can I do that.
Finally, I tried to upload my local images to the web and search with using urls but this time I faced the following problem:
When I searched this image with its url, google generates this url:
https://www.google.com/search?tbs=sbi:AMhZZivZoXHOHzWl5_1BGnG05Bm1LpdXCjewepYnpAH4Xi-s7fVU0S86XG4MFlP7hYlGUpioWaZSjwBBIRDOXrGL8uum9wurfEZowKDUl_1GMPE8JHOO5vEb_1iMSbkmvqx-sWxbPqeHeW1eeJPDgtjio_1l7sJcvSbIquQOoacs3x1mDiF7OLw0mNA3WdR59dFDZAwlpU9A2cXbk_1RrqcilNOEcf0osSDx6TDtXN9ndN3ZSFF8NQhHVDPRrjqRpETbXpVHtyJiIxTzLeAiSC-POpwwN1I3tutScJISO72ZhLCUMAZ-gAuuaTHiHQq-vJBcAgq_1zfzwrDxncCVaKBlqb-zDHclm_1tc9qAMlIIsuKvGXnOSY9flVL4Nqk6Js8Un7_1P_1MbkgVCOcWRmbKG0E_1Sl_145Xe-las_18k4e0N0Ar9eKWGd5gvO33ai967E1tj8uiBqfjZTDYUC_1UARgU-IedUIU4uTmpLgK2xMBTXbSgLU8LdW5ZmB1p_1Tm7tpyIczoN23B2AJz9tFp1wnVOeCi_1jOcegCMPxw_1pULXDVWmgd_1f1OMX_1OrLl7wq5VZbBnH3ME62tdKCScZySq7_11Rx7zvzf2JTKQ_16jt_1HJ2Nf6mYb77n58TSMOSbxNvlCnT6afbPHN_101-Xrb2o0QnkESNBMKNwhLg2ZDDgRSgO0gvyzn86FAIR4Eif77PMV0IlEXtaizdveGwCN3upch2XZQpzljgMOUD0ZEfpe_1GxysMuetPZe_12MsYFp2EVW_19oFqTiavEtn2LIcBI1jhow5zWCkwmcNv8Dz80qYTLCRcAaj5l5w2DsdJd8IiufYP0qxKb5pwXbdM0k3-jEQVaWBo_1wK4dohn3UierX63up9YZWNfKNciTjecJ2q69b9xkhtXp_1LWt9Sdi8-xt25FS1XkW6VdVuqhX9-OexZ9G8bV1SgOEHx5GOuCkdsBjqBZ_1Df9wDGLKDX4V9BVvpX_13TLn6YNFtkHR70z_1zaG66rHPun-fWygzsO_1uSmJH5BtcQODEOSJ7jCs_1iSJf--RB339DBzLenbJB_1HUVPiC7Tj0BvbnWtLnY9sElHi5jPprOlqfVa9uQe21eymwXZROi4aWwhByeODCsCfZjjUNoi0M_1pCTva4KW6mlmrWshh9h_1_1kl3Wx7sKpHGBqIY7VJ8pG3kcp7x0YtbPmfxF6J2iKoMzKHyutTx3cn5PJY9kZhOYs5RCs9ejC0Vmw42qdQaivEUB1aQazxRYH-knaGcbANS0p2OacI32X1SrwWoOdodj733y5_1jJi2soZi4COkUjG_18_1c028sLlBkdVkedcq8DXbUEcQB5jIQPx1115aZqdn8SzSLGxLhowIlVxq6kLuyXuLJy72kArT91Rol2v5jHFxapFjrNuDgwdirVQQIsbx_1jXzgTVPdhYV08eFdpnVnsVu3OaUNZPZO8gsSs9A
I expected a url like google.com/search?url={image_url} but it isn't. Hence, I cannot generate a script for searching my local images.
How can I solve my problem? Thanks for your help.
Use the following url
https://www.google.com/searchbyimage?&image_url=
and concatenate your image URL to it.

HTML parsing to obtain what I want [duplicate]

This question already has answers here:
Parsing HTML using Python
(7 answers)
Closed 2 years ago.
I'm trying to do a little bit of HTML parsing in Python which I'm horrible at to be quite honest. I've been up googling ways to do this but can't get anything to work. Here is my situation. I have a web page that has a BUNCH of links to downloads. What I want to do is specify a search string, and if the string I am searching for is there, download the file. But it needs to get the entire file name. For example if I am searching for game-1 and the name of the actual game is game-1-something-else, I want it to download game-1-1something-else. I have already used the following code to obtain the source of the page:
import urllib2
file = urllib2.urlopen('http://www.example.com/my/example/dir')
dload = file.read()
This grabs the entire source code of the webpage which is just a directory by itself. For example, I have tons of tags. I have <a href tags, <td> tags, etc. I want to string the tags so all I have is a list of the files in the directory of the web page, then I want to use a regular expression or something simliar to search for what I am searching for, take the entire file name, and download it.
Once you have the HTML data, parse it and then you can make selections of nodes within the page:
import lxml.html
tree = lxml.html.fromstring(dload)
for node in tree.xpath('//a'):
print node['href']

Categories