Get image path from src="cid:image.png..." resource - python

I'm trying to parse e-mail, and the links in the img tags have an unusual format. I'm not strong in regular expressions. I would be glad to hear your suggestions how to get a normal link from this :
src="cid:image006.png#01D4225D.4CE86AB0"

You can't get the image link from that tag, because CID actually references the images in the email that you received. This image has been attached to your email.

Related

Get original image url from base64 Image String using Python?

Wondering can someone point me in the right direction on how to convert base64 image string to its original image URL.
My code is scraping top 5 news from google based on my search string.
Images are in one big massive base64 string. Images are printing ok on my outlook email (my code extract the news and send out an email in outlook) but when I forward that email on to different email account can't see any image but a message The linked image cannot be displayed. The file may have been moved, renamed or deleted. Just to check that, I copied the image from my outlook email and tried to paste on word document; all I can see is an empty box but no image.
any advice, please?
You can't get a URL from those. Those base64 encoded strings are fully embedded images. You could base64 decode them and save it to a file or just take the base64 encoded string an attach it to another image tag like in the incoming email.
If you have some specific code I could be of more help.

Understanding google's HTML

first time poster here.
I am just getting into python and coding in general and I am looking into the requests and BeutifulSoup libraries. I am trying to grab image url’s from google images. When inspecting the site in chrome i can find the “div” and the correct img src url. But when I open the HTML that “requests” gives me I can find the same “div” but the img src url is something completely different and only leads to a black page if used.
Img of the HTML requests get
Img of the HTML found in chrome's inspect tool
What I wonder, and want to understand is:
why are these HTML's different
How do I get the img src that is found with the inspect tool with requests?
Hope the question makes sense and thank you in advance for any help!
Maybe differences between the the response HTML and the code in chrome inspector stems for updates to the page when JS changes it . for example when you use innerHTML() to edit div element so the code you add will add to DOM stack so as the code in the inspector but it would have no influence on the response.
You may search the http:// in the begging and the .png or .jpg or any other image format in the end.
Simply put, your code retrieves a single HTML page, and lets you access it, as it was retrieved. The browser, on the other hand, retrieves that HTML, but then lets the scripts embedded in (or linked from) it run, and these scripts often make significant modifications to the HTML (also known as DOM - Document Object Model). The browser's inspector inspects the fully modified DOM.

embedding images in multipart html email - defining the linked images correctly

I have difficulties in embedding images in a multipart email.
I am trying to send out a html file with plenty of embedded images as mail. However, the images do not appear, and are currently just sent as attachments.
I assume I don't manage to link the html code.
Here is part of the HTML code for the first image
</v:shapetype><v:shape id="Picture_x0020_5" o:spid="_x0000_i1029" type="#_x0000_t75"
alt="cid:image001.png#01D58F16.6A9DB2F0" style='width:441.45pt;height:183.85pt;
visibility:visible;mso-wrap-style:square'>
<v:imagedata src="somefolder-data/image001.png"
o:title="image001.png#01D58F16"/>
The images are not placed in the working directory itself.
I believe the issue lies in defining the image's ID as referenced above in the HTML, which looks different from online examples. I have tried a few versions, but haven't had success.
I assumed the part following cid would be relevant (so image001.png), but it might be "image001.png#01D58F16.6A9DB2F0".
Can someone help with to create the right connection here?
msgImage = MIMEImage(fp.read())
fp.close()
msgImage.add_header('Content-Disposition', 'inline', filename='image001.png')
# Attach part into message container.
msg.attach(msgImage)
Thanks in advance

How to scrape image url off of html?

I'm having trouble getting the url of an image on a website and I was wondering if I could get some help.
I want to get the image url of the card on the website, but using xpath only gives me the image url of the website logo.
scrapy shell https://db.ygoprodeck.com/card/?search=7%20Colored%20Fish
response.xpath('//img')
Out[2]: [<Selector xpath='//img' data='<img src="https://db.ygoprodeck.com/sear'>]
There should be another img link to the card picture but it is not showing up
So there is some logic to how the images are done. Each card has an ID listed on the page. The ID is the name of the image. They hide this ID from you also.
They load much of this information in via the meta attributes at the top of the page. Often times the JS will be put at the top in the script or meta attributes. This is particularly true of shopify stores.
If you ever have trouble finding something for example with this image get the image name and search the rest of the document for references for that keyword. You will often be able to track down the information or at least figure out how it is loaded. This is also useful when websites require a "token" often they will supply the token on the previous page somewhere.
# with css
In [6]: response.css('meta[property="og:image"]::attr(content)').extract_first()
Out[6]: 'https://ygoprodeck.com/pics/23771716.jpg'
# with xpath
In [8]: response.xpath('//meta[#property="og:image"]/#content').extract_first()
Out[8]: 'https://ygoprodeck.com/pics/23771716.jpg'

Getting links to wikipedia images

I'm trying to extract the links to all the images in the wikipedia, without losing the image names and probably the alt tags. I got to know from
How do I get link to an image on wikipedia from the infobox?
that I could get it by querying: http://en.wikipedia.org/wiki/File:filename.jpg
However, to do this I need to get all the filenames of the images.
Any clues?
Thanks!
this gives the list of all image urls and file names also in the page
http://en.wikipedia.org/w/api.php?action=query&titles=World&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json
change the title= part

Categories