Getting links to wikipedia images - python

I'm trying to extract the links to all the images in the wikipedia, without losing the image names and probably the alt tags. I got to know from
How do I get link to an image on wikipedia from the infobox?
that I could get it by querying: http://en.wikipedia.org/wiki/File:filename.jpg
However, to do this I need to get all the filenames of the images.
Any clues?
Thanks!

this gives the list of all image urls and file names also in the page
http://en.wikipedia.org/w/api.php?action=query&titles=World&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json
change the title= part

Related

How do I get the list of all images on a page?

In Firefox, I can get a list of all images from the "Media" tab of the Page Info window:
How can I obtain such a list using Python Selenium? In addition to getting such a list of image URLs, I would also like to be able to get each image's data (i.e. the image itself) without needing to make additional network requests.
Please DO NOT suggest that I parse the HTML to look for <img ... /> tags. That is clearly not what I'm looking for. I am looking for image responses. Not all image responses are present in the DOM. Example: some image responses from AJAX requests.

How to scrape image url off of html?

I'm having trouble getting the url of an image on a website and I was wondering if I could get some help.
I want to get the image url of the card on the website, but using xpath only gives me the image url of the website logo.
scrapy shell https://db.ygoprodeck.com/card/?search=7%20Colored%20Fish
response.xpath('//img')
Out[2]: [<Selector xpath='//img' data='<img src="https://db.ygoprodeck.com/sear'>]
There should be another img link to the card picture but it is not showing up
So there is some logic to how the images are done. Each card has an ID listed on the page. The ID is the name of the image. They hide this ID from you also.
They load much of this information in via the meta attributes at the top of the page. Often times the JS will be put at the top in the script or meta attributes. This is particularly true of shopify stores.
If you ever have trouble finding something for example with this image get the image name and search the rest of the document for references for that keyword. You will often be able to track down the information or at least figure out how it is loaded. This is also useful when websites require a "token" often they will supply the token on the previous page somewhere.
# with css
In [6]: response.css('meta[property="og:image"]::attr(content)').extract_first()
Out[6]: 'https://ygoprodeck.com/pics/23771716.jpg'
# with xpath
In [8]: response.xpath('//meta[#property="og:image"]/#content').extract_first()
Out[8]: 'https://ygoprodeck.com/pics/23771716.jpg'

Webscraping: book page images inside reader?

The image below shows source code, I came across while web scraping. The link to website is URL. It is basically online book reader which shows rendered images of books. What's so weird is I am unable to find any "img" or "src" tags or any url to these images. Any hint how to scrap this.

Get image path from src="cid:image.png..." resource

I'm trying to parse e-mail, and the links in the img tags have an unusual format. I'm not strong in regular expressions. I would be glad to hear your suggestions how to get a normal link from this :
src="cid:image006.png#01D4225D.4CE86AB0"
You can't get the image link from that tag, because CID actually references the images in the email that you received. This image has been attached to your email.

how to extract youtube thumbnail from youtube link in python

I want to extract and display Youtube search results for a query to the user.
In that process, I have completed fetching the Youtube link and also extracted the title from the link.
Nevertheless I also want the thumbnail of that link displayed, same as that displayed in Youtube suggestions section.
For a question like this, I'd recommend using the site:youtube.com Google Images search, and just have a look at one or two thumbnails. I believe the below should work in all cases, though you'd need to test on different types of videos.
If the video URL is https://www.youtube.com/watch?v=xxxxxxxxxxxx
The thumbnail URL is https://i.ytimg.com/vi/xxxxxxxxxxxx/maxresdefault.jpg

Categories