python newspaper module - get all the images from an article - python

By using newspaper module of python , I can get the top image from an article in the following way:
from newspaper import Article
first_article = Article(url="http://www.lemonde.fr/...", language='fr')
first_article.download()
first_article.parse()
print(first_article.top_image)
But I need to get all the images in the article. Their github documentation says : 'All image extraction from html' is possible. But I can't just figure that out. And i do no want to manually download and save the html files in hard drive and then feed the module with the files and get the images.
In what way can I achieve that ?

You likely solved this already, but you can obtain the image urls with Newspaper by calling article.images.
from newspaper import Article
article = Article(url="http://www.lemonde.fr/", language='fr')
article.download()
article.parse()
top_image = article.top_image
all_images = article.images
for image in all_images:
print(image)
https://img.lemde.fr/2020/09/22/0/3/4485/2990/220/146/30/0/a79897c_115736902-000-8pt8nc.jpg
https://img.lemde.fr/2020/09/22/0/0/5315/3543/192/0/75/0/7b90c88_645792534-pns-3418491.jpg
https://img.lemde.fr/2020/09/09/200/0/1500/999/180/0/95/0/d8099d2_51464-3185927.jpg
https://img.lemde.fr/2020/09/22/0/4/4248/2832/664/442/60/0/557e6ee_5375150-01-06.jpg

Related

python how to get all images which are not part of the template

I'm looking for a way to extract all main images of a web page. the easy way is to do it with lxml
import lxml.html
import requests
html = requests.get('https://fr.wikipedia.org/wiki/Image').text()
tree = lxml.html.fromstring(html)
img = tree.xpath('//img[#src]']
this way we get all images, including logos, icons, pictos, sprite css...etc what I would like to get is only real images that are in the content. Any ideas?
Thanks
Use this:
//div[#id="mw-content-text"]//img[#src]

Saving image from API Endpoint with no filetype, in python

I'm trying to save images from the Spotify API
I get album art in the form of a link:
https://i.scdn.co/image/ab67616d00004851c96f7c7b077c224975b4c5ce
I think it's a jpg file.
I run into errors in trying to display or save this in python.
I'm not even sure how I'm meant to format something like:
Do I need str around the link?
str(https://i.scdn.co/image/ab67616d00004851c96f7c7b077c224975b4c5ce)
Or should I create a new variable e.g.
image_path = 'https://i.scdn.co/image/ab67616d00004851c96f7c7b077c224975b4c5ce'
And then:
im1 = im1.save(image_path)
Your second suggestion should work with an addition of actually downloading the image using urllib.request:
import urllib.request
image_path = 'https://i.scdn.co/image/ab67616d00004851c96f7c7b077c224975b4c5ce'
urllib.request.urlretrieve(image_path, "image.jpg")

How to Get image when dynamic link comes from a website

I want to get the full resolution image displayed from this website :
http://oiswww.eumetsat.org/IPPS/html/MSG/PRODUCTS/MPE/FULLRESOLUTION/index.htm
The image has a dynamic link every time when it is updated, which cause problem if we want to download it every time.
Do you have some tricks with python to systematically download the full resolution image.
Thanks all.
You can use BeautifulSoup, lxml or a Python RegExp to parse the HTML and get the correct link, there should be an xpath to it.
From the source code of the html:
array_nom_imagen[0]="wwCzemwbmWlTk"
array_nom_imagen[1]="CtXqGo6wG8hVz"
array_nom_imagen[2]="8UFuyfrkbcd0b"
...
...
array_nom_imagen[138]="fFoSqmGjl6zhJ"
array_nom_imagen[139]="S5QefAKEdpWQf"
array_nom_imagen[140]="vCcabHqeoVgdv"
and
function loadimages(i_image) {
array_imagen[i_image] = new Image()
array_imagen[i_image].src = "IMAGESDisplay/"+array_nom_imagen[i_image]
imageurl[i_image]="IMAGESDisplay/"+array_nom_imagen[i_image]
loaded_images[i_image]="TRUE"
}
So only 141 pictures are available.

Display or save a List of URL images

Python is known to be an easy and powerful language. I have a List, literally, of URL images,
>>> for i in images: print i
http://upload.wikimedia.org/wikipedia/commons/8/86/Influenza_virus_research.jpg
http://upload.wikimedia.org/wikipedia/commons/f/f8/Wiktionary-logo-en.svg
http://upload.wikimedia.org/wikipedia/en/e/e7/Cscr-featured.svg
http://upload.wikimedia.org/wikipedia/commons/f/fa/Wikiquote-logo.svg
http://upload.wikimedia.org/wikipedia/commons/4/4c/Wikisource-logo.svg
http://upload.wikimedia.org/wikipedia/commons/1/1b/Wikiversity-logo-en.svg
http://upload.wikimedia.org/wikipedia/commons/1/1b/Wikiversity-logo-en.svg
I wonder if there's some library (or snippet of code) in python to easily display a list of URL images in a browser, or maybe save them in a folder.
import urllib
urllib.urlretrieve("http://8020.photos.jpgmag.com/3670771_314453_2ee7120da5_m.jpg", "my.jpg")
The "my.jpg" is the path to save the file. It can be "/home/user/pics/my.jpg" etc..

Automate downloading images off Google

I'm very new to Python and I'm trying to create a tool that automates downloading images off Google.
So far, I have the following code:
import urllib
def google_image(x):
search = x.split()
search = '%20'.join(map(str, search))
url = 'http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=%s&safe=off' %
But I'm not sure where to continue or if I'm even on the right track. Can someone please help?
see scrapy documentation for image pipeline
ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}

Categories