I am trying to download some images from NASS Case Viewer. An example of a case is
https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?xsl=main.xsl&CaseID=149006692
The link to the image viewer for this case is
https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?ImageView&ImageID=497001669&Desc=FRONT&Title=Vehicle+1+-+Front&Version=1&Extend=jpg
which may not be viewable, I assume because of the https. However, this is simply the Front second image.
The actual link to the image is (or should be?)
https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1
This will simply download aspx binaries.
My problem is that I do not know how to store these binaries to proper jpg files.
Example of code I've tried is
import requests
test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1"
pull_image = requests.get(test_image)
with open("test_image.jpg", "wb+") as myfile:
myfile.write(str.encode(pull_image.text))
But this does not result in a proper jpg file. I've also inspected pull_image.raw.read() and saw that it's empty.
What could be the issue here? Are my URL's improper? I've used Beautifulsoup to put these URLs together and reviewed them by inspecting the HTML code from a few pages.
Am I saving the binaries incorrectly?
.text decodes the response content to string, so your imge file will be corrupted.
Instead you should use .content which holds the binary response content.
import requests
test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1"
pull_image = requests.get(test_image)
with open("test_image.jpg", "wb+") as myfile:
myfile.write(pull_image.content)
.raw.read() also returns bytes, but in order to use it you must set the stream parameter to True.
pull_image = requests.get(test_image, stream=True)
with open("test_image.jpg", "wb+") as myfile:
myfile.write(pull_image.raw.read())
I wanted to follow up on #t.m.adam 's answer to provide a complete answer for anyone who is interested in using this data for their own projects.
Here is my code to pull all images for a sample of Case IDs. It's fairly un-clean code, but I think it gives you what you may need to get started.
import requests
from bs4 import BeautifulSoup
from tqdm import tqdm
CaseIDs = [149006673, 149006651, 149006672, 149006673, 149006692, 149006693]
url_part1 = 'https://www-nass.nhtsa.dot.gov/nass/cds/'
data = []
with requests.Session() as sesh:
for caseid in tqdm(CaseIDs):
url_full = f"https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?ViewText&CaseID={caseid}&xsl=textonly.xsl&websrc=true"
#print(url_full)
source = sesh.get(url_full).text
soup = BeautifulSoup(source, 'lxml')
tr_tags = soup.find_all('tr', style="page-break-after: always")
for tag in tr_tags:
#print(tag)
"""
try:
vehicle = [x for x in tag.text.split('\n') if 'Vehicle' in x][0] ## return the first element
except IndexError:
vehicle = [x for x in tag.text.split('\n') if 'Scene' in x][0] ## return the first element
"""
tag_list = tag.find_all('tr', class_ = 'label')
test = [x.find('td').text for x in tag_list]
#print(test)
img_id, img_type, part_name = test
img_id = img_id.replace(":", "")
img = tag.find('img')
#part_name = img.get('alt').replace(":", "").replace("/", "")
part_name = part_name.replace(":", "").replace("/", "")
image_name = " ".join([img_type, part_name, img_id]) + ".jpg"
url_src = img.get('src')
img_url = url_part1 + url_src
print(img_url)
pull_image = sesh.get(img_url, stream=True)
with open(image_name, "wb+") as myfile:
myfile.write(pull_image.content)
Related
i am wrtiting this code to get information about top movies and also download the image blong to the movie but on some image they downloaded but their size are 0 but they have size on disk when i kilick on the link of the image that i cant download it well its opening well and there is no problem in link
for exampele this is one of the link that images :
https://static.stacker.com/s3fs-public/styles/slide_desktop/s3/00000116_4_0.png
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = "https://stacker.com/stories/1587/100-best-movies-all-time"
count = 0
local_description = ""
movie_data = []
data = requests.get(URL).text
soap = BeautifulSoup(data, "html.parser")
titles = soap.find_all(name="h2", class_="ct-slideshow__slide__text-container__caption")[1:]
description = soap.find_all(name="div", class_="ct-slideshow__slide__text-container__description")[1:]
images = soap.find_all(name="img", typeof="foaf:Image")[6:106]
for num in range(100):
movie_name = titles[num].getText().replace("\n", "")
local_des = description[num].find_all(name="p")[1:]
for s in local_des:
local_description = s.getText().replace(" ", "")
local_data = {"title": movie_name, "description": local_description}
movie_data.append(local_data)
movie_image_link = images[num].get("src")
response = requests.get(movie_image_link)
with open(f"images/{movie_name}.png", 'wb') as f:
f.write(response.content)
count += 1
print(count)
data_collected = pd.DataFrame(movie_data)
data_collected.to_csv("Data/100_movie.csv", index=False)
i found my problem in some movie name there was ":" and as you knwo you cant user ":"
in file names i fix the code with .replace()
movie_name.replace(":", "")
Once you get a response, check if it's empty before writing to disk. Might need to retry or the link may be bad.
I am making an image scraper and want to be able to take some of these photos from this link and then save them in a folder named dribblephotos
: https://dribbble.com/search/shots/popular/illustration?q=sneaker%20
Here are the links I've retrieved:
https://static.dribbble.com/users/458522/screenshots/6040912/nike_air_huarache_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/105681/screenshots/3944640/hype_1x.png
https://static.dribbble.com/users/105681/avatars/mini/avatar-01-01.png?1377980605
https://static.dribbble.com/users/923409/screenshots/7179093/basketball_marly_gallardo_1x.jpg
https://static.dribbble.com/users/923409/avatars/mini/bc17b2db165c31804e1cbb1d4159462a.jpg?1596192494
https://static.dribbble.com/users/458522/screenshots/6034458/nike_air_jordan_i_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1237425/screenshots/5071294/customize_air_jordan_web_2x.png
https://static.dribbble.com/users/1237425/avatars/mini/87ae45ac7a07dd69fe59985dc51c7f0f.jpeg?1524130139
https://static.dribbble.com/users/1174720/screenshots/6187664/adidas_2x.png
https://static.dribbble.com/users/1174720/avatars/mini/9de08da40078e869f1a680d2e43cdb73.png?1588733495
https://static.dribbble.com/users/179617/screenshots/4426819/ultraboost_1x.png
https://static.dribbble.com/users/179617/avatars/mini/2d545dc6c0dffc930a2b20ca3be88802.jpg?1596735027
https://static.dribbble.com/users/458522/screenshots/6126041/nike_air_max_270_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/60266/screenshots/6698826/nike_shoe_2x.jpg
https://static.dribbble.com/users/60266/avatars/mini/64826d925db1d4178258d17d8826842b.png?1549028805
https://static.dribbble.com/users/78464/screenshots/4950025/8x600_1x.jpg
https://static.dribbble.com/users/78464/avatars/mini/a9ae6a559ab479d179e8bd22591e4028.jpg?1465908886
https://static.dribbble.com/users/458522/screenshots/6118702/adidas_nmd_r1_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/458522/screenshots/6098953/nike_lebron_10_je_icon_qs_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/879147/screenshots/7152093/img_0966_2x.png
https://static.dribbble.com/users/879147/avatars/mini/e095f3837f221bb2ef652dcc966b99f7.jpg?1568473177
https://static.dribbble.com/users/458522/screenshots/6128979/nerd_x_adidas_pharrell_hu_nmd_trail_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/879147/screenshots/11064235/26fa4a2d-9033-4953-b48f-4c0e8a93fc9d_2x.png
https://static.dribbble.com/users/879147/avatars/mini/e095f3837f221bb2ef652dcc966b99f7.jpg?1568473177
https://static.dribbble.com/users/458522/screenshots/6132938/nike_moon_racer_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1823684/screenshots/5973495/jordannn1_2x.png
https://static.dribbble.com/users/1823684/avatars/mini/f6041c082aec67302d4b78b8d203f02b.png?1509719582
https://static.dribbble.com/users/552027/screenshots/4666241/airmax270_1x.jpg
https://static.dribbble.com/users/552027/avatars/mini/35bb0dcb5a6619f68816290898bff6cc.jpg?1535884243
https://static.dribbble.com/users/458522/screenshots/6044426/adidas_pharrell_hu_nmd_trail_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/220914/screenshots/11295053/woman_shoe_tree_floating2_2x.png
https://static.dribbble.com/users/220914/avatars/mini/d364a9c166edb6d96cc059a836219a7d.jpg?1590773568
https://static.dribbble.com/users/4040486/screenshots/7079508/___2x.png
https://static.dribbble.com/users/4040486/avatars/mini/f31e9b50df877df815177e2015135ff7.png?1582521697
https://static.dribbble.com/users/57602/screenshots/12909636/d2_2x.png
https://static.dribbble.com/users/57602/avatars/mini/b4c27f3be2c61d82fbc821433d058b04.jpg?1575089000
https://static.dribbble.com/users/458522/screenshots/6049522/nike_x_john_elliott_lebron_10_soldier_1x.jpg
https://static.dribbble.com/users/458522/avatars/mini/0e524c2621e12569378282793e1ce72b.png?1580329767
https://static.dribbble.com/users/1025917/screenshots/9738550/vans-2020-pixelwolfie-dribbble_2x.png
https://static.dribbble.com/users/1025917/avatars/mini/87fdcb145eab0b47eda29fc873f25f8c.png?1594466719
https://static.dribbble.com/assets/icon-backtotop-1b04df73090f6b0f3192a3b71874ca3b3cc19dff16adc6cf365cd0c75897f6c0.png
https://static.dribbble.com/assets/dribbble-ball-icon-e94956d5f010d19607348176b0ae90def55d61871a43cb4bcb6d771d8d235471.svg
https://static.dribbble.com/assets/icon-shot-x-light-40c073cd65443c99d4ac129b69bf578c8cf97d69b78990c00c4f8c5873b0d601.png
https://static.dribbble.com/assets/icon-shot-prev-light-ca583c76838d54eca11832ebbcaba09ba8b2bf347de2335341d244ecb9734593.png
https://static.dribbble.com/assets/icon-shot-next-light-871a18220c4c5a0325d1353f8e4cc204c3b49beacc63500644556faf25ded617.png
https://static.dribbble.com/assets/dribbble-square-c8c7a278e96146ee5a9b60c3fa9eeba58d2e5063793e2fc5d32366e1b34559d3.png
https://static.dribbble.com/assets/dribbble-ball-192-ec064e49e6f63d9a5fa911518781bee0c90688d052a038f8876ef0824f65eaf2.png
https://static.dribbble.com/assets/icon-overlay-x-2x-b7df2526b4c26d4e8410a7c437c433908be0c7c8c3c3402c3e578af5c50cf5a5.png
However, I only want to be able to grab the URLs that have the string "screenshots" in them. So, I tried making a function to grab certain images that have the "screenshots" in its URL. so for example:
https://static.dribbble.com/users/923409/screenshots/7179093/basketball_marly_gallardo_1x.jpg
At first to see if even worked I made a function to print the specific links I wanted. However it didn't work. Here is my function code :
def art_links():
images = []
for img in x:
images.append(img['src'])
images = soup2.find_all("screenshots")
print(images)
Here is my full code:
from bs4 import BeautifulSoup
import requests as rq
import os
r2 = rq.get("https://dribbble.com/search/shots/popular/illustration?q=sneaker%20")
soup2 = BeautifulSoup(r2.text, "html.parser")
links = []
x = soup2.select('img[src^="https://static.dribbble.com"]')
for img in x:
links.append(img['src'])
def art_links():
images = []
for img in x:
images.append(img['src'])
images = soup2.find_all("screenshots")
print(images)
os.mkdir('dribblephotos')
for index, img_link in enumerate(links):
if "screenshots" in images:
img_data = r.get(img_link).content
with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
f.write(img_data)
else:
break
art_links()
I'm noticing a little bit of an issue with the syntax of your code by the if statement at the end (not tabbed over under the if), so I reformatted it a bit to try and get it to what you wanted. I think what might be happening is you are breaking in an else statement out of the for loop you have at the end. This makes it so as soon as one entry doesn't have screenshot in the link, it stops the loop entirely instead of continuing. While there is a keyword 'continue' that can be used, it is sufficient to just not put the else statement. You also are checking for "screenshots" in images, but the name of the link that you are trying to check is declared as img_link in your for loop. Try this out for your for loop at the end and see what you get:
for index, img_link in enumerate(links):
if "screenshots" in img_link:
img_data = rq.get(img_link).content
with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
f.write(img_data)
If you still require the links rather than the file download, you should be able to retrieve them as you loop through the images in the for loop and store them in a new list if it was a screenshot link.
UPDATE:
This newest one works for me. I removed the function that filters out the ips after putting them into a loop, since this was unnecessary after having already looped through it twice. The first for loop is all you need, iterating twice is unnecessary so I just check on the first time it is iterated through and only save the links to the links list if it is required.
from bs4 import BeautifulSoup
import requests as rq
import os
r2 = rq.get("https://dribbble.com/search/shots/popular/illustration?q=sneaker%20")
soup2 = BeautifulSoup(r2.text, "html.parser")
links = []
x = soup2.select('img[src^="https://static.dribbble.com"]')
os.mkdir('dribblephotos')
# Only one for loop required, shouldn't iterate twice if not required
for index, img in enumerate(x):
# Store the current url from the image result
url = img["src"]
# Check the url for screenshot before putting in the links
if "screenshot" in url:
links.append(img['src'])
# Download the image
img_data = rq.get(url).content
# Put the image into the file
with open("dribblephotos/" + str(index + 1) + '.jpg', 'wb+') as f:
f.write(img_data)
print(links)
Okay, so I am working on a manga (japanese comics) downloader. Japanese Comics are available online but you can only read them, if you wish to download them, you have to start saving image files by right clicking blah blah blah...
So, I was working on an alternative manga downloader that will download all the chapters of the manga as specified by you and then convert them to pdf.
I have completed the code for downloading the images and its working quite well, but the problem is in the pdf-conversion part.
here's my code
import requests
import urllib
import glob
from bs4 import BeautifulSoup
import os
from fpdf import FPDF
def download_image(url, path):
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
start_chapter = int(input("Enter Starting Chapter: "))
end_chapter = int(input("Enter Ending Chapter: "))
chapters = range(start_chapter, end_chapter + 1)
chapter_list = []
for chapter in chapters:
chapter_list.append("https://manganelo.com/chapter/read_one_piece_manga_online_free4/chapter_" + str(chapter))
for URL in chapter_list:
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.findAll('img')
for i in images:
url = i.attrs["src"]
os.makedirs(url.split('/')[-2], exist_ok=True)
download_image(url, os.path.join(url.split('/')[-2], url.split('/')[-1]))
pdf = FPDF()
imageList = glob.glob("*")
for image in imageList:
pdf.add_page()
pdf.image(image, 10, 10, 200, 300)
pdf.output("One Piece Chapter", "F")
So, any suggestions how i can fix this error:
raise RuntimeError('FPDF error: '+msg) RuntimeError: FPDF error: Unsupported image type: chapter_1_romance_dawn
First of all this is a very nice idea.
The error will occurs because the image list path is wrong.
You are storing the jpgs in the folder (chaptername).
Everything you have to do is give the correct path to FPDF.
I created a set to avoid duplications.
Then i removed the "images" and "icon" folder -> maybe you will use them ?
cchapter = set()
for URL in chapter_list:
r = requests.get(URL)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.findAll('img')
for i in images:
url = i.attrs["src"]
cchapter.add(url.split('/')[-2])
os.makedirs(url.split('/')[-2], exist_ok=True)
download_image(url, os.path.join(url.split('/')[-2], url.split('/')[-1]))
cchapter.remove('images')
cchapter.remove('icons')
chapterlist = list(cchapter)
print(chapterlist[0])
def sortKeyFunc(s):
return int(os.path.basename(s)[:-4])
for chap in chapterlist:
pdf = FPDF()
imageList = glob.glob(chap + "/*.jpg")
imageList.sort(key=sortKeyFunc)
for image in imageList:
pdf.add_page()
pdf.image(image, 10, 10, 200, 300)
pdf.output(chap + ".pdf", "F")
Finally i added a loop to create a pdf for each single folder...
Then naming the PDF to the chapters name...
You also miss in your ourput the extension (".pdf")...
This will work. :)
EDIT:
glob.glob will return the filelist not in correct order.
Reference: here
It is probably not sorted at all and uses the order at which entries
appear in the filesystem, i.e. the one you get when using ls -U. (At
least on my machine this produces the same order as listing glob
matches).
Therefor you can use the filename (in our case given as a number) as a sortkey.
def sortKeyFunc(s):
return int(os.path.basename(s)[:-4])
then add imageList.sort(key=sortKeyFunc) in the loop.
NOTE: Code is updated.
So I wanted to get all of the pictures on this page(of the nba teams).
http://www.cbssports.com/nba/draft/mock-draft
However, my code gives a bit more than that. It gives me,
<img src="http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png" alt="Orlando Magic" width="30" height="30" border="0" />
How can I shorten it to only give me, http://sports.cbsimg.net/images/nba/logos/30x30/ORL.png.
My code:
import urllib2
from BeautifulSoup import BeautifulSoup
# or if your're using BeautifulSoup4:
# from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read())
rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:]
for row in rows:
fields = row.findAll("td")
if len(fields) >= 3:
anchor = row.findAll("td")[1].find("a")
if anchor:
print anchor
I know this can be "traumatic", but for those automatically generated pages, where you just want to grab the damn images away and never come back, a quick-n-dirty regular expression that takes the desired pattern tends to be my choice (no Beautiful Soup dependency is a great advantage):
import urllib, re
source = urllib.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read()
## every image name is an abbreviation composed by capital letters, so...
for link in re.findall('http://sports.cbsimg.net/images/nba/logos/30x30/[A-Z]*.png', source):
print link
## the code above just prints the link;
## if you want to actually download, set the flag below to True
actually_download = False
if actually_download:
filename = link.split('/')[-1]
urllib.urlretrieve(link, filename)
Hope this helps!
To save all the images on http://www.cbssports.com/nba/draft/mock-draft,
import urllib2
import os
from BeautifulSoup import BeautifulSoup
URL = "http://www.cbssports.com/nba/draft/mock-draft"
default_dir = os.path.join(os.path.expanduser("~"),"Pictures")
opener = urllib2.build_opener()
urllib2.install_opener(opener)
soup = BeautifulSoup(urllib2.urlopen(URL).read())
imgs = soup.findAll("img",{"alt":True, "src":True})
for img in imgs:
img_url = img["src"]
filename = os.path.join(default_dir, img_url.split("/")[-1])
img_data = opener.open(img_url)
f = open(filename,"wb")
f.write(img_data.read())
f.close()
To save any particular image on http://www.cbssports.com/nba/draft/mock-draft,
use
soup.find("img",{"src":"image_name_from_source"})
You can use this functions for getting the list of all images url from url.
#
#
# get_url_images_in_text()
#
# #param html - the html to extract urls of images from him.
# #param protocol - the protocol of the website, for append to urls that not start with protocol.
#
# #return list of imags url.
#
#
def get_url_images_in_text(html, protocol):
urls = []
all_urls = re.findall(r'((http\:|https\:)?\/\/[^"\' ]*?\.(png|jpg))', html, flags=re.IGNORECASE | re.MULTILINE | re.UNICODE)
for url in all_urls:
if not url[0].startswith("http"):
urls.append(protocol + url[0])
else:
urls.append(url[0])
return urls
#
#
# get_images_from_url()
#
# #param url - the url for extract images url from him.
#
# #return list of images url.
#
#
def get_images_from_url(url):
protocol = url.split('/')[0]
resp = requests.get(url)
return get_url_images_in_text(resp.text, protocol)
This is my code:
import urllib2
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read())
rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:]
for row in rows:
fields = row.findAll("td")
if len(fields) >= 3:
anchor = row.findAll("td")[1].find("a")
if anchor:
print anchor
Instead of printing out an image, it gives me where the image is in the page source. Any reasons as to why?
According to BeautifulSoup documentation, soup.findAll returns a list of tags or NavigableStrings.
So you have to use specific methods such as content().
Visit http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html at "Navigating the Parse Tree" subtitle to find what you need in this case.
It looks like you want the team logo thumbnails?
import urllib2
import BeautifulSoup
url = 'http://www.cbssports.com/nba/draft/mock-draft'
txt = urllib2.urlopen(url).read()
bs = BeautifulSoup.BeautifulSoup(txt)
# get the main table
t = bs.findAll('table', attrs={'class': 'data borderTop'})[0]
# get the thumbnail urls
imgs = [im["src"] for im in t.findAll('img')] if "logos" in im["src"]]
imgs now looks like
[[u'http://sports.cbsimg.net/images/nba/logos/30x30/NO.png',
u'http://sports.cbsimg.net/images/nba/logos/30x30/CHA.png',
u'http://sports.cbsimg.net/images/nba/logos/30x30/WAS.png',
u'http://sports.cbsimg.net/images/nba/logos/30x30/CLE.png',
etc. These are the file locations for each logo, which is all the HTML actually contains; if you want the actual pictures, you have to get each one separately.
The list contains duplicate references to each logo; the quickest way to remove duplicates is
imgs = list(set(imgs))
Alternatively, the list does not include every team; if you had a full list of team name contractions, you could build the logo-url list directly.
Also, looking at the site, each 30x30 logo has a corresponding 90x90 logo which you might prefer - much larger and clearer. If so,
imgs = [im.replace('30x30', '90x90') for im in imgs]
imgs now looks like
[u'http://sports.cbsimg.net/images/nba/logos/90x90/BOS.png',
u'http://sports.cbsimg.net/images/nba/logos/90x90/CHA.png',
u'http://sports.cbsimg.net/images/nba/logos/90x90/CLE.png',
u'http://sports.cbsimg.net/images/nba/logos/90x90/DAL.png',
etc.
Now, for each url, we download the image and save it:
import os
savedir = 'c:\\my documents\\logos' # assumes this dir actually exists!
for im in imgs:
fname = im.rsplit('/', 1)[1]
fname = os.path.join(savedir, fname)
with open(fname, 'wb') as outf:
outf.write(urllib2.urlopen(im).read())
and you have your logos.