Python json get specific story and image - python

I am very new to Python and I have a json news feed that I need to get a selected 'title' and image 'src'.
I have managed to get to print all the 'title' and just the image 'src' that says "1024 landscape".
How can I print, for example, just the second title? How do I address that particular one?
The feed is : http://www.stuff.co.nz/_json/ipad-big-picture
for story in data.get('stories', []):
print 'Title:', story['title']
for img in story.get('images', []):
for var in img.get('variants', []):
if var.get('layout') == "1024 Landscape":
print ' img:', (var.get('src')).split('/')[-1], ' layout:', var.get('layout')
Thanks

First just get your stories object (list of dicts):
stories = data.get('stories', [])
Once you have this list you can just access by index:
if len(stories) >= 2:
print stories[1]['title']
Or try first and catch the exception:
i = 1
try:
print stories[i]['title']
except IndexError:
print "Story does not exist at index %d" % i
So, when trying to get all 1024 Landscape images for a specific story, it might look like this:
imgs = set()
for img in stories[1].get('images', []):
for variant in img.get('variants', []):
if variant.get('layout') == '1024 Landscape':
imgs.add(variant['src'])
print imgs
set([u'http://static.stuff.co.nz/1341147692/827/7202827.jpg'])

Related

how to get desired output in python from following output

I am getting this output as pasted below .
[{'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/92E58C7C69D015DA528D8D7F22844BF49D702DFC'}, {'accel-world-infinite-burst-2016': 'https://yts.mx/torrent/download/3086E306E7CB623F377B6F99261F82CC8BB57115'}, {'accel-world-infinite-burst-2016': 'https://yifysubtitles.org/movie-imdb/tt5923132'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/E92B664EE87663D7E5EC8E9FEED574C586A95A62'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/torrent/download/4F6F194996AC29924DB7596FB646C368C4E4224B'}, {'anna-to-the-infinite-power-1983': 'https://yts.mx/movies/anna-to-the-infinite-power-1983/request-subtitle'}, {'infinite-2021': 'https://yts.mx/torrent/download/304DB2FEC8901E996B066B74E5D5C010D2F818B4'}, {'infinite-2021': 'https://yts.mx/torrent/download/1320D6D3B332399B2F4865F36823731ABD1444C0'}, {'infinite-2021': 'https://yts.mx/torrent/download/45821E5B2E339382E7EAEFB2D89967BB2C9835F6'}, {'infinite-2021': 'https://yifysubtitles.org/movie-imdb/tt6654210'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/47EB04FBC7DC37358F86A5BFC115A0361F019B5B'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/torrent/download/88223BEAA09D0A3D8FB7EEA62BA9C5EB5FDE9282'}, {'infinite-potential-the-life-ideas-of-david-bohm-2020': 'https://yts.mx/movies/infinite-potential-the-life-ideas-of-david-bohm-2020/request-subtitle'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/0E2ACFF422AF4F62877F59EAE4EF93C0B3623828'}, {'the-infinite-man-2014': 'https://yts.mx/torrent/download/52437F80F6BDB6FD326A179FC8A63003832F5896'}, {'the-infinite-man-2014': 'https://yifysubtitles.org/movie-imdb/tt2553424'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/DA101D139EE3668EEC9EC5B855B446A39C6C5681'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yts.mx/torrent/download/8759CD554E8BB6CFFCFCE529230252AC3A22D4D4'}, {'nick-and-norahs-infinite-playlist-2008': 'https://yifysubtitles.org/movie-imdb/tt0981227'}]
As you can see each movie have multiple links and for each link movie name is repeating .I want all links related to same movie must appeared as same object e.g
[{accel-world-infinite-burst-2016:{link1,link2,link3,link4},........]
for item in li:
# print(item.partition("movies/")[2])
movieName["Movies"].append(item.partition("movies/")[2])
req=requests.get(item)
s=soup(req.text,"html.parser")
m=s.find_all("p",{"class":"hidden-xs hidden-sm"})
# print(m[0])
for a in m[0].find_all('a', href=True):
# movieName['Movies'][item.partition("movies/")[2]]=(a['href'])
downloadLinks.append ( {item.partition("movies/")[2]:a['href'] })
you can try this,
# input = your list of dict
otp_dict = {}
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = list([value])
else:
otp_dict[key].append(value)
print(otp_dict)
otp: {'accel-world-infinite-burst-2016':[link1,link2],...}
output is dict containing list of links if you want set as you mentioned in your desired op try this
for l in input:
for key, value in l.items():
if key not in otp_dict:
otp_dict[key] = set([value])
else:
otp_dict[key].add(value)
otp: {'accel-world-infinite-burst-2016':{link1,link2},...}

Once I hit an exception, can I ignore all lines below and go to another item in for loop?

I am trying to use two Google API calls to get a restaurant's price_level and phone number.
First, looping through
for restaurant in name:
find_place_url = "https://maps.googleapis.com/maps/api/place/findplacefromtext/json?"
# use separate parameter dictionary b.c. findplace and findplacedetail have diff field.
find_place_param ={}
find_place_param["input"] = restaurant
find_place_param["inputtype"] = "textquery"
find_place_param["key"] = google_key
# get place_id then use it to get phone number
a = requests.get(find_place_url, parameters).json()
this is first findplace api used to grab place_id for given restaurant. It will look like:
{'candidates': [{'place_id': 'ChIJdTDCTdT4cUgRqxush2XhgnQ'}], 'status': 'OK'}
if given restaurant has proper place_id or else it will give:
{'candidates': [], 'status': 'ZERO_RESULTS'}
now this is all of my code: from here I grab place_id however put it in try and except because as stated above status is either zero or ok. But even if I go pass except it will run find_place_detail api call which requires place_id thus it fails. How can I skip last block of code if I do not receive place_id?
price_level2 = []
phone_number = []
for restaurant in name:
find_place_url = "https://maps.googleapis.com/maps/api/place/findplacefromtext/json?"
# use separate parameter dictionary b.c. findplace and findplacedetail have diff field.
find_place_param ={}
find_place_param["input"] = restaurant
find_place_param["inputtype"] = "textquery"
find_place_param["key"] = google_key
# get place_id then use it to get phone number
a = requests.get(find_place_url, parameters).json()
print(a)
# adding it to original parameter. since only this and findplace parameter has to be different.
try:
parameters["place_id"] = a["candidates"][0]["place_id"]
except:
print("Phone number not available")
phone_number.append(None)
# passing in fields of our interest
parameters["fields"] = "name,price_level,formatted_phone_number"
find_place_detail_url ="https://maps.googleapis.com/maps/api/place/details/json?"
b = requests.get(find_place_detail_url, parameters).json()
phone_number.append(b["result"]["formatted_phone_number"])
price_level2.append(b["result"]['price_level'])
You can use an else clause:
try:
parameters["place_id"] = a["candidates"][0]["place_id"]
except KeyError:
print("Phone number not available")
phone_number.append(None)
else:
parameters["fields"] = "name,price_level,formatted_phone_number"
find_place_detail_url ="https://maps.googleapis.com/maps/api/place/details/json?"
b = requests.get(find_place_detail_url, parameters).json()
...
Also, your except clause should be more specific (I guess the case you're trying to catch is a KeyError). For more information on exception handling in Python, see the documentation.

Getting wrong result from JSON - Python 3

Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?
Please find my code below (Google_Results is the class I use to initialize the variables):
import requests
def Book_Search(search_term):
parms = {"q": search_term, "maxResults": 3}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
print(r.url)
results = r.json()
i = 0
for result in results["items"]:
try:
isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
title = str(result["volumeInfo"]["title"])
author = str(result["volumeInfo"]["authors"])[2:-2]
publisher = str(result["volumeInfo"]["publisher"])
published_date = str(result["volumeInfo"]["publishedDate"])
description = str(result["volumeInfo"]["description"])
pages = str(result["volumeInfo"]["pageCount"])
genre = str(result["volumeInfo"]["categories"])[2:-2]
language = str(result["volumeInfo"]["language"])
image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])
dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
language, image_link)
gr.append(dict)
print(gr[i].title)
i += 1
except:
pass
return
gr = []
Book_Search("Linkedin")
I am a beginner to Python, so any help would be appreciated!
It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.
Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:
def book_search(search_term, max_results=3):
results = [] # a list to store the results
parms = {"q": search_term, "maxResults": max_results}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
try: # just in case the server doesn't return valid JSON
for result in r.json().get("items", []):
if "volumeInfo" not in result: # invalid entry - missing volumeInfo
continue
result_dict = {} # a dictionary to store our discovered fields
result = result["volumeInfo"] # all the data we're interested is in volumeInfo
isbns = result.get("industryIdentifiers", None) # capture ISBNs
if isinstance(isbns, list) and isbns:
for i, t in enumerate(("isbn10", "isbn13")):
if len(isbns) > i and isinstance(isbns[i], dict):
result_dict[t] = isbns[i].get("identifier", None)
result_dict["title"] = result.get("title", None)
authors = result.get("authors", None) # capture authors
if isinstance(authors, list) and len(authors) > 2: # you're slicing from 2
result_dict["author"] = str(authors[2:-2])
result_dict["publisher"] = result.get("publisher", None)
result_dict["published_date"] = result.get("publishedDate", None)
result_dict["description"] = result.get("description", None)
result_dict["pages"] = result.get("pageCount", None)
genres = result.get("authors", None) # capture genres
if isinstance(genres, list) and len(genres) > 2: # since you're slicing from 2
result_dict["genre"] = str(genres[2:-2])
result_dict["language"] = result.get("language", None)
result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
# make sure Google_Results accepts keyword arguments like title, author...
# and make them optional as they might not be in the returned result
gr = Google_Results(**result_dict)
results.append(gr) # add it to the results list
except ValueError:
return None # invalid response returned, you may raise an error instead
return results # return the results
Then you can easily retrieve as much info as possible for a term:
gr = book_search("Google")
And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.
Following #Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.
Thanks for the help guys!

KeyError and TypeError in my python web scraper

So sorry about this vague and confusing title. But there is no really better way for me to summarize my problem in one sentence.
I was trying to get the student and grade information from a french website. The link is this (http://www.bankexam.fr/resultat/2014/BACCALAUREAT/AMIENS?filiere=BACS)
My code is as follows:
import time
import urllib2
from bs4 import BeautifulSoup
regions = {'R\xc3\xa9sultats Bac Amiens 2014':'/resultat/2014/BACCALAUREAT/AMIENS'}
base_url = 'http://www.bankexam.fr'
tests = {'es':'?filiere=BACES','s':'?filiere=BACS','l':'?filiere=BACL'}
for i in regions:
for x in tests:
# create the output file
output_file = open('/Users/student project/'+ i + '_' + x + '.txt','a')
time.sleep(2) #compassionate scraping
section_url = base_url + regions[i] + tests[x] #now goes to the x test page of region i
request = urllib2.Request(section_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response,'html.parser')
content = soup.find('div',id='zone_res')
for row in content.find_all('tr'):
if row.td:
student = row.find_all('td')
name = student[0].strong.string.encode('utf8').strip()
try:
school = student[1].strong.string.encode('utf8')
except AttributeError:
school = 'NA'
result = student[2].span.string.encode('utf8')
output_file.write ('%s|%s|%s\n' % (name,school,result))
# Find the maximum pages to go through
if soup.find('div','pagination'):
import re
page_info = soup.find('div','pagination')
pages = []
for i in page_info.find_all('a',re.compile('elt')):
try:
pages.append(int(i.string.encode('utf8')))
except ValueError:
continue
max_page = max(pages)
# Now goes through page 2 to max page
for i in range(1,max_page):
page_url = '&p='+str(i)+'#anchor'
section2_url = section_url+page_url
request = urllib2.Request(section2_url)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response,'html.parser')
content = soup.find('div',id='zone_res')
for row in content.find_all('tr'):
if row.td:
student = row.find_all('td')
name = student[0].strong.string.encode('utf8').strip()
try:
school = student[1].strong.string.encode('utf8')
except AttributeError:
school = 'NA'
result = student[2].span.string.encode('utf8')
output_file.write ('%s|%s|%s\n' % (name,school,result))
A little more description about the code:
I created a 'regions' dictionary and 'tests' dictionary because there are 30 other regions I need to collect and I just include one here for showcase. And I'm just interested in the test results of three tests (ES, S, L) and so I created this 'tests' dictionary.
Two errors keep showing up,
one is
KeyError: 2
and the error is linked to line 12,
section_url = base_url + regions[i] + tests[x]
The other is
TypeError: cannot concatenate 'str' and 'int' objects
and this is linked to line 10.
I know there is a lot of information here and I'm probably not listing the most important info for you to help me. But let me know how I can do to fix this!
Thanks
The issue is that you're using the variable i in more than one place.
Near the top of the file, you do:
for i in regions:
So, in some places i is expected to be a key into the regions dictionary.
The trouble comes when you use it again later. You do so in two places:
for i in page_info.find_all('a',re.compile('elt')):
And:
for i in range(1,max_page):
The second of these is what is causing your exceptions, as the integer values that get assigned to i don't appear in the regions dict (nor can an integer be added to a string).
I suggest renaming some or all of those variables. Give them meaningful names, if possible (i is perhaps acceptable for an "index" variable, but I'd avoid using it for anything else unless you're code golfing).

Python will not pull values from dictionary properly using for loop

Im trying to use a dictionary to check a given number of servers listed for a particular SQL backup success or fail. My problem so far is when I run this code:
for serverChk in srvrDict['Server']:
it returns the server name as single characters on each new line like:
S
E
R
V
E
R
So in my trial I see this "Error connecting to T to check OS version" where T is the fist character of the servername. I can't seem to put my finger on it and all the searching I've done has lead me to asking.
Thanks!
class checkstatus:
#def getServers(self):
chkbkpstats = csv.reader(file('c://temp//networkerservers.csv'))
for row in chkbkpstats:
srvrDict = {}
srvrDict['Server'] = row[0]
srvrDict['Instance'] = row[1]
print srvrDict
for serverChk in srvrDict['Server']:
try:
c = wmi.WMI(server)
for os in c.Win32_OperatingSystem():
osVer = os.caption
except:
print 'Error connecting to %s to check OS version' % serverChk
if '2003' in osVer:
print 'w2k3'
if '2008' in osVer:
print 'w2k8'
I suppose you have stored a string in your dictionary. So the line for serverChk in srvrDict['Server'] translates to for serverChk in yourSavedString. This is why you are getting individual characters. To access individual dictionary items you should do for k,v in srvrDict.iteritems() where k is the key and v is the value.
You are overwriting the Server and Instance values in srvrDict each iteration of your loop through chkbkpstats, not actually generating a sequence of data with an entry for each item in your log file as it appears you expect. You need to make that a list containing dictionaries, which you append to each iteration. You are probably looking for something more like:
chkbkpstats = csv.reader(file('c://temp//networkerservers.csv'))
srvrs = []
for for row in chkbkpstats:
srvrs.append({'Name' : row[0], 'Instance' : row[1]})
for srvr in srvrs:
try:
c = wmi.WMI(srvr['Instance'])
except:
print 'Error connecting to %s to check OS version' % srvr['Name']
else:
osVer = c.Win32_OperatingSystem()[0].Caption
if '2003' in osVer:
print 'w2k3'
elif '2008' in osVer:
print 'w2k8'
There are a few problems with your code.
First, you create a new srvrDict each time you go through the first for loop, overwriting the value that was stored in this variable the last time. I think, what you actually intended to do is the following:
srvrDict = {}
for row in chkbkpstats:
srvrDict[row[0]] = row[1]
Now, srvrDict will contain an entry like {'P1RT04': ['THP06ASU']} for each row in chkbkpstats, mapping server names to lists of instances running on that server.
Then, in the second loop, use for serverChk in srvrDict: to iterate over all the entries in the dictionary. However, I'm not sure where the variable server in c = wmi.WMI(server) comes from. If this is what has been row[1] in the first loop, then you should use srcvDict[serverChk] to retrieve the value from the dictionary.
This way, the whole procedure would look something like this:
chkbkpstats = csv.reader(file('c://temp//networkerservers.csv'))
srvrDict = {}
for row in chkbkpstats:
name, instance = row[0], row[1]
if name not in srvrDict:
srvrDict[name] = []
srvrDict[name].append(instance)
for server in srvrDict:
for instance in srvrDict[server]:
try:
c = wmi.WMI(instance)
except:
print 'Error connecting to %s to check OS version' % server
else:
osVer = c.Win32_OperatingSystem()[0].caption
if '2003' in osVer:
print 'w2k3'
elif '2008' in osVer:
print 'w2k8'
else:
print 'unknown OS'
PS.: I'm not sure what's the return value of c.Win32_OperatingSystem(). [...] Update: Thanks to sr2222 for pointing this out. Code fixed.
Update: Edited the code to allow for one server hosting multiple instances.

Categories