Parsing data from JSON with python - python

I'm just starting out with Python and here is what I'm trying to do. I want to access Bing's API to get the picture of the day's url. I can import the json file fine but then I can't parse the data to extract the picture's url.
Here is my python script:
import urllib, json
url = "http://www.bing.com/HPImageArchive.aspx? format=js&idx=0&n=1&mkt=en-US"
response = urllib.urlopen(url)
data = json.loads(response.read())
print data
print data["images"][3]["url"]
I get this error:
Traceback (most recent call last):
File "/Users/Robin/PycharmProjects/predictit/api.py", line 9, in <module>
print data["images"][3]["url"]
IndexError: list index out of range
FYI, here is what the JSON file looks like:
http://jsonviewer.stack.hu/#http://www.bing.com/HPImageArchive.aspx?format=js&idx=0&n=1&mkt=en-US

print data["images"][0]["url"]
there is only one object in "images" array

Since there is only one element in the images list, you should have data['images'][0]['url'].
You can also see that under the "Viewer" tab in the "json viewer" that you linked to.

Related

Download ans save many PDFs files with python

I am trying to download many PDFS fle from a website and save them.
import requests
url = "https://jawdah.qcc.abudhabi.ae/en/Registration/QCCServices/Services/Registration/Trade%20Licenses/"+id+".pdf"
r = requests.get(url, stream= TRUE)
for id in range(1,125):
with open(id+'.pdf',"wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
pdf.write(chunk)
THE first url of the pdf is https://jawdah.qcc.abudhabi.ae/en/Registration/QCCServices/Services/Registration/Trade%20Licenses/1.pdf
and the last url is https://jawdah.qcc.abudhabi.ae/en/Registration/QCCServices/Services/Registration/Trade%20Licenses/125.pdf
I want to download all this files.
When i execute this code i have this error
Traceback (most recent call last):
File "c:\Users\king-\OneDrive\Bureau\pdfs\pdfs.py", line 6, in <module>
url = "https://jawdah.qcc.abudhabi.ae/en/Registration/QCCServices/Services/Registration/Trade%20Licenses/"+id+".pdf"
TypeError: can only concatenate str (not "builtin_function_or_method") to str
In the second line
url = "https://jawdah.qcc.abudhabi.ae/en/Registration/QCCServices/Services/Registration/Trade%20Licenses/"+id+".pdf"
you add a str object to something named id. id is a built-in function (type id() in a python console). In line 4
for id in range(1,125):
you overwrite id with something else (a number), which is possible, but not recommendable.
Apart from that you just make a single request, not one for very file. Try this:
import requests
url = "https://jawdah.qcc.abudhabi.ae/en/Registration/QCCServices/Services/Registration/Trade%20Licenses/{}.pdf"
for num in range(1,126):
r = requests.get(url.format(num), stream= TRUE)
with open('{}.pdf'.format(num),"wb") as pdf:
for chunk in r.iter_content(chunk_size=1024):
if chunk:
pdf.write(chunk)

how do i parse json file using python

After doing a rest api call and storing result as json file contents of json file look as follows:
["x","y","z"]
I need to use python script to iterate through each item and print it out.
I have the following snippet of code which does error out.
with open('%s/staging_area/get_label.json' % cwd) as data_file:
data = json.load(data_file)
for item in data:
print data [item]
Error I am getting is as follows:
Traceback (most recent call last):
File "Untitled 8.py", line 33, in <module>
print data [item]
TypeError: list indices must be integers, not unicode
What am I missing? Thank you for your help!
In the line
for item in data:
you set item to be an element of data, but then in the line
print data [item]
you use item as an index, which it is not. Hence the error. There is also no need to use an index since item is already an element of data.
What you can do instead is:
for item in data:
print(item)

Getting error while creating multiple file in python

I'm creating two files using python script, first file is JSON and second one is HTML file, my below is creating json file but while creating HTML file I'm getting error. Could someone help me to resolve the issue? I'm new to Python script so it would be really appreciated if you could suggest some solution
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import json
JsonResponse = '[{"status": "active", "due_date": null, "group": "later", "task_id": 73286}]'
def create(JsonResponse):
print JsonResponse
print 'creating new file'
try:
jsonFile = 'testFile.json'
file = open(jsonFile, 'w')
file.write(JsonResponse)
file.close()
with open('testFile.json') as json_data:
infoFromJson = json.load(json_data)
print infoFromJson
htmlReportFile = 'Report.html'
htmlfile = open(htmlReportFile, 'w')
htmlfile.write(infoFromJson)
htmlfile.close()
except:
print 'error occured'
sys.exit(0)
create(JsonResponse)
I used below online Python editor to execute my code:
https://www.tutorialspoint.com/execute_python_online.php
infoFromJson = json.load(json_data)
Here, json.load() will expect a valid json data as json_data. But the json_data you provided are not valid json, it's a simple string(Hello World!). So, you are getting the error.
ValueError: No JSON object could be decoded
Update:
In your code you should get the error:
TypeError: expected a character buffer object
That's because, the content you are writing to the file needs to be string, but in place of that, you have a list of dictionary.
Two way to solve this. Replace the line:
htmlfile.write(infoFromJson)
To either this:
htmlfile.write(str(infoFromJson))
To make infoFromJson a string.
Or use the dump utility of json module:
json.dump(infoFromJson, json_data)
If you delete Try...except statement, you will see errors below:
Traceback (most recent call last):
File "/Volumes/Ithink/wechatProjects/django_wx_joyme/app/test.py", line 26, in <module>
create(JsonResponse)
File "/Volumes/Ithink/wechatProjects/django_wx_joyme/app/test.py", line 22, in create
htmlfile.write(infoFromJson)
TypeError: expected a string or other character buffer object
Errors occurred because htmlfile.write need string type ,but infoFromJson is a list .
So,change htmlfile.write(infoFromJson) to htmlfile.write(str(infoFromJson)) will avoid errors!

Reading a JSON string in Python: Receiving error " TypeError: string indices must be integers"

I am trying to create a program that will get me current weather using the OpenWeatherMap API. I am new to coding in the sense of coding while receiving the data from the internet.
The error I receive is:
"Traceback (most recent call last):
File "/home/pi/Python Codes/Weather/CurrentTest3.py", line 7, in
temp_k = [record['temp'] for record in url2 ['main']] #this line should take down the temperature information from the .Json file
File "/home/pi/Python Codes/Weather/CurrentTest3.py", line 7, in
temp_k = [record['temp'] for record in url2 ['main']] #this line should take down the temperature information from the .Json file
TypeError: string indices must be integers
I do not understand why I am getting this, my code is below.
from dateutil import parser #imports parser
from pprint import pprint #imports pprint
import requests #imports request
url = requests.get('http://api.openweathermap.org/data/2.5/weather? q=london&APPID=APIKEY') #identifies the url address
pprint(url.json()) #prints .JSON information from address
url2 = url.json() #establishes .Json file as variable
temp_k = [record['temp'] for record in url2 ['main']] #this line should take down the temperature information from the .Json file
print(temp_k) #prints the value for temperature
The value for main in your data is a dict, not a list of dicts. So there is no need to iterate through it; just access the temp value directly.
temp_k = url2['main']['temp']
The problem is with this portion of temp_k, record['temp'].
This is the format of each variable record:
for record in url2 ['main']:
print record
>> pressure
temp_min
temp_max
temp
humidity
It is a bunch of strings that you are trying to index as a dictionary, hence the string indices error. Just change the temp_k line to this:
temp_k = [url2['main'].get('temp')]
>> [272.9]

Python 3.4 - reading data from a webpage

I'm currently trying to learn how to read from a webpage, and have tried the following:
>>>import urllib.request
>>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
>>>contents = page.read()
>>>lines = contents.split('\n')
This gives the following error:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
lines = contents.split('\n')
TypeError: Type str doesn't support the buffer API
Now I assumed that reading from a URL would be pretty similar from reading for a text file, and that the contents of contents would be of type str. Is this not that case?
When I try >>> contents I can see that the contents of contents is just the HTML document, so why doesn't `.split('\n') work? How can I make it work?
Please note that I'm splitting at the newline characters so I can print the webpage line by line.
Following the same train of thought, I then tried contents.readlines() which gave this error:
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
contents.readlines()
AttributeError: 'bytes' object has no attribute 'readlines'
Is the webpage stored in some object called 'bytes'?
Can someone explain to me what is happening here? And how to read the webpage properly?
You need to wrap it with an io.TextIOWrapper() object and encode your file (utf-8 is a universal you can change it to proper encoding too):
import urllib.request
import io
u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
f = io.TextIOWrapper(u,encoding='utf-8')
text = f.read()
Decode the bytes object to produce a string:
lines = contents.decode(encoding="UTF-8").split("/n")
The return type of the read() method is of type bytes. You need to properly decode it to a string before you can use a string method like split. Assuming it is UTF-8 you can use:
s = contents.decode('utf-8')
lines = s.split('\n')
As a general solution you should check the character encoding the server provides in the response to your request and use that.

Categories