I have data in Json format availaible on this link:
Json Data
What would be the best way to get this done? I know this could be done by Python but not sure how.
Use urllib module to fetch details from the url.
import urllib
url = "http://www.omdbapi.com/?t=UN%20HOMME%20ID%C3%89AL"
res = urllib.urlopen(url)
print res.code
data = res.read()
Parse data to JSON by json module.
import json
data1 = json.loads(data)
Use xlwt module to create xls file.
data = {"Title":"Un homme idéal","Year":"2015","Rated":"N/A",\
"Released":"18 Mar 2015","Runtime":"97 min","Genre":"Thriller",\
"Director":"Yann Gozlan","Writer":"Yann Gozlan, Guillaume Lemans, Grégoire Vigneron",\
"Actors":"Pierre Niney, Ana Girardot, André Marcon, Valeria Cavalli",\
"Plot":"N/A","Language":"French","Country":"France","Awards":"N/A",\
"Poster":"N/A","Metascore":"N/A","imdbRating":"6.3","imdbVotes":"214",\
"imdbID":"tt4058500","Type":"movie","Response":"True"}
import xlwt
book = xlwt.Workbook(encoding="utf-8")
sheet1 = book.add_sheet("AssetsReport0")
colunm_count = 0
for title, value in data.iteritems():
sheet1.write(0, colunm_count, title)
sheet1.write(1, colunm_count, value)
colunm_count += 1
file_name = "test.xls"%()
book.save(file_name)
Get URL from User.
By Command Line Argument:
Use sys.argv to get arguments passed from the command.
Demo:
import sys
print "Arguments:", sys.argv
Output:
vivek:~/workspace/vtestproject/study$ python polydict.py arg1 arg2 arg3
Arguments: ['polydict.py', 'arg1', 'arg2', 'arg3']
By Raw_input() /input() method
Demo:
>>> url = raw_input("Enter url:-")
Enter url:-www.google.com
>>> url
'www.google.com'
>>>
Note:
Use raw_input() for Python 2.x
Use input for Python 3.x
To get data in Python from an URL (and print it):
import requests
r = requests.get('http://www.omdbapi.com/?t=UN%20HOMME%20ID%C3%89AL')
print(r.text)
To parse a json in Python
import requests
import json
r = requests.get('http://www.omdbapi.com/?t=UN%20HOMME%20ID%C3%89AL')
json.loads(r.text)
You will have a JSON object.
To convert from JSON to tsv you may use tablib.
To create a excel document in Python
you may use openpyxl (more tools at python-excel.org).
Related
This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
What are the differences between the urllib, urllib2, urllib3 and requests module?
(11 answers)
Closed last month.
I want to dynamically query Google Maps through the Google Directions API. As an example, this request calculates the route from Chicago, IL to Los Angeles, CA via two waypoints in Joplin, MO and Oklahoma City, OK:
http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false
It returns a result in the JSON format.
How can I do this in Python? I want to send such a request, receive the result and parse it.
I recommend using the awesome requests library:
import requests
url = 'http://maps.googleapis.com/maps/api/directions/json'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below
JSON Response Content: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content
The requests Python module takes care of both retrieving JSON data and decoding it, due to its builtin JSON decoder. Here is an example taken from the module's documentation:
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
So there is no use of having to use some separate module for decoding JSON.
requests has built-in .json() method
import requests
requests.get(url).json()
import urllib
import json
url = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
result = json.load(urllib.urlopen(url))
Use the requests library, pretty print the results so you can better locate the keys/values you want to extract, and then use nested for loops to parse the data. In the example I extract step by step driving directions.
import json, requests, pprint
url = 'http://maps.googleapis.com/maps/api/directions/json?'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
data = requests.get(url=url, params=params)
binary = data.content
output = json.loads(binary)
# test to see if the request was valid
#print output['status']
# output all of the results
#pprint.pprint(output)
# step-by-step directions
for route in output['routes']:
for leg in route['legs']:
for step in leg['steps']:
print step['html_instructions']
just import requests and use from json() method :
source = requests.get("url").json()
print(source)
OR you can use this :
import json,urllib.request
data = urllib.request.urlopen("url").read()
output = json.loads(data)
print (output)
Try this:
import requests
import json
# Goole Maps API.
link = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
# Request data from link as 'str'
data = requests.get(link).text
# convert 'str' to Json
data = json.loads(data)
# Now you can access Json
for i in data['routes'][0]['legs'][0]['steps']:
lattitude = i['start_location']['lat']
longitude = i['start_location']['lng']
print('{}, {}'.format(lattitude, longitude))
Also for pretty Json on console:
json.dumps(response.json(), indent=2)
possible to use dumps with indent. (Please import json)
This question already has answers here:
HTTP requests and JSON parsing in Python [duplicate]
(8 answers)
Closed 8 months ago.
How would I parse a json api response with python?
I currently have this:
import urllib.request
import json
url = 'https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty'
def response(url):
with urllib.request.urlopen(url) as response:
return response.read()
res = response(url)
print(json.loads(res))
I'm getting this error:
TypeError: the JSON object must be str, not 'bytes'
What is the pythonic way to deal with json apis?
Version 1: (do a pip install requests before running the script)
import requests
r = requests.get(url='https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
print(r.json())
Version 2: (do a pip install wget before running the script)
import wget
fs = wget.download(url='https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
with open(fs, 'r') as f:
content = f.read()
print(content)
you can use standard library python3:
import urllib.request
import json
url = 'http://www.reddit.com/r/all/top/.json'
req = urllib.request.Request(url)
##parsing response
r = urllib.request.urlopen(req).read()
cont = json.loads(r.decode('utf-8'))
counter = 0
##parcing json
for item in cont['data']['children']:
counter += 1
print("Title:", item['data']['title'], "\nComments:", item['data']['num_comments'])
print("----")
##print formated
#print (json.dumps(cont, indent=4, sort_keys=True))
print("Number of titles: ", counter)
output will be like this one:
...
Title: Maybe we shouldn't let grandma decide things anymore.
Comments: 2018
----
Title: Carrie Fisher and Her Stunt Double Sunbathing on the Set of Return of The Jedi, 1982
Comments: 880
----
Title: fidget spinner
Comments: 1537
----
Number of titles: 25
I would usually use the requests package with the json package. The following code should be suitable for your needs:
import requests
import json
url = 'https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty'
r = requests.get(url)
print(json.loads(r.content))
Output
[11008076,
11006915,
11008202,
....,
10997668,
10999859,
11001695]
The only thing missing in the original question is a call to the decode method on the response object (and even then, not for every python3 version). It's a shame no one pointed that out and everyone jumped on a third party library.
Using only the standard library, for the simplest of use cases :
import json
from urllib.request import urlopen
def get(url, object_hook=None):
with urlopen(url) as resource: # 'with' is important to close the resource after use
return json.load(resource, object_hook=object_hook)
Simple use case :
data = get('http://url') # '{ "id": 1, "$key": 13213654 }'
print(data['id']) # 1
print(data['$key']) # 13213654
Or if you prefer, but riskier :
from types import SimpleNamespace
data = get('http://url', lambda o: SimpleNamespace(**o)) # '{ "id": 1, "$key": 13213654 }'
print(data.id) # 1
print(data.$key) # invalid syntax
# though you can still do
print(data.__dict__['$key'])
With Python 3
import requests
import json
url = 'http://IP-Address:8088/ws/v1/cluster/scheduler'
r = requests.get(url)
data = json.loads(r.content.decode())
I'm trying to write a simple script in python using urllib2 and json where I print the json to the console.
Currently I'm having trouble getting the auth_signature value. I already have the url variable setup with the appropriate keys except for the auth_signature. How do I go about this?
Here is what I have:
import json
import urllib2
import oauth2
timestamp = oauth2.generate_timestamp
nonce = oauth2.generate_nonce
url = "http://api.yelp.com/v2/search?term=food&location=Seattle&callback=callbackYelpAuth&oauth_consumer_key=XXX&oauth_consumer_secret=XXX&oauth_token=XXX&oauth_signature_method=HMAC-SHA1&oauth_timestamp=" + str(timestamp) + "&oauth_nonce=" + str(nonce) + "&oauth_signature=" + str(????)
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
print data
Following code is working fine. Hope it will help you.
import urlparse
url = "http://api.yelp.com/v2/search?term=food&location=Seattle&callback=callbackYelpAuth&oauth_consumer_key=XXX&oauth_consumer_secret=XXX&oauth_token=XXX&oauth_signature_method=HMAC-SHA1&oauth_timestamp=temstamp_value&oauth_nonce=nonce_value&oauth_signature=signature_value"
parsed = urlparse.urlparse(url)
params = urlparse.parse_qsl(parsed.query)
for x,y in params:
print "Parameter = "+x,", Value = "+y
I am using JSON library and trying to import a page feed to an CSV file. Tried many a ways to get the result however every time code execute it Gives JSON not serialzable. No Facebook use auth code which I have and used it so connection string will change however if you use a page which has public privacy you will still be able to get the result from below code.
following is the code
import urllib3
import json
import requests
#from pprint import pprint
import csv
from urllib.request import urlopen
page_id = "abcd" # username or id
api_endpoint = "https://graph.facebook.com"
fb_graph_url = api_endpoint+"/"+page_id
try:
#api_request = urllib3.Requests(fb_graph_url)
#http = urllib3.PoolManager()
#api_response = http.request('GET', fb_graph_url)
api_response = requests.get(fb_graph_url)
try:
#print (list.sort(json.loads(api_response.read())))
obj = open('data', 'w')
# write(json_dat)
f = api_response.content
obj.write(json.dumps(f))
obj.close()
except Exception as ee:
print(ee)
except Exception as e:
print( e)
Tried many approach but not successful. hope some one can help
api_response.content is the text content of the API, not a Python object so you won't be able to dump it.
Try either:
f = api_response.content
obj.write(f)
Or
f = api_response.json()
obj.write(json.dumps(f))
requests.get(fb_graph_url).content
is probably a string. Using json.dumps on it won't work. This function expects a list or a dictionary as the argument.
If the request already returns JSON, just write it to the file.
I have a URL:
http://somewhere.com/relatedqueries?limit=2&query=seedterm
where modifying the inputs, limit and query, will generate wanted data. Limit is the max number of term possible and query is the seed term.
The URL provides text result formatted in this way:
oo.visualization.Query.setResponse({version:'0.5',reqId:'0',status:'ok',sig:'1303596067112929220',table:{cols:[{id:'score',label:'Score',type:'number',pattern:'#,##0.###'},{id:'query',label:'Query',type:'string',pattern:''}],rows:[{c:[{v:0.9894380670262618,f:'0.99'},{v:'newterm1'}]},{c:[{v:0.9894380670262618,f:'0.99'},{v:'newterm2'}]}],p:{'totalResultsCount':'7727'}}});
I'd like to write a python script that takes two arguments (limit number and the query seed), go fetch the data online, parse the result and return a list with the new terms ['newterm1','newterm2'] in this case.
I'd love some help, especially with the URL fetching since I have never done this before.
It sounds like you can break this problem up into several subproblems.
Subproblems
There are a handful of problems that need to be solved before composing the completed script:
Forming the request URL: Creating a configured request URL from a template
Retrieving data: Actually making the request
Unwrapping JSONP: The returned data appears to be JSON wrapped in a JavaScript function call
Traversing the object graph: Navigating through the result to find the desired bits of information
Forming the request URL
This is just simple string formatting.
url_template = 'http://somewhere.com/relatedqueries?limit={limit}&query={seedterm}'
url = url_template.format(limit=2, seedterm='seedterm')
Python 2 Note
You will need to use the string formatting operator (%) here.
url_template = 'http://somewhere.com/relatedqueries?limit=%(limit)d&query=%(seedterm)s'
url = url_template % dict(limit=2, seedterm='seedterm')
Retrieving data
You can use the built-in urllib.request module for this.
import urllib.request
data = urllib.request.urlopen(url) # url from previous section
This returns a file-like object called data. You can also use a with-statement here:
with urllib.request.urlopen(url) as data:
# do processing here
Python 2 Note
Import urllib2 instead of urllib.request.
Unwrapping JSONP
The result you pasted looks like JSONP. Given that the wrapping function that is called (oo.visualization.Query.setResponse) doesn't change, we can simply strip this method call out.
result = data.read()
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
Parsing JSON
The resulting result string is just JSON data. Parse it with the built-in json module.
import json
result_object = json.loads(result)
Traversing the object graph
Now, you have a result_object that represents the JSON response. The object itself be a dict with keys like version, reqId, and so on. Based on your question, here is what you would need to do to create your list.
# Get the rows in the table, then get the second column's value for
# each row
terms = [row['c'][2]['v'] for row in result_object['table']['rows']]
Putting it all together
#!/usr/bin/env python3
"""A script for retrieving and parsing results from requests to
somewhere.com.
This script works as either a standalone script or as a library. To use
it as a standalone script, run it as `python3 scriptname.py`. To use it
as a library, use the `retrieve_terms` function."""
import urllib.request
import json
import sys
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
# Strip JSONP function wrapper
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
# Deserialize JSON to Python objects
result_object = json.loads(result)
# Get the rows in the table, then get the second column's value
# for each row
return [row['c'][2]['v'] for row in result_object['table']['rows']]
def retrieve_terms(limit, seedterm):
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://somewhere.com/relatedqueries?limit={limit}&query={seedterm}'
url = url_template.format(limit=limit, seedterm=seedterm)
try:
with urllib.request.urlopen(url) as data:
data = perform_request(limit, seedterm)
result = data.read()
except:
print('Could not request data from server', file=sys.stderr)
exit(E_OPERATION_ERROR)
terms = parse_result(result)
print(terms)
def main(limit, seedterm):
"""Retrieves and parses data and prints each term to standard output"""
terms = retrieve_terms(limit, seedterm)
for term in terms:
print(term)
if __name__ == '__main__'
try:
limit = int(sys.argv[1])
seedterm = sys.argv[2]
except:
error_message = '''{} limit seedterm
limit must be an integer'''.format(sys.argv[0])
print(error_message, file=sys.stderr)
exit(2)
exit(main(limit, seedterm))
Python 2.7 version
#!/usr/bin/env python2.7
"""A script for retrieving and parsing results from requests to
somewhere.com.
This script works as either a standalone script or as a library. To use
it as a standalone script, run it as `python2.7 scriptname.py`. To use it
as a library, use the `retrieve_terms` function."""
import urllib2
import json
import sys
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
prefix = 'oo.visualization.Query.setResponse('
suffix = ');'
# Strip JSONP function wrapper
if result.startswith(prefix) and result.endswith(suffix):
result = result[len(prefix):-len(suffix)]
# Deserialize JSON to Python objects
result_object = json.loads(result)
# Get the rows in the table, then get the second column's value
# for each row
return [row['c'][2]['v'] for row in result_object['table']['rows']]
def retrieve_terms(limit, seedterm):
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://somewhere.com/relatedqueries?limit=%(limit)d&query=%(seedterm)s'
url = url_template % dict(limit=2, seedterm='seedterm')
try:
with urllib2.urlopen(url) as data:
data = perform_request(limit, seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
terms = parse_result(result)
print terms
def main(limit, seedterm):
"""Retrieves and parses data and prints each term to standard output"""
terms = retrieve_terms(limit, seedterm)
for term in terms:
print term
if __name__ == '__main__'
try:
limit = int(sys.argv[1])
seedterm = sys.argv[2]
except:
error_message = '''{} limit seedterm
limit must be an integer'''.format(sys.argv[0])
sys.stderr.write('%s\n' % error_message)
exit(2)
exit(main(limit, seedterm))
i didn't understand well your problem because from your code there it seem to me that you use Visualization API (it's the first time that i hear about it by the way).
But well if you are just searching for a way to fetch data from a web page you could use urllib2 this is just for getting data, and if you want to parse the retrieved data you will have to use a more appropriate library like BeautifulSoop
if you are dealing with another web service (RSS, Atom, RPC) rather than web pages you can find a bunch of python library that you can use and that deal with each service perfectly.
import urllib2
from BeautifulSoup import BeautifulSoup
result = urllib2.urlopen('http://somewhere.com/relatedqueries?limit=%s&query=%s' % (2, 'seedterm'))
htmletxt = resul.read()
result.close()
soup = BeautifulSoup(htmltext, convertEntities="html" )
# you can parse your data now check BeautifulSoup API.