I have the following exercise:
Use the json module. First use urllib2 to download this file, then load the json as a python object and use pprint to make it look good when written to the terminal.
Now until now I've only worked with standard Python things (such as the codeacademy course and things such as lists).
What I understand is that I have to import urllib2 and apparently import json in some other way and use pprint...???
This is what I have done, but not sure if I got it right...
import urllib2
response = urllib2.urlopen('https://dl.dropboxusercontent.com/u/153071/test.json')
html = response.read()
import json
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(c) #Just printing a list from earlier in the file, not sure what to print...
You don't need to import pprint. You can specify indentation using the json module itself
import urllib2
import json
response = urllib2.urlopen('https://dl.dropboxusercontent.com/u/153071/test.json')
content_dict = json.loads(response.read())
print json.dumps(content_dict, indent=4)
Related
I'm given a URL which contains some JSON text. In the text there are URL's for csv files. I'm trying to parse the JSON from the URL and download the CSV files. I am able to print out the JSON from the URL but do not know how to grab the CSV files from within.
import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
s = url.read()
print(s)
The above will print the JSON from the URL ( see below printout), there are URL's for csv files that I then need to download using python.
{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}
The following code can help you.
import json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
s = url.read()
loadJson = json.load(s)
results = loadJson["results"]
csvLinks = []
for object in results:
csvlinks.append(object["content"])
Now you have a list of links to CSV files. Download them using urllib.
import json
from collections import namedtuple
#This is your "s" -- data = s
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
This answer from: How to convert JSON data into a Python object loads Json into an object. Now access it via the key it was passed with in json.
print x.content
Of course you'll have to wiggle the code around to get it to work exactly how you want. I'm not really a python expert and have nothing to test with. But the idea is to just load it into a Tuple object and access it via the key.
import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
s = url.read()
# assuming here you got that json content
s='{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}'
d=json.loads(s)
for f in d['results']:
# manage download here
csv_url= f['content']
I want to send image data via urllib python 3.6 library.
i presently have an implementation of a python 2.7 with the help of requests library.
Id there a way to replace the requests lib with the urllib in this code.
import argparse
import io
import os
import sys
import base64
import requests
def read_file_bytestream(image_path):
data = open(image_path, 'rb').read()
return data
if __name__== "__main__":
data=read_file_bytestream("testimg.png")
requests.put("http.//0.0.0.0:8080", files={'image': data})
Here is one way, pretty much taken from the docs:
import urllib.request
def read_file_bytestream(image_path):
data = open(image_path, 'rb').read()
return data
DATA = read_file_bytestream("file.jpg")
req = urllib.request.Request(url='http://httpbin.org/put', data=DATA, method='PUT')
with urllib.request.urlopen(req) as f:
pass
print(f.status)
print(f.reason)
With Python 3, I want to read an XML web page and save it in my local drive.
Also, if the file already exist, it must overwrite it.
I tested some script like :
import urllib.request
xml = urllib.request.urlopen('URL')
data = xml.read()
file = open("file.xml","wb")
file.writelines(data)
file.close()
But I have an error :
TypeError: a bytes-like object is required, not 'int'
First suggestion: do what even the official urllib docs says and don't use urllib, use requests instead.
Your problem is that you use .writelines() and it expects a list of lines, not a bytes objects (for once in Python the error message is not very helpful). Use .write() instead
import requests
resp = requests.get('URL')
with open('file.xml', 'wb') as foutput:
foutput.write(resp.content)
I found a solution :
from urllib.request import urlopen
xml = open("import.xml", "r+")
xml.write(urlopen('URL').read().decode('utf-8'))
xml.close()
Thanks for your help.
I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.
Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?
I'm not sure how to use shutil and os modules, either.
The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!
Here's a partial solution, that I wrote from various answers combined:
import requests
import os
import shutil
global dump
def download_file():
global dump
url = "http://randomsite.com/file.gz"
file = requests.get(url, stream=True)
dump = file.raw
def save_file():
global dump
location = os.path.abspath("D:\folder\file.gz")
with open("file.gz", 'wb') as location:
shutil.copyfileobj(dump, location)
del dump
Could someone point out errors (beginner level) and explain any easier methods to do this?
A clean way to download a file is:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.
This example uses the urllib library, and it will directly retrieve the file form a source.
For Python3+ URLopener is deprecated.
And when used you will get error as below:
url_opener = urllib.URLopener() AttributeError: module 'urllib' has no
attribute 'URLopener'
So, try:
import urllib.request
urllib.request.urlretrieve(url, filename)
As mentioned here:
import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")
EDIT: If you still want to use requests, take a look at this question or this one.
Four methods using wget, urllib and request.
#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile as profile
import urllib
import wget
url = 'https://tinypng.com/images/social/website.jpg'
def testRequest():
image_name = 'test1.jpg'
r = requests.get(url, stream=True)
with open(image_name, 'wb') as f:
for chunk in r.iter_content():
f.write(chunk)
def testRequest2():
image_name = 'test2.jpg'
r = requests.get(url)
i = Image.open(StringIO(r.content))
i.save(image_name)
def testUrllib():
image_name = 'test3.jpg'
testfile = urllib.URLopener()
testfile.retrieve(url, image_name)
def testwget():
image_name = 'test4.jpg'
wget.download(url, image_name)
if __name__ == '__main__':
profile.run('testRequest()')
profile.run('testRequest2()')
profile.run('testUrllib()')
profile.run('testwget()')
testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds
testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds
testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds
testwget - 3489 function calls in 0.020 seconds
I use wget.
Simple and good library if you want to example?
import wget
file_url = 'http://johndoe.com/download.zip'
file_name = wget.download(file_url)
wget module support python 2 and python 3 versions
Exotic Windows Solution
import subprocess
subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)
import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/dnishimoto/python-deep-learning/master/list%20iterators%20and%20generators.ipynb", "test.ipynb")
downloads a single raw juypter notebook to file.
For text files, you can use:
import requests
url = 'https://WEBSITE.com'
req = requests.get(url)
path = "C:\\YOUR\\FILE.html"
with open(path, 'wb') as f:
f.write(req.content)
I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.
I had to disable the firewall(lazy)/enable https out by editing the rules(proper)
created the python script:
import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()
dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
with open("file.ova", 'wb') as tmp_file:
shutil.copyfileobj(response, tmp_file)
ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path
Another clean way to save the file is this:
import csv
import urllib
urllib.retrieve("your url goes here" , "output.csv")
I am wanting to have each tweet be on its own line.
Currently, this breaks at each response (I listed Response_1...I am using through Response_10)
Any ideas?
#!/usr/bin/env python
import urllib
import json
response_1 = urllib.urlopen("http://search.twitter.com/search.json?q=microsoft&page=1")
for i in response_1:
print (i, "\n")
You have to parse the json as a python object first, only then you can iterate over it.
#!/usr/bin/env python
import urllib
import json
response_1 = json.loads(urllib.urlopen("http://search.twitter.com/search.json?q=microsoft&page=1").read())
for i in response_1['results']:
print (i, "\n")