I have this script which abstract the json objects from the webpage. The json objects are converted into dictionary. Now I need to write those dictionaries in a file. Here's my code:
#!/usr/bin/python
import requests
r = requests.get('https://github.com/timeline.json')
for item in r.json or []:
print item['repository']['name']
There are ten lines in a file. I need to write the dictionary in that file which consist of ten lines..How do I do that? Thanks.
To address the original question, something like:
with open("pathtomyfile", "w") as f:
for item in r.json or []:
try:
f.write(item['repository']['name'] + "\n")
except KeyError: # you might have to adjust what you are writing accordingly
pass # or sth ..
note that not every item will be a repository, there are also gist events (etc?).
Better, would be to just save the json to file.
#!/usr/bin/python
import json
import requests
r = requests.get('https://github.com/timeline.json')
with open("yourfilepath.json", "w") as f:
f.write(json.dumps(r.json))
then, you can open it:
with open("yourfilepath.json", "r") as f:
obj = json.loads(f.read())
Related
The goal is to open a json file or websites so that I can view earthquake data. I create a json function that use dictionary and a list but within the terminal an error appears as a invalid argument. What is the best way to open a json file using python?
import requests
`def earthquake_daily_summary():
req = requests.get("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson")
data = req.json() # The .json() function will convert the json data from the server to a dictionary
# Open json file
f = open('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson')
# returns Json oject as a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['emp_details']:
print(i)
f.close()
print("\n=========== PROBLEM 5 TESTS ===========")
earthquake_daily_summary()`
You can immediately convert the response to json and read the data you need.
I didn't find the 'emp_details' key, so I replaced it with 'features'.
import requests
def earthquake_daily_summary():
data = requests.get("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson").json()
for row in data['features']:
print(row)
print("\n=========== PROBLEM 5 TESTS ===========")
earthquake_daily_summary()
I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia
With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.
Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]
I have a group of .jsonl.gz files.
I can read them using the script:
import json
import gzip
with gzip.open(filepath, "r") as read_file: # file path ends with .jsonl.gz
try:
# read gzip file which contains a list of json files (json lines)
# each json file is a dictionary of nested dictionaries
json_list = list(read_file)
except:
print("fail to read thezip ")
Then I do some processing and get some .json files and store them in a list.
for num, json_file in enumerate(json_list):
try:
j_file = json.loads(json_file)
(...some code...)
except:
print("fail")
My question is what is the right way to write them again into .jsonl.gz again?
This is my attempt
jsonfilename = 'valid_' +str(num)+'.jsonl.gz'
with gzip.open(jsonfilename, 'wb') as f:
for dict in list_of_nested_dictionaries:
content.append(json.dumps(dict).encode('utf-8'))
f.write(content)
But I got this error:
TypeError: memoryview: a bytes-like object is required, not 'list'
Then I tried just to gzip the list of dictionaries as is:
jsonfilename = 'valid_' +str(num)+'.jsonl.gz'
with gzip.open(jsonfilename, 'wb') as f:
f.write(json.dumps(list_of_nested_dictionaries).encode('utf-8'))
But the problem here that it gzips the whole list as one block, and when I read it back I got one element which is the whole stored list but not a list of json files as I got from the first step.
this is the code that i use for reading
with gzip.open('valid_3.jsonl.gz', "r" , ) as read_file:
try:
json_list = list(read_file) # read zip file
print(len(json_list))# I got 1 here
except:
print("fail")
json_list[0].decode('utf-8')
f.write(content) takes a byte-string, but you're passing it a list of byte-strings.
f.writelines(content) will iterate over and write each byte-string from the list.
Edit: by the way, gzip is meant for compressing a single file. If you need to compress multiple files into one, I suggest to pack them together in a tarball first and then gzip that.
the solution is simply like this
with gzip.open(jsonfilename, 'wb') as f:
for dict in list_of_nested_dictionaries:
content.append((json.dumps(dict)+'\n').encode('utf-8'))
f.writelines(content)
I am trying to read a JSON file with Python. This file is described by the authors as not strict JSON. In order to convert it to strict JSON, they suggest this approach:
import json
def parse(path):
g = gzip.open(path, 'r')
for l in g:
yield json.dumps(eval(l))
however, not being familiar with Python, I am able to execute the script but I am not able to produce any output file with the new clean JSON. How should I modify the script in order to produce a new JSON file? I have tried this:
import json
class Amazon():
def parse(self, inpath, outpath):
g = open(inpath, 'r')
out = open(outpath, 'w')
for l in g:
yield json.dumps(eval(l), out)
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
but the output is an empty file. Any help more than welcome
import json
class Amazon():
def parse(self, inpath, outpath):
g = open(inpath, 'r')
with open(outpath, 'w') as fout:
for l in g:
fout.write(json.dumps(eval(l)))
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
another shorter way of doing this
import json
class Amazon():
def parse(readpath, writepath):
with open(readpath) as g, open(writepath, 'w') as fout:
for l in g:
json.dump(eval(l), fout)
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
While handling json data it is better to use json modules json.dump(json, output_file) for dumping json in file and json.load(file_path) to load the data. In this way you can get maintain json wile saving and reading json data.
For very large amount of data say 1k+ use python pandas module.
I am running Python 3.x. So i have been working on some code for fetching data on currencies names around the world from a currency website to get information which the code is as follows
def _fetch_currencies():
import urllib.request
import json
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
dumps = json.dumps(decoded, indent=4)
return dumps
I then need to save it as a file locally but having some issue and cant see where.
Here is the code for saving the currencies:
def save_currencies(_fetch_currencies, filename):
sorted_currencies = sorted(decoded.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
They just don't seem to work together apart from when i remove the line ' dumps = json.dumps(decoded, indent=4) ' but i need that line to be able to print the file in text, how do i get around deleting this line and still be able to save and print? How do i also pick where it saves?
Any Help will be great, thank you very much anyone and everyone who answers/reads this.
I may be mistaken, but your "decoded" variable should be declared as global in both functions.
I would actually have _fetch_currencies() return a dictionary, and then I would pass that dictionary on to saved_currencies(currencies_decoded, filename). For example:
def _fetch_currencies():
import urllib.request
import json
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
return decoded
def save_currencies(currencies_decoded, filename):
sorted_currencies = sorted(currencies_decoded.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
my_currencies_decoded = _fetch_currencies()
save_currencies(my_currencies_decoded, "filename.csv")
Furthermore, if you would like to save your csv file to a certain location in your filesystem, you can import os and use the os.path.join() function and provide it the FULL path. For example, to save your .csv file to a location called "/Documents/Location/Here", you can do:
import os
def save_currencies(currencies_decoded, filename):
sorted_currencies = sorted(currencies_decoded.items())
with open(os.path.join("Documents","Location","Here"), 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted_currencies)
You can also use a relative path, so that if you're already in directory "Documents", and you'd like to save a file in "/Documents/Location/Here", you can instead just say:
with open(os.path.join("Location", "Here"), 'w') as my_csv: