I am trying to change the format in which a file is written. The file is written in a generic file format. I am putting headers into the file and writing it so that it can be read by another program in the future by simply using a pandas dictionary or an xarray datarray. In order to do this, I am trying to make columns in the file that are more separate than what I have now. I have the following code:
def cvrtpa(fle,separation_character,(labels)):
import numpy as np
import pandas as pd
labels[-1]=labels[-1]+'\n'
flabels=labels
finlabel = np.array([flabel + separation_character for flabel in flabels])
infile=open(fle,'r')
templines=infile.readlines()
vardict = {} #dict version
for i in finlabel:
for j in range(len(templines)):
split=templines[j]
x = split.split()
vardict.setdefault(finlabel[i],[]).append(x)
infile.close()
outfile=open(fle, 'w')
outfile.write(finlabel)
outfile.write(temp)
outfile.close()
mfile=open(fle,'r')
data=mfile.readlines()
return data
fl='.../Summer 2016/Data/Test/Test'
label=['Year','Month','Day','Hour','Minute','Precipitation']
xx=cvrtpa(fl,'',label)
I am not overly familiar with dictionaries and so it has been a bit difficult to come up with the code I have. I know there may be inconsistencies/errors.
import json
# Create some random data structure
animals = zip(['dogs', 'cats', 'mice'], [124156, 858532, 812885])
data = {k:{v: {k: [v, v, v, k, v, {k: [k, k, k, k]}]}} for k, v in animals}
# Create a new file, dump the data to it
with open('filename.ext', 'w+') as file:
json.dump(data, file, indent=4, sort_keys=False)
# Open the same file, load it back as a new variable
with open('filename.ext') as file:
new_dictionary = json.load(file)
# Make some changes to the dict
new_dictionary['new_key'] = 'hello python'
# Open the file back up again and rewrite the new data
with open('filename.ext', 'w+') as file:
json.dump(new_dictionary, file, indent=4)
Related
I need help with improving my script's execution time.
It does what it suppose to do:
Reads a file line by line
Matches the line with the content of json file
Writes both the matching lines with the corresponding information from json file into a new txt file
The problem is with execution time, the file has more than 500,000 lines and the json file contains much more.
How can I optimize this script?
import json
import time
start = time.time()
print start
JsonFile=open('categories.json')
data = json.load(JsonFile)
Annotated_Data={}
FileList = [line.rstrip('\n') for line in open("FilesNamesID.txt")]
for File in FileList:
for key, value in data.items():
if File == key:
Annotated_Data[key]=(value)
with open('Annotated_Files.txt', 'w') as outfile:
json.dump(Annotated_Data, outfile, indent=4)
end = time.time()
print(end - start)
There is no need for the nested for loop to look up the File in data. You could replace it with the following code:
for File in FileList:
if File in data:
Annotated_Data[File]=data[File]
or with a comprehension:
AnnotatedData = {File: data[File] for File in FileList if File in data}
You can also avoid copying the contents of the whole FilesNamesID.txt to the new list - you are consuming it line by line anyway - but it would be a relatively minor improvement.
I don't know exact format of your data, but you could try speed-up your script by using set():
json_data = '''
{
"file1": "data1",
"file2": "data2",
"file3": "data3"
}
'''
filenames_id_txt = '''
file1
file3
'''
import json
data = json.loads(json_data)
lines = [l.strip() for l in filenames_id_txt.splitlines() if l.strip()]
s = set(data.keys())
Annotated_Data = {k: data[k] for k in s.intersection(lines)}
print(json.dumps(Annotated_Data))
Prints:
{"file3": "data3", "file1": "data1"}
EDIT: If I understand your question correctly, you want to find "intersection" between your JSON data and lines in your TXT file.
I chose the set() (doc) to store the JSON keys (set is collection of unique elements). The set() has very fast methods, one of the method is intersection() (doc), which accepts other iterators (e.g. lines from the TXT file) and return a new set with common elements.
I use this new set to construct new dictionary and output it as JSON file.
I am trying to execute below code but it's throwing some error, whereas same code is running on jupyter notebook. I am not sure what's going wrong. The python version is 2 on both platform. This codes takes json file as input, pick 'Data' key and place all values under it into csv file.
command line :
Python version : 2.6.6
$ python Parser.py /data/csdb/stage/fundapiresponse.json /data/csp53/csdb/stage/fundresponse.csv
File "Parser.py", line 27
flat_data = [{k:v for j in i for k, v in j.items()} for i in zip(*[element['Data'] for key in element])]
^
SyntaxError: invalid syntax
Code:
#################################################
# importing libraries
#################################################
import csv
import json
import collections
import sys
#################################################
# Reading input and output file from command line
#################################################
infile = sys.argv[1]
outfile = sys.argv[2]
print infile
print outfile
#################################################
# Read JSON and build CSV layout
#################################################
with open(infile,'r') as f:
data= json.load(f)
with open(outfile, 'w') as f:
for element in data:
flat_data = [{k:v for j in i for k, v in j.items()} for i in zip(*[element['Data'] for key in element])]
csvwriter = DictWriter(f,flat_data[0].keys(),lineterminator='\n')
csvwriter.writerows(flat_data)
Jupyter notebook :
Python 2
#################################################
# importing libraries
#################################################
import csv
import json
import collections
import sys
#################################################
# Reading input and output file from command line
#################################################
infile = 'fundapiresponse.json'
outfile = 'fundresponse.csv'
print infile
print outfile
#################################################
# Read JSON and build CSV layout
#################################################
with open(infile,'r') as f:
data= json.load(f)
with open(outfile, 'w') as f:
for element in data:
flat_data = [{k:v for j in i for k, v in j.items()} for i in zip(*[element['Data'] for key in element])]
csvwriter = DictWriter(f,flat_data[0].keys(),lineterminator='\n')
csvwriter.writerows(flat_data)
Dict comprehensions (PEP274) with curly brackets and colon notation were only introduced in Python2.7. Before that, you had to use the dict constructor with an appropriate list or generator of pairs:
dict((k, v) for j in i for k, v in j.iteritems()) # items works, too
See also Alternative to dict comprehension prior to Python 2.7.
I need a solution to sort my file like the following:
Super:1,4,6
Superboy:2,4,9
My file at the moment looks like this:
Super:1
Super:4
Super:6
I need help to keep track of the scores for each member of the class obtains in the quiz. There are
three classes in the school and the data needs to be kept separately for each class.
My code is below:
className = className +(".txt")#This adds .txt to the end of the file so the user is able to create a file under the name of their chosen name.
file = open(className , 'a') #opens the file in 'append' mode so you don't delete all the information
name = (name)
file.write(str(name + " : " )) #writes the information to the file
file.write(str(score))
file.write('\n')
file.close() #safely closes the file to save the information
You can use a dict to group the data, in particular a collections.OrderedDict to keep the order the names are seen in the original file:
from collections import OrderedDict
with open("class.txt") as f:
od = OrderedDict()
for line in f:
# n = name, s = score
n,s = line.rstrip().split(":")
# if n in dict append score to list
# or create key/value pairing and append
od.setdefault(n, []).append(s)
It is just a matter of writing the dict keys and values to a file to get the output you want using the csv module to give you nice comma separated output.
from collections import OrderedDict
import csv
with open("class.txt") as f, open("whatever.txt","w") as out:
od = OrderedDict()
for line in f:
n,s = line.rstrip().split(":")
od.setdefault(n, []).append(s)
wr = csv.writer(out)
wr.writerows([k]+v for k,v in od.items())
If you want to update the original files, you can write to a tempfile.NamedTemporaryFile and replace the original with the updated using shutil.move:
from collections import OrderedDict
import csv
from tempfile import NamedTemporaryFile
from shutil import move
with open("class.txt") as f, NamedTemporaryFile("w",dir=".",delete=False) as out:
od = OrderedDict()
for line in f:
n, s = line.rstrip().split(":")
od.setdefault(n, []).append(s)
wr = csv.writer(out)
wr.writerows([k]+v for k,v in od.items())
# replace original file
move(out.name,"class.txt")
If you have more than one class just use a loop:
classes = ["foocls","barcls","foobarcls"]
for cls in classes:
with open("{}.txt".format(cls)) as f, NamedTemporaryFile("w",dir=".",delete=False) as out:
od = OrderedDict()
for line in f:
n, s = line.rstrip().split(":")
od.setdefault(n, []).append(s)
wr = csv.writer(out)
wr.writerows([k]+v for k,v in od.items())
move(out.name,"{}.txt".format(cls))
I'll provide some pseudocode to help you out.
First your data structure should look like this:
data = {'name': [score1, score2, score3]}
Then the logic you should follow should be something like this:
Read the file line-by-line
if name is already in dict:
append score to list. example: data[name].append(score)
if name is not in dict:
create new dict entry. example: data[name] = [score]
Iterate over dictionary and write each line to file
I have problem with changing a dict value and saving the dict to a text file (the format must be same), I only want to change the member_phone field.
My text file is the following format:
memberID:member_name:member_email:member_phone
and I split the text file with:
mdict={}
for line in file:
x=line.split(':')
a=x[0]
b=x[1]
c=x[2]
d=x[3]
e=b+':'+c+':'+d
mdict[a]=e
When I try change the member_phone stored in d, the value has changed not flow by the key,
def change(mdict,b,c,d,e):
a=input('ID')
if a in mdict:
d= str(input('phone'))
mdict[a]=b+':'+c+':'+d
else:
print('not')
and how to save the dict to a text file with same format?
Python has the pickle module just for this kind of thing.
These functions are all that you need for saving and loading almost any object:
import pickle
with open('saved_dictionary.pkl', 'wb') as f:
pickle.dump(dictionary, f)
with open('saved_dictionary.pkl', 'rb') as f:
loaded_dict = pickle.load(f)
In order to save collections of Python there is the shelve module.
Pickle is probably the best option, but in case anyone wonders how to save and load a dictionary to a file using NumPy:
import numpy as np
# Save
dictionary = {'hello':'world'}
np.save('my_file.npy', dictionary)
# Load
read_dictionary = np.load('my_file.npy',allow_pickle='TRUE').item()
print(read_dictionary['hello']) # displays "world"
FYI: NPY file viewer
We can also use the json module in the case when dictionaries or some other data can be easily mapped to JSON format.
import json
# Serialize data into file:
json.dump( data, open( "file_name.json", 'w' ) )
# Read data from file:
data = json.load( open( "file_name.json" ) )
This solution brings many benefits, eg works for Python 2.x and Python 3.x in an unchanged form and in addition, data saved in JSON format can be easily transferred between many different platforms or programs. This data are also human-readable.
Save and load dict to file:
def save_dict_to_file(dic):
f = open('dict.txt','w')
f.write(str(dic))
f.close()
def load_dict_from_file():
f = open('dict.txt','r')
data=f.read()
f.close()
return eval(data)
As Pickle has some security concerns and is slow (source), I would go for JSON, as it is fast, built-in, human-readable, and interchangeable:
import json
data = {'another_dict': {'a': 0, 'b': 1}, 'a_list': [0, 1, 2, 3]}
# e.g. file = './data.json'
with open(file, 'w') as f:
json.dump(data, f)
Reading is similar easy:
with open(file, 'r') as f:
data = json.load(f)
This is similar to this answer, but implements the file handling correctly.
If the performance improvement is still not enough, I highly recommend orjson, fast, correct JSON library for Python build upon Rust.
I'm not sure what your first question is, but if you want to save a dictionary to file you should use the json library. Look up the documentation of the loads and puts functions.
I would suggest saving your data using the JSON format instead of pickle format as JSON's files are human-readable which makes your debugging easier since your data is small. JSON files are also used by other programs to read and write data. You can read more about it here
You'll need to install the JSON module, you can do so with pip:
pip install json
# To save the dictionary into a file:
json.dump( data, open( "myfile.json", 'w' ) )
This creates a json file with the name myfile.
# To read data from file:
data = json.load( open( "myfile.json" ) )
This reads and stores the myfile.json data in a data object.
For a dictionary of strings such as the one you're dealing with, it could be done using only Python's built-in text processing capabilities.
(Note this wouldn't work if the values are something else.)
with open('members.txt') as file:
mdict={}
for line in file:
a, b, c, d = line.strip().split(':')
mdict[a] = b + ':' + c + ':' + d
a = input('ID: ')
if a not in mdict:
print('ID {} not found'.format(a))
else:
b, c, d = mdict[a].split(':')
d = input('phone: ')
mdict[a] = b + ':' + c + ':' + d # update entry
with open('members.txt', 'w') as file: # rewrite file
for id, values in mdict.items():
file.write(':'.join([id] + values.split(':')) + '\n')
I like using the pretty print module to store the dict in a very user-friendly readable form:
import pprint
def store_dict(fname, dic):
with open(fname, "w") as f:
f.write(pprint.pformat(dic, indent=4, sort_dicts=False))
# note some of the defaults are: indent=1, sort_dicts=True
Then, when recovering, read in the text file and eval() it to turn the string back into a dict:
def load_file(fname):
try:
with open(fname, "r") as f:
dic = eval(f.read())
except:
dic = {}
return dic
Unless you really want to keep the dictionary, I think the best solution is to use the csv Python module to read the file.
Then, you get rows of data and you can change member_phone or whatever you want ;
finally, you can use the csv module again to save the file in the same format
as you opened it.
Code for reading:
import csv
with open("my_input_file.txt", "r") as f:
reader = csv.reader(f, delimiter=":")
lines = list(reader)
Code for writing:
with open("my_output_file.txt", "w") as f:
writer = csv.writer(f, delimiter=":")
writer.writerows(lines)
Of course, you need to adapt your change() function:
def change(lines):
a = input('ID')
for line in lines:
if line[0] == a:
d=str(input("phone"))
line[3]=d
break
else:
print "not"
I haven't timed it but I bet h5 is faster than pickle; the filesize with compression is almost certainly smaller.
import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'dict2': dict2}, compression=('blosc', 9))
file_name = open("data.json", "w")
json.dump(test_response, file_name)
file_name.close()
or use context manager, which is better:
with open("data.json", "w") as file_name:
json.dump(test_response, file_name)
I'm used to bringing data in and out of Python using CSV files, but there are obvious challenges to this. Are there simple ways to store a dictionary (or sets of dictionaries) in a JSON or pickle file?
For example:
data = {}
data ['key1'] = "keyinfo"
data ['key2'] = "keyinfo2"
I would like to know both how to save this, and then how to load it back in.
Pickle save:
try:
import cPickle as pickle
except ImportError: # Python 3.x
import pickle
with open('data.p', 'wb') as fp:
pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)
See the pickle module documentation for additional information regarding the protocol argument.
Pickle load:
with open('data.p', 'rb') as fp:
data = pickle.load(fp)
JSON save:
import json
with open('data.json', 'w') as fp:
json.dump(data, fp)
Supply extra arguments, like sort_keys or indent, to get a pretty result. The argument sort_keys will sort the keys alphabetically and indent will indent your data structure with indent=N spaces.
json.dump(data, fp, sort_keys=True, indent=4)
JSON load:
with open('data.json', 'r') as fp:
data = json.load(fp)
Minimal example, writing directly to a file:
import json
json.dump(data, open(filename, 'wb'))
data = json.load(open(filename))
or safely opening / closing:
import json
with open(filename, 'wb') as outfile:
json.dump(data, outfile)
with open(filename) as infile:
data = json.load(infile)
If you want to save it in a string instead of a file:
import json
json_str = json.dumps(data)
data = json.loads(json_str)
Also see the speeded-up package ujson:
import ujson
with open('data.json', 'wb') as fp:
ujson.dump(data, fp)
To write to a file:
import json
myfile.write(json.dumps(mydict))
To read from a file:
import json
mydict = json.loads(myfile.read())
myfile is the file object for the file that you stored the dict in.
If you want an alternative to pickle or json, you can use klepto.
>>> init = {'y': 2, 'x': 1, 'z': 3}
>>> import klepto
>>> cache = klepto.archives.file_archive('memo', init, serialized=False)
>>> cache
{'y': 2, 'x': 1, 'z': 3}
>>>
>>> # dump dictionary to the file 'memo.py'
>>> cache.dump()
>>>
>>> # import from 'memo.py'
>>> from memo import memo
>>> print memo
{'y': 2, 'x': 1, 'z': 3}
With klepto, if you had used serialized=True, the dictionary would have been written to memo.pkl as a pickled dictionary instead of with clear text.
You can get klepto here: https://github.com/uqfoundation/klepto
dill is probably a better choice for pickling then pickle itself, as dill can serialize almost anything in python. klepto also can use dill.
You can get dill here: https://github.com/uqfoundation/dill
The additional mumbo-jumbo on the first few lines are because klepto can be configured to store dictionaries to a file, to a directory context, or to a SQL database. The API is the same for whatever you choose as the backend archive. It gives you an "archivable" dictionary with which you can use load and dump to interact with the archive.
If you're after serialization, but won't need the data in other programs, I strongly recommend the shelve module. Think of it as a persistent dictionary.
myData = shelve.open('/path/to/file')
# Check for values.
keyVar in myData
# Set values
myData[anotherKey] = someValue
# Save the data for future use.
myData.close()
For completeness, we should include ConfigParser and configparser which are part of the standard library in Python 2 and 3, respectively. This module reads and writes to a config/ini file and (at least in Python 3) behaves in a lot of ways like a dictionary. It has the added benefit that you can store multiple dictionaries into separate sections of your config/ini file and recall them. Sweet!
Python 2.7.x example.
import ConfigParser
config = ConfigParser.ConfigParser()
dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}
dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}
dict3 = {'x':1, 'y':2, 'z':3}
# Make each dictionary a separate section in the configuration
config.add_section('dict1')
for key in dict1.keys():
config.set('dict1', key, dict1[key])
config.add_section('dict2')
for key in dict2.keys():
config.set('dict2', key, dict2[key])
config.add_section('dict3')
for key in dict3.keys():
config.set('dict3', key, dict3[key])
# Save the configuration to a file
f = open('config.ini', 'w')
config.write(f)
f.close()
# Read the configuration from a file
config2 = ConfigParser.ConfigParser()
config2.read('config.ini')
dictA = {}
for item in config2.items('dict1'):
dictA[item[0]] = item[1]
dictB = {}
for item in config2.items('dict2'):
dictB[item[0]] = item[1]
dictC = {}
for item in config2.items('dict3'):
dictC[item[0]] = item[1]
print(dictA)
print(dictB)
print(dictC)
Python 3.X example.
import configparser
config = configparser.ConfigParser()
dict1 = {'key1':'keyinfo', 'key2':'keyinfo2'}
dict2 = {'k1':'hot', 'k2':'cross', 'k3':'buns'}
dict3 = {'x':1, 'y':2, 'z':3}
# Make each dictionary a separate section in the configuration
config['dict1'] = dict1
config['dict2'] = dict2
config['dict3'] = dict3
# Save the configuration to a file
f = open('config.ini', 'w')
config.write(f)
f.close()
# Read the configuration from a file
config2 = configparser.ConfigParser()
config2.read('config.ini')
# ConfigParser objects are a lot like dictionaries, but if you really
# want a dictionary you can ask it to convert a section to a dictionary
dictA = dict(config2['dict1'] )
dictB = dict(config2['dict2'] )
dictC = dict(config2['dict3'])
print(dictA)
print(dictB)
print(dictC)
Console output
{'key2': 'keyinfo2', 'key1': 'keyinfo'}
{'k1': 'hot', 'k2': 'cross', 'k3': 'buns'}
{'z': '3', 'y': '2', 'x': '1'}
Contents of config.ini
[dict1]
key2 = keyinfo2
key1 = keyinfo
[dict2]
k1 = hot
k2 = cross
k3 = buns
[dict3]
z = 3
y = 2
x = 1
If save to a JSON file, the best and easiest way of doing this is:
import json
with open("file.json", "wb") as f:
f.write(json.dumps(dict).encode("utf-8"))
My use case was to save multiple JSON objects to a file and marty's answer helped me somewhat. But to serve my use case, the answer was not complete as it would overwrite the old data every time a new entry was saved.
To save multiple entries in a file, one must check for the old content (i.e., read before write). A typical file holding JSON data will either have a list or an object as root. So I considered that my JSON file always has a list of objects and every time I add data to it, I simply load the list first, append my new data in it, and dump it back to a writable-only instance of file (w):
def saveJson(url,sc): # This function writes the two values to the file
newdata = {'url':url,'sc':sc}
json_path = "db/file.json"
old_list= []
with open(json_path) as myfile: # Read the contents first
old_list = json.load(myfile)
old_list.append(newdata)
with open(json_path,"w") as myfile: # Overwrite the whole content
json.dump(old_list, myfile, sort_keys=True, indent=4)
return "success"
The new JSON file will look something like this:
[
{
"sc": "a11",
"url": "www.google.com"
},
{
"sc": "a12",
"url": "www.google.com"
},
{
"sc": "a13",
"url": "www.google.com"
}
]
NOTE: It is essential to have a file named file.json with [] as initial data for this approach to work
PS: not related to original question, but this approach could also be further improved by first checking if our entry already exists (based on one or multiple keys) and only then append and save the data.
Shorter code
Saving and loading all types of python variables (incl. dictionaries) with one line of code each.
data = {'key1': 'keyinfo', 'key2': 'keyinfo2'}
saving:
pickle.dump(data, open('path/to/file/data.pickle', 'wb'))
loading:
data_loaded = pickle.load(open('path/to/file/data.pickle', 'rb'))
Maybe it's obvious, but I used the two-row solution in the top answer quite a while before I tried to make it shorter.