storing and retrieving lists from files - python

I have a very big list of lists. One of my programs does this:
power_time_array = [[1,2,3],[1,2,3]] # In a short form
with open (file_name,'w') as out:
out.write (str(power_time_array))
Now another independent script need to read this list of lists back.
How do I do this?
What I have tried:
with open (file_name,'r') as app_trc_file :
power_trace_of_application.append (app_trc_file.read())
Note: power_trace_application is a list of list of lists.
This stores it as a list with one element as a huge string.
How does one efficiently store and retrieve big lists or list of lists from files in python?

You can serialize your list to json and deserialize it back. This really doesn't change anything in representation, your list is already valid json:
import json
power_time_array = [[1,2,3],[1,2,3]] # In a short form
with open (file_name,'w') as out:
json.dump(power_time_array, out)
and then just read it back:
with open (file_name,'r') as app_trc_file :
power_trace_of_application = json.load(app_trc_file)
For speed, you can use a json library with C backend (like ujson). And this works with custom objects too.

Use Json library to efficiently read and write structured information (in the form of JSON) to a text file.
To write data on the file, use json.dump() , and
To retrieve json data from file, use json.load()

It will be faster:
from ast import literal_eval
power_time_array = [[1,2,3],[1,2,3]]
with open(file_name, 'w') as out:
out.write(repr(power_time_array))
with open(file_name,'r') as app_trc_file:
power_trace_of_application.append(literal_eval(app_trc_file.read()))

Related

How to generate *.dict and *.idx file in python?

Basically I have created a dictionary in python, say
dictionary = {'test': [1, 2, 3], 'other': [100]}
but I want to now write a program that would generate a dict file(say file1.dict) containing the dictionary and a idx file(say file2.idx) containing its inverted index posting.
I would suggest using .json format for any objects you want to store. For that you can import the json library and then use json.dump to store the object and json.load to get it back.
If you want to mess with the format a bit you can use json.dumps which returns a string, then change whatever you want in that string and writing it to a file.
I would not suggest using the file formats you are describing because they are not standard as someone else mentioned.
First to create a file use the following code-
Note: as .dict and .idx are not standard extensions I have used .txt
dict_file = open("sample.txt", "w+")
idx_file = open("sample.txt", "w+")
Now to write something in it use-
dict_file.write(#pass the dictionary in it)
idx_file.write(#pass the inverted index)
At last add the following code to close it-
dict_file.close()
idx_file.close()

How to append a json line to a loaded json file?

I am writing Python code.
I loaded a json file.
with open('..\config_4099.json', "r") as fid:
jaySon = json.load(fid)
It's a flat json structure, so no internal elements to append to. Just need to tack onto the bottom the piece in curlies:
jaySon.append({'pluginInputs': "PluginInputs"})
It's complaining about dictionaries.
What's the best way to do this?
With dicts, use update:
jaySon.update({'pluginInputs': "PluginInputs"})

Best way to store dictionary in a file and load it partially?

Which is the best way to store dictionary of strings in file(as they are big) and load it partially in python. Dictionary of strings here means, keyword would be a string and the value would be a list of strings.
Dictionary storing in appended form to check keys, if available not update or else update. Then use keys for post processing.
Usually a dictionary is stored in JSON.
I'll leave here a link:
Convert Python dictionary to JSON array
You could simply write the dictionary to a text file, and then create a new dictionary that only pulls certain keys and values from that text file.
But you're probably best off exploring the json module.
Here's a straighforward way to write a dict called "sample" to a file with the json module:
import json
with open('result.json', 'w') as fp:
json.dump(sample, fp)
On the loading side, we'd need to know more about how you want to choose which keys to load from the JSON file.
The above answers are great, but i hate using JSON, i have had issues with pickle before that corrupted my data, so what i do is, i use numpy's save and load
To save np.save(filename,dict)
to load dict = np.load(filename).item()
really simple and works well, as far as loading partially goes, you could always split the dictionary into multiple smaller dictionaries and save them as individual files, maybe not a very concrete solution but it could work
to split the dictionary you could do something like this
temp_dict = {}
for i,k in enumerate(dict.keys()):
if i%1000 == 0:
np.save("records-"+str(i-1000)+"-"+str(i)+".npy",temp_dict)
temp_dict = {}
temp_dict[k]=dict[k].value()
then for loading just do something like
my_dict={}
all_files = glob.glob("*.npy")
for f in all_files:
dict = np.load(filename).item()
my_dict.update(dict)
If this is for some sort of database type use then save yourself the headache and use TinyDB. It uses JSON format when saving to disc and will provide you the "partial" loading that you're looking for.
I only recommend TinyDB as this seems to be the closest to what you're looking to achieve, maybe try googling for other databases if this isn't your fancy there's TONS of them out there!

Deserialize json array directly to a set in python

Is there a way to deserialize a json array directly to a set?
data.json (yes this is just a json array.)
["a","b","c"]
Notice that the json array contains unique elements.
Currently my workflow is the following.
open_file = open(path, 'r')
json_load = json.load(open_file) # this returns a list
return set(json_load) # which I am then converting to a set.
Is there a way to do something like this?
open_file = open(path, 'r')
return json.load(open_file, **arguments) # this returns a set.
Also is there any other way to go about doing it without the json module perhaps? Surely I am not the first one to need a set decoder.
No. You would have to subclass one of the json module classes JSONDecoder and override the method that creates the object, to do it yourself.
And it is also not worth the trouble. json arrays really map to lists in python - they have order, and can allow duplicates - a set can't correctly represent a json array. Therefore it is not the job of a json decoder to provide a set.
Converting is the best you can do. You could create a function and call it when you need:
def json_load_set(f):
return set(json.load(f))

Writing a set to an output file in python

I usually use json for lists, but it doesn't work for sets. Is there a similar function to write a set into an output file,f? Something like this, but for sets:
f=open('kos.txt','w')
json.dump(list, f)
f.close()
json is not a python-specific format. It knows about lists and dictionaries, but not sets or tuples.
But if you want to persist a pure python dataset you could use string conversion.
with open('kos.txt','w') as f:
f.write(str({1,3,(3,5)})) # set of numbers & a tuple
then read it back again using ast.literal_eval
import ast
with open('kos.txt','r') as f:
my_set = ast.literal_eval(f.read())
this also works for lists of sets, nested lists with sets inside... as long as the data can be evaluated literally and no sets are empty (a known limitation of literal_eval). So basically serializing (almost) any python basic object structure with str can be parsed back with it.
For the empty set case there's a kludge to apply since set() cannot be parsed back.
import ast
with open('kos.txt','r') as f:
ser = f.read()
my_set = set() if ser == str(set()) else ast.literal_eval(ser)
You could also have used the pickle module, but it creates binary data, so more "opaque", and there's also a way to use json: How to JSON serialize sets?. But for your needs, I would stick to str/ast.literal_eval
Using ast.literal_eval(f.read()) will give error ValueError: malformed node or string, if we write empty set in file. I think, pickle would be better to use.
If set is empty, this will give no error.
import pickle
s = set()
##To save in file
with open('kos.txt','wb') as f:
pickle.dump(s, f)
##To read it again from file
with open('kos.txt','rb') as f:
my_set = pickle.load(f)

Categories