In my script, I'm trying to save a dictionary using cPickle. Everything works fine except the thing that loaded dictionary has modified each key.
My dictionary looks like: {'a':[45,155856,26,98536], 'b':[88,68,9454,78,4125,52]...}
When I print keys from this dictionary before saving it, it prints correct values: 'a','b'...
But when I save it and then load using cPickle, each key contains '\r' after correct char: 'a\r','b\r'...
Here is the code for saving:
def saveSuffixArrayDictA():
for i in self.creation.dictA.keys():
print len(i)
print 'STOP'
with open('dictA','w+') as f:
pickle.dump(self.creation.dictA,f)
Which prints: 1,1,1,1,1,1....STOP (with newlines of course)
Then, when I'm trying to load it using this:
#staticmethod
def dictA():
with open('ddictA','rb') as f:
dict = pickle.load(f)
for i in dict.keys():
print len(i)
print 'STOP'
return dict
It returns: 2,2,2,2,2,2,2,2...STOP (with newlines of course)
As you can see it should be the same but it isn't... where could be the problem please?
EDIT: I tried to print values and realized that each item in list (list is value) has added 'L' at the end of this item which is a number.
Per the docs:
Be sure to always open pickle files created with protocols >= 1 in binary
mode. For the old ASCII-based pickle protocol 0 you can use either text mode
or binary mode as long as you stay consistent. (my emphasis)
Therefore, do not write the pickle file in the text-mode w+, but read it in the binary mode rb.
Instead, use binary modes, wb+ and rb, for both.
When you write in text mode (e.g. w+), \n is mapped to the OS-specific end-of-line character(s). On Windows, \n is mapped to \r\n. That appears to be the source of the errant \rs appearing in the keys.
This is a very strange error and I don't know its reason. But here is another way for saving and loading data structures in python. Just convert your data structure to string using str() and write it to any file. Load the file back, read it in any variable and convert it back to data structure using ast. Demo is given below:
>>> import ast
>>> d={'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10]}
>>> saveDic=str(d)
>>> saveDic
"{'a': [1, 2, 3, 4], 'c': [9, 10], 'b': [5, 6, 7, 8]}"
# save this string to any file, load it back and convert to dictionary using ast
>>> d=ast.literal_eval(saveDic)
>>> d
{'a': [1, 2, 3, 4], 'c': [9, 10], 'b': [5, 6, 7, 8]}
Related
I'm storing f-strings with function calls in a separate file (with lots of variables).
I am writing a script that has hundreds of variables which are then loaded into an HTML table. Some of the contents in the HTML table require function calls.
This works:
def add_one(a):
return a + 1
a = 1
s = f"A is {a} and next comes {add_one(a)}"
print(s)
When I store s in a file, I can use **locals() to format it and it works when I store variables in s.txt.
Contents of s.txt:
A is {a}
Contents of script that works:
a = 1
print(open('s.txt').read().format(**locals()))
However, when I try to call functions, it does not work:
Contents of s.txt:
A is {a} and next comes {add_one(a)}
Contents of script that does not work:
def add_one(a):
return a + 1
a = 1
print(open('s.txt').read().format(**locals()))
What can I do to make it work (given my actual case is hundreds of function calls and not this simple 2 variable example)?
In this example it should result in A is 1 and next comes 2.
You might want to consider using a templating language rather than f-strings if you have a complex HTML table with hundreds of variables. e.g. Jinja2.
For simplicity I've stored the a value in a dictionary as this then simplifies passing it to the Jinja2 render and also converting it to JSON for storing it in a file.
Here is your example using Jinja2 templates and storing the data to a json file:
import json
from pathlib import Path
import jinja2
json_file = Path('/tmp/test_store.json')
jinja_env = jinja2.Environment()
# Set variable values
values = {'a': 3}
# Save to json file
json_file.write_text(json.dumps(values))
# Read from json file to dictionary with new variable name
read_values = json.loads(json_file.read_text())
def add_one(a):
return a + 1
# Add custom filter to jinja environment
jinja_env.filters['add_one'] = add_one
# Define template
template = jinja_env.from_string("A is {{a}} and next comes {{a | add_one}}")
# Print rendered template
print(template.render(read_values))
This gave the output of:
A is 3 and next comes 4
The JSON file is the following:
{"a": 3}
As mentioned in the discussion e.g. here, what you want does not really work in any simple way. There is one obvious workaround: storing an f-string (e.g. f"A is {a} and next comes {add_one(a)}") in your text file and then eval'ing it:
with open('s.txt', 'r') as f:
print(eval(f.read())) # A is 1 and next comes 2
Of course, all the usual warnings about shooting yourself in the foot apply here, but your problem definition sounds exactly like this use case. You can try sandboxing your functions and whatnot, but it generally does not work well. I would say it is still a viable use case for homebrew automation, but it has a massive potential for backfiring, and the only reason I am suggesting it is because alternative solutions are likely to be about as dangerous.
Use serialization and deserialization to store data
import json
data = {
"a" : 1,
"b" : 2,
"name" : "Jack",
"bunch_of_numbers" : [1, 2, 3, 5, 6]
}
file_name = "s.txt"
with open(file_name, 'w') as file:
file.write(json.dumps(data)) #serialization
with open(file_name, 'rb') as file:
data = json.load(file) # de-serialization
print(data)
Output:
{'a': 1, 'b': 2, 'name': 'Jack', 'bunch_of_numbers': [1, 2, 3, 5, 6]}
In my program, I have certain settings that can be modified by the user, saved on the disk, and then loaded when application is restarted. Some these settings are stored as dictionaries. While trying to implement this, I noticed that after a dictionary is restored, it's values cannot be used to access values of another dictionary, because it throws a KeyError: 1 exception.
This is a minimal code example that ilustrates the issue:
import json
motorRemap = {
1: 3,
2: 1,
3: 6,
4: 4,
5: 5,
6: 2,
}
motorPins = {
1: 6,
2: 9,
3: 10,
4: 11,
5: 13,
6: 22
}
print(motorPins[motorRemap[1]]); #works correctly
with open('motorRemap.json', 'w') as fp:
json.dump(motorRemap, fp)
with open('motorRemap.json', 'r') as fp:
motorRemap = json.load(fp)
print(motorPins[motorRemap[1]]); #throws KeyError: 1
You can run this code as it is. First print statement works fine, but after the first dictionary is saved and restored, it doesn't work anymore. Apparently, saving/restoring somehow breaks that dictionary.
I have tried saving and restoring with json and pickle libraries, and both produce in the same error. I tried printing values of the first dictionary after it is restored directly ( print(motorRemap[1]), and it prints out correct values without any added spaces or anything. KeyError usually means that the specified key doesn't exist in the dictionary, but in this instance print statement shows that it does exist - unless some underlying data types have changed or something. So I am really puzzled as to why this is happening.
Can anyone help me understand what is causing this issue, and how to solve it?
What happens becomes clear when you look at what json.dump wrote into motorRemap.json:
{"1": 3, "2": 1, "3": 6, "4": 4, "5": 5, "6": 2}
Unlike Python, json can only use strings as keys. Python, on the other hand, allows many different types for dictionary keys, including booleans, floats and even tuples:
my_dict = {False: 1,
3.14: 2,
(1, 2): 3}
print(my_dict[False], my_dict[3.14], my_dict[(1, 2)])
# Outputs '1 2 3'
The json.dump function automatically converts some of these types to string when you try to save the dictionary to a json file. False becomes "false", 3.14 becomes "3.14" and, in your example, 1 becomes "1". (This doesn't work for the more complex types such as a tuple. You will get a TypeError if you try to json.dump the above dictionary where one of the keys is (1, 2).)
Note how the keys change when you dump and load a dictionary with some of the Python-specific keys:
import json
my_dict = {False: 1,
3.14: 2}
print(my_dict[False], my_dict[3.14])
with open('my_dict.json', 'w') as fp:
json.dump(my_dict, fp)
# Writes {"false": 1, "3.14": 2} into the json file
with open('my_dict.json', 'r') as fp:
my_dict = json.load(fp)
print(my_dict["false"], my_dict["3.14"])
# And not my_dict[False] or my_dict[3.14] which raise a KeyError
Thus, the solution to your issue is to access the values using strings rather than integers after you load the dictionary from the json file.
print(motorPins[motorRemap["1"]]) instead of your last line will fix your code.
From a more general perspective, it might be worth considering keeping the keys as strings from the beginning if you know you will be saving the dictionary into a json file. You could also convert the values back to integers after loading as discussed here; however, that can lead to bugs if not all the keys are integers and is not a very good idea in bigger scale.
Checkout pickle if you want to save the dictionary keeping the Python format. It is, however, not human-readable unlike json and it's also Python-specific so it cannot be used to transfer data to other languages, missing virtually all the main benefits of json.
If you want to save and load the dictionary using pickle, this is how you would do it:
# import pickle
...
with open('motorRemap.b', 'wb') as fp:
pickle.dump(motorRemap, fp)
with open('motorRemap.b', 'rb') as fp:
motorRemap = pickle.load(fp)
...
since the keys (integers) from a dict will be written to the json file as strings, we can modify the reading of the json file. using a dict comprehension restores the original dict values:
...
with open('motorRemap.json', 'r') as fp:
motorRemap = {int(item[0]):item[1] for item in json.load(fp).items()}
...
I am trying to write lists from a file and define them to separate values in a dictionary. The text file would look something like this:
[12, 13, 14]
[87, 45, 32]
...
and then the dictionary would look something like this:
{"score_set0": [12, 13, 14], "score_set1": [87, 45, 32]...}
This is the code I have get so far, but it just returns an empty dictionary
def readScoresFile(fileAddr):
dic = {}
i = 0
with open(fileAddr, "r") as f:
x = len(f.readlines())
for line in f:
dic["score_set{}".format(x[i])] = line
i += 1
return dic
I am only programming at GCSE level (UK OCR syllabus if that helps) in year 10. Thanks for any help anyone can give
Also I am trying to do this without pickle module
x = len(f.readlines()) consumed your whole file, so your subsequent loop over f is iterating an exhausted file handle, sees no remaining lines, and exists immediately.
There's zero need to pre-check the length here (and the only use you make of x is trying to index it, which makes no sense; you avoided a TypeError solely because the loop never ran), so just omit that and use enumerate to get the numbers as you go:
def readScoresFile(fileAddr):
dic = {}
with open(fileAddr, "r") as f:
for i, line in enumerate(f): # Let enumerate manage the numbering for you
dic["score_set{}".format(i)] = line # If you're on 3.6+, dic[f'score_set{i}'] = line is nicer
return dic
Note that this does not actually convert the input lines to lists of int (neither did your original code). If you want to do that, you can change:
dic[f'score_set{i}'] = line
to:
dic[f'score_set{i}'] = ast.literal_eval(line) # Add import ast to top of file
to interpret the line as a Python literal, or:
dic[f'score_set{i}'] = json.loads(line) # Add import json to top of file
to interpret each line as JSON (faster, but supports fewer Python types, and some legal Python literals are not legal JSON).
As a rule, you basically never want to use .readlines(); simply iterating over the file handle will get you the lines live and avoid a memory requirement proportionate to the size of the file. (Frankly, I'd have preferred if they'd gotten rid of it in Py3, since list(f) gets the same result if you really need it, and it doesn't create a visible method that encourages you to do "The Wrong Thing" so often).
By operating line-by-line, you eventually store all the data, but that's better than doubling the overhead by storing both the parsed data in the dict and all the string data it came from in the list.
If you're trying to turn the lines into actual Python lists, I suggest using the json module. (Another option would be ast.literal_eval, since the syntax happens to be the same in this case.)
import json
def read_scores_file(file_path):
with open(file_path) as f:
return {
f"score_set{i}": json.loads(line)
for i, line in enumerate(f)
}
I copied and pasted the output of my code into a text file for later use. This output is a dictionary in which some of the values are numpy arrays but these were copied into the text file as e.g. "key": array([0]).
When I copy and paste back into the IPython console I get the following error: NameError: name 'array' is not defined.
I want to recover the entire dictionary with these numpy arrays converted back to numpy objects to keep using the data. There are several layers of dictionaries stored as values of the "parent" dictionary, many dictionaries per layer and many of these arrays in each dictionary.
Is there any way to recover this dictionary? How would you recommend I save objects for another session the next time?
If you need to recover the output of your previous calculation, what you could do is one of the following:
from numpy import array
do a replace all on your text file array -> numpy.array
Then you pass the text to eval (if you are doing this directly into the command line or you are copy/pasting the data from your file you can skip the eval altogheter. This is useful if you have your data stored inside a string, e.g., after reading it from the file within python)
from numpy import array
a="""
{
'1':array([0]),
'2':'some random text',
'3':123,
'4':{
'4.1':array([1,2,3]),
'4.2':{
'4.2.1':'more nested stuff'
}
}
}
"""
b = eval(a)
print(b)
# {'1': array([0]), '2': 'some random text', '3': 123, '4': {'4.1': array([1, 2, 3]), '4.2': {'4.2.1': 'more nested stuff'}}}
As a side-note, never run eval on outputs from sources other than yourself.
This is literally executing text as python code and is obviously very vulnerable to malicious stuff.
A more secure way would be to use ast.literal_eval from ast. The problem being in this case that for safety reasons, it will always handle python built-ins, which does not include numpy.
Regarding other way to store your data, as suggested in the comments, pickle might do it for you
fname = 'output.pickle'
import pickle
# Sava data into file
with open(fname, 'wb') as f:
pickle.dump(b, f)
# Restore data from file
with open(fname, 'rb') as f:
c = pickle.load(f)
print(c)
# {'1': array([0]), '2': 'some random text', '3': 123, '4': {'4.1': array([1, 2, 3]), '4.2': {'4.2.1': 'more nested stuff'}}}
I have a list of 16 elements, and each element is another 500 elements long. I would like to write this to a txt file so I no longer have to create the list from a simulation. How can I do this, and then access the list again?
Pickle will work, but the shortcoming is that it is a Python-specific binary format. Save as JSON for easy reading and re-use in other applications:
import json
LoL = [ range(5), list("ABCDE"), range(5) ]
with open('Jfile.txt','w') as myfile:
json.dump(LoL,myfile)
The file now contains:
[[0, 1, 2, 3, 4], ["A", "B", "C", "D", "E"], [0, 1, 2, 3, 4]]
To get it back later:
with open('Jfile.txt','r') as infile:
newList = json.load(infile)
print newList
To store it:
import cPickle
savefilePath = 'path/to/file'
with open(savefilePath, 'w') as savefile:
cPickle.dump(myBigList, savefile)
To get it back:
import cPickle
savefilePath = 'path/to/file'
with open(savefilePath) as savefile:
myBigList = cPickle.load(savefile)
Take a look pickle Object serialization. With pickle you can serialize your list and then save it to a text file. Later you can 'unpickle' the data from the text file. The data will be unpickled to a list and you can use it again in python. #inspectorG4dget beat me to the answer so take a look at.
While pickle is certainly a good option, for this particular question I would prefer simply saving it into a csv or just plain txt file with 16 columns using numpy.
import numpy as np
# here I use list of 3 lists as an example
nlist = 3
# generating fake data `listoflists`
listoflists = []
for i in xrange(3) :
listoflists.append([i]*500)
# save it into a numpy array
outarr = np.vstack(listoflists)
# save it into a file
np.savetxt("test.dat", outarr.T)
I do recommend cPickle in this case, but you should take some "extra" steps:
ZLIB the output.
Encode or encrypt it.
By doing this you have these advantages:
ZLIB will reduce its size.
Encrypting may keep pickling hijacks off.
Yes, pickle is not safe! See this.