Converting numpy arrays exported as text into numpy arrays again

Converting numpy arrays exported as text into numpy arrays again - python

I copied and pasted the output of my code into a text file for later use. This output is a dictionary in which some of the values are numpy arrays but these were copied into the text file as e.g. "key": array([0]).
When I copy and paste back into the IPython console I get the following error: NameError: name 'array' is not defined.
I want to recover the entire dictionary with these numpy arrays converted back to numpy objects to keep using the data. There are several layers of dictionaries stored as values of the "parent" dictionary, many dictionaries per layer and many of these arrays in each dictionary.
Is there any way to recover this dictionary? How would you recommend I save objects for another session the next time?

If you need to recover the output of your previous calculation, what you could do is one of the following:
from numpy import array
do a replace all on your text file array -> numpy.array
Then you pass the text to eval (if you are doing this directly into the command line or you are copy/pasting the data from your file you can skip the eval altogheter. This is useful if you have your data stored inside a string, e.g., after reading it from the file within python)
from numpy import array
a="""
{
'1':array([0]),
'2':'some random text',
'3':123,
'4':{
'4.1':array([1,2,3]),
'4.2':{
'4.2.1':'more nested stuff'
}
}
}
"""
b = eval(a)
print(b)
# {'1': array([0]), '2': 'some random text', '3': 123, '4': {'4.1': array([1, 2, 3]), '4.2': {'4.2.1': 'more nested stuff'}}}
As a side-note, never run eval on outputs from sources other than yourself.
This is literally executing text as python code and is obviously very vulnerable to malicious stuff.
A more secure way would be to use ast.literal_eval from ast. The problem being in this case that for safety reasons, it will always handle python built-ins, which does not include numpy.
Regarding other way to store your data, as suggested in the comments, pickle might do it for you
fname = 'output.pickle'
import pickle
# Sava data into file
with open(fname, 'wb') as f:
pickle.dump(b, f)
# Restore data from file
with open(fname, 'rb') as f:
c = pickle.load(f)
print(c)
# {'1': array([0]), '2': 'some random text', '3': 123, '4': {'4.1': array([1, 2, 3]), '4.2': {'4.2.1': 'more nested stuff'}}}

Related

Python throws KeyError: 1 when using a Dictionary restored from a file

In my program, I have certain settings that can be modified by the user, saved on the disk, and then loaded when application is restarted. Some these settings are stored as dictionaries. While trying to implement this, I noticed that after a dictionary is restored, it's values cannot be used to access values of another dictionary, because it throws a KeyError: 1 exception.
This is a minimal code example that ilustrates the issue:
import json
motorRemap = {
1: 3,
2: 1,
3: 6,
4: 4,
5: 5,
6: 2,
}
motorPins = {
1: 6,
2: 9,
3: 10,
4: 11,
5: 13,
6: 22
}
print(motorPins[motorRemap[1]]); #works correctly
with open('motorRemap.json', 'w') as fp:
json.dump(motorRemap, fp)
with open('motorRemap.json', 'r') as fp:
motorRemap = json.load(fp)
print(motorPins[motorRemap[1]]); #throws KeyError: 1
You can run this code as it is. First print statement works fine, but after the first dictionary is saved and restored, it doesn't work anymore. Apparently, saving/restoring somehow breaks that dictionary.
I have tried saving and restoring with json and pickle libraries, and both produce in the same error. I tried printing values of the first dictionary after it is restored directly ( print(motorRemap[1]), and it prints out correct values without any added spaces or anything. KeyError usually means that the specified key doesn't exist in the dictionary, but in this instance print statement shows that it does exist - unless some underlying data types have changed or something. So I am really puzzled as to why this is happening.
Can anyone help me understand what is causing this issue, and how to solve it?

What happens becomes clear when you look at what json.dump wrote into motorRemap.json:
{"1": 3, "2": 1, "3": 6, "4": 4, "5": 5, "6": 2}
Unlike Python, json can only use strings as keys. Python, on the other hand, allows many different types for dictionary keys, including booleans, floats and even tuples:
my_dict = {False: 1,
3.14: 2,
(1, 2): 3}
print(my_dict[False], my_dict[3.14], my_dict[(1, 2)])
# Outputs '1 2 3'
The json.dump function automatically converts some of these types to string when you try to save the dictionary to a json file. False becomes "false", 3.14 becomes "3.14" and, in your example, 1 becomes "1". (This doesn't work for the more complex types such as a tuple. You will get a TypeError if you try to json.dump the above dictionary where one of the keys is (1, 2).)
Note how the keys change when you dump and load a dictionary with some of the Python-specific keys:
import json
my_dict = {False: 1,
3.14: 2}
print(my_dict[False], my_dict[3.14])
with open('my_dict.json', 'w') as fp:
json.dump(my_dict, fp)
# Writes {"false": 1, "3.14": 2} into the json file
with open('my_dict.json', 'r') as fp:
my_dict = json.load(fp)
print(my_dict["false"], my_dict["3.14"])
# And not my_dict[False] or my_dict[3.14] which raise a KeyError
Thus, the solution to your issue is to access the values using strings rather than integers after you load the dictionary from the json file.
print(motorPins[motorRemap["1"]]) instead of your last line will fix your code.
From a more general perspective, it might be worth considering keeping the keys as strings from the beginning if you know you will be saving the dictionary into a json file. You could also convert the values back to integers after loading as discussed here; however, that can lead to bugs if not all the keys are integers and is not a very good idea in bigger scale.
Checkout pickle if you want to save the dictionary keeping the Python format. It is, however, not human-readable unlike json and it's also Python-specific so it cannot be used to transfer data to other languages, missing virtually all the main benefits of json.
If you want to save and load the dictionary using pickle, this is how you would do it:
# import pickle
...
with open('motorRemap.b', 'wb') as fp:
pickle.dump(motorRemap, fp)
with open('motorRemap.b', 'rb') as fp:
motorRemap = pickle.load(fp)
...

since the keys (integers) from a dict will be written to the json file as strings, we can modify the reading of the json file. using a dict comprehension restores the original dict values:
...
with open('motorRemap.json', 'r') as fp:
motorRemap = {int(item[0]):item[1] for item in json.load(fp).items()}
...

Possible to store multiple data structures in one file for save and load (python)?

I want to write an array and a dictionary to a file (and possible more), and then be able to read the file later and recreate the array and dictionary from the file. Is there a reasonable way to do this in Python?

I recommend you use shelve (comes with python). For example:
import shelve
d = shelve.open('file.txt') # in this file you will save your variables
d['mylist'] = [1, 2, 'a'] # thats all, but note the name for later.
d['mydict'] = {'a':1, 'b':2}
d.close()
To read values:
import shelve
d = shelve.open('file.txt')
my_list = d['mylist'] # the list is read from disk
my_dict = d['mydict'] # the dict is read from disk
If you are going to be saving numpy arrays then I recommend you use joblib which is optimized for this use case.

Pickle would be one way to go about it (it is in the standard library).
import pickle
my_dict = {'a':1, 'b':2}
# write to file
pickle.dump(my_dict, open('./my_dict.pkl', 'wb'))
#load from file
my_dict = pickle.load(open('./my_dict.pkl', 'rb'))
And for the array you can use the ndarray.dump() method in numpy, which is more efficient for large arrays.
import numpy as np
my_ary = np.array([[1,2], [3,4]])
my_ary.dump( open('./my_ary.pkl', 'wb'))
But you can of course also put everything into the same pickle file or use shelve (which uses pickle) like suggested in the other answer.

The pickle format blurs the line between data and code and I don't like using it except when I'm the only writer and reader of the data in question and when I'm sure that it's not been tampered with.
If your data structure is just non sequence types, dicts and lists, you can serialise it into json using the json module. This is a pure data format which can be read back reliably. It doesn't handle tuples though but treats them as lists.
Here's an example.
a = [1,2,3,4]
b = dict(lang="python", author="Guido")
import json
with open("data.dump", "r") as f:
x, y = json.load(f)
print x # => [1, 2, 3, 4]
print y # => {u'lang': u'python', u'author': u'Guido'}
It's not totally unchanged but often good enough.

cPickle saved object loaded modified - Python

In my script, I'm trying to save a dictionary using cPickle. Everything works fine except the thing that loaded dictionary has modified each key.
My dictionary looks like: {'a':[45,155856,26,98536], 'b':[88,68,9454,78,4125,52]...}
When I print keys from this dictionary before saving it, it prints correct values: 'a','b'...
But when I save it and then load using cPickle, each key contains '\r' after correct char: 'a\r','b\r'...
Here is the code for saving:
def saveSuffixArrayDictA():
for i in self.creation.dictA.keys():
print len(i)
print 'STOP'
with open('dictA','w+') as f:
pickle.dump(self.creation.dictA,f)
Which prints: 1,1,1,1,1,1....STOP (with newlines of course)
Then, when I'm trying to load it using this:
#staticmethod
def dictA():
with open('ddictA','rb') as f:
dict = pickle.load(f)
for i in dict.keys():
print len(i)
print 'STOP'
return dict
It returns: 2,2,2,2,2,2,2,2...STOP (with newlines of course)
As you can see it should be the same but it isn't... where could be the problem please?
EDIT: I tried to print values and realized that each item in list (list is value) has added 'L' at the end of this item which is a number.

Per the docs:
Be sure to always open pickle files created with protocols >= 1 in binary
mode. For the old ASCII-based pickle protocol 0 you can use either text mode
or binary mode as long as you stay consistent. (my emphasis)
Therefore, do not write the pickle file in the text-mode w+, but read it in the binary mode rb.
Instead, use binary modes, wb+ and rb, for both.
When you write in text mode (e.g. w+), \n is mapped to the OS-specific end-of-line character(s). On Windows, \n is mapped to \r\n. That appears to be the source of the errant \rs appearing in the keys.

This is a very strange error and I don't know its reason. But here is another way for saving and loading data structures in python. Just convert your data structure to string using str() and write it to any file. Load the file back, read it in any variable and convert it back to data structure using ast. Demo is given below:
>>> import ast
>>> d={'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10]}
>>> saveDic=str(d)
>>> saveDic
"{'a': [1, 2, 3, 4], 'c': [9, 10], 'b': [5, 6, 7, 8]}"
# save this string to any file, load it back and convert to dictionary using ast
>>> d=ast.literal_eval(saveDic)
>>> d
{'a': [1, 2, 3, 4], 'c': [9, 10], 'b': [5, 6, 7, 8]}

How can I write a list of lists into a txt file?

I have a list of 16 elements, and each element is another 500 elements long. I would like to write this to a txt file so I no longer have to create the list from a simulation. How can I do this, and then access the list again?

Pickle will work, but the shortcoming is that it is a Python-specific binary format. Save as JSON for easy reading and re-use in other applications:
import json
LoL = [ range(5), list("ABCDE"), range(5) ]
with open('Jfile.txt','w') as myfile:
json.dump(LoL,myfile)
The file now contains:
[[0, 1, 2, 3, 4], ["A", "B", "C", "D", "E"], [0, 1, 2, 3, 4]]
To get it back later:
with open('Jfile.txt','r') as infile:
newList = json.load(infile)
print newList

To store it:
import cPickle
savefilePath = 'path/to/file'
with open(savefilePath, 'w') as savefile:
cPickle.dump(myBigList, savefile)
To get it back:
import cPickle
savefilePath = 'path/to/file'
with open(savefilePath) as savefile:
myBigList = cPickle.load(savefile)

Take a look pickle Object serialization. With pickle you can serialize your list and then save it to a text file. Later you can 'unpickle' the data from the text file. The data will be unpickled to a list and you can use it again in python. #inspectorG4dget beat me to the answer so take a look at.

While pickle is certainly a good option, for this particular question I would prefer simply saving it into a csv or just plain txt file with 16 columns using numpy.
import numpy as np
# here I use list of 3 lists as an example
nlist = 3
# generating fake data `listoflists`
listoflists = []
for i in xrange(3) :
listoflists.append([i]*500)
# save it into a numpy array
outarr = np.vstack(listoflists)
# save it into a file
np.savetxt("test.dat", outarr.T)

I do recommend cPickle in this case, but you should take some "extra" steps:
ZLIB the output.
Encode or encrypt it.
By doing this you have these advantages:
ZLIB will reduce its size.
Encrypting may keep pickling hijacks off.
Yes, pickle is not safe! See this.

How to save dictionaries and arrays in the same archive (with numpy.savez)

first question here. I'll try to be concise.
I am generating multiple arrays containing feature information for a machine learning application. As the arrays do not have equal dimensions, I store them in a dictionary rather than in an array. There are two different kinds of features, so I am using two different dictionaries.
I also generate labels to go with the features. These labels are stored in arrays. Additionally, there are strings containing the exact parameters used for running the script and a timestamp.
All in all it looks like this:
import numpy as np
feature1 = {}
feature2 = {}
label1 = np.array([])
label2 = np.array([])
docString = 'Commands passed to the script were...'
# features look like this:
feature1 = {'case 1': np.array([1, 2, 3, ...]),
'case 2': np.array([2, 1, 3, ...]),
'case 3': np.array([2, 3, 1, ...]),
and so on... }
Now my goal would be to do this:
np.savez(outputFile,
saveFeature1 = feature1,
saveFeature2 = feature2,
saveLabel1 = label1,
saveLabel2 = label2,
saveString = docString)
This seemingly works (i.e. such a file is saved with no error thrown and can be loaded again). However, when I try to load for example the feature from the file again:
loadedArchive = np.load(outFile)
loadedFeature1 = loadedArchive['saveFeature1']
loadedString = loadedArchive['saveString']
Then instead of getting a dictionary back, I get a numpy array of shape (0) where I don't know how to access the contents:
In []: loadedFeature1
Out[]:
array({'case 1': array([1, 2, 3, ...]),
'case 2': array([2, 3, 1, ...]),
..., }, dtype=object)
Also strings become arrays and get a strange datatype:
In []: loadedString.dtype
Out[]: dtype('|S20')
So in short, I am assuming this is not how it is done correctly. However I would prefer not to put all variables into one big dictionary because I will retrieve them in another process and would like to just loop over the dictionary.keys() without worrying about string comparison.
Any ideas are greatly appreciated.
Thanks

As #fraxel has already suggested, using pickle is a much better option in this case. Just save a dict with your items in it.
However, be sure to use pickle with a binary protocol. By default, it less efficient format, which will result in excessive memory usage and huge files if your arrays are large.
saved_data = dict(outputFile,
saveFeature1 = feature1,
saveFeature2 = feature2,
saveLabel1 = label1,
saveLabel2 = label2,
saveString = docString)
with open('test.dat', 'wb') as outfile:
pickle.dump(saved_data, outfile, protocol=pickle.HIGHEST_PROTOCOL)
That having been said, let's take a look at what's happening in more detail for illustrative purposes.
numpy.savez expects each item to be an array. In fact, it calls np.asarray on everything you pass in.
If you turn a dict into an array, you'll get an object array. E.g.
import numpy as np
test = {'a':np.arange(10), 'b':np.arange(20)}
testarr = np.asarray(test)
Similarly, if you make an array out of a string, you'll get a string array:
In [1]: np.asarray('abc')
Out[1]:
array('abc',
dtype='|S3')
However, because of a quirk in the way object arrays are handled, if you pass in a single object (in your case, your dict) that isn't a tuple, list, or array, you'll get a 0-dimensional object array.
This means that you can't index it directly. In fact, doing testarr[0] will raise an IndexError. The data is still there, but you need to add a dimension first, so you have to do yourdictionary = testarr.reshape(-1)[0].
If all of this seems clunky, it's because it is. Object arrays are essentially always the wrong answer. (Although asarray should arguably pass in ndmin=1 to array, which would solve this particular problem, but potentially break other things.)
savez is intended to store arrays, rather than arbitrary objects. Because of the way it works, it can store completely arbitrary objects, but it shouldn't be used that way.
If you did want to use it, though, a quick workaround would be to do:
np.savez(outputFile,
saveFeature1 = [feature1],
saveFeature2 = [feature2],
saveLabel1 = [label1],
saveLabel2 = [label2],
saveString = docString)
And you'd then access things with
loadedArchive = np.load(outFile)
loadedFeature1 = loadedArchive['saveFeature1'][0]
loadedString = str(loadedArchive['saveString'])
However, this is clearly much more clunky than just using pickle. Use numpy.savez when you're just saving arrays. In this case, you're saving nested data structures, not arrays.

If you need to save your data in a structured way, you should consider using the HDF5 file format (http://www.hdfgroup.org/HDF5/). It is very flexible, easy to use, efficient, and other software might already support it (HDFView, Mathematica, Matlab, Origin..). There is a simple python binding called h5py.
You can store datasets in a filesystem like structure and define attributes for each dataset, like a dictionary. For example:
import numpy as np
import h5py
# some data
table1 = np.array([(1,1), (2,2), (3,3)], dtype=[('x', float), ('y', float)])
table2 = np.ones(shape=(3,3))
# save to data to file
h5file = h5py.File("test.h5", "w")
h5file.create_dataset("Table1", data=table1)
h5file.create_dataset("Table2", data=table2, compression=True)
# add attributes
h5file["Table2"].attrs["attribute1"] = "some info"
h5file["Table2"].attrs["attribute2"] = 42
h5file.close()
Reading the data is also simple, you can even load just a few elements out of a large file if you want:
h5file = h5py.File("test.h5", "r")
# read from file (numpy-like behavior)
print h5file["Table1"]["x"][:2]
# read everything into memory (real numpy array)
print np.array(h5file["Table2"])
# read attributes
print h5file["Table2"].attrs["attribute1"]
More features and possibilities are found in the documentation and on the websites (the Quick Start Guide might be of interest).

2022 Update
There is a much simpler solution to this question using Numpy's np.load(..., allow_pickle=True).
I first save an npz file as described in the question.
import numpy as np
feature1 = {'case 1': np.arange(2), 'case 2': np.arange(3)}
feature2 = {'case 3': np.arange(4), 'case 3': np.arange(5)}
label1 = np.arange(6)
label2 = np.arange(7)
docstring = 'Commands passed to the script were...'
np.savez('test', feature1=feature1, feature2=feature2,
label1=label1, label2=label2, docstring=docstring)
Now one can read the file as follows
data = np.load('test.npz', allow_pickle=True)
# This is a structured array: extract the dict
feature1 = data["feature1"].item()
print("feature1 =", feature1)
# This is a normal array already
label1 = data["label1"]
print("label1 =", label1)
It produces the following
feature1 = {'case 1': array([0, 1]), 'case 2': array([0, 1, 2])}
label1 = [0 1 2 3 4 5]

Put all your variables into an object and then use Pickle. It's a better way to store state information.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting numpy arrays exported as text into numpy arrays again - python

Related

Python throws KeyError: 1 when using a Dictionary restored from a file

Possible to store multiple data structures in one file for save and load (python)?

cPickle saved object loaded modified - Python

How can I write a list of lists into a txt file?

How to save dictionaries and arrays in the same archive (with numpy.savez)

Categories

Resources