Save & Retrieve Numpy Array From String - python

I would like to convert a multi-dimensional Numpy array into a string and, later, convert that string back into an equivalent Numpy array.
I do not want to save the Numpy array to a file (e.g. via the savetxt and loadtxt interface).
Is this possible?

You could use np.tostring and np.fromstring:
In [138]: x = np.arange(12).reshape(3,4)
In [139]: x.tostring()
Out[139]: '\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00'
In [140]: np.fromstring(x.tostring(), dtype=x.dtype).reshape(x.shape)
Out[140]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Note that the string returned by tostring does not save the dtype nor the shape of the original array. You have to re-supply those yourself.
Another option is to use np.save or np.savez or np.savez_compressed to write to a io.BytesIO object (instead of a file):
import numpy as np
import io
x = np.arange(12).reshape(3,4)
output = io.BytesIO()
np.savez(output, x=x)
The string is given by
content = output.getvalue()
And given the string, you can load it back into an array using np.load:
data = np.load(io.BytesIO(content))
x = data['x']
This method stores the dtype and shape as well.
For large arrays, np.savez_compressed will give you the smallest string.
Similarly, you could use np.savetxt and np.loadtxt:
import numpy as np
import io
x = np.arange(12).reshape(3,4)
output = io.BytesIO()
np.savetxt(output, x)
content = output.getvalue()
# '0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00\n4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00\n8.000000000000000000e+00 9.000000000000000000e+00 1.000000000000000000e+01 1.100000000000000000e+01\n'
x = np.loadtxt(io.BytesIO(content))
print(x)
Summary:
tostring gives you the underlying data as a string, with no dtype or
shape
save is like tostring except it also saves dtype and shape (.npy format)
savez saves the array in npz format (uncompressed)
savez_compressed saves the array in compressed npz format
savetxt formats the array in a humanly readable format

If you want to save the dtype as well you can also use the pickle module from python.
import pickle
import numpy as np
a = np.ones(4)
string = pickle.dumps(a)
pickle.loads(string)

np.tostring and np.fromstring does NOT work anymore. They use np.tobyte but it parses the np.array as bytes and not string. To do that use ast.literal_eval.
if elements of lists are 2D float. ast.literal_eval() cannot handle a lot very complex list of list of nested list while retrieving back.
Therefore, it is better to parse list of list as dict and dump the string.
while loading a saved dump, ast.literal_eval() handles dict as strings in a better way. convert the string to dict and then dict to list of list
k = np.array([[[0.09898942, 0.22804536],[0.06109612, 0.19022354],[0.93369348, 0.53521671],[0.64630094, 0.28553219]],[[0.94503154, 0.82639528],[0.07503319, 0.80149062],[0.1234832 , 0.44657691],[0.7781163 , 0.63538195]]])
d = dict(enumerate(k.flatten(), 1))
d = str(d) ## dump as string (pickle and other packages parse the dump as bytes)
m = ast.literal_eval(d) ### convert the dict as str to dict
m = np.fromiter(m.values(), dtype=float) ## convert m to nparray

I use JSON to do that:
1. Encode to JSON
The first step is to encode it to JSON:
import json
import numpy as np
np_array = np.array(
[[[0.2123842 , 0.45560746, 0.23575005, 0.40605248],
[0.98393952, 0.03679023, 0.6192098 , 0.00547201],
[0.13259942, 0.69461942, 0.8781533 , 0.83025555]],
[[0.8398132 , 0.98341709, 0.25296835, 0.84055815],
[0.27619265, 0.55544911, 0.56615598, 0.058845 ],
[0.76205113, 0.18001961, 0.68206229, 0.47252472]]])
json_array = json.dumps(np_array.tolist())
print("converted to: " + str(type(json_array)))
print("looks like:")
print(json_array)
Which results in this:
converted to: <class 'str'>
looks like:
[[[0.2123842, 0.45560746, 0.23575005, 0.40605248], [0.98393952, 0.03679023, 0.6192098, 0.00547201], [0.13259942, 0.69461942, 0.8781533, 0.83025555]], [[0.8398132, 0.98341709, 0.25296835, 0.84055815], [0.27619265, 0.55544911, 0.56615598, 0.058845], [0.76205113, 0.18001961, 0.68206229, 0.47252472]]]
2. Decode back to Numpy
To convert it back to a numpy array you can use:
list_from_json = json.loads(json_array)
np.array(list_from_json)
print("converted to: " + str(type(list_from_json)))
print("converted to: " + str(type(np.array(list_from_json))))
print(np.array(list_from_json))
Which give you:
converted to: <class 'list'>
converted to: <class 'numpy.ndarray'>
[[[0.2123842 0.45560746 0.23575005 0.40605248]
[0.98393952 0.03679023 0.6192098 0.00547201]
[0.13259942 0.69461942 0.8781533 0.83025555]]
[[0.8398132 0.98341709 0.25296835 0.84055815]
[0.27619265 0.55544911 0.56615598 0.058845 ]
[0.76205113 0.18001961 0.68206229 0.47252472]]]
I like this method because the string is easy to read and, although for this case you didn't need storing it in a file or something, this can be done as well with this format.

Related

give a key value to an np array

I have an array that is composed of multiple np arrays. I want to give every array a key and convert it to an HDF5 file
arr = np.concatenate((Hsp_data, Hsp_rdiff, PosC44_WKS, PosX_WKS, PosY_WKS, PosZ_WKS,
RMS_Acc_HSp, RMS_Acc_Rev, RMS_Schall, Rev_M, Rev_rdiff, X_rdiff, Z_I, Z_rdiff, time), axis=1)
d1 = np.random.random(size=(7501, 15))
hf = h5py.File('data.hdf5', 'w')
hf.create_dataset('arr', data=d1)
hf.close()
hf = h5py.File('data.hdf5', 'r+')
print(hf.key)
This what I have done so far and I get this error AttributeError: 'File' object has no attribute 'key'.
I want the final answer to be like this when printing the keys
<KeysViewHDF5 ['Hsp_M', 'Hsp_rdiff', 'PosC44_WKS', 'PosX_WKS', 'PosY_WKS', 'PosZ_WKS', 'RMS_Acc_HSp', 'RMS_Acc_Rev', 'RMS_Schall', 'Rev_M', 'Rev_rdiff', 'X_rdiff', 'Z_I', 'Z_rdiff']>
any ideas?
You/we need a clearer idea of how the original .mat is laid out. In h5py, the file is viewed as a nested set of groups, which are dict like. Hence the use of keys(). At the ends of that nesting are datasets which can be loaded (or saved from) as numpy arrays. The datasets/arrays don't have keys; it's the file and groups that have those.
Creating your file:
In [69]: import h5py
In [70]: d1 = np.random.random(size=(7501, 15))
...: hf = h5py.File('data.hdf5', 'w')
...: hf.create_dataset('arr', data=d1)
...: hf.close()
Reading it:
In [71]: hf = h5py.File('data.hdf5', 'r+')
In [72]: hf.keys()
Out[72]: <KeysViewHDF5 ['arr']>
In [73]: hf['arr']
Out[73]: <HDF5 dataset "arr": shape (7501, 15), type "<f8">
In [75]: arr = hf['arr'][:]
In [76]: arr.shape
Out[76]: (7501, 15)
'arr' is the name of the dataset that we created at the start. In this case there's no group; just the one dataset. [75] loads the dataset to an array which I called arr, but that name could be anything (like the original d1).
Arrays and datasets may have a compound dtype, which has named fields. I don't know if MATLAB uses those or not.
Without knowledge of the group and dataset layout in the original .mat, it's hard to help you. And when looking at datasets, pay particular attention to shape and dtype.

numpy savetxt: how to save an integer and a float numpy array into the save row of the file

I have a set of integers and a set of numpy arrays, which I would like to use np.savetxt to store the corresponding integer and the array into the same row, and rows are separated by \n.
In the txt file, each row should look like the following:
12345678 0.282101 -0.343122 -0.19537 2.001613 1.034215 0.774909 0.369273 0.219483 1.526713 -1.637871
The float numbers should separated by space
I try to use the following code to solve this
np.savetxt("a.txt", np.column_stack([ids, a]), newline="\n", delimiter=' ',fmt='%d %.06f')
But somehow I cannot figure out the correct formating for integer and floats.
Any suggestions?
Please specify what a "set of integers" and "set of numpy arrays" are: from your example it looks as though ids is a list or 1d numpy array, and a is a 2d numpy array, but this is not clear from your question.
If you're trying to combine a list of integers with a 2d array, you should probably avoid np.savetxt and convert to a string first:
import numpy as np
ids = [1, 2, 3, 4, 5]
a = np.random.rand(5, 5)
with open("filename.txt", "w") as f:
for each_id, row in zip(ids, a):
line = "%d " %each_id + " ".join(format(x, "0.8f") for x in row) + "\n"
f.write(line)
Gives the output in filename.txt:
1 0.38325380 0.80964789 0.83787527 0.83794886 0.93933360
2 0.44639702 0.91376799 0.34716179 0.60456704 0.27420285
3 0.59384528 0.12295988 0.28452126 0.23849965 0.08395266
4 0.05507753 0.26166780 0.83171085 0.17840250 0.66409724
5 0.11363045 0.40060894 0.90749637 0.17903019 0.15035594

Python: save csv with numpy and custom format in row's columns

I'm having a problem trying to save an array of arrays as csv using numpy.
The array of arrays:
a = numpy.asarray(output)
print(a)
it outputs (i kept only one item in the array, for the example):
[
['0.014897877' '-0.08609378' '-0.17572552' '0.14012611' '-0.034625955'
'-0.008969465' '-0.020985316' '-0.13464274' '-0.0643522' '0.001874223'
'0.093106195' '0.01346092' '0.17929164' '-0.019824918' '0.125187'
'-0.030954123' '0.06412735' '-0.025960516' '-0.14795333' '0.0026848102'
'-0.15260652' '0.033525705' '0.03411285' '-0.1506365' '-0.028831841'
'-0.07956695' '0.15328659' '0.0019746106' '0.0031366237' '0.07046314'
'0.052344024' '0.05874188' '0.09005664' '0.068730354' '-0.08379446'
'0.012004613' '0.10616668' '0.03131739' '-0.07437438' '-0.1299052'
'0.15480998' '0.11262169' '0.032479584' '-0.08733604' '-0.016337194'
'-0.17954016' '0.086337775' '-0.06776995' '0.10646294' '0.10496249'
'0.004988468' '-0.11673802' '-0.0628141' '0.096142575' '0.03181175'
'0.008554184' '-0.123010606' '0.0027755483' '-0.04792862' '-0.11383578'
'-0.0071639013' '-0.012682551' '0.04330155' '0.13239346' '0.06173887'
'0.04698543' '-0.10461798' '0.10343763' '-0.14041597' '0.04108579'
'-0.0041574505' '0.06904513' '0.06497475' '-0.054304164' '0.11304527'
'0.016850471' '0.008820267' '-0.056193784' '-0.021828642'
'-0.016804473' '-0.15866709' '0.14507978' '-0.033435807' '0.1639024'
'0.069541104' '0.01782177' '0.0115199955' '0.016909525' '-0.050565705'
'-0.16228318' '-0.028010793' '0.01789277' '0.09902625' '-0.00567781'
'0.09101357' '-0.01005199' '-0.01796569' '0.13880818' '-0.059297908'
'-0.120813474' '0.05444644' '0.037819214' '-0.029543832' '0.0038782186'
'0.13723414' '0.17920697' '0.036572605' '-0.06763435' '-0.01860031'
'-0.021825034' '-0.025996473' '0.1434528' '-0.08229978'
'-0.00014224448' '-0.058567036' '0.023525652' '-0.18324575'
'-0.11447681' '-0.063667014' '0.07024462' '-0.010663624' '-0.043397065'
'0.24583343' '0.0026523445' '0.006540918' '-0.005765302' '0.008451278'
'0.2030176' '5ca5204047427b2e5b29a5e2']
]
When I try to save using these format (128 float columns and the last as string)
numpy.savetxt("foo.csv", a, delimiter=",", fmt=''.join(['%1.20f,']*128) + '%s')
the console outputs:
numpy.savetxt("foo.csv", a, delimiter=",", fmt=''.join(['%1.20f,']*128) + '%s')
TypeError: Mismatch between array dtype ('<U32') and format specifier ('%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%1.20f,%s')
Any help?

How to save and load a dictionary of scipy sparse csr matrices?

I have a dict of scipy.sparse.csr_matrix objects as values, with integer keys. How can I save this in a separate file?
If I had a regular ndarray for each entry, then I could serialize it with json, but when I try this with a sparse matrix:
with open('filename.txt', 'w') as f:
f.write(json.dumps(the_matrix))
I get a TypeError:
TypeError: <75x75 sparse matrix of type '<type 'numpy.int64'>'
with 10 stored elements in Compressed Sparse Row format> is not JSON serializable
How can I save my dictionary with keys that are integers and values that are sparse csr matrices?
I faced this same issue trying to save a dictionary whose values are csr_matrix. Dumped it to disk using pickle. file handler should be opened in "wb" mode.
import pickle
pickle.dump(csr_dict_obj, open("csr_dict.pkl","wb"))
load the dict back using.
csr_dict = pickle.load(open("csr_dict.pkl","rb"))
Newer scipy versions have a scipy.sparse.save_npz function (and corresponding load). It saves the attributes of a sparse matrix to a numpy savez zip archive. In the case of a csr is saves the data, indices and indptr arrays, plus shape.
scipy.io.savemat can save a sparse matrix in a MATLAB compatible format (csc). There are one or two other scipy.io formats that can handle sparse matrices, but I haven't worked with them.
While a sparse matrix contains numpy arrays it isn't an array subclass, so the numpy functions can't be used directly.
The pickle method for numpy arrays is its np.save. And an array that contains objects, uses pickle (if possible). So a pickle of a dictionary of arrays should work.
The sparse dok format is a subclass of dict, so might be pickleable. It might even work with json. But I haven't tried it.
By the way, a plain numpy array can't be jsoned either:
In [427]: json.dumps(np.arange(5))
TypeError: array([0, 1, 2, 3, 4]) is not JSON serializable
In [428]: json.dumps(np.arange(5).tolist())
Out[428]: '[0, 1, 2, 3, 4]'
dok doesn't work either. The keys are tuples of indices,
In [433]: json.dumps(M.todok())
TypeError: keys must be a string
MatrixMarket is a text format that handles sparse:
In [444]: io.mmwrite('test.mm', M)
In [446]: cat test.mm.mtx
%%MatrixMarket matrix coordinate integer general
%
1 5 4
1 2 1
1 3 2
1 4 3
1 5 4
import numpy as np
from scipy.sparse import lil_matrix, csr_matrix, issparse
import re
def save_sparse_csr(filename, **kwargs):
arg_dict = dict()
for key, value in kwargs.items():
if issparse(value):
value = value.tocsr()
arg_dict[key+'_data'] = value.data
arg_dict[key+'_indices'] = value.indices
arg_dict[key+'_indptr'] = value.indptr
arg_dict[key+'_shape'] = value.shape
else:
arg_dict[key] = value
np.savez(filename, **arg_dict)
def load_sparse_csr(filename):
loader = np.load(filename)
new_d = dict()
finished_sparse_list = []
sparse_postfix = ['_data', '_indices', '_indptr', '_shape']
for key, value in loader.items():
IS_SPARSE = False
for postfix in sparse_postfix:
if key.endswith(postfix):
IS_SPARSE = True
key_original = re.match('(.*)'+postfix, key).group(1)
if key_original not in finished_sparse_list:
value_original = csr_matrix((loader[key_original+'_data'], loader[key_original+'_indices'], loader[key_original+'_indptr']),
shape=loader[key_original+'_shape'])
new_d[key_original] = value_original.tolil()
finished_sparse_list.append(key_original)
break
if not IS_SPARSE:
new_d[key] = value
return new_d
You can write a wrapper as shown above.

How to set precision of numpy float array when converting from string array?

I am using the following link to convert a array of string to array of float
Convert String to float array
The data that I am getting is in a weird format
535. 535. 535. 534.68 534.68 534.68
Although numpy is able to convert the string array to float but some other is failing when data is in the format 535.
Is there a way to convert all 535. to 535.00 in one go.
I am using the following code for conversions
import numpy as np
strarray = ["535.","535.","534.68"]
floatarray = np.array(filter(None,strarray),dtype='|S10').astype(np.float)
print floatarray
Convert the the strings to float128.
Try this:
import numpy as np
strarray = ["535.","535.","534.68"]
floatarray = np.array(filter(None,strarray),dtype='|S10').astype(np.float128)
print floatarray
Output:
[ 535.0 535.0 534.68]
Or use the recommended longdouble:
import numpy as np
strarray = ["535.","535.","534.68"]
floatarray = np.array(filter(None,strarray),dtype='|S10').astype(np.longdouble)
print floatarray
Output:
[ 535.0 535.0 534.68]

Categories