I develop an application that will be used for running simulation and optimization over graphs (for instance Travelling salesman problem or various other problems).
Currently I use 2d numpy array as graph representation and always store list of lists and after every load/dump from/into DB I use function np.fromlist, np.tolist() functions respectively.
Is there supported way how could I store numpy ndarray into psql? Unfortunately, np arrays are not JSON-serializable by default.
I also thought to convert numpy array into scipy.sparse matrix, but they are not json serializable either
json.dumps(np_array.tolist()) is the way to convert a numpy array to json. np_array.fromlist(json.loads(json.dumps(np_array.tolist()))) is how you get it back.
Related
In my project, I have requests consisting of a batch of several multidimensional numpy arrays. I wonder if it's possible to set numpy arrays as environment variables to process the upcoming traffic. If not, are there any other encoding methods to convert numpy arrays to the appropriate format that works particularly well in this case?
Short version
Given a built-in quaternion data type, how can I view a numpy array of quaternions as a numpy array of floats with an extra dimension of size 4 (without copying memory)?
Long version
Numpy has built-in support for floats and complex floats. I need to use quaternions -- which generalize complex numbers, but rather than having two components, they have four. There's already a very nice package that uses the C API to incorporate quaternions directly into numpy, which seems to do all the operations perfectly fast. There are a few more quaternion functions that I need to add to it, but I think I can mostly handle those.
However, I would also like to be able to use these quaternions in other functions that I need to write using the awesome numba package. Unfortunately, numba cannot currently deal with custom types. But I don't need the fancy quaternion functions in those numba-ed functions; I just need the numbers themselves. So I'd like to be able to just re-cast an array of quaternions as an array of floats with one extra dimension (of size 4). In particular, I'd like to just use the data that's already in the array without copying, and view it as a new array. I've found the PyArray_View function, but I don't know how to implement it.
(I'm pretty confident the data are held contiguously in memory, which I assume would be required for having a simple view of them. Specifically, elsize = 8*4 and alignment = 8 in the quaternion package.)
Turns out that was pretty easy. The magic of numpy means it's already possible. While thinking about this, I just tried the following with complex numbers:
import numpy as np
a = np.array([1+2j, 3+4j, 5+6j])
a.view(np.float).reshape(a.shape[0],2)
And this gave exactly what I was looking for. Somehow the same basic idea works with the quaternion type. I guess the internals just rely on that elsize, divide by sizeof(float) and use that to set the new size in the last dimension???
To answer my own question then, the same idea can be applied to the quaternion module:
import numpy as np, quaternions
a = np.array([np.quaternion(1,2,3,4), np.quaternion(5,6,7,8), np.quaternion(9,0,1,2)])
a.view(np.float).reshape(a.shape[0],4)
The view transformation and reshaping combined seem to take about 1 microsecond on my laptop, independent of the size of the input array (presumably because there's no memory copying, other than a few members in some basic python object).
The above is valid for simple 1-d arrays of quaternions. To apply it to general shapes, I just write a function inside the quaternion namespace:
def as_float_array(a):
"View the quaternion array as an array of floats with one extra dimension of size 4"
return a.view(np.float).reshape(a.shape+(4,))
Different shapes don't seem to slow the function down significantly.
Also, it's easy to convert back to from a float array to a quaternion array:
def as_quat_array(a):
"View a float array as an array of floats with one extra dimension of size 4"
if(a.shape[-1]==4) :
return a.view(np.quaternion).reshape(a.shape[:-1])
return a.view(np.quaternion).reshape(a.shape[:-1]+(a.shape[-1]//4,))
I generate feature vectors for examples from large amount of data, and I would like to store them incrementally while i am reading the data. The feature vectors are numpy arrays. I do not know the number of numpy arrays in advance, and I would like to store/retrieve them incrementally.
Looking at pytables, I found two options:
Arrays: They require predetermined size and I am not quite sure how
much appending is computationally efficient.
Tables: The column types do not support list or arrays.
If it is a plain numpy array, you should probably use Extendable Arrays (EArray) http://pytables.github.io/usersguide/libref/homogenous_storage.html#the-earray-class
If you have a numpy structured array, you should use a Table.
Can't you just store them into an array? You have your code and it should be a loop that will grab things from the data to generate your examples and then it generates the example. create an array outside the loop and append your vector into the array for storage!
array = []
for row in file:
#here is your code that creates the vector
array.append(vector)
then after you have gone through the whole file, you have an array with all of your generated vectors! Hopefully that is what you need, you were a bit unclear...next time please provide some code.
Oh, and you did say you wanted pytables, but I don't think it's necessary, especially because of the limitations you mentioned
I have some NumPy arrays that are are pickled and stored in MongoDB using the bson module. For instance, if x is a NumPy array, then I set a field of a MongoDB record to:
bson.binary.Binary(x.dumps())
My question is whether it is possible to recover a subset of the array x without reloading the entire array via np.loads(). So, first, how can I get MongoDB to only give me back a chunk of the binary array, and then second, how can I turn that chunk into a NumPy array. I should mention here that I also have all the NumPy metadata regarding the array already, such as it's dimensions and datatype.
A concrete example might be that I have a 2-dimensional array of size (100000,10) with datatype np.float64 and I want to retrieve just x[50,10].
I can not say for sure, but checking the api docs of BSON C++ I get the idea that it was not designed for partial retrieval...
If you can at all, consider using pytables, which is designed for large data and inter-operating nicely with numpy. Mongo is great for certain distributed applications, though, while pytables is not.
If you store the array directly inside of MongoDB, you can also try using the $slice operator to get a contiguous subset of an array. You could linearize your 2D array into an 1D array, and the $slice operator will get you matrix rows, but if you want to select columns or generally select noncontiguous indicies, then you're out of luck.
Background on $slice.
I want to create a MATLAB-like cell array in Numpy. How can I accomplish this?
Matlab cell arrays are most similar to Python lists, since they can hold any object - but scipy.io.loadmat imports them as numpy object arrays - which is an array with dtype=object.
To be honest though you are just as well off using Python lists - if you are holding general objects you will loose almost all of the advantages of numpy arrays (which are designed to hold a sequence of values which each take the same amount of memory).