In my project, I have requests consisting of a batch of several multidimensional numpy arrays. I wonder if it's possible to set numpy arrays as environment variables to process the upcoming traffic. If not, are there any other encoding methods to convert numpy arrays to the appropriate format that works particularly well in this case?
Related
I am trying to convert Numpy arrays that are 2D grids varying in time in a HDF5 format for several cases so for example the Numpy array has the following aspects: Case Number (0-100), Time (0-200years), X-grid point location (0-100m), y-grid point location (0-20m) plus the actual data point at this location (e.g. Saturation ranging from 0-100%). I am finding a bit difficult to efficiently store in HDF5 format. Its supposed to be used later to train an RNN model.
I tried just assigning a Numpy to an HDF5 format (don't know if it worked as I didn't retrieve it). I was also confused about the different types of storage options for such a case and the best way to store it such that its easily retrievable to train a NN. I need to use HDF5 format as it seems to optimize the use/retrieval of large data as in the current case..I was also trying to find the best way to learn HDF5 format.. Thank you!
import h5py
import numpy as np
# Create a numpy array
arr = np.random.rand(3,3)
# Create a HDF5 file
with h5py.File('mydata.h5', 'w') as f:
# Write the numpy array to the HDF5 file
f.create_dataset('mydata', data=arr)
You can also use h5py library to append the data to existing hdf5 file instead of creating new one.
I develop an application that will be used for running simulation and optimization over graphs (for instance Travelling salesman problem or various other problems).
Currently I use 2d numpy array as graph representation and always store list of lists and after every load/dump from/into DB I use function np.fromlist, np.tolist() functions respectively.
Is there supported way how could I store numpy ndarray into psql? Unfortunately, np arrays are not JSON-serializable by default.
I also thought to convert numpy array into scipy.sparse matrix, but they are not json serializable either
json.dumps(np_array.tolist()) is the way to convert a numpy array to json. np_array.fromlist(json.loads(json.dumps(np_array.tolist()))) is how you get it back.
I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?
Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.
Suppose I save a numpy array to a file, "arr.npy", using numpy.save() and that I do this using a particular python version, numpy version, and OS.
Can I load, using numpy.load(), arr.npy on a different OS using a different version of python or numpy? Are there any restrictions, such as backwards compatibility?
Yes. The .npy format is documented here:
https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#module-numpy.lib.format
Note this comment in the source code (emphasis mine):
The .npy format is the standard binary file format in NumPy for
persisting a single arbitrary NumPy array on disk. The format stores
all of the shape and dtype information necessary to reconstruct the
array correctly even on another machine with a different architecture.
I have some NumPy arrays that are are pickled and stored in MongoDB using the bson module. For instance, if x is a NumPy array, then I set a field of a MongoDB record to:
bson.binary.Binary(x.dumps())
My question is whether it is possible to recover a subset of the array x without reloading the entire array via np.loads(). So, first, how can I get MongoDB to only give me back a chunk of the binary array, and then second, how can I turn that chunk into a NumPy array. I should mention here that I also have all the NumPy metadata regarding the array already, such as it's dimensions and datatype.
A concrete example might be that I have a 2-dimensional array of size (100000,10) with datatype np.float64 and I want to retrieve just x[50,10].
I can not say for sure, but checking the api docs of BSON C++ I get the idea that it was not designed for partial retrieval...
If you can at all, consider using pytables, which is designed for large data and inter-operating nicely with numpy. Mongo is great for certain distributed applications, though, while pytables is not.
If you store the array directly inside of MongoDB, you can also try using the $slice operator to get a contiguous subset of an array. You could linearize your 2D array into an 1D array, and the $slice operator will get you matrix rows, but if you want to select columns or generally select noncontiguous indicies, then you're out of luck.
Background on $slice.