Writing functions to work on vectors and matrices - python

As part of a larger function, I'm writing some code to generate a vector/matrix (depending on the input) containing the mean value of each column of the input vector/matrix 'x'. These values are stored in a vector/matrix of the same shape as the input vector.
My preliminary solution for it to work on both a 1-D and matrix arrays is very(!) messy:
# 'x' is of type array and can be a vector or matrix.
import scipy as sp
shp = sp.shape(x)
x_mean = sp.array(sp.zeros(sp.shape(x)))
try: # if input is a matrix
shp_range = range(shp[1])
for d in shp_range:
x_mean[:,d] = sp.mean(x[:,d])*sp.ones(sp.shape(z))
except IndexError: # error occurs if the input is a vector
z = sp.zeros((shp[0],))
x_mean = sp.mean(x)*sp.ones(sp.shape(z))
Coming from a MATLAB background, this is what it would look like in MATLAB:
[R,C] = size(x);
for d = 1:C,
xmean(:,d) = zeros(R,1) + mean(x(:,d));
end
This works on both vectors as well as matrices without errors.
My question is, how can I make my python code work on input of both vector and matrix format without the (ugly) try/except block?
Thanks!

You don't need to distinguish between vectors and matrices for the mean calculation itself - if you use the axis parameter Numpy will perform the calculation along the vector (for vectors) or columns (for matrices). And then to construct the output, you can use a good old-fashioned list comprehension, although it might be a bit slow for huge matrices:
import numpy as np
m = np.mean(x,axis=0) # For vector x, calculate the mean. For matrix x, calculate the means of the columns
x_mean = np.array([m for k in x]) # replace elements for vectors or rows for matrices
Creating the output with a list comprehension is slow because it has to allocate memory twice - once for the list and once for the array. Using np.repeat or np.tile would be faster, but acts funny for vector inputs - the output will be a nested matrix with a 1-long vector in each row. If speed matters more than elegance you can replace the last line with this if:
if len(x.shape) == 1:
x_mean = m*np.ones(len(x))
else:
x_mean = np.tile(m, (x.shape[1],1))
By the way, your Matlab code behaves differently for row vectors and column vectors (try running it with x and x').

First A quick note about broadcasting in numpy. Broadcasting was kinda confusing to me when I switched over from matlab to python, but once I took the time to understand it I realized how useful it could be. To learn more about broadcasting take a look at http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html,
Because of broadcasting an (m,) array in numpy (what you're calling a vector) is essentially equivelent to an (1, m) array or (1, 1, m) array and so on. It seems like you want to have an (m,) array behave like a (m, 1) array. I believe this happens sometimes, especially in the linalg module, but if you're going to do it you should know that you're breaking the numpy convention.
With that warning there's the code:
import scipy as sp
def my_mean(x):
if x.ndim == 1:
x = x[:, sp.newaxis]
m = sp.empty(x.shape)
m[:] = x.mean(0)
return sp.squeeze(m)
and an example:
In [6]: x = sp.arange(30).reshape(5,6)
In [7]: x
Out[7]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
In [8]: my_mean(x)
Out[8]:
array([[ 12., 13., 14., 15., 16., 17.],
[ 12., 13., 14., 15., 16., 17.],
[ 12., 13., 14., 15., 16., 17.],
[ 12., 13., 14., 15., 16., 17.],
[ 12., 13., 14., 15., 16., 17.]])
In [9]: my_mean(x[0])
Out[9]: array([ 2.5, 2.5, 2.5, 2.5, 2.5, 2.5])
This is faster than using tile, the timing is bellow:
In [1]: import scipy as sp
In [2]: x = sp.arange(30).reshape(5,6)
In [3]: m = x.mean(0)
In [5]: timeit m_2d = sp.empty(x.shape); m_2d[:] = m
100000 loops, best of 3: 2.58 us per loop
In [6]: timeit m_2d = sp.tile(m, (len(x), 1))
100000 loops, best of 3: 13.3 us per loop

Related

Scipy filter with multi-dimensional (or non-scalar) output

Is there a filter similar to ndimage's generic_filter that supports vector output? I did not manage to make scipy.ndimage.filters.generic_filter return more than a scalar. Uncomment the line in the code below to get the error: TypeError: only length-1 arrays can be converted to Python scalars.
I'm looking for a generic filter that process 2D or 3D arrays and returns a vector at each point. Thus the output would have one added dimension. For the example below I'd expect something like this:
m.shape # (10,10)
res.shape # (10,10,2)
Example Code
import numpy as np
from scipy import ndimage
a = np.ones((10, 10)) * np.arange(10)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
def myfunc(x):
r = sum(x)
#r = np.array([1,1]) # uncomment this
return r
res = ndimage.generic_filter(a, myfunc, footprint=footprint)
The generic_filter expects myfunc to return a scalar, never a vector.
However, there is nothing that precludes myfunc from also adding information
to, say, a list which is passed to myfunc as an extra argument.
Instead of using the array returned by generic_filter, we can generate our vector-valued array by reshaping this list.
For example,
import numpy as np
from scipy import ndimage
a = np.ones((10, 10)) * np.arange(10)
footprint = np.array([[1,1,1],
[1,0,1],
[1,1,1]])
ndim = 2
def myfunc(x, out):
r = np.arange(ndim, dtype='float64')
out.extend(r)
return 0
result = []
ndimage.generic_filter(
a, myfunc, footprint=footprint, extra_arguments=(result,))
result = np.array(result).reshape(a.shape+(ndim,))
I think I get what you're asking, but I'm not completely sure how does the ndimage.generic_filter work (how abstruse is the source!).
Here's just a simple wrapper function. This function will take in an array, all the parameters ndimage.generic_filter needs. Function returns an array where each element of the former array is now represented by an array with shape (2,), result of the function is stored as the second element of that array.
def generic_expand_filter(inarr, func, **kwargs):
shape = inarr.shape
res = np.empty(( shape+(2,) ))
temp = ndimage.generic_filter(inarr, func, **kwargs)
for row in range(shape[0]):
for val in range(shape[1]):
res[row][val][0] = inarr[row][val]
res[row][val][1] = temp[row][val]
return res
Output, where res denotes just the generic_filter and res2 denotes generic_expand_filter, of this function is:
>>> a.shape #same as res.shape
(10, 10)
>>> res2.shape
(10, 10, 2)
>>> a[0]
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> res[0]
array([ 3., 8., 16., 24., 32., 40., 48., 56., 64., 69.])
>>> print(*res2[0], sep=", ") #this is just to avoid the vertical default output
[ 0. 3.], [ 1. 8.], [ 2. 16.], [ 3. 24.], [ 4. 32.], [ 5. 40.], [ 6. 48.], [ 7. 56.], [ 8. 64.], [ 9. 69.]
>>> a[0][0]
0.0
>>> res[0][0]
3.0
>>> res2[0][0]
array([ 0., 3.])
Of course you probably don't want to save the old array, but instead have both fields as new results. Except I don't know what exactly you had in mind, if the two values you want stored are unrelated, just add a temp2 and func2 and call another generic_filter with the same **kwargs and store that as the first value.
However if you want an actual vector quantity that is calculated using multiple inarr elements, meaning that the two new created fields aren't independent, you are just going to have to write that kind of a function, one that takes in an array, idx, idy indices and returns a tuple\list\array value which you can then unpack and assign to the result.

Drawing vector addition

Lets say I have the following arrays which contain the X and Y values for a bunch of vectors, respectively:
xdat = np.array([3,2,7,4])
ydat = np.array([2,4,4,9])
Lets say that I wanted to draw the sum total of these vectors (a+b+c+d), not only as a single line from the origin, but drawn sequentially from the sum of each individual vector.
How do I do this?
My idea is to use plt.plot for the values of two new arrays which contain the X and Y coordinates for each start/end point of all the vectors. The specific coordinates would be calculated from xdat and ydat. Assuming this was the most efficient method (without resorting to some easy-to-use function already built into python) how would I code this?
It sounds like you want numpy.cumsum
import numpy as np
xdat = np.array([3,2,7,4])
ydat = np.array([2,4,4,9])
dat = np.vstack((xdat, ydat))
# array([[3, 2, 7, 4],
# [2, 4, 4, 9]])
dat = np.cumsum(dat, axis=1)
# array([[ 3, 5, 12, 16],
# [ 2, 6, 10, 19]], dtype=int32)
# optionally start at 0, 0 (can do this before or after cumsum)
dat = np.hstack([np.zeros((2, 1)), dat])
# array([[ 0., 3., 5., 12., 16.],
# [ 0., 2., 6., 10., 19.]])
I stacked them up for convenience, but you could also run cumsum on the 1-D arrays. The axis argument selects either to run over the whole flattened array (None, the default), or along the n-th axis (row = 0, column = 1)
If you want to plot the X-Y coordinates, I'd do so with plt.plot(*dat), which will unpack the X and Y rows as arguments to plot.

why can't x[:,0] = x[0] for a single row vector?

I'm relatively new to python but I'm trying to understand something which seems basic.
Create a vector:
x = np.linspace(0,2,3)
Out[38]: array([ 0., 1., 2.])
now why isn't x[:,0] a value argument?
IndexError: invalid index
It must be x[0]. I have a function I am calling which calculates:
np.sqrt(x[:,0]**2 + x[:,1]**2 + x[:,2]**2)
Why can't what I have just be true regardless of the input? It many other languages, it is independent of there being other rows in the array. Perhaps I misunderstand something fundamental - sorry if so. I'd like to avoid putting:
if len(x) == 1:
norm = np.sqrt(x[0]**2 + x[1]**2 + x[2]**2)
else:
norm = np.sqrt(x[:,0]**2 + x[:,1]**2 + x[:,2]**2)
everywhere. Surely there is a way around this... thanks.
Edit: An example of it working in another language is Matlab:
>> b = [1,2,3]
b =
1 2 3
>> b(:,1)
ans =
1
>> b(1)
ans =
1
Perhaps you are looking for this:
np.sqrt(x[...,0]**2 + x[...,1]**2 + x[...,2]**2)
There can be any number of dimensions in place of the ellipsis ...
See also What does the Python Ellipsis object do?, and the docs of NumPy basic slicing
It looks like the ellipsis as described by #JanneKarila has answered your question, but I'd like to point out how you might make your code a bit more "numpythonic". It appears you want to handle an n-dimensional array with the shape (d_1, d_2, ..., d_{n-1}, 3), and compute the magnitudes of this collection of three-dimensional vectors, resulting in an (n-1)-dimensional array with shape (d_1, d_2, ..., d_{n-1}). One simple way to do that is to square all the elements, then sum along the last axis, and then take the square root. If x is the array, that calculation can be written np.sqrt(np.sum(x**2, axis=-1)). The following shows a few examples.
x is 1-D, with shape (3,):
In [31]: x = np.array([1.0, 2.0, 3.0])
In [32]: np.sqrt(np.sum(x**2, axis=-1))
Out[32]: 3.7416573867739413
x is 2-D, with shape (2, 3):
In [33]: x = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
In [34]: x
Out[34]:
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
In [35]: np.sqrt(np.sum(x**2, axis=-1))
Out[35]: array([ 3.74165739, 8.77496439])
x is 3-D, with shape (2, 2, 3):
In [36]: x = np.arange(1.0, 13.0).reshape(2,2,3)
In [37]: x
Out[37]:
array([[[ 1., 2., 3.],
[ 4., 5., 6.]],
[[ 7., 8., 9.],
[ 10., 11., 12.]]])
In [38]: np.sqrt(np.sum(x**2, axis=-1))
Out[38]:
array([[ 3.74165739, 8.77496439],
[ 13.92838828, 19.10497317]])
I tend to solve this is by writing
x = np.atleast_2d(x)
norm = np.sqrt(x[:,0]**2 + x[:,1]**2 + x[:,2]**2)
Matlab doesn't have 1D arrays, so b=[1 2 3] is still a 2D array and indexing with two dimensions makes sense. It can be a novel concept for you, but they're quite useful in fact (you can stop worrying whether you need to multiply by the transpose, insert a row or a column in another array...)
By the way, you could write a fancier, more general norm like this:
x = np.atleast_2d(x)
norm = np.sqrt((x**2).sum(axis=1))
The problem is that x[:,0] in Python isn't the same as in Matlab.
If you want to extract the first element in the single row vector you should go with
x[:1]
This is called a "slice". In this example it means that you take everything in the array from the first element to the element with index 1 (not included).
Remember that Python has zero-based numbering.
Another example may be:
x[0:2]
which would return the first and the second element of the array.

Better way to shuffle two numpy arrays in unison

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.
This code works, and illustrates my goals:
def shuffle_in_unison(a, b):
assert len(a) == len(b)
shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
permutation = numpy.random.permutation(len(a))
for old_index, new_index in enumerate(permutation):
shuffled_a[new_index] = a[old_index]
shuffled_b[new_index] = b[old_index]
return shuffled_a, shuffled_b
For example:
>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
[1, 1],
[3, 3]]), array([2, 1, 3]))
However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.
Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.
One other thought I had was this:
def shuffle_in_unison_scary(a, b):
rng_state = numpy.random.get_state()
numpy.random.shuffle(a)
numpy.random.set_state(rng_state)
numpy.random.shuffle(b)
This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.
Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b):
assert len(a) == len(b)
p = numpy.random.permutation(len(a))
return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html
Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays a and b look like this:
a = numpy.array([[[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 6., 7., 8.],
[ 9., 10., 11.]],
[[ 12., 13., 14.],
[ 15., 16., 17.]]])
b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original a and b:
a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).
In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.
This solution could be adapted to the case that a and b have different dtypes.
Very simple solution:
randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way
James wrote in 2015 an sklearn solution which is helpful. But he added a random state variable, which is not needed. In the below code, the random state from numpy is automatically assumed.
X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)
from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]
Shuffle any number of arrays together, in-place, using only NumPy.
import numpy as np
def shuffle_arrays(arrays, set_seed=-1):
"""Shuffles arrays in-place, in the same order, along axis=0
Parameters:
-----------
arrays : List of NumPy arrays.
set_seed : Seed value if int >= 0, else seed is random.
"""
assert all(len(arr) == len(arrays[0]) for arr in arrays)
seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed
for arr in arrays:
rstate = np.random.RandomState(seed)
rstate.shuffle(arr)
And can be used like this
a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])
shuffle_arrays([a, b, c])
A few things to note:
The assert ensures that all input arrays have the same length along
their first dimension.
Arrays shuffled in-place by their first dimension - nothing returned.
Random seed within positive int32 range.
If a repeatable shuffle is needed, seed value can be set.
After the shuffle, the data can be split using np.split or referenced using slices - depending on the application.
you can make an array like:
s = np.arange(0, len(a), 1)
then shuffle it:
np.random.shuffle(s)
now use this s as argument of your arrays. same shuffled arguments return same shuffled vectors.
x_data = x_data[s]
x_label = x_label[s]
There is a well-known function that can handle this:
from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)
Just setting test_size to 0 will avoid splitting and give you shuffled data.
Though it is usually used to split train and test data, it does shuffle them too.
From documentation
Split arrays or matrices into random train and test subsets
Quick utility that wraps input validation and
next(ShuffleSplit().split(X, y)) and application to input data into a
single call for splitting (and optionally subsampling) data in a
oneliner.
This seems like a very simple solution:
import numpy as np
def shuffle_in_unison(a,b):
assert len(a)==len(b)
c = np.arange(len(a))
np.random.shuffle(c)
return a[c],b[c]
a = np.asarray([[1, 1], [2, 2], [3, 3]])
b = np.asarray([11, 22, 33])
shuffle_in_unison(a,b)
Out[94]:
(array([[3, 3],
[2, 2],
[1, 1]]),
array([33, 22, 11]))
One way in which in-place shuffling can be done for connected lists is using a seed (it could be random) and using numpy.random.shuffle to do the shuffling.
# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
np.random.seed(seed)
np.random.shuffle(a)
np.random.seed(seed)
np.random.shuffle(b)
That's it. This will shuffle both a and b in the exact same way. This is also done in-place which is always a plus.
EDIT, don't use np.random.seed() use np.random.RandomState instead
def shuffle(a, b, seed):
rand_state = np.random.RandomState(seed)
rand_state.shuffle(a)
rand_state.seed(seed)
rand_state.shuffle(b)
When calling it just pass in any seed to feed the random state:
a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)
Output:
>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]
Edit: Fixed code to re-seed the random state
Say we have two arrays: a and b.
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])
We can first obtain row indices by permutating first dimension
indices = np.random.permutation(a.shape[0])
[1 2 0]
Then use advanced indexing.
Here we are using the same indices to shuffle both arrays in unison.
a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]
This is equivalent to
np.take(a, indices, axis=0)
[[4 5 6]
[7 8 9]
[1 2 3]]
np.take(b, indices, axis=0)
[[6 6 6]
[4 2 0]
[9 1 1]]
If you want to avoid copying arrays, then I would suggest that instead of generating a permutation list, you go through every element in the array, and randomly swap it to another position in the array
for old_index in len(a):
new_index = numpy.random.randint(old_index+1)
a[old_index], a[new_index] = a[new_index], a[old_index]
b[old_index], b[new_index] = b[new_index], b[old_index]
This implements the Knuth-Fisher-Yates shuffle algorithm.
Shortest and easiest way in my opinion, use seed:
random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)
most solutions above work, however if you have column vectors you have to transpose them first. here is an example
def shuffle(self) -> None:
"""
Shuffles X and Y
"""
x = self.X.T
y = self.Y.T
p = np.random.permutation(len(x))
self.X = x[p].T
self.Y = y[p].T
With an example, this is what I'm doing:
combo = []
for i in range(60000):
combo.append((images[i], labels[i]))
shuffle(combo)
im = []
lab = []
for c in combo:
im.append(c[0])
lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)
I extended python's random.shuffle() to take a second arg:
def shuffle_together(x, y):
assert len(x) == len(y)
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random.random() * (i+1))
x[i], x[j] = x[j], x[i]
y[i], y[j] = y[j], y[i]
That way I can be sure that the shuffling happens in-place, and the function is not all too long or complicated.
Just use numpy...
First merge the two input arrays 1D array is labels(y) and 2D array is data(x) and shuffle them with NumPy shuffle method. Finally split them and return.
import numpy as np
def shuffle_2d(a, b):
rows= a.shape[0]
if b.shape != (rows,1):
b = b.reshape((rows,1))
S = np.hstack((b,a))
np.random.shuffle(S)
b, a = S[:,0], S[:,1:]
return a,b
features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

In-place type conversion of a NumPy array

Given a NumPy array of int32, how do I convert it to float32 in place? So basically, I would like to do
a = a.astype(numpy.float32)
without copying the array. It is big.
The reason for doing this is that I have two algorithms for the computation of a. One of them returns an array of int32, the other returns an array of float32 (and this is inherent to the two different algorithms). All further computations assume that a is an array of float32.
Currently I do the conversion in a C function called via ctypes. Is there a way to do this in Python?
Update: This function only avoids copy if it can, hence this is not the correct answer for this question. unutbu's answer is the right one.
a = a.astype(numpy.float32, copy=False)
numpy astype has a copy flag. Why shouldn't we use it ?
You can make a view with a different dtype, and then copy in-place into the view:
import numpy as np
x = np.arange(10, dtype='int32')
y = x.view('float32')
y[:] = x
print(y)
yields
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32)
To show the conversion was in-place, note that copying from x to y altered x:
print(x)
prints
array([ 0, 1065353216, 1073741824, 1077936128, 1082130432,
1084227584, 1086324736, 1088421888, 1090519040, 1091567616])
You can change the array type without converting like this:
a.dtype = numpy.float32
but first you have to change all the integers to something that will be interpreted as the corresponding float. A very slow way to do this would be to use python's struct module like this:
def toi(i):
return struct.unpack('i',struct.pack('f',float(i)))[0]
...applied to each member of your array.
But perhaps a faster way would be to utilize numpy's ctypeslib tools (which I am unfamiliar with)
- edit -
Since ctypeslib doesnt seem to work, then I would proceed with the conversion with the typical numpy.astype method, but proceed in block sizes that are within your memory limits:
a[0:10000] = a[0:10000].astype('float32').view('int32')
...then change the dtype when done.
Here is a function that accomplishes the task for any compatible dtypes (only works for dtypes with same-sized items) and handles arbitrarily-shaped arrays with user-control over block size:
import numpy
def astype_inplace(a, dtype, blocksize=10000):
oldtype = a.dtype
newtype = numpy.dtype(dtype)
assert oldtype.itemsize is newtype.itemsize
for idx in xrange(0, a.size, blocksize):
a.flat[idx:idx + blocksize] = \
a.flat[idx:idx + blocksize].astype(newtype).view(oldtype)
a.dtype = newtype
a = numpy.random.randint(100,size=100).reshape((10,10))
print a
astype_inplace(a, 'float32')
print a
Time spent reading data
t1=time.time() ; V=np.load ('udata.npy');t2=time.time()-t1 ; print( t2 )
95.7923333644867
V.dtype
dtype('>f8')
V.shape
(3072, 1024, 4096)
**Creating new array **
t1=time.time() ; V64=np.array( V, dtype=np.double); t2=time.time()-t1 ; print( t2 )
1291.669689655304
Simple in-place numpy conversion
t1=time.time() ; V64=np.array( V, dtype=np.double); t2=time.time()-t1 ; print( t2 )
205.64322113990784
Using astype
t1=time.time() ; V = V.astype(np.double) ; t2=time.time()-t1 ; print( t2 )
400.6731758117676
Using view
t1=time.time() ; x=V.view(np.double);V[:,:,:]=x ;t2=time.time()-t1 ; print( t2 )
556.5982494354248
Note that each time I cleared the variables. Thus simply let python handle the conversion is the most efficient.
import numpy as np
arr_float = np.arange(10, dtype=np.float32)
arr_int = arr_float.view(np.float32)
use view() and parameter 'dtype' to change the array in place.
Use this:
In [105]: a
Out[105]:
array([[15, 30, 88, 31, 33],
[53, 38, 54, 47, 56],
[67, 2, 74, 10, 16],
[86, 33, 15, 51, 32],
[32, 47, 76, 15, 81]], dtype=int32)
In [106]: float32(a)
Out[106]:
array([[ 15., 30., 88., 31., 33.],
[ 53., 38., 54., 47., 56.],
[ 67., 2., 74., 10., 16.],
[ 86., 33., 15., 51., 32.],
[ 32., 47., 76., 15., 81.]], dtype=float32)
a = np.subtract(a, 0., dtype=np.float32)

Categories