Accessing array in Python/Numba gives weird result - python

I'm trying to use numpy with numba but I'm getting weird results while trying to access or set some values to a numpy array of float using a float index converted to an int.
Check with this basic function.
#numba.jit("void(f8[:,::1],f8[:,::1])")
def test(table, index):
x,y = int(index[0,0]), int(index[1,0)
table[y,x] = 1.0
print index[0,0], index[1,0], x,y
print table
print table[y,x]
table = np.zeros((5,5), dtype = np.float32)
index = np.random.ranf(((2,2)))*5
test(table, index)
results:
index[0,0] = 1.34129550525 index[1,0] = 0.0656177324359 x = 1 y = 0
table[0,1] = 1.0
table [[ 0. 0. 1.875 0. 0. ]
[ 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. ]]
Why do I get a 1.875 in my table and not a 1.0? This a basic example but I'm working with big array and it gives me a lot of error. I know i can convert index to np.int32 and change #numba.jit("void(f8[:,::1],f8[:,::1])") to #numba.jit("void(f8[:,::1],i4[:,::1])") and that is working fine, but I would you like ton understand why this is not working.
Is it a problem while parsing the type from python to c++?
Thanks for you help

In [198]: np.float64(1.0).view((np.float32,2))
Out[198]: array([ 0. , 1.875], dtype=float32)
So when
table[y,x] = 1.0
writes a np.float64(1.0) into table, table views the data as np.float32 and interprets it as a 0 and a 1.875.
Notice that the 0 shows up at index location [0,1], and 1.875 shows up at index location [0,2], whereas the assignment occurred at [y,x] = [0,1].
You could fix the dtype mismatch by changing
#numba.jit("void(f8[:,::1],f8[:,::1])")
to
#numba.jit("void(f4[:,::1],f8[:,::1])")
These are the 8 bytes in np.float64(1.0):
In [201]: np.float64(1.0).tostring()
Out[201]: '\x00\x00\x00\x00\x00\x00\xf0?'
And when the 4 bytes '\x00\x00\xf0?' are interpreted as a np.float32 you get 1.875:
In [205]: np.fromstring('\x00\x00\xf0?', dtype='float32')
Out[205]: array([ 1.875], dtype=float32)

Related

How to efficiently concatenate Numpy Array based on position conditioning?

The objective is to concatenate a Numpy Array according to a set of position. However, I am curious whether the concatenate and step as shown in the code below can be optimized further without the need of for loop and if-else statement?
tot_length=0.2 implementation
steps=0.1
start_val=0
repeat_perm=3
list_no =np.arange(start_val, tot_length, steps)
x, y, z = np.meshgrid(*[list_no for _ in range(3)], sparse=True)
ix = np.array(((x>=y) & (y>=z)).nonzero()).T
final_opt=list_no[ix]
final_opt[:,[0, 1]] = final_opt[:,[1, 0]]
all_result=itertools.product(range(0,ix.shape[1]), repeat=repeat_perm)
for num, num_pair in enumerate(all_result, start=1):
for num_x, num_pair_x in enumerate ( num_pair, start=0 ):
if (num == 1) &(num_x==0) :
cont_arry = final_opt [num_pair_x, :]
else:
cont_arry= np.concatenate((cont_arry, final_opt [num_pair_x, :]), axis=0)
final_arr =np.reshape(cont_arry, (-1, 9))
print(final_arr)
Output of size (27, 9), but only partial are shown below
[[0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 0. 0.1 0. 0. ]
[0. 0. 0. 0. 0. 0. 0.1 0.1 0. ]
[0. 0. 0. 0.1 0. 0. 0. 0. 0. ]
[0. 0. 0. 0.1 0. 0. 0.1 0. 0. ]
[0. 0. 0. 0.1 0. 0. 0.1 0.1 0. ]
[0.1 0.1 0. 0.1 0.1 0. 0.1 0.1 0. ]]
Just some heads up,the cont_arry will be vectorised multiply with a 1D array of similar length with the cont_arry. Knowing this, is there a way to avoid from storing the result of concatenation on memory or what not to minimise potential memory issue since in actual application, the worst possible parameter setting is as below
tot_length=200
steps=0.1
start_val=0
repeat_perm=1200
I think your concatenate loop can be replaced with:
alist = []
for num, num_pair in enumerate(all_result, start=1):
for num_x, num_pair_x in enumerate ( num_pair, start=0 ):
alist.append( final_opt [num_pair_x, :]))
arr = np.array(alist)
# arr = np.concatenate(alist, axis=0)
# arr = np.vstack(alist)
There may be some details in this that I didn't catch. I haven't tried to test it. List append is much faster than concatenate, especially when done repeatedly.
concatenate is most efficient when give a whole list of arrays to join.
Better yet, don't iterate at all; instead make use of whole-array math and indexing. But I haven't tried to master your code, so won't suggest how to do that.

Vectorize extracting sub-multidimensional array from multidimensional array with list of indices

I have this multidimensional array of shape (500000,3,2,3),let's call it data. The data is basically 500000 sets of 3 points,each of the 3 points seperated into its x and y coordinates (hence the 2). The last 3 in the shape represents different rotations of the 3 points. Now, I've got this 1d array of 500000 numbers between 0 and 2 that tell me which of the rotations I want to keep, let's call it rot_index. I would like to construct a multidimensional array of shape (500000,3,2) that only keeps the correctly rotated data points. Any ideas on how to extract the data with the correct index from the original data array? I tried something like this, but it didn't work
data[:,:,:,rot_index]
Edit:
here is some example data (giving 10 sets of points instead of 500000)
data =
[[[[0.70846822 0.98552876 0.66736535]
[0. 0. 0. ]]
[[0.66736535 0.70846822 0.98552876]
[1.54545219 2.39798549 2.33974762]]
[[0.98552876 0.66736535 0.70846822]
[3.88519982 3.94343768 4.73773311]]]
[[[0.8132551 1.18845796 1.53004225]
[0. 0. 0. ]]
[[1.18845796 1.53004225 0.8132551 ]
[1.43211754 2.58720625 2.26386152]]
[[1.53004225 0.8132551 1.18845796]
[4.01932379 4.85106777 3.69597906]]]
[[[0.66123513 0.93651048 0.83170562]
[0. 0. 0. ]]
[[0.93651048 0.83170562 0.66123513]
[2.09747072 2.38383457 1.80188002]]
[[0.83170562 0.66123513 0.93651048]
[4.48130529 4.18571459 3.89935074]]]
[[[1.31047414 0.67740955 1.42020073]
[0. 0. 0. ]]
[[0.67740955 1.42020073 1.31047414]
[1.66061575 1.97600777 2.64656179]]
[[1.42020073 1.31047414 0.67740955]
[3.63662352 4.62256956 4.30717753]]]
[[[1.4085555 1.64177102 0.27708893]
[0. 0. 0. ]]
[[0.27708893 1.4085555 1.64177102]
[0.62154257 3.04315813 2.61848461]]
[[1.64177102 0.27708893 1.4085555 ]
[3.24002718 3.6647007 5.66164274]]]
[[[0.48080385 0.85910831 0.52342904]
[0. 0. 0. ]]
[[0.52342904 0.48080385 0.85910831]
[1.08970318 2.57102289 2.62245924]]
[[0.85910831 0.52342904 0.48080385]
[3.71216242 3.66072607 5.19348213]]]
[[[1.13610207 1.51237019 0.47256909]
[0. 0. 0. ]]
[[1.51237019 0.47256909 1.13610207]
[2.92304081 2.59328103 0.76686347]]
[[0.47256909 1.13610207 1.51237019]
[5.51632184 3.3601445 3.68990428]]]
[[[1.08397801 1.16506242 0.84703646]
[0. 0. 0. ]]
[[1.16506242 0.84703646 1.08397801]
[2.37250664 2.04419242 1.86648625]]
[[0.84703646 1.08397801 1.16506242]
[4.41669906 3.91067866 4.23899289]]]
[[[0.98734317 1.11177984 0.90283297]
[0. 0. 0. ]]
[[1.11177984 0.90283297 0.98734317]
[2.25981006 2.13666143 1.88671382]]
[[0.90283297 0.98734317 1.11177984]
[4.39647149 4.02337525 4.14652387]]]
[[[1.94118244 1.14738719 1.98251535]
[0. 0. 0. ]]
[[1.14738719 1.98251535 1.94118244]
[1.83291888 1.90183408 2.54843234]]
[[1.98251535 1.94118244 1.14738719]
[3.73475296 4.45026642 4.38135123]]]]
And here is a list of the indices I want to keep:
rot_index = np.array([1 2 1 1 1 1 1 2 1 1])
So just as an example, if you consider
data[0,:,:,0] = [[0.70846822 0.]
[0.66736535 1.54545219]
[0.98552876 3.88519982]]
data[0,:,:,1] = [[0.98552876 0.]
[0.70846822 2.39798549]
[0.66736535 3.94343768]]
data[0,:,:,2] = [[0.66736535 0.]
[0.98552876 2.33974762]
[0.70846822 4.73773311]]
These are 3 different "rotations" of the same sample, and if we look at the first element of rot_index, it is a 1. So I only want to keep
data[0,:,:,1] = [[0.98552876 0.]
[0.70846822 2.39798549]
[0.66736535 3.94343768]]
Using numpy advanced indexing, and under that, the specific subtopic of combining advanced and basic indexing this should work (where data_array is a numpy ndarray having your data):
result = data_array[range(500000),...,rot_index]
For your sample data, this produces:
[[[0.98552876 0. ]
[0.70846822 2.39798549]
[0.66736535 3.94343768]]
[[1.53004225 0. ]
[0.8132551 2.26386152]
[1.18845796 3.69597906]]
[[0.93651048 0. ]
[0.83170562 2.38383457]
[0.66123513 4.18571459]]
[[0.67740955 0. ]
[1.42020073 1.97600777]
[1.31047414 4.62256956]]
[[1.64177102 0. ]
[1.4085555 3.04315813]
[0.27708893 3.6647007 ]]
[[0.85910831 0. ]
[0.48080385 2.57102289]
[0.52342904 3.66072607]]
[[1.51237019 0. ]
[0.47256909 2.59328103]
[1.13610207 3.3601445 ]]
[[0.84703646 0. ]
[1.08397801 1.86648625]
[1.16506242 4.23899289]]
[[1.11177984 0. ]
[0.90283297 2.13666143]
[0.98734317 4.02337525]]
[[1.14738719 0. ]
[1.98251535 1.90183408]
[1.94118244 4.45026642]]]

Standardising data of irregular shape (TypeError: only size-1 arrays can be converted to Python scalars)

So I have an array XsN of shape (590,) and I am trying to standardise the data.
This is an example of one of the 590 elements in my array:
print(XsN[:1])
[array([[ 0. , 0.27229556, -1.8033657 , ..., 0. ,
0. , 0. ],
[ 0. , 0.20665401, -1.9340569 , ..., 0. ,
0. , 0. ],
[ 4. , 0. , 0.04352444, ..., 0. ,
0. , 0. ],
...,
[10. , 0. , -0.5655 , ..., 0. ,
0. , 0. ],
[10. , 0. , 0.9150001 , ..., 0. ,
0. , 0. ],
[10. , 0. , 1.0005 , ..., 0. ,
0. , 0. ]], dtype=float32)]
I'm then reshaping it so that it has shape (590,1):
XsN_2 = XsN.reshape(-1,1)
Now when I use StandardScaler:
from sklearn.preprocessing import StandardScaler
standardized_data = StandardScaler().fit_transform(XsN_2)
I get the error that
TypeError: only size-1 arrays can be converted to Python scalars
and
ValueError: setting an array element with a sequence.
I understand it tries to find a number but instead it finds an ndarray but I'm not quite sure how to standardise data of shape (590,) where each element is its own ndarray.
Edit 1:
Referring to this csv file: https://gofile.io/?c=YGxCWQ
Here is some code with a sample data:
import pandas as pd
from sklearn.preprocessing import StandardScaler
imp = pd.read_csv('foo.csv', sep=',', header=None)
data = imp.values
print(data)
standardized_data = StandardScaler().fit_transform(data)
The error I get now is:
ValueError: could not convert string to float
Is there any way I can standardise this data?
Without access to your original data in the form of a valid .csv file it is a little difficult to debug this. From the look of what you printed it seems like XsN is a list of arrays, so you may want to loop through each in turn or convert it into an array with expanded dimensions.
Here is an example of standardizing some dummy data which I think resembles the structure of your data. Hope that helps.
n = 100
# Create feature 1
mean1 = 10
standard_dev1 = 2
col1 = np.random.normal(loc=mean1,scale=standard_dev1,size=[n,1])
# Create feature 2
mean2 = 20
standard_dev2 = 4
col2 = np.random.normal(loc=mean2,scale=standard_dev2,size=[n,1])
data = np.concatenate([col1,col2],axis=1)
print(f"means of raw data: {data.mean(axis=0)}")
>>>
means of raw data: [10.15783287 19.82541124]
print(f"standard devations of raw data: {data.std(axis=0)}")
>>>
standard devations of raw data: [2.00049111 3.87277793]
from sklearn.preprocessing import StandardScaler
standardized_data = StandardScaler().fit_transform(data)
print(f"means of standardized data: {standardized_data.mean(axis=0)}")
>>>
means of standardized data: [-6.92779167e-16 -1.78745907e-15]
print(f"standard devations of standardized data: {standardized_data.std(axis=0)}")
>>>
standard devations of standardized data: [1. 1.]

Insert Numpy Array into Array with extending of the embedding array

First of all, I work with byte array (>= 400x400x1000) bytes.
I wrote a small function which can insert a multidimensional array (or a fraction of) into another one by indicating an offset. This works if the embedded array is smaller than the embedding array (case A). Otherwise the embedded array is truncated (case B).
case A) Inserting a 3x3 into a 5x5 matrix with offset 1,1 would look like this.
[[ 0. 0. 0. 0. 0.]
[ 0. 1. 1. 1. 0.]
[ 0. 1. 1. 1. 0.]
[ 0. 1. 1. 1. 0.]
[ 0. 0. 0. 0. 0.]]
case B) If the offsets are exceeding the dimensions of the embedding matrix, the smaller array is truncated. E.g. a (-1,-1) offset would result in this.
[[ 1. 1. 0. 0. 0.]
[ 1. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
case C) Now, instead of truncating the embedded array, I want to extend the embedding array (by zeroes) if the embedded array is either bigger than the embedding array or the offsets enforce it (e.g. case B). Is there a smart way with numpy or scipy to solve this?
[[ 1. 1. 1. 0. 0. 0.]
[ 1. 1. 1. 0. 0. 0.]
[ 1. 1. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
Actually I work with 3D array, but for simplicity I wrote an example for 2D arrays. Current source:
import numpy as np
import nibabel as nib
def addAtPos(mat_bigger, mat_smaller, xyz_coor):
size_sm_x, size_sm_y = np.shape(mat_smaller)
size_gr_x, size_gr_y = np.shape(mat_bigger)
start_gr_x, start_gr_y = xyz_coor
start_sm_x, start_sm_y = 0,0
end_x, end_y = (start_gr_x + size_sm_x), (start_gr_y + size_sm_y)
print(size_sm_x, size_sm_y)
print(size_gr_x, size_gr_y)
print(end_x, end_y)
if start_gr_x < 0:
start_sm_x = -start_gr_x
start_gr_x = 0
if start_gr_y < 0:
start_sm_y = -start_gr_y
start_gr_y = 0
if end_x > size_gr_x:
size_sm_x = size_sm_x - (end_x - size_gr_x)
end_x = size_gr_x
if end_y > size_gr_y:
size_sm_y = size_sm_y - (end_y - size_gr_y)
end_y = size_gr_y
# copy all or a chunk (if offset is small/big enough) of the smaller matrix into the bigger matrix
mat_bigger[start_gr_x:end_x, start_gr_y:end_y] = mat_smaller[start_sm_x:size_sm_x, start_sm_y:size_sm_y]
return mat_bigger
a_gr = np.zeros([5,5])
a_sm = np.ones([3,3])
a_res = addAtPos(a_gr, a_sm, [-2,1])
#print (a_gr)
print (a_res)
Actually there is an easier way to do it.
For your first example of a 3x3 array embedded to a 5x5 one you can do it with something like:
A = np.array([[1,1,1], [1,1,1], [1,1,1]])
(N, M) = A.shape
B = np.zeros(shape=(N + 2, M + 2))
B[1:-1:, 1:-1] = A
By playing with slicing you can select a subset of A and insert it anywhere within a continuous subset of B.
Hope it helps! ;-)

How to fill upper triangle of numpy array with zeros in place?

What is the best way to fill in the lower triangle of a numpy array with zeros in place so that I don't have to do the following:
a=np.random.random((5,5))
a = np.triu(a)
since np.triu returns a copy, not a view. Preferable this would require no list indexing as well since I am working with large arrays.
Digging into the internals of triu you'll find that it just multiplies the input by the output of tri.
So you can just multiply the array in-place by the output of tri:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape)
>>> a
array([[ 0.46026582, 0. , 0. , 0. , 0. ],
[ 0.76234296, 0.5298908 , 0. , 0. , 0. ],
[ 0.08797149, 0.14881991, 0.9302515 , 0. , 0. ],
[ 0.54794779, 0.36896506, 0.92901552, 0.73747726, 0. ],
[ 0.62917827, 0.61674542, 0.44999905, 0.80970863, 0.41860336]])
Like triu, this still creates a second array (the output of tri), but at least it performs the operation itself in-place. The splat is a bit of a shortcut; consider basing your function on the full version of triu for something robust. But note that you can still specify a diagonal:
>>> a = np.random.random((5, 5))
>>> a *= np.tri(*a.shape, k=2)
>>> a
array([[ 0.25473126, 0.70156073, 0.0973933 , 0. , 0. ],
[ 0.32859487, 0.58188318, 0.95288351, 0.85735005, 0. ],
[ 0.52591784, 0.75030515, 0.82458369, 0.55184033, 0.01341398],
[ 0.90862183, 0.33983192, 0.46321589, 0.21080121, 0.31641934],
[ 0.32322392, 0.25091433, 0.03980317, 0.29448128, 0.92288577]])
I now see that the question title and body describe opposite behaviors. Just in case, here's how you can fill the lower triangle with zeros. This requires you to specify the -1 diagonal:
>>> a = np.random.random((5, 5))
>>> a *= 1 - np.tri(*a.shape, k=-1)
>>> a
array([[0.6357091 , 0.33589809, 0.744803 , 0.55254798, 0.38021111],
[0. , 0.87316263, 0.98047459, 0.00881754, 0.44115527],
[0. , 0. , 0.51317289, 0.16630385, 0.1470729 ],
[0. , 0. , 0. , 0.9239731 , 0.11928557],
[0. , 0. , 0. , 0. , 0.1840326 ]])
If speed and memory use are still a limitation and Cython is available, a short Cython function will do what you want.
Here's a working version designed for a C-contiguous array with double precision values.
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
cpdef make_lower_triangular(double[:,:] A, int k):
""" Set all the entries of array A that lie above
diagonal k to 0. """
cdef int i, j
for i in range(min(A.shape[0], A.shape[0] - k)):
for j in range(max(0, i+k+1), A.shape[1]):
A[i,j] = 0.
This should be significantly faster than any version that involves multiplying by a large temporary array.
import numpy as np
n=3
A=np.zeros((n,n))
for p in range(n):
A[0,p] = p+1
if p >0 :
A[1,p]=p+3
if p >1 :
A[2,p]=p+4
creates a upper triangular matrix starting at 1

Categories