Numpy modify array in place? - python

I have the following code which is attempting to normalize the values of an m x n array (It will be used as input to a neural network, where m is the number of training examples and n is the number of features).
However, when I inspect the array in the interpreter after the script runs, I see that the values are not normalized; that is, they still have the original values. I guess this is because the assignment to the array variable inside the function is only seen within the function.
How can I do this normalization in place? Or do I have to return a new array from the normalize function?
import numpy
def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""
dmin = array.min()
dmax = array.max()
array = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
print array[0]
def main():
array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)
for column in array.T:
normalize(column)
return array
if __name__ == "__main__":
a = main()

If you want to apply mathematical operations to a numpy array in-place, you can simply use the standard in-place operators +=, -=, /=, etc. So for example:
>>> def foo(a):
... a += 10
...
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> foo(a)
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
The in-place version of these operations is a tad faster to boot, especially for larger arrays:
>>> def normalize_inplace(array, imin=-1, imax=1):
... dmin = array.min()
... dmax = array.max()
... array -= dmin
... array *= imax - imin
... array /= dmax - dmin
... array += imin
...
>>> def normalize_copy(array, imin=-1, imax=1):
... dmin = array.min()
... dmax = array.max()
... return imin + (imax - imin) * (array - dmin) / (dmax - dmin)
...
>>> a = numpy.arange(10000, dtype='f')
>>> %timeit normalize_inplace(a)
10000 loops, best of 3: 144 us per loop
>>> %timeit normalize_copy(a)
10000 loops, best of 3: 146 us per loop
>>> a = numpy.arange(1000000, dtype='f')
>>> %timeit normalize_inplace(a)
100 loops, best of 3: 12.8 ms per loop
>>> %timeit normalize_copy(a)
100 loops, best of 3: 16.4 ms per loop

This is a trick that it is slightly more general than the other useful answers here:
def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""
dmin = array.min()
dmax = array.max()
array[...] = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
Here we are assigning values to the view array[...] rather than assigning these values to some new local variable within the scope of the function.
x = np.arange(5, dtype='float')
print x
normalize(x)
print x
>>> [0. 1. 2. 3. 4.]
>>> [-1. -0.5 0. 0.5 1. ]
EDIT:
It's slower; it allocates a new array. But it may be valuable if you are doing something more complicated where builtin in-place operations are cumbersome or don't suffice.
def normalize2(array, imin=-1, imax=1):
dmin = array.min()
dmax = array.max()
array -= dmin;
array *= (imax - imin)
array /= (dmax-dmin)
array += imin
A = np.random.randn(200**3).reshape([200] * 3)
%timeit -n5 -r5 normalize(A)
%timeit -n5 -r5 normalize2(A)
>> 47.6 ms ± 678 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)
>> 26.1 ms ± 866 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)

def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""
dmin = array.min()
dmax = array.max()
array -= dmin;
array *= (imax - imin)
array /= (dmax-dmin)
array += imin
print array[0]

There is a nice way to do in-place normalization when using numpy. np.vectorize is is very usefull when combined with a lambda function when applied to an array. See the example below:
import numpy as np
def normalizeMe(value,vmin,vmax):
vnorm = float(value-vmin)/float(vmax-vmin)
return vnorm
imin = 0
imax = 10
feature = np.random.randint(10, size=10)
# Vectorize your function (only need to do it once)
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax))
normfeature = temp(np.asarray(feature))
print feature
print normfeature
One can compare the performance with a generator expression, however there are likely many other ways to do this.
%%timeit
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax))
normfeature1 = temp(np.asarray(feature))
10000 loops, best of 3: 25.1 µs per loop
%%timeit
normfeature2 = [i for i in (normalizeMe(val,imin,imax) for val in feature)]
100000 loops, best of 3: 9.69 µs per loop
%%timeit
normalize(np.asarray(feature))
100000 loops, best of 3: 12.7 µs per loop
So vectorize is definitely not the fastest, but can be conveient in cases where performance is not as important.

Related

Splitting values in an array 'logarithmically' / based on another array

I have a 2d array, where each element is a fourier transform. I'd like to split transform 'logarithmically'. For example, let's take a single one of those arrays and call it a:
a = np.arange(0, 512)
# I want to split a into 'bins' defined by b, below:
b = np.array([0] + [10 * 2**i for i in range(6)]) # [0, 10, 20, 40, 80, 160, 320, 640]
What I'm looking to do is something like using np.split, except I would like to split values into 'bins' based on array b such that all values of a between [0, 10) are in one bin, all values between [10, 20) in another, etc.
I could do this in some sort of convoluted for loop:
split_arr = []
for i in range(1, len(b)):
fbin = []
for amp in a:
if (amp >= b[i-1]) and (amp < b[i]):
fbin.append(amp)
split_arr.append(fbin)
I have many arrays to split, and also this is ugly (just my opinion). Is there a better way?
Here is how you can do it, using np.split:
np.split(a, np.searchsorted(a,b))
If your array a is not sorted, sort it before the above command:
a = np.sort(a)
np.searchsorted finds the locations of values in b that would be inserted in the sorted array a. In other words, np.searchsorted finds the locations where you want to split your array. And if you do not want the empty array at the beginning, simply remove 0 from b.
First you can reduce the 'ugliness' by using list comprehension:
split_arr = [[amp for amp in a if (amp >= b[i-1]) and (amp < b[i])] for i in range(1, len(b))]
Then you can apply the same logic using numpy fast parallelized functionalities (which has the bonus of looking even cleaner):
split_arr = [a[(a >= b[i-1]) & (a < b[i])] for i in range(1, len(b))]
Comparison:
%timeit [[amp for amp in a if (amp >= b[i-1]) and (amp < b[i])] for i in range(1, len(b))]
1.29 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [a[(a >= b[i-1]) & (a < b[i])] for i in range(1, len(b))]
35.9 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

printing binomial coefficient using numpy

Binomial coefficient for given value of n and k(nCk)
using numpy to multiply the results of a for loop
but numpy method is returning the memory location not the result
pls provide better solution in terms of time complexity if possible.
or any other suggestions.
import time
import numpy
def binomialc(n,k):
return 1 if k==0 or k==n else numpy.prod((n+1-i)/i for i in range(1,k+1))
starttime=time.perf_counter()
print(binomialc(600,298))
print(time.perf_counter()-starttime)
You may want to use: scipy.special.binom()
or, since Python 3.8: math.comb()
EDIT
I am not quite sure why you would not want to use SciPy but you are OK with NumPy, as SciPy is a well-established library from essentially the same folks developing NumPy.
Anyway, here a couple of other methods:
using math.factorial:
import math
def binom(n, k):
return math.factorial(n) // math.factorial(k) // math.factorial(n - k)
using prod() and math.factorial() (theoretically more efficient, but not in practice):
def prod(items, start=1):
for item in items:
start *= item
return start
def binom_simplified(n, k):
if k > n - k:
return prod(range(k + 1, n + 1)) // math.factorial(n - k)
else:
return prod(range(n - k + 1, n + 1)) // math.factorial(k)
using numpy.prod():
import numpy as np
def binom_np(n, k):
return 1 if k == 0 or k == n else np.prod([(n + 1 - i) / i for i in range(1, k + 1)])
Speed-wise, scipy.special.binom() is the fastest by far and large, but if you need the exact value also for very large numbers, you may prefer binom() (somewhat surprisingly even over math.comb()).
%timeit scipy.special.binom(600, 298)
# 1000000 loops, best of 3: 1.56 µs per loop
print(scipy.special.binom(600, 298))
# 1.3332140543730587e+179
%timeit math.comb(600, 298)
# 10000 loops, best of 3: 75.6 µs per loop
print(math.binom(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
%timeit binom(600, 298)
# 10000 loops, best of 3: 36.5 µs per loop
print(binom(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600
%timeit binom_np(600, 298)
# 10000 loops, best of 3: 45.8 µs per loop
print(binom_np(600, 298))
# 1.3332140543726893e+179
%timeit binom_simplified(600, 298)
# 10000 loops, best of 3: 41.9 µs per loop
print(binom_simplified(600, 298))
# 133321405437268991724586879878020905773601074858558174180536459530557427686938822154484588609548964189291743543415057988154692680263088796451884071926401665548516571367537285901600

Improving performance on comparison algorithm np.packbits(A==A[:, None], axis=1)

I am looking to memory optimise np.packbits(A==A[:, None], axis=1), where A is dense array of integers of length n. A==A[:, None] is memory hungry for large n since the resulting Boolean array is stored inefficiently with each Boolean value costing 1 byte.
I wrote the below script to achieve the same result while packing bits one section at a time. It is, however, around 3x slower, so I am looking for ways to speed it up. Or, alternatively, a better algorithm with small memory overhead.
Note: this is a follow-up question to one I asked earlier; Comparing numpy array with itself by element efficiently.
Reproducible code below for benchmarking.
import numpy as np
from numba import jit
#jit(nopython=True)
def bool2int(x):
y = 0
for i, j in enumerate(x):
if j: y += int(j)<<(7-i)
return y
#jit(nopython=True)
def compare_elementwise(arr, result, section):
n = len(arr)
for row in range(n):
for col in range(n):
section[col%8] = arr[row] == arr[col]
if ((col + 1) % 8 == 0) or (col == (n-1)):
result[row, col // 8] = bool2int(section)
section[:] = 0
return result
n = 10000
A = np.random.randint(0, 1000, n)
result_arr = np.zeros((n, n // 8 if n % 8 == 0 else n // 8 + 1)).astype(np.uint8)
selection_arr = np.zeros(8).astype(np.uint8)
# memory efficient version, but slow
packed = compare_elementwise(A, result_arr, selection_arr)
# memory inefficient version, but fast
packed2 = np.packbits(A == A[:, None], axis=1)
assert (packed == packed2).all()
%timeit compare_elementwise(A, result_arr, selection_arr) # 1.6 seconds
%timeit np.packbits(A == A[:, None], axis=1) # 0.460 second
Here is a solution 3 times faster than the numpy one (a.size must be a multiple of 8; see below) :
#nb.njit
def comp(a):
res=np.zeros((a.size,a.size//8),np.uint8)
for i,x in enumerate(a):
for j,y in enumerate(a):
if x==y: res[i,j//8] |= 128 >> j%8
return res
This works because the array is scanned one time, where you do it many times,
and amost all terms are null.
In [122]: %timeit np.packbits(A == A[:, None], axis=1)
389 ms ± 57.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [123]: %timeit comp(A)
123 ms ± 24.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
If a.size%8 > 0, the cost for find back the information will be higher. The best way in this case is to pad the initial array with some (in range(7)) zeros.
For completeness, the padding could be done as so:
if A.size % 8 != 0: A = np.pad(A, (0, 8 - A.size % 8), 'constant', constant_values=0)

Vectorize addition into array indexed by another array

I am trying to get a fast vectorized version of the following loop:
for i in xrange(N1):
A[y[i]] -= B[i,:]
Here A.shape = (N2,N3), y.shape = (N1) with y taking values in [0,N2[, B.shape = (N1,N3). You can think of entries of y being indices into rows of A. Here N1 is large, N2 is pretty small and N3 is smallish.
I thought simply doing
A[y] -= B
would work, but the issue is that there are repeated entries in y and this does not do the right thing (i.e., if y=[1,1] then A[1] is only added to once, not twice). Also this is does not seem to be any faster than the unvectorized for loop.
Is there a better way of doing this?
EDIT: YXD linked this answer to in comments which at first seems to fit the bill. It would seem you can do exactly what I want with
np.subtract.at(A, y, B)
and it does work, however when I try to run it it is significantly slower than the unvectorized version. So, the question remains: is there a more performant way of doing this?
EDIT2: An example, to make things concrete:
n1,n2,n3 = 10000, 10, 500
A = np.random.rand(n2,n3)
y = np.random.randint(n2, size=n1)
B = np.random.rand(n1,n3)
The for loop, when run using %timeit in ipython gives on my machine:
10 loops, best of 3: 19.4 ms per loop
The subtract.at version produces the same value for A in the end, but is much slower:
1 loops, best of 3: 444 ms per loop
The code for the original for-loop based approach would look something like this -
def for_loop(A):
N1 = B.shape[0]
for i in xrange(N1):
A[y[i]] -= B[i,:]
return A
Case #1
If n2 >> n3, I would suggest this vectorized approach -
def bincount_vectorized(A):
n3 = A.shape[1]
nrows = y.max()+1
id = y[:,None] + nrows*np.arange(n3)
A[:nrows] -= np.bincount(id.ravel(),B.ravel()).reshape(n3,nrows).T
return A
Runtime tests -
In [203]: n1,n2,n3 = 10000, 500, 10
...: A = np.random.rand(n2,n3)
...: y = np.random.randint(n2, size=n1)
...: B = np.random.rand(n1,n3)
...:
...: # Make copies
...: Acopy1 = A.copy()
...: Acopy2 = A.copy()
...:
In [204]: %timeit for_loop(Acopy1)
10 loops, best of 3: 19 ms per loop
In [205]: %timeit bincount_vectorized(Acopy2)
1000 loops, best of 3: 779 µs per loop
Case #2
If n2 << n3, a modified for-loop approach with lesser loop complexity could be suggested -
def for_loop_v2(A):
n2 = A.shape[0]
for i in range(n2):
A[i] -= np.einsum('ij->j',B[y==i]) # OR (B[y==i]).sum(0)
return A
Runtime tests -
In [206]: n1,n2,n3 = 10000, 10, 500
...: A = np.random.rand(n2,n3)
...: y = np.random.randint(n2, size=n1)
...: B = np.random.rand(n1,n3)
...:
...: # Make copies
...: Acopy1 = A.copy()
...: Acopy2 = A.copy()
...:
In [207]: %timeit for_loop(Acopy1)
10 loops, best of 3: 24.2 ms per loop
In [208]: %timeit for_loop_v2(Acopy2)
10 loops, best of 3: 20.3 ms per loop

Optimizing python one-liner

I profiled my program, and more than 80% of the time is spent in this one-line function! How can I optimize it? I am running with PyPy, so I'd rather not use NumPy, but since my program is spending almost all of its time there, I think giving up PyPy for NumPy might be worth it. However, I would prefer to use the CFFI, since that's more compatible with PyPy.
#x, y, are lists of 1s and 0s. c_out is a positive int. bit is 1 or 0.
def findCarryIn(x, y, c_out, bit):
return (2 * c_out +
bit -
sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))) #note this is basically a dot product.
Without using Numpy, After testing with timeit , The fastest method for the summing (that you are doing) seems to be using simple for loop and summing over the elements, Example -
def findCarryIn(x, y, c_out, bit):
s = 0
for i,j in zip(x, reversed(y)):
s += i & j
return (2 * c_out + bit - s)
Though this did not increase the performance by a lot (maybe 20% or so).
The results of timing tests (With different methods , func4 containing the method described above) -
def func1(x,y):
return sum(map(lambda x_bit, y_bit: x_bit & y_bit, x, reversed(y)))
def func2(x,y):
return sum([i & j for i,j in zip(x,reversed(y))])
def func3(x,y):
return sum(x[i] & y[-1-i] for i in range(min(len(x),len(y))))
def func4(x,y):
s = 0
for i,j in zip(x, reversed(y)):
s += i & j
return s
In [125]: %timeit func1(x,y)
100000 loops, best of 3: 3.02 µs per loop
In [126]: %timeit func2(x,y)
The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 2.9 µs per loop
In [127]: %timeit func3(x,y)
100000 loops, best of 3: 4.31 µs per loop
In [128]: %timeit func4(x,y)
100000 loops, best of 3: 2.2 µs per loop
This can for sure be sped up a lot using numpy. You could define your function something like this:
def find_carry_numpy(x, y, c_out, bit):
return 2 * c_out + bit - np.sum(x & y[::-1])
Create some random data:
In [36]: n = 100; c = 15; bit = 1
In [37]: x_arr = np.random.rand(n) > 0.5
In [38]: y_arr = np.random.rand(n) > 0.5
In [39]: x_list = list(x_arr)
In [40]: y_list = list(y_arr)
Check that results are the same:
In [42]: find_carry_numpy(x_arr, y_arr, c, bit)
Out[42]: 10
In [43]: findCarryIn(x_list, y_list, c, bit)
Out[43]: 10
Quick speed test:
In [44]: timeit find_carry_numpy(x_arr, y_arr, c, bit)
10000 loops, best of 3: 19.6 µs per loop
In [45]: timeit findCarryIn(x_list, y_list, c, bit)
1000 loops, best of 3: 409 µs per loop
So you gain a factor of 20 in speed! That is a pretty typical speedup when converting Python code to Numpy.

Categories