Vectorizing the addition of results to a numpy array

Vectorizing the addition of results to a numpy array - python

I have a function that works something like this:
def Function(x):
a = random.random()
b = random.random()
c = OtherFunctionThatReturnsAThreeColumnArray()
results = np.zeros((1,5))
results[0,0] = a
results[0,1] = b
results[0,2] = c[-1,0]
results[0,3] = c[-1,1]
results[0,4] = c[-1,2]
return results
What I'm trying to do is run this function many, many times, appending the returned one row, 5 column results to a running data set. But the append function, and a for-loop are both ruinously inefficient as I understand it, and I'm both trying to improve my code and the number of runs is going to be large enough that that kind of inefficiency isn't doing me any favors.
Whats the best way to do the following such that it induces the least overhead:
Create a new numpy array to hold the results
Insert the results of N calls of that function into the array in 1?

You're correct in thinking that numpy.append or numpy.concatenate are going to be expensive if repeated many times (this is to do with numpy declaring a new array for the two previous arrays).
The best suggestion (If you know how much space you're going to need in total) would be to declare that before you run your routine, and then just put the results in place as they become available.
If you're going to run this nrows times, then
results = np.zeros([nrows, 5])
and then add your results
def function(x, i, results):
<.. snip ..>
results[i,0] = a
results[i,1] = b
results[i,2] = c[-1,0]
results[i,3] = c[-1,1]
results[0,4] = c[-1,2]
Of course, if you don't know how many times you're going to be running function this won't work. In that case, I'd suggest a less elegant approach;
Declare a possibly large results array and add to results[i, x] as above (keeping track of i and the size of results.
When you reach the size of results, then do the numpy.append (or concatenate) on a new array. This is less bad than appending repetitively and shouldn't destroy performance - but you will have to write some wrapper code.
There are other ideas you could pursue. Off the top of my head you could
Write the results to disk, depending on the speed of OtherFunctionThatReturnsAThreeColumnArray and the size of your data this may not be too daft an idea.
Save your results in a list comprehension (forgetting numpy until after the run). If function returned (a, b, c) not results;
results = [function(x) for x in my_data]
and now do some shuffling to get results into the form you need.

Related

How to efficiently make a function call to each row of a 2D ndarray?

I'm implementing a KNN classifier and need to quickly traverse the test set to calculate and store their predicted labels.
The way I use now is to use the list comprehension to get a list, and then turn it into a ndarray, similar to np.array([predict(point) for point in test_set]), but I think it takes time and space, because the for loop of Python is relatively slow and it needs to create another copy. Is there a more efficient way to get such an array?
I know that numpy has apply_along_axis function, but it is said that it only implicitly uses the for loop, which may not improve the performance.
EDIT: I learned a possible way to save memory: match np.fromiter() function and generator, like np.fromiter((predict(point) for point in test_set), int, test_set.shape[0]), which avoids creating a list halfway. Unfortunately, in my program, it seems to run a little slower than the previous method.

the good old way:
def my_func(test_set):
i = 0
test_set_size = len(test_set)
result = [None] * test_set_size
while i < test_set_size:
result[i] = predict(test_set[i])
i = i + 1
return np.array(result)

How to loop through a random function multiple times?

def a4():
p = []
for i in range(10):
p.append(random.sample(x, 100))
r = []
for i in p:
for j in i:
r.append(j)
return r
OUTPUT:
[0.5202486543583558, 0.5202486543583558, 0.5202486543583558, 0.5202486543583558, 0.5202486543583558]
a1000 = []
for i in range(5):
a4()
a1000.append(statistics.mean(a4()))
print(a1000)
I tried to loop through the above defined function using for loop mentioned above but the function only runs once and all the loop results are basically the same. I want the function to run each time through the loop. Could someone tell me why the function is only running once?

As was pointed in the comments the sublists in p in the definition of a4 have exactly the same elements, exactly the same number of times only the order of these element changes.
Therefore the same goes for every new result of a4. These are the same lists upto a permutation of elements. But the order of elements is irrelevant for the computation of the mean (the sum of permuted elements is always the same). Hence you always get the same mean as a result.
However, what you might have wanted to implement is some kind of a bootstrapping mechanism. In that case you would want to sample with replacement. And that in turn would yield different result every time. If this is what you want then replace
p.append(random.sample(x, 100))
with
p.append(random.choices(x, k=100))
Also I would consider using numpy for these things. Read about numpy array methods concatenate, flatten. And numpy.random.sample and numpy.random.choice.

Is putting a numpy array inside a list pythonic?

I am trying to break a long sequence into sub-sequence with a smaller window size by using the get_slice function defined by me.
Then I suddenly realized that my code is too clumsy, since my raw data is already a numpy array, then I need to store it into a list in my get_slice function. After that, when I read each row in the data_matrix, I need another list to stored the information again.
The code works fine, yet the conversion between numpy array and list back and forth seems non-pythonic to me. I wonder if I am doing it right. If not, how to do it more efficiently and more pythonic?
Here's my code:
import numpy as np
##Artifical Data Generation##
X_row1 = np.linspace(1,60,60,dtype=int)
X_row2 = np.linspace(101,160,60,dtype=int)
X_row3 = np.linspace(1001,1060,60,dtype=int)
data_matrix = np.append(X_row1.reshape(1,-1),X_row2.reshape(1,-1),axis=0)
data_matrix = np.append(data_matrix,X_row3.reshape(1,-1,),axis=0)
##---------End--------------##
##The function for generating time slice for sequence##
def get_slice(X,windows=5, stride=1):
x_slice = []
for i in range(int(len(X)/stride)):
if i*stride < len(X)-windows+1:
x_slice.append(X[i*stride:i*stride+windows])
return np.array(x_slice)
##---------End--------------##
x_list = []
for row in data_matrix:
temp_data = get_slice(row) #getting time slice as numpy array
x_list.append(temp_data) #appending the time slice into a list
X = np.array(x_list) #Converting the list back to numpy array

Putting this here as a semi-complete answer to address your two points - making the code more "pythonic" and more "efficient."
There are many ways to write code and there's always a balance to be found between the amount of numpy code and pure python code used.
Most of that comes down to experience with numpy and knowing some of the more advanced features, how fast the code needs to run, and personal preference.
Personal preference is the most important - you need to be able to understand what your code does and modify it.
Don't worry about what is pythonic, or even worse - numpythonic.
Find a coding style that works for you (as you seem to have done), and don't stop learning.
You'll pick up some tricks (like #B.M.'s answer uses), but for the most part these should be saved for rare instances.
Most tricks tend to require extra work, or only apply in some circumstances.
That brings up the second part of your question.
How to make code more efficient.
The first step is to benchmark it.
Really.
I've been surprised at the number of things I thought would speed up code that barely changed it, or even made it run slower.
Python's lists are highly optimized and give good performance for many things (Although many users here on stackoverflow remain convinced that using numpy can magically make any code faster).
To address your specific point, mixing lists and arrays is fine in most cases. Particularly if
You don't know the size of your data beforehand (lists expand much more efficiently)
You are creating a large number of views into an array (a list of arrays is often cheaper than one large array in this case)
You have irregularly shaped data (arrays must be square)
In your code, case 2 applies. The trick with as_strided would also work, and probably be faster in some cases, but until you've profiled and know what those cases are I would say your code is good enough.

There is very fews case where mixing list and array is necessary. You can efficiently have the same data with only array primitives:
data_matrix=np.add.outer([0,100,1000],np.linspace(1,60,60,dtype=int))
X=np.lib.stride_tricks.as_strided(data_matrix2,shape=(3, 56, 5),strides=(4*60,4,4))
It's just a view. A fresh array can be obtained by X=X.copy().

Appending to the list will be slow. Try a list comprehension to make the numpy array.
something like below
import numpy as np
##Artifical Data Generation##
X_row1 = np.linspace(1,60,60,dtype=int)
X_row2 = np.linspace(101,160,60,dtype=int)
X_row3 = np.linspace(1001,1060,60,dtype=int)
data_matrix = np.append(X_row1.reshape(1,-1),X_row2.reshape(1,-1),axis=0)
data_matrix = np.append(data_matrix,X_row3.reshape(1,-1,),axis=0)
##---------End--------------##
##The function for generating time slice for sequence##
def get_slice(X,windows=5, stride=1):
return np.array([X[i*stride:i*stride+windows]
for i in range(int(len(X)/stride))
if i*stride < len(X)-windows+1])
##---------End--------------##
X = np.array([get_slice(row) for row in data_matrix])
print(X)
This may be odd, because you have a numpy array of numpy arrays. If you want a 3 dimensional array this is perfectly fine. If you don't want a 3 dimensional array then you may want to vstack or append the arrays.
# X = np.array([get_slice(row) for row in data_matrix])
X = np.vstack((get_slice(row) for row in data_matrix))
List Comprehension speed
I am running Python 3.4.4 on Windows 10.
import timeit
TEST_RUNS = 1000
LIST_SIZE = 2000000
def make_list():
li = []
for i in range(LIST_SIZE):
li.append(i)
return li
def make_list_microopt():
li = []
append = li.append
for i in range(LIST_SIZE):
append(i)
return li
def make_list_comp():
li = [i for i in range(LIST_SIZE)]
return li
print("List Append:", timeit.timeit(make_list, number=TEST_RUNS))
print("List Comprehension:", timeit.timeit(make_list_comp, number=TEST_RUNS))
print("List Append Micro-optimization:", timeit.timeit(make_list_microopt, number=TEST_RUNS))
Output
List Append: 222.00971377954895
List Comprehension: 125.9705268094408
List Append Micro-optimization: 157.25782340883387
I am very surprised with how much the micro-optimization helps. Still, List Comprehensions are a lot faster for large lists on my system.

Iterating on data in two 3D arrays python

I'm trying to perform a number of functions to get some results from a set of satellite imagery (in the example case I am performing similarity functions). I first intended to iterate through all the pixels simultaneously, each containing 4 numbers, then calculating a value for each one based off these too numbers then write it to an array e.g scipy.spatial.distance.correlation(pixels_0, pixels_1).
The issue I have is when I run this loop I am having issues getting it to save to an array 1000x1000 giving it a value for each pixel.
array_0 = # some array with dimensions(1000, 1000, 4)
array_1 = # some array with dimensions(1000, 1000, 4)
result_array = []
for rows_0, rows_1 in itertools.izip(array_0, array_1):
for pixels_0, pixels_1 in itertools.izip(rows_0, rows_1):
results = some_function(pixels_0, pixels_1)
print results
>>> # successfully prints desired results
results_array.append(results)
>>> # unsuccessful in creating the desired array
I am getting the results I want to get printing down the run window but I don't know how to put it back into an array which I could manipulate in a similar manor. Are my for loops the issue or is this a simple issue with appending it back to arrays? Any explanation on speeding it up would also be great too as I'm very new to python and programming all together.
a = np.random.rand(10, 10, 4)
b = np.random.rand(10, 10, 4)
def dotprod(T0, T1):
return np.dot(T0, T1)/(np.linalg.norm(T0)*np.linalg.norm(T1))
results =dotprod(a.flatten(), b.flatten())
results = results.reshape(a.shape)
This now causes ValueError: total size of new array must be unchanged,
and when printing the first results value I receive only one number. Is this the fault of my own poorly constructed function or in how I am using numpy?

The best way is to use Numpy for your task. You should think in vectors. And you should write your some_function()to work in a vectorized manner. Here is an example:
array_0 = np.random.rand(1000,1000,4)
array_1 = np.random.rand(1000,1000,4)
results = some_function(array_0.flatten(), array_1.flatten()) ## this will be (1000*1000*4 X 1)
results = results.reshape(array_0.shape) ## reshaping to make it the way you want it.

Before investing anymore effort into programming it this way, take a look into the numpy package. It will be many times faster!
About your code: shouldn't your results array also be multidimensional? So in your inner (per row) loop you should be appending to a row, which you then in you outer loop append to your results matrix.
Try it with a small amount of data (e.g. 10 x 10 x 4) to learn from, but after that switch to numpy as soon as you can...

Numpy mean 'inplace'

I have a line of code that looks like this:
te_succ_rate = np.mean(np.argmax(test_y, axis=1) == self.predictor(test_x))
where test_y is a numpy array of arrays and self.predictor(test_x) returns a numpy array. The whole line of code returns the percentage of subarrays in test_y that has a max value equal to the value in the corresponding position in the array returned from self.predictor(test_x).
The problem is that for large sizes of test_y and test_x, it runs out of memory. It works fine for 10 000, but not 60 000.
Is there a way to avoid this?
I tried this:
tr_res = []
for start, end in zip(range(0, len(train_x), subsize), range(subsize, len(train_x), subsize)):
tr_res.append(self.predictor(train_x[start:end]))
tr_res = np.asarray(tr_res)
tr_res = tr_res.flatten()
tr_succ_rate = np.mean(np.argmax(train_y, axis=1) == tr_res)
But it does not work as the result is somehow 0 (which is not correct).

Level 1:
Though this isn't an answer for doing it inline, it may still be an answer to your problem:
You sure you're running out of memory from the mean and not the argmax?
Each additional dimension in test_y will be storing an extra N number of whatever datatype you're working with. Say you have 5 dimensions in your data, you'll have to store 5N values (presumably floats). The results of your self.predictor(test_x) will take a 6th N of memory. The temporary array that is the answer to your conditional is a 7th N. I don't actually know what the memory usage of np.mean is, but I assume it's not another N. But for arguments sake, let's say it is. If you inline just np.mean, you'll only save up to an N of memory, while you already need 7N worth.
So alternatively, try pulling out your np.argmax(test_y, axis=1) into an intermediate variable in a previous step and don't reference test_y again after calculating the argmax so test_y gets garbage collected. (or do whatever python 3 does to force deletion of that variable) That should save you the number of dimensions of your data minus 1 N of memory usage. (you'll be down to around 3N or up to 4N memory usage, which is better than you could have achieved by in-lining just np.mean.
I made the assumption that running self.predictor(test_x) only takes 1N. If it takes more, then pulling that out into its own intermediate variable in the same way will also help.
Level 2:
If that still isn't enough, still pull out your np.argmax(test_y, axis=1) and the self.predictor(test_x) into their own variables, then iterate across the two arrays yourself and do the conditional and aggregation yourself. Something like:
sum = 0.
n = 0
correct_ans = np.argmax(test_y, axis=1)
returned_ans = self.predictor(test_x)
for c, r in zip(correct_ans, returned_ans):
if c == r:
sum += 1
n += 1
avg = sum / n
(not sure if zip is the best way to do this. np probably has a more efficient way to do the same thing. This is the second thing you tried, but accumulating the aggregates without storing an additional array)
That way, you'll also save the need to store the temporary list of booleans resulting from your conditional.
If that still isn't enough, you're going to have to fundamentally change how you're storing your actual and target results, since the issue becomes you not being able to fit just the target and results into memory.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.