Efficient operation on numpy arrays contain rows with different size

Efficient operation on numpy arrays contain rows with different size - python

I want to ask something that is related with this question posted time ago Operation on numpy arrays contain rows with different size . The point is that I need to do some operations with numpy arrays that contain rows with different size.
The standard way like "list2*list3*np.exp(list1)" doens't work since the rows are from different size, and the option that works is using zip. See the code below.
import numpy as np
import time
list1 = np.array([[2.,0.,3.5,3],[3.,4.2,5.,7.1,5.]])
list2 = np.array([[2,3,3,0],[3,8,5.1,7.6,1.2]])
list3 = np.array([[1,3,8,3],[3,4,9,0,0]])
start_time = time.time()
c =[]
for i in range(len(list1)):
c.append([list2*list3*np.exp(list1) for list1, list2,list3 in zip(list1[i], list2[i],list3[i])])
print("--- %s seconds ---"% (time.time()-start_time))
I want to ask if exist a much more efficient way to perform this operations avoiding a loop an doing in a more numpy way. Thanks!

This should do it:
f = np.vectorize(lambda x, y, z: y * z * np.exp(x))
result = [f(*i) for i in np.column_stack((list1, list2, list3))]
result
#[array([ 14.7781122 , 9. , 794.77084701, 0. ]),
# array([ 180.76983231, 2133.96259331, 6812.16400281, 0. , 0. ])]

Related

Replacing row-cells by colum-index with different values for multiple rows with numpy

The title is probably a bit abstract but I'm seeking an elegant/optemized solution for the following problem:
I have a 2d numpy array, x:
x = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[10,11,12]])
and the following two 1d arrays:
y = np.array([1,2,3,4])
i = np.array([2,0,2,1])
the goal is to replace the value with index, in, in each row, n, with the corresponding, xn.
so x results in something like this:
Array([[1,2,2],
[2,5,6],
[7,8,3],
[10,4,12]])
This is done easily but messy with a for loop:
for n in range(len(x)):
x[n][i[n]] = y[n]
It works but I was wondering if there was an numpy function which would optimize this process in the case of x having way bigger dimensions, eg. (1000,3).
I'm trying to optimize this as much as possible because the function containing the operation is called multiple times in a loop as well.

Parsing a very large array with list comprehension is slow

I have an ultra large list of numerical values in numpy.float64 format, and I want to convert each value, to 0.0 if there's an inf value, and parse the rest the elements to simple float.
This is my code, which works perfectly:
# Values in numpy.float64 format.
original_values = [np.float64("Inf"), np.float64(0.02345), np.float64(0.2334)]
# Convert them
parsed_values = [0.0 if x == float("inf") else float(x) for x in original_values]
But this is slow. Is there any way to faster this code? Using any magic with map or numpy (I have no experience with these libraries)?

Hey~ you probably are asking how could you do it faster with numpy, the quick answer is to turn the list into a numpy array and do it the numpy way:
import numpy as np
original_values = [np.float64("Inf"), ..., np.float64(0.2334)]
arr = np.array(original_values)
arr[arr == np.inf] = 0
where arr == np.inf returns another array that looks like array([ True, ..., False]) and can be used to select indices in arr in the way I showed.
Hope it helps.
I tested a bit, and it should be fast enough:
# Create a huge array
arr = np.random.random(1000000000)
idx = np.random.randint(0, high=1000000000, size=1000000)
arr[idx] = np.inf
# Time the replacement
def replace_inf_with_0(arr=arr):
arr[arr == np.inf] = 0
timeit.Timer(replace_inf_with_0).timeit(number=1)
The output says it takes 1.5 seconds to turn all 1,000,000 infs into 0s in a 1,000,000,000-element array.
#Avión used arr.tolist() in the end to convert it back to a list for MongoDB, which should be the common way. I tried with the billion-sized array, and the conversion took about 30 seconds, while creating the billion-sized array took less than 10 sec. So, feel free to recommend more efficient methods.

Efficient way to assign list elements to numpy array

I have some numpy array objects in a list that I want to combine to a single numpy array. What is an efficient way to do this? The code below does not work since it puts a list into a numpy array...
import numpy as np
C = [np.array([1,2,3]), np.array([4,5,6]), np.array([7,8,9])]
M = np.zeros((1,3*3))
M[0] = C ## THIS THROWS AN ERROR

Use the following code
print(np.append(C,[]))
[1. 2. 3. 4. 5. 6. 7. 8. 9.]

Vectorization - how to append array without loop for

I have the following code:
x = range(100)
M = len(x)
sample=np.zeros((M,41632))
for i in range(M):
lista=np.load('sample'+str(i)+'.npy')
for j in range(41632):
sample[i,j]=np.array(lista[j])
print i
to create an array made of sample_i numpy arrays.
sample0, sample1, sample3, etc. are numpy arrays and my expected output is a Mx41632 array like this:
sample = [[sample0],[sample1],[sample2],...]
How can I compact and make more quick this operation without loop for? M can reach also 1 million.
Or, how can I append my sample array if the starting point is, for example, 1000 instead of 0?
Thanks in advance

Initial load
You can make your code a lot faster by avoiding the inner loop and not initialising sample to zeros.
x = range(100)
M = len(x)
sample = np.empty((M, 41632))
for i in range(M):
sample[i, :] = np.load('sample'+str(i)+'.npy')
In my tests this took the reading code from 3 seconds to 60 miliseconds!
Adding rows
In general it is very slow to change the size of a numpy array. You can append a row once you have loaded the data in this way:
sample = np.insert(sample, len(sample), newrow, axis=0)
but this is almost never what you want to do, because it is so slow.
Better storage: HDF5
Also if M is very large you will probably start running out of memory.
I recommend that you have a look at PyTables which will allow you to store your sample results in one HDF5 file and manipulate the data without loading it into memory. This will in general be a lot faster than the .npy files you are using now.

It is quite simple with numpy. Consider this example:
import numpy as np
l = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
#create an array with 4 rows and 3 columns
arr = np.zeros([4,3])
arr[:,:] = l
You can also insert rows or columns separately:
#insert the first row
arr[0,:] = l[0]
You just have to provide that dimensions are the same.

numpy, fill sparse matrix with rows from other matrix

I have trouble figuring out what would be the most efficient way to do the following:
import numpy as np
M = 10
K = 10
ind = np.array([0,1,0,1,0,0,0,1,0,0])
full = np.random.rand(sum(ind),K)
output = np.zeros((M,K))
output[1,:] = full[0,:]
output[3,:] = full[1,:]
output[7,:] = full[2,:]
I want to build output, which is a sparse matrix, whose rows are given in a dense matrix (full) and the row indices are specified through a binary vector.
Ideally, I want to avoid a for-loop. Is that possible? If not, I'm looking for the most efficient way to for-loop this.
I need to perform this operation quite a few times. ind and full will keep changing, hence I've just provided some exemplar values for illustration.
I expect ind to be pretty sparse (at most 10% ones), and both M and K to be large numbers (10e2 - 10e3). Ultimately, I might need to perform this operation in pytorch, but some decent procedure for numpy, would already get me quite far.
Please also help me find a more appropriate title for the question, if you have one or more appropriate categories for this question.
Many thanks,
Max

output[ind.astype(bool)] = full
By converting the integer values in ind to boolean values, you can do boolean indexing to select the rows in output that you want to populate with values in full.
example with a 4x4 array:
M = 4
K = 4
ind = np.array([0,1,0,1])
full = np.random.rand(sum(ind),K)
output = np.zeros((M,K))
output[ind.astype(bool)] = full
print(output)
[[ 0. 0. 0. 0. ]
[ 0.32434109 0.11970721 0.57156261 0.35839647]
[ 0. 0. 0. 0. ]
[ 0.66038644 0.00725318 0.68902177 0.77145089]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient operation on numpy arrays contain rows with different size - python

This should do it: f = np.vectorize(lambda x, y, z: y * z * np.exp(x)) result = [f(*i) for i in np.column_stack((list1, list2, list3))] result #[array([ 14.7781122 , 9. , 794.77084701, 0. ]), # array([ 180.76983231, 2133.96259331, 6812.16400281, 0. , 0. ])]

Related

Replacing row-cells by colum-index with different values for multiple rows with numpy

Parsing a very large array with list comprehension is slow

Efficient way to assign list elements to numpy array

Vectorization - how to append array without loop for

numpy, fill sparse matrix with rows from other matrix

Categories

Resources