Numpy: Find the maximum length of arrays inside another array - python

I have a numpy array as the following:
import numpy as np
arr = np.array([np.array([1]),np.array([1,2]),np.array([1,2,3]),np.array([1,3,4,2,4,2])])
I want a nice numpy function, which gives me the maximum length of the arrays inside my arr array.
So I need a numpy function, which return for this example 6.
This is possible with iteration, but I am looking for a nicer way, perhaps even without map()
Any function inside tensorflow or keras would also be possible to use.

We could do:
max(map(len, arr))
#6

Another simple trick is to use max() function with key argument to return the array with maximum length.
len(max(arr, key = len))

Another way would be to use the keras.preprocessing.sequence.pad_sequences() function.
It pads the sequences to the lenght of the maximum, but in my opinion it creates a high memory usage and depending on the array it might take some time. However, this would be a way without looping:
len(keras.preprocessing.sequence.pad_sequences(arr)[0])

Related

Piecewise Operation on List of Numpy Arrays

My question is, can I make a function or variable that can perform an on operation or numpy method on each np.array element within a list in a more succinct way than what I have below (preferably by just calling one function or variable)?
Generating the list of arrays:
import numpy as np
array_list = [np.random.rand(3,3) for x in range(5)]
array_list
Current Technique of operating on each element:
My current method (as seen below) involves unpacking it and doing something to it:
[arr.std() for arr in array_list]
[arr + 2 for arr in array_list]
Goal:
My hope it to get something that could perform the operations above by simply typing:
x.std()
or
x +2
Yes - use an actual NumPy array and perform your operations over the desired axes, instead of having them stuffed in a list.
actual_array = np.array(array_list)
actual_array.std(axis=(1, 2))
# array([0.15792346, 0.25781021, 0.27554279, 0.2693581 , 0.28742179])
If you generally wanted all axes except the first, this could be something like tuple(range(1, actual_array.ndim)) instead of explicitly specifying the tuple.

Change array output to one without brackets

I changed a sparse dictionary into an array with (np.asarray). Then, I wrote a function that used that array to return the answer of a formula. However, I did that in a way the output includes the double brackets. Let's say the output is now:
[[7.58939191]]
but should be:
7.58939191
Can someone say how I can change this easily? Or do I have to share my function for this?
One way could be item method:
x.item(0)
See the documentation:
Copy an element of an array to a standard Python scalar and return it.
You can turn it into a numpy array, then compress the dimension:
import numpy as np
a = np.squeeze(np.asarray(a))
Then you can use a just like a number, for example:
b = a + 1

How can I take every combination of vectors for the numpy array in the equation shown

numpy.dot(nparray[1],nparray[2])/((np.sum(nparray[1]))*(np.sum(nparray[2]))
I want to implement this so that it does it for all of the vectors in my numpy array. How can I go about doing this? I'm assuming that it'll use itertools.combinations but after that I'm lost. In the equation above, I'm using the first and the second vector but I'd like to do that for all the combinations of vectors. Is it possible to have this labelled?
edit*
If you have a way of implementing this without itertools, that works too it seems from the comments below that isn't the method I should be using.
You can use a list of indexes as a proxy and itertools as follows
import numpy as np
import itertools
N = 5 # size of your vector
M = 5 # number of vectors
a = np.random.rand(M,N)
index = range(M) # using an index to be a proxy to be able to use itertools
for i, j in itertools.combinations(index, 2):
print(np.dot(a[:,i], a[:,j]) # we're accessing the columns of a
Now, instead of printing inside the for loop you call your function (which you should probably define as a proper python function).

Whats the best way to iterate over multidimensional array and tracking/doing operations on iteration index

I need to do a lot of operations on multidimensional numpy arrays and therefor i am experimenting towards the best approach on this.
So let's say i have an array like this:
A = np.random.uniform(0, 1, size = 100).reshape(20, 5)
My goal is to get the maximum value numpy.amax() of each entry and it's index. So may A[0] be something like this:
A[0] = [ 0.64570441 0.31781716 0.07268926 0.84183753 0.72194227]
I want to get the maximum and the index of that maximum [0.84183753][0, 3]. No specific representation of the results needed, just an example. I even need the horizontal index only.
I tried using numpy's nditer object:
A_it = np.nditer(A, flags=['multi_index'], op_flags=['readwrite'])
while not A_it.finished:
print(np.amax(A_it.value))
print(A_it.multi_index[1])
A_it.iternext()
I can access every element of the array and its index over the iterations that way but i don't seem to be able to bring the numpy.amax() function in each element and the index together syntax wise. Can i even do it using nditerobject?
Also, in Numpy: Beginner nditer i read that using nditer or using iterations in numpy usually means that i am doing something wrong. But i can't find another convenient way to achieve my goal here without any iterations. Obviously i am a total beginner in numpy and python in general, so any keyword to search for or hint is very much appreciated.
A major problem with nditer is that it iterates over each element, not each row. It's best used as a stepping stone toward a Cython or C rewrite of your code.
If you just want the maximum for each row of your array, a simple iteration or list comprehension will do nicely.
for row in A: print(np.amax(row))
or to turn it back into an array:
np.array([np.amax(row) for row in A])
But you can get the same values by giving amax an axis parameter
np.amax(A,axis=1)
np.argmax identifies the location of the maximum.
np.argmax(A,axis=1)
With the argmax values you could then select the max values as well,
ind=np.argmax(A,axis=1)
A[np.arange(A.shape[0]),ind]
(speed's about the same as repeating the np.amax call).

Is there any way to use the "out" argument of a Numpy function when modifying an array in place?

If I want to get the dot product of two arrays, I can get a performance boost by specifying an array to store the output in instead of creating a new array (if I am performing this operation many times)
import numpy as np
a = np.array([[1.0,2.0],[3.0,4.0]])
b = np.array([[2.0,2.0],[2.0,2.0]])
out = np.empty([2,2])
np.dot(a,b, out = out)
Is there any way I can take advantage of this feature if I need to modify an array in place? For instance, if I want:
out = np.array([[3.0,3.0],[3.0,3.0]])
out *= np.dot(a,b)
Yes, you can use the out argument to modify an array (e.g. array=np.ones(10)) in-place, e.g. np.multiply(array, 3, out=array).
You can even use in-place operator syntax, e.g. array *= 2.
To confirm if the array was updated in-place, you can check the memory address array.ctypes.data before and after the modification.

Categories