At the most basic I have the following dataframe:
a = {'possibility' : np.array([1,2,3])}
b = {'possibility' : np.array([4,5,6])}
df = pd.DataFrame([a,b])
This gives me a dataframe of size 2x1:
like so:
row 1: np.array([1,2,3])
row 2: np.array([4,5,6])
I have another vector of length 2. Like so:
[1,2]
These represent the index I want from each row.
So if I have [1,2] I want: from row 1: 2, and from row 2: 6.
Ideally, my output is [2,6] in a vector form, of length 2.
Is this possible? I can easily run through a for loop, but am looking for FAST approaches, ideally vectors approaches since it is already in pandas/numpy.
For actual use case approximations, I am looking to make this work in the 300k-400k row ranges. And need to run it in optimization problems (hence the fast part)
You could transform to a multi-dimensional numpy array and take_along_axis:
v = np.array([1,2])
a = np.vstack(df['possibility'])
np.take_along_axis(a.T, v[None], axis=0)[0]
output: array([2, 6])
Related
I have a matrix D and sort every row with the indicies (argsort). I'm trying to set values of some_matrix at indicies 1-5 in np.argsort(D) to 1. What I have below does what I need, but is there a way to do this in one line with numpy arrays?
some_matrix = np.zeros((n,n))
for i in range(n):
some_matrix[i,np.argsort(D)[i,1:5]] = 1
Firstly, note that you don't need a full sort, only a partition of elements 1-4 (I assume you need elements 1,2,3,4, because that's what your code does). So let's use that:
#assuming you want indices 1,2,3,4 of the sorted array, in any order
indices = np.argpartition(D, (1, 4), axis=1)[:, 1:5]
Now we've got indices of D with the first, second, third and fourth smallest elements (this is similar to indices = np.argsort(D, 1)[:, 1:5], but will be faster for large arrays). All we need is to set these elements to 1
np.put_along_axis(some_matrix, indices, 1, axis=1)
I have 2 Numpy 2D arrays, both with the same shape. One of the arrays is boolean, the other is an array of numbers. For example, I may have arrays of the following form:
import numpy as np
bool_array = np.array([[False,False,False,False,False],[False,False,True,True,True],[False,True,True,False,False]])
num_array = np.array([[2.5,3.4,9,1.2,7.5],[9.01,4.6,2,4,1],[7.3,4.5,6.3,10,2]])
In any row of bool_array where there are one or more True values next to each other, I want to get the corresponding values in num_array and determine the location of the minimum of these values. In the example I gave, there are a string of True values in row 2, columns 3, 4, and 5 in bool_array and another string in row 3, columns 2 and 3. The corresponding values in num_array in each row are 2, 4, and 1 in row 2 and 4.5 and 6.3 in row 3. The minimum in row 2 is 1, and its location in the array is row 2, column 5. The minimum in row 3 is 4.5, and its location in the array is row 3, column 2. I want to get both of these locations.
What is the most efficient way to do this?
This should be fast, but I don't know for certain if it is the most efficient way possible. Generally you try to avoid using loops when dealing with numpy arrays and rather try to use numpy ufuncs that are designed specifically to perform the loops internally in machine code that is significantly faster than interpreted python.
import numpy as np
bool_array = np.array([[False,False,False,False,False],[False,False,True,True,True]])
num_array = np.array([[2.5,3.4,9,1.2,7.5],[9.01,4.6,2,4,1]])
minimum = num_array[bool_array].min()
index = np.argwhere((num_array == minimum) & bool_array)
Can someone please help me out? I am trying to get the minimum value of each row and of each column of this matrix
matrix =[[12,34,28,16],
[13,32,36,12],
[15,32,32,14],
[11,33,36,10]]
So for example: I would want my program to print out that 12 is the minimum value of row 1 and so on.
Let's repeat the task statement: "get the minimum value of each row and of each column of this matrix".
Okay, so, if the matrix has n rows, you should get n minimum values, one for each row. Sounds interesting, doesn't it? So, the code'll look like this:
result1 = [<something> for row in matrix]
Well, what do you need to do with each row? Right, find the minimum value, which is super easy:
result1 = [min(row) for row in matrix]
As a result, you'll get a list of n values, just as expected.
Wait, by now we've only found the minimums for each row, but not for each column, so let's do this as well!
Given that you're using Python 3.x, you can do some pretty amazing stuff. For example, you can loop over columns easily:
result2 = [min(column) for column in zip(*matrix)] # notice the asterisk!
The asterisk in zip(*matrix) makes each row of matrix a separate argument of zip's, like this:
zip(matrix[0], matrix[1], matrix[2], matrix[3])
This doesn't look very readable and is dependent on the number of rows in matrix (basically, you'll have to hard-code them), and the asterisk lets you write much cleaner code.
zip returns tuples, and the ith tuple contains the ith values of all the rows, so these tuples are actually the columns of the given matrix.
Now, you may find this code a bit ugly, you may want to write the same thing in a more concise way. Sure enough, you can use some functional programming magic:
result1 = list(map(min, matrix))
result2 = list(map(min, zip(*matrix)))
These two approaches are absolutely equivalent.
Use numpy.
>>> import numpy as np
>>> matrix =[[12,34,28,16],
... [13,32,36,12],
... [15,32,32,14],
... [11,33,36,10]]
>>> np.min(matrix, axis=1) # computes minimum in each row
array([12, 12, 14, 10])
>>> np.min(matrix, axis=0) # computes minimum in each column
array([11, 32, 28, 10])
def maxvalues():
for n in range(1,15):
dummy=[]
for k in range(len(MotionsAndMoorings)):
dummy.append(MotionsAndMoorings[k][n])
max(dummy)
L = [x + [max(dummy)]] ## to be corrected (adding columns with value max(dummy))
## suggest code to add new row to L and for next function call, it should save values here.
i have an array of size (k x n) and i need to pick the max values of the first column in that array. Please suggest if there is a simpler way other than what i tried? and my main aim is to append it to L in columns rather than rows. If i just append, it is adding values at the end. I would like to this to be done in columns for row 0 in L, because i'll call this function again and add a new row to L and do the same. Please suggest.
General suggestions for your code
First of all it's not very handy to access globals in a function. It works but it's not considered good style. So instead of using:
def maxvalues():
do_something_with(MotionsAndMoorings)
you should do it with an argument:
def maxvalues(array):
do_something_with(array)
MotionsAndMoorings = something
maxvalues(MotionsAndMoorings) # pass it to the function.
The next strange this is you seem to exlude the first row of your array:
for n in range(1,15):
I think that's unintended. The first element of a list has the index 0 and not 1. So I guess you wanted to write:
for n in range(0,15):
or even better for arbitary lengths:
for n in range(len(array[0])): # I chose the first row length here not the number of columns
Alternatives to your iterations
But this would not be very intuitive because the max function already implements some very nice keyword (the key) so you don't need to iterate over the whole array:
import operator
column = 2
max(array, key=operator.itemgetter(column))[column]
this will return the row where the i-th element is maximal (you just define your wanted column as this element). But the maximum will return the whole row so you need to extract just the i-th element.
So to get a list of all your maximums for each column you could do:
[max(array, key=operator.itemgetter(column))[column] for column in range(len(array[0]))]
For your L I'm not sure what this is but for that you should probably also pass it as argument to the function:
def maxvalues(array, L): # another argument here
but since I don't know what x and L are supposed to be I'll not go further into that. But it looks like you want to make the columns of MotionsAndMoorings to rows and the rows to columns. If so you can just do it with:
dummy = [[MotionsAndMoorings[j][i] for j in range(len(MotionsAndMoorings))] for i in range(len(MotionsAndMoorings[0]))]
that's a list comprehension that converts a list like:
[[1, 2, 3], [4, 5, 6], [0, 2, 10], [0, 2, 10]]
to an "inverted" column/row list:
[[1, 4, 0, 0], [2, 5, 2, 2], [3, 6, 10, 10]]
Alternative packages
But like roadrunner66 already said sometimes it's easiest to use a library like numpy or pandas that already has very advanced and fast functions that do exactly what you want and are very easy to use.
For example you convert a python list to a numpy array simple by:
import numpy as np
Motions_numpy = np.array(MotionsAndMoorings)
you get the maximum of the columns by using:
maximums_columns = np.max(Motions_numpy, axis=0)
you don't even need to convert it to a np.array to use np.max or transpose it (make rows to columns and the colums to rows):
transposed = np.transpose(MotionsAndMoorings)
I hope this answer is not to unstructured. Some parts are suggestions to your function and some are alternatives. You should pick the parts that you need and if you have any trouble with it, just leave a comment or ask another question. :-)
An example with a random input array, showing that you can take the max in either axis easily with one command.
import numpy as np
aa= np.random.random([4,3])
print aa
print
print np.max(aa,axis=0)
print
print np.max(aa,axis=1)
Output:
[[ 0.51972266 0.35930957 0.60381998]
[ 0.34577217 0.27908173 0.52146593]
[ 0.12101346 0.52268843 0.41704152]
[ 0.24181773 0.40747905 0.14980534]]
[ 0.51972266 0.52268843 0.60381998]
[ 0.60381998 0.52146593 0.52268843 0.40747905]
Does anyone know how to combine integer indices in numpy? Specifically, I've got the results of a few np.wheres and I would like to extract the elements that are common between them.
For context, I am trying to populate a large 3d array with the number of elements that are between boundary values of each cell, i.e. I have records of individual events including their time, latitude and longitude. I want to grid this into a 3D frequency matrix, where the dimensions are time, lat and lon.
I could loop round the array elements doing an np.where(timeCondition & latCondition & lonCondition), population with the length of the where result, but I figured this would be very inefficient as you would have to repeat a lot of the wheres.
What would be better is to just have a list of wheres for each of the cells in each dimension, and then loop through the logically combining them?
as #ali_m said, use bitwise and should be much faster, but to answer your question:
call ravel_multi_index() to convert the multi-dim index into 1-dim index.
call intersect1d() to get the index that in both condition.
call unravel_index() to convert the 1-dim index back to multi-dim index.
Here is the code:
import numpy as np
a = np.random.rand(10, 20, 30)
idx1 = np.where(a>0.2)
idx2 = np.where(a<0.4)
ridx1 = np.ravel_multi_index(idx1, a.shape)
ridx2 = np.ravel_multi_index(idx2, a.shape)
ridx = np.intersect1d(ridx1, ridx2)
idx = np.unravel_index(ridx, a.shape)
np.allclose(a[idx], a[(a>0.2) & (a<0.4)])
or you can use ridx directly:
a.ravel()[ridx]