I've seen questions adjacent to this answered a number of times, but I'm really, really new to Python, and can't seem to get those answers to work for me...
I'm trying to access every row in an np array, where both columns have values greater than 1.
So, if x is my original array, and x has 500 rows and 2 columns, I want to know which rows, of those 500, contain 2 values > 1.
I've tried a bunch of solutions, but the following two seem the closest:
Test1 = x[(x[:,0:1] > 1) & (x[:,1:2] > 1)]
# Where the first condition should look for values greater than 1 in the first column, and the second condition should look for values greater than 1 in the second column.
Test2 = np.where(x[:,0:1] > 1 & x[:,1:2] > 1)
Any help would be greatly appreciated! Thanks so much!
I have a homework assignment to extract a 2-dimensional numpy array out of another 2-dimensional np array by choosing specific columns by condition (not by range).
So I have an array A with shape (3, 50000). I am trying to get a new array with shape (3, x) for some x < 50000 with the original columns ofAthat satisfy the third cell in the column is-0.4 < z < 0.1`.
For example if:
A = [[1,2,3],[2,0.5,0],[9,-2,-0.2],[0,0,0.5]]
I wish to have back:
B = [[2,0.5,0],[9,-2,-0.2]
I have tried to make a bool 1 rank array that holds true on the columns I want, and to some how combine between the two. The problem it's output is 1 rank array which is not what I am looking for. And I got some ValueErrors..
bool_idx = (-0.4 < x_y_z[2] < 0.1)
This code made some troubles:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I can do it with some loops but NumPy got so many beautiful function I am sure I am missing something here..
In Python, the expression -0.4 < x_y_z[2] < 0.1 is roughly equivalent to -0.4 < x_y_z[2] and x_y_z[2] < 0.1. The and operator decides the truth value of each part of the expression by converting it into a bool. Unlike Python lists and tuples, numpy arrays do not support the conversion.
The correct way to specify the condition is with bitwise & (which is unambiguous and non-short-circuiting), rather than the implicit and (which short circuits and is ambiguous in this case):
condition = ((x_y_z[2, :] > - 0.4) & (x_y_z[2, :] < 0.1))
condition is a boolean mask that selects the columns you want. You can select the rows with a simple slice:
selection = x_y_z[:, condition]
I have three DataFrames that are all the same shape ~(1,000, 10,000).
original has ~20-100 non-zero values per row - very sparse
input is a copy of original, with 10 random non-zero values per row changed to zero
output is populated completely with non-zero values
I am now attempting to compare original and output only in the positions where input and output are different (i.e. just in the 10 randomly chosen positions)
Firstly, I create a df of only these elements of original with everything else set to zero:
maskedOriginal = original.where(original != input, other=0)
This is created in seconds. I then attempt to do the same for output:
maskedOutput = output.where(original != input, other=0)
However, since this is now working with 3 DataFrames, it is far too slow - I haven't even got a result after a couple of minutes. Is there any more suitable way to do this?
Use numpy.where with DataFrame contructor:
arr = original.values
maskedOriginal = pd.DataFrame(np.where(arr != input, arr, 0),
index=original.index,
columns=original.columns)
I want to set a column in numpy array to zero at different times, in other words, I have numpy array M with size 5000x500. When I enter shape command the result is (5000,500), I think 5000 are rows and 500 are columns
shape(M)
(5000,500)
But the problem when I want to access one column like first column
Mcol=M[:][0]
Then I check by shape again with new matrix Mcol
shape(Mcol)
(500,)
I expected the results will be (5000,) as the first has 5000 rows. Even when changed the operation the result was the same
shape(M)
(5000,500)
Mcol=M[0][:]
shape(Mcol)
(500,)
Any help please in explaining what happens in my code and if the following operation is right to set one column to zero
M[:][0]=0
You're doing this:
M[:][0] = 0
But you should be doing this:
M[:,0] = 0
The first one is wrong because M[:] just gives you the entire array, like M. Then [0] gives you the first row.
Similarly, M[0][:] gives you the first row as well, because again [:] has no effect.
I am using Numeric Python. Unfortunately, NumPy is not an option. If I have multiple arrays, such as:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
How do I return an array that represents the element-wise median of arrays a,b and c?...such as,
array(([5,8,3],[5,5,6],[5,2,3]))
And then looking at a more general situation: Given n number of arrays, how do I find the percentiles of each element? For example, return an array that represents the 30th percentile of 10 arrays. Thank you very much for your help!
Combine your stack of 2-D arrays into one 3-D array, d = Numeric.array([a, b, c]) and then sort on the third dimension. Afterwards, the successive 2-D planes will be rank order so you can extract planes for the low, high, quartiles, percentiles, or median.
Well, I'm not versed in Numeric, but I'll just start with a naive solution and see if we can make it any better.
To get the 30th percentile of list foo let x=0.3, sort the list, and pick the the element at foo[int(len(foo)*x)]
For your data, you want to put it in a matrix, transpose it, sort each row, and get the median of each row.
A matrix in Numeric (just like numpy) is an array with two dimensions.
I think that bar = Numeric.array(a,b,c) would make Array you want, and then you could get the nth column with 'bar[:,n]' if Numeric has the same slicing techniques as Numpy.
foo = sorted(bar[:,n])
foo[int(len(foo)*x)]
I hope that helps you.
Putting Raymond Hettinger's description into python:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
d = Numeric.array([a, b, c])
d.sort(axis=0)
Since there are n=3 input matrii so the median would be that of the middle one, the one indexed by one,
print d[n//2]
[[5 8 3]
[5 5 6]
[5 2 3]]
And if you had 4 input matrii, you would have to get the mean-elements of d[1] and d[2].