ValueError shape mismatch

ValueError shape mismatch - python

I am learning pandas and numpy on python. I was trying to apply conditional statements to my DataFrame and I encountered a ValueError due to shape mismatch. Please kindly help me to understand why, thank you!
Here is a look of my simple dataset:
I was trying to filter the DataFrame if the following conditions are met:
area > 8 and area < 10
Here is the result that I have received:
The results are fine if I print the condition out individually and I couldn't understand why can't the matrix converge to form a single DataFrame.

The problem is here: brics[brics['area'] > 8] and brics[brics['area'] < 10].
The inner expression in both cases produces a 5-element vector. Both of them have the same shape. The first has 4 trues and 1 false, the second has 3 trues and 2 falses. But when you do brics[xxx], that selects a subset. brics[xxx] where xxx has 4 trues produces a (4,4) matrix. brics[xxx] where xxx has 3 trues produces a (3,3) matrix. You can't combine those.
The KEY is that you want to combine these BEFORE you use them as indexes:
x = brics[ np.logical_and( brics['area'] > 8, brics['area'] < 10 ) ]
And by the way, you made this much harder for us than it should have been because you posted an image instead of code we could cut and paste.

Related

How to include two conditions using np.where, when array two columns?

I've seen questions adjacent to this answered a number of times, but I'm really, really new to Python, and can't seem to get those answers to work for me...
I'm trying to access every row in an np array, where both columns have values greater than 1.
So, if x is my original array, and x has 500 rows and 2 columns, I want to know which rows, of those 500, contain 2 values > 1.
I've tried a bunch of solutions, but the following two seem the closest:
Test1 = x[(x[:,0:1] > 1) & (x[:,1:2] > 1)]
# Where the first condition should look for values greater than 1 in the first column, and the second condition should look for values greater than 1 in the second column.
Test2 = np.where(x[:,0:1] > 1 & x[:,1:2] > 1)
Any help would be greatly appreciated! Thanks so much!

Extracting specific columns in numpy array by condition

I have a homework assignment to extract a 2-dimensional numpy array out of another 2-dimensional np array by choosing specific columns by condition (not by range).
So I have an array A with shape (3, 50000). I am trying to get a new array with shape (3, x) for some x < 50000 with the original columns ofAthat satisfy the third cell in the column is-0.4 < z < 0.1`.
For example if:
A = [[1,2,3],[2,0.5,0],[9,-2,-0.2],[0,0,0.5]]
I wish to have back:
B = [[2,0.5,0],[9,-2,-0.2]
I have tried to make a bool 1 rank array that holds true on the columns I want, and to some how combine between the two. The problem it's output is 1 rank array which is not what I am looking for. And I got some ValueErrors..
bool_idx = (-0.4 < x_y_z[2] < 0.1)
This code made some troubles:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I can do it with some loops but NumPy got so many beautiful function I am sure I am missing something here..

In Python, the expression -0.4 < x_y_z[2] < 0.1 is roughly equivalent to -0.4 < x_y_z[2] and x_y_z[2] < 0.1. The and operator decides the truth value of each part of the expression by converting it into a bool. Unlike Python lists and tuples, numpy arrays do not support the conversion.
The correct way to specify the condition is with bitwise & (which is unambiguous and non-short-circuiting), rather than the implicit and (which short circuits and is ambiguous in this case):
condition = ((x_y_z[2, :] > - 0.4) & (x_y_z[2, :] < 0.1))
condition is a boolean mask that selects the columns you want. You can select the rows with a simple slice:
selection = x_y_z[:, condition]

df.where based on the difference of two other DataFrames

I have three DataFrames that are all the same shape ~(1,000, 10,000).
original has ~20-100 non-zero values per row - very sparse
input is a copy of original, with 10 random non-zero values per row changed to zero
output is populated completely with non-zero values
I am now attempting to compare original and output only in the positions where input and output are different (i.e. just in the 10 randomly chosen positions)
Firstly, I create a df of only these elements of original with everything else set to zero:
maskedOriginal = original.where(original != input, other=0)
This is created in seconds. I then attempt to do the same for output:
maskedOutput = output.where(original != input, other=0)
However, since this is now working with 3 DataFrames, it is far too slow - I haven't even got a result after a couple of minutes. Is there any more suitable way to do this?

Use numpy.where with DataFrame contructor:
arr = original.values
maskedOriginal = pd.DataFrame(np.where(arr != input, arr, 0),
index=original.index,
columns=original.columns)

Set a column in numpy array to zero

I want to set a column in numpy array to zero at different times, in other words, I have numpy array M with size 5000x500. When I enter shape command the result is (5000,500), I think 5000 are rows and 500 are columns
shape(M)
(5000,500)
But the problem when I want to access one column like first column
Mcol=M[:][0]
Then I check by shape again with new matrix Mcol
shape(Mcol)
(500,)
I expected the results will be (5000,) as the first has 5000 rows. Even when changed the operation the result was the same
shape(M)
(5000,500)
Mcol=M[0][:]
shape(Mcol)
(500,)
Any help please in explaining what happens in my code and if the following operation is right to set one column to zero
M[:][0]=0

You're doing this:
M[:][0] = 0
But you should be doing this:
M[:,0] = 0
The first one is wrong because M[:] just gives you the entire array, like M. Then [0] gives you the first row.
Similarly, M[0][:] gives you the first row as well, because again [:] has no effect.

Element-wise median and percentiles of arrays with Numeric Python

I am using Numeric Python. Unfortunately, NumPy is not an option. If I have multiple arrays, such as:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
How do I return an array that represents the element-wise median of arrays a,b and c?...such as,
array(([5,8,3],[5,5,6],[5,2,3]))
And then looking at a more general situation: Given n number of arrays, how do I find the percentiles of each element? For example, return an array that represents the 30th percentile of 10 arrays. Thank you very much for your help!

Combine your stack of 2-D arrays into one 3-D array, d = Numeric.array([a, b, c]) and then sort on the third dimension. Afterwards, the successive 2-D planes will be rank order so you can extract planes for the low, high, quartiles, percentiles, or median.

Well, I'm not versed in Numeric, but I'll just start with a naive solution and see if we can make it any better.
To get the 30th percentile of list foo let x=0.3, sort the list, and pick the the element at foo[int(len(foo)*x)]
For your data, you want to put it in a matrix, transpose it, sort each row, and get the median of each row.
A matrix in Numeric (just like numpy) is an array with two dimensions.
I think that bar = Numeric.array(a,b,c) would make Array you want, and then you could get the nth column with 'bar[:,n]' if Numeric has the same slicing techniques as Numpy.
foo = sorted(bar[:,n])
foo[int(len(foo)*x)]
I hope that helps you.

Putting Raymond Hettinger's description into python:
a=Numeric.array(([1,2,3],[4,5,6],[7,8,9]))
b=Numeric.array(([9,8,7],[6,5,4],[3,2,1]))
c=Numeric.array(([5,9,1],[5,4,7],[5,2,3]))
d = Numeric.array([a, b, c])
d.sort(axis=0)
Since there are n=3 input matrii so the median would be that of the middle one, the one indexed by one,
print d[n//2]
[[5 8 3]
[5 5 6]
[5 2 3]]
And if you had 4 input matrii, you would have to get the mean-elements of d[1] and d[2].

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ValueError shape mismatch - python

Related

How to include two conditions using np.where, when array two columns?

Extracting specific columns in numpy array by condition

df.where based on the difference of two other DataFrames

Set a column in numpy array to zero

Element-wise median and percentiles of arrays with Numeric Python

Categories

Resources