Getting Around "ValueError: operands could not be broadcast together" - python

The code below yields the following value error.
ValueError: operands could not be broadcast together with shapes (8,8) (64,)
It first arose when I expanded the "training" data set from 10 images to 100. The interpreter seems to be telling me that I can't perform any coordinate-wise operations on these data points because one of the coordinate pairs is missing a value. I can't argue with that. Unfortunately, my work arounds haven't exactly worked out. I attempted to insert an if condition followed by a continue statement (i.e., if this specific coordinate comes up, it should continue from the top of the loop). The interpreter didn't like this idea and muttered something about the truth of that statement not being as cut and dry as I thought. It suggested I try a.any() or a.all(). I checked out examples of both, and tried placing the problematic coordinate pair in the parenthesis and in place of the "a." Both approaches got me nowhere. I'm unaware of any Python functions similar to the functions I would use in C to exclude inputs that don't meet specific criteria. Other answers pertaining to similar problems recommend changing the math one uses, but I was told that this is how I am to proceed, so I'm looking at it as an error handling problem.
Does anyone have any insight concerning how one might handle this issue? Any thoughts would be greatly appreciated!
Here's the code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
digits = datasets.load_digits()
#print the 0th image in the image database as an integer matrix
print(digits.images[0])
#plot the 0th image in the database assigning each pixel an intensity of black
plt.figure()
plt.imshow(digits.images[0], cmap = plt.cm.gray_r, interpolation = 'nearest')
plt.show()
#create training subsets of images and targets(labels)
X_train = digits.images[0:1000]
Y_train = digits.target[0:1000]
#pick a test point from images (345)
X_test = digits.images[345]
#view test data point
plt.figure()
plt.imshow(digits.images[345], cmap = plt.cm.gray_r, interpolation = 'nearest')
plt.show()
#distance
def dist(x, y):
return np.sqrt(np.sum((x - y)**2))
#expand set of test data
num = len(X_train)
no_errors = 0
distance = np.zeros(num)
for j in range(1697, 1797):
X_test = digits.data[j]
for i in range(num):
distance[i] = dist(X_train[i], X_test)
min_index = np.argmin(distance)
if Y_train[min_index] != digits.target[j]:
no_errors += 1
print(no_errors)

You need to show us where the error occurs, and some of the error stack.
Then you need to identify which arrays are causing the problem, and examine their shape. Actually the error tells us that. One operand is a 8x8 2d array. The other has the same number of elements but with a 1d shape. You may have to trace some variables back to your own code.
Just to illustrate the problem:
In [381]: x = np.ones((8,8),int)
In [384]: y = np.arange(64)
In [385]: x*y
...
ValueError: operands could not be broadcast together with shapes (8,8) (64,)
In [386]: x[:] = y
...
ValueError: could not broadcast input array from shape (64) into shape (8,8)
Since the 2 arrays have the same number of elements, a fix likely involves reshaping one or the other:
In [387]: x.ravel() + y
Out[387]:
array([ 1, 2, 3, 4, 5, ... 64])
or x-y.reshape(8,8).
My basic point is, you need to understand what array shapes mean, and how arrays of different shape can be used together. You don't 'get around' the error, you fix the inputs so they are 'broadcasting' compatible.
I don't think problem is with the value of a specific element.
The truth value error occurs when you try to test an array in a if context. if expects a simple True or False, not an array of True/False values.
In [389]: if x>0:print('yes')
....
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Related

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 3

I am getting the error ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 3 and the array at index 1 has size 1 while running the below code.
for i in range(6):
print('current b', current_batch)
current_pred = model.predict(current_batch)[0]
print('current pred', current_pred)
test_predictions.append(current_pred)
print('current batch', current_batch)
print('current batch => ', current_batch[:,1:,:])
current_batch = np.append(current_batch[:,1:,:], [[current_pred]], axis=1)
getting this error
Can anyone please explain me why this is happening.
Thanks,
Basically, Numpy is telling you that the shapes of the concatenated matrices should align. For example, it is possible to concatenate a 3x4 matrix with 3x5 matrix so that we get 3x9 matrix (we added dimension 1).
The problem here is that Numpy is telling you that the axis don't align. In my example, that would be trying to concatenate 3x4 matrix with 10x10 matrix. This is not possible as the shapes are not aligned.
This usually means that the you are trying to concatenate wrong things. If you are sure though, try using np.reshape function, which will change the shape of one of the matrices so that they can be concatenated.
As the traceback shows, np.append is actually using np.concatenate. Did you read (study) the docs for either function? Understand what they say about dimensions?
From the display [[current_pred]], converted to array will be (1,1,1) shape. Do you understand that?
current_batch[:,1:,:] is, as best I can tell from the small image (1,5,3)
You are asking to join on axis 1, which is 1 and 5, ok. But it's saying that the last dimension, axis 2, doesn't match. That 1 does not equal 3. Do you understand that?
List append as you do with test_predictions.append(current_pred) works well in an iteration.
np.append does not work well. Even when it works, it is slow. And here it doesn't work, because you aren't taking sufficient care to match dimensions.

How to stack matrices in a single column table

I am trying to store 20 automatically generated Matrices in a single column Matrix, so this last Matrix would be a 1x20 Matrix.
For this I am using numpy and vstack, but it doesn't work, it Keep on getting the following error:
ValueError: all the input arrays must have same number of dimensions
Even though all the Matrices tham I'm trying to stack together have the same dimensions (881 x 882)
So I'd like to know what is wrong About this or if there is any other way to stack all the Matrices in a way that if one of them is needed I can access easily to that one.
Try to change dimensions with expand and squeeze functions:
y = np.expand_dims(x, axis=0) # dim 20 become 1x20
y = np.squeeze(x, axis=0) # dim 1x20 become 20

Seamlessly solve square linear system that could be 1-dimensional in numpy

I am solving a linear system of equations Ax=b.
It is known that A is square and of full rank, but it is the result of a few matrix multiplications, say A = numpy.dot(C,numpy.dot(D,E)) in which the result can be 1x1 depending on the inputs C,D,E. In that case A is a float.
b is ensured to be a vector, even when it is a 1x1 one.
I am currently doing
A = numpy.dot(C,numpy.dot(D,E))
try:
x = numpy.linalg.solve(A,b)
except:
x = b[0] / A
I searched numpy's documentation and didn't find other alternatives for solve and dot that would accept scalars for the first or output arrays for the second. Actually numpy.linalg.solve requires dimension at least 2. If we were going to produce an A = numpy.array([5]) it would complain too.
Is there some alternative that I missed?
in which the result can be 1x1 depending on the inputs C,D,E. In that case A is a float.
This is not true, it is a 1x1 matrix, as expected
x=np.array([[1,2]])
z=x.dot(x.T) # 1x2 matrix times 2x1
print(z.shape) # (1, 1)
which works just fine with linalg.solve
linalg.solve(z, z) # returns [[1]], as expected
While you could expand the dimensions of A:
A = numpy.atleast_2d(A)
it sounds like A never should have been a float in the first place, and you should instead fix whatever is causing it to be one.

boolean indexing in xarray

I have some arrays with dims 'time', 'lat', 'lon' and some with just 'lat', 'lon'. I often have to do this in order to mask time-dependent data with a 2d (lat-lon) mask:
x.data[:, mask.data] = np.nan
Of course, computations broadcast as expected. If y is 2d lat-lon data, its values are broadcast to all time coordinates in x:
z = x + y
But indexing doesn't broadcast as I'd expect. I'd like to be able to do this, but it raises ValueError: Buffer has wrong number of dimensions:
x[mask] = np.nan
Lastly, it seems that xr.where does broadcast the values of the mask across time coordinates as expected, but you can't set values this way.
x_masked = x.where(mask)
So, is there something I'm missing here that facilitates setting values using a boolean mask that is missing dimensions (and needs to be broadcast)? Is the option I provided at the top really the way to do this (in which case, I might as well just be using standard numpy arrays...)
Edit: this question is still getting upvotes, but it's now much easier - see this answer
Somewhat related question here: Concise way to filter data in xarray
Currently the best approach is a combination of .where and .fillna.
valid = date_by_items.notnull()
positive = date_by_items > 0
positive = positive * 2
result = positive.fillna(0.).where(valid)
result
But changes are coming in xarray that will make this more concise - checkout the GitHub repo if you're interested

numpy einsum with '...'

The code below is meant to conduct a linear coordinate transformation on a set of 3d coordinates. The transformation matrix is A, and the array containing the coordinates is x. The zeroth axis of x runs over the dimensions x, y, z. It can have any arbitrary shape beyond that.
Here's my attempt:
A = np.random.random((3, 3))
x = np.random.random((3, 4, 2))
x_prime = np.einsum('ij,j...->i...', A, x)
The output is:
x_prime = np.einsum('ij,j...->i...', A, x)
ValueError: operand 0 did not have enough dimensions
to match the broadcasting, and couldn't be extended
because einstein sum subscripts were specified at both
the start and end
If I specify the additional subscripts in x explicitly, the error goes away. In other words, the following works:
x_prime = np.einsum('ij,jkl->ikl', A, x)
I'd like x to be able to have any arbitrary number of axes after the zeroth axis, so the workaround I give about is not optimal. I'm actually not sure why the first einsum example is not working. I'm using numpy 1.6.1. Is this a bug, or am I misunderstanding the documentation?
Yep, it's a bug. It was fixed in this pull request: https://github.com/numpy/numpy/pull/4099
This was only merged a month ago, so it'll be a while before it makes it to a stable release.
EDIT: As #hpaulj mentions in the comment, you can work around this limitation by adding an ellipsis even when all indices are specified:
np.einsum('...ij,j...->i...', A, x)

Categories