unable to combine 2d arrays into another 2d array (python) - python

I have two lists,
list_a
list_b
whose shape are [10,50] and [40,50]
and I'm trying to combine them into one [50,50] array, starting with the following code (edited for readability)
array_a=np.array(list_a)
array_b=np.array(list_b)
array_c=np.concatenate(array_a,array_b)
But it keeps giving me an error that says
"TypeError: only length-1 arrays can be converted to Python scalars"
What's the issue here, and how can I fix it? This error isn't very helpful...

np.concatenate expects a tuple as argument, i.e. it should be
array_c=np.concatenate((array_a,array_b))
The first argument is a tuple of an arbitrary number of arrays, the second argument (in your case array_b) tells concatenate along which axis it should operate.

The issue here is that np.concatenate expects an iterable sequence of array-like objects for the first argument. Here it just takes array_a as the first argument. It is taking array_b as the second argument, which specifies which array axis to concatenate along. As this argument needs to be integer-like, it is attempting to convert array_b to an integer, but failing as it contains more than one item. Hence this error message.
To solve it, you need to wrap your two arrays in an iterable such as a tuple, like this:
cc=np.concatenate((array_a,array_b))
This results in both arrays being passed as the first argument to the function. (Edit: Wrapping in a list also works, i.e. concatenate([array_a,array_b]). Haven't tried other forms).
In your example, this will work, as the second argument defaults to 0, which means the arrays can have a different length in the first dimension only (the zeroth-indexed dimension). For you, these lengths are 10 and 40, and the other dimension is 50 for both. If your array dimensions were reversed, so they were now [50,10] and [50,40], you would need to set the axis to the second dimension (index 1) like so:
cc=np.concatenate((array_a,array_b),1)

Related

Mask certain indices for every entry in a batch, when using torch.max()

I am incremently sampling a batch of size torch.Size([n, 8]).
I also have a list valid_indices of length n which contains tuples of indices that are valid for each entry in the batch.
For instance valid_indices[0] may look like this: (0,1,3,4,5,7) , which suggests that indices 2 and 6 should be excluded from the first entry in batch along dim 1.
Particularly I need to exclude these values for when I use torch.max(batch, dim=1, keepdim=True).
Indices to be excluded (if any) may differ from entry to entry within the batch.
Any ideas? Thanks in advance.
I assume that you are getting the good old
IndexError: too many indices for tensor of dimension 1
error when you use your tuple indices directly on the tensor.
At least that was the error that I was able to reproduce when I execute the following line
t[0][valid_idx0]
Where t is a random tensor with size (10,8) and valid_idx0 is a tuple with 4 elements.
However, same line works just fine when you convert your tuple to a list as following
t[0][list(valid_idx0)]
>>> tensor([0.1847, 0.1028, 0.7130, 0.5093])
But when it comes to applying these indices to 2D tensors, things get a bit different, since we need to preserve the structure of our tensor for batch processing.
Therefore, it would be reasonable to convert our indices to mask arrays.
Let's say we have a list of tuples valid_indices at hand. First thing will be converting it to a list of lists.
valid_idx_list = [list(tup) for tup in valid_indices]
Second thing will be converting them to mask arrays.
masks = np.zeros((t.size()))
for i, indices in enumerate(valid_idx_list):
masks[i][indices] = 1
Done. Now we can apply our mask and use the torch.max on the masked tensor.
torch.max(t*masks)
Kindly see the colab notebook that I've used to reproduce the problem.
https://colab.research.google.com/drive/1BhKKgxk3gRwUjM8ilmiqgFvo0sfXMGiK?usp=sharing

python find column size of 2D list passed in as argument

I'm trying to write a function to compute the dot product of two 2D lists passed in as arguments, lets call them x and y.
My idea is to first create a 2D list of zeros of the proper dimensions for the result of the dot product. In order to do so, I need to find the column size of y when computing x * y
dim1 = len(x)
dim2 = len(y[0])
result = [0]*dim1*dim2
The above code was my idea for getting these dimensions, however it fails on the second line due to an error:
dim2 = len(y[0])
TypeError: object of type 'int' has no len()
My python interpreter seems to not like that I am assuming my arguments will be 2D lists. It seems to think it'll be a 1D list. How can I get the column length of the 2D list. I am assuming the 2D lists passed in will be of dimensions NxM, so it should be a clean rectangle shape list/matrix.
I am not able to use numpy for this case.

What is different between x.shape[0] and x.shape in numpy?

What is different between x.shape[0] and x.shape in numpy? I code without [0] then have a bug: "TypeError: arange: scalar arguments expected instead of a tuple.", but when I add [0] in, my code runs completely.
And why i can't type x.shape[1] or x.shape[1000]?
Looking forward to receiving answers from everyone, many thanks!!
From your error message:
"TypeError: arange: scalar arguments expected instead of a tuple."
It sounds to me like you are trying to use the shape of an existing array to define the shape of a new array using np.arange.
Your problem is that you don't understand what x.shape is giving you.
For example:
x = np.array([[1,2,3],[4,5,6]])
x.shape
produces (2,3), a tuple. If I try to use just x.shape to define a argument in np.arange like this:
np.arange(x.shape)
I get the following error:
"arange: scalar arguments expected instead of a tuple."
Reason being np.arange accepts either a scalar (which creates an array starting at 0 and increasing by 1 to the length provided) or 3 scalars which define where to start and end the array and the step size. You are giving it a tuple instead which it doesn't like.
So when you do:
np.arange(x.shape[0])
you are giving the arange function the first scalar in the tuple provided by x.shape and in my example producing an array like this [0,1] because the first index in the tuple is 2.
If I alternatively did
np.arange(x.shape[1])
I would get an array like [0,1,2] because the second index in the tuple is a 3.
If I did any of the following,
np.arange(x.shape[2])
np.arange(x.shape[1000])
np.arange(x.shape[300])
I would get an error because the tuple created by x.shape has only two dimensions and so can't be indexed any further than 0 or 1.
Hope that helps!

numpy array concatenation error: 0-d arrays can't be concatenated

I am trying to concatenate two numpy arrays, but I got this error. Could some one give me a bit clue about what this actually means?
Import numpy as np
allValues = np.arange(-1, 1, 0.5)
tmp = np.concatenate(allValues, np.array([30], float))
Then I got
ValueError: 0-d arrays can't be concatenated
If I do
tmp = np.concatenate(allValues, np.array([50], float))
There is no error message but tmp variable does not reflect the concatenation either.
You need to put the arrays you want to concatenate into a sequence (usually a tuple or list) in the argument.
tmp = np.concatenate((allValues, np.array([30], float)))
tmp = np.concatenate([allValues, np.array([30], float)])
Check the documentation for np.concatenate. Note that the first argument is a sequence (e.g. list, tuple) of arrays. It does not take them as separate arguments.
As far as I know, this API is shared by all of numpy's concatenation functions: concatenate, hstack, vstack, dstack, and column_stack all take a single main argument that should be some sequence of arrays.
The reason you are getting that particular error is that arrays are sequences as well. But this means that concatenate is interpreting allValues as a sequence of arrays to concatenate. However, each element of allValues is a float rather than an array, and is therefore being interpreted as a zero-dimensional array. As the error says, these "arrays" cannot be concatenated.
The second argument is taken as the second (optional) argument of concatenate, which is the axis to concatenate on. This only works because there is a single element in the second argument, which can be cast as an integer and therefore is a valid value. If you had put an array with more elements in the second argument, you would have gotten a different error:
a = np.array([1, 2])
b = np.array([3, 4])
np.concatenate(a, b)
# TypeError: only length-1 arrays can be converted to Python scalars
Also make sure you are concatenating two numpy arrays. I was concatenating one python array with a numpy array and it was giving me the same error:
ValueError: 0-d arrays can't be concatenated
It took me some time to figure this out since all the answers in stackoverflow were assuming that you had two numpy arrays.
Pretty silly but easily overlooked mistake. Hence posting just in case this helps someone.
Here are the links to converting an existing python array using np.asarray
or
create np arrays, if it helps.
Another way to get this error is to have two numpy objects of different... types?
I get this error when I try np.concatenate([A,B])
and ValueError: all the input arrays must have same number of dimensions when I run np.concatenate([B,A])
Just as #mithunpaul mentioned, my types are off: A is an array of 44279x204 and B is a <44279x12 sparse matrix of type '<class 'numpy.float64'>' with 88558 stored elements in Compressed Sparse Row format>)
So that's why the error is happening. Don't know how to solve it yet though.

numpy arrays dimension mismatch

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.
np.concatenate((tmp, id, freqs))
Here are the exact values:
tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"
The dimensions of tmp, 17232, and freqs are as follows:
[in] tmp.shape
[out] (4,)
[in] np.array(17232).shape
[out] ()
[in] freqs.shape
[out] (1,)
I have also tried casting them all as numpy arrays to no avail.
Although the variable freqs will frequently have more than one value.
However, with both the np.concatenate and np.append functions I get the following error:
*** ValueError: all the input arrays must have same number of dimensions
These all have the same number of columns (0), why can't I concatenate them with either of the above described numpy methods?
All I'm looking to obtain is[(tmp), 17232, (freqs)] in one single dimensional array, which is to be appended onto the end of a pandas dataframe.
Thanks.
Update
It appears I can concatenate the two existing arrays:
np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)
However, the integer, even when casted cannot be used in concatenate.
np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions
What does work, however is nesting append and concatenate
np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
0.022831050228310501], dtype=object)
Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?
The problem is that id, and later the integer np.array(17571), are not an array_like object. See here how numpy decides whether an object can be converted automatically to a numpy array or not.
The solution is to make id array_like, i.e. to be an element of a list or tuple, so that numpy understands that id belongs to a 1D array_like structure
It all boils down to
concatenate((tmp, (id,), freqs))
or
concatenate((tmp, [id], freqs))
To avoid this sort of problems when dealing with input variables in functions using numpy, you can use atleast_1d, as pointed out by #askewchan. See about it this question/answer.
Basically, if you are unsure if in different scenarios your variable id will be a single str or a list of str, you are better off using
concatenate((tmp, atleast_1d(id), freqs))
because the two options above will fail if id is already a list/tuple of strings.
EDIT: It may not be obvious why np.array(17571) is not an array_like object. This happens because np.array(17571).shape==(), so it is not iterable as it has no dimensions.

Categories