What is the difference between numpy.array([]) and numpy.array([[]])? - python

Why can't I get the transpose when of alpha but I can get it for beta? What do the additional [] do?
alpha = np.array([1,2,3,4])
alpha.shape
alpha.T.shape
beta = np.array([[1,2,3,4]])
beta.shape
beta.T.shape

From the documention (link):
Transposing a 1-D array returns an unchanged view of the original array.
The array [1,2,3,4] is 1-D while the array [[1,2,3,4]] is a 1x4 2-D array.

The second pair of bracket indicates that it is a 2D array, so with such and array the transposed array is different from the first array (since the transpose switches the 2 dimensions). However if the array is only 1D the transpose doesn't change anything and the resulting array is equal to the starting one.

alpha is a 1D array, the transpose is itself.
beta is a 2D array, so you can transform (1,n) to (n,1).
To do the same with alpha, you need to add a dimension, you don't need to transpose it:
alpha[:, None]

alpha is a 1D array with shape (4,). The transpose is just alpha again, i.e. alpha == alpha.T.
beta is a 2D array with shape (1,4). It's a single row, but it has two dimensions. Its transpose looks like a single column with shape (4,1).

When I arrived at the programming language world, having come from the "math side of the business" this also seemed strange to me. After giving some thought to it I realized that from a programming perspective they are different. Have a look at the following list:
a = [1,2,3,4,5]
This is a 1D structure. This is so, because to get back the values 1,2,3,4 and 5 you just need to assign one address value. 3 would be returned if you issued the command a[2] for instance.
Now take a look at this list:
b = [[ 1, 2, 3, 4, 5],
[11, 22, 33, 44, 55]]
To get back the 11 for instance you would need two positional numbers, 1 because 11 is located in the 2nd list and 0 because in the second list it is located in the first position. In other words b[1,0] gives back to you 11.
Now comes the trick part. Look at this third list:
c = [ [ 100, 200, 300, 400, 500] ]
If you look carefully each number requires 2 positional numbers to be taken back from the list. 300 for instance requires 0 because it is located in the first (and only) list and 2 because it is the third element of the first list. c[0,2] gets you back 300.
This list can be transposed because it has two dimensions and the transposition operation is something that switches the positional arguments. So c.T would give you back a list whose shape would be [5,1], since c has a [1,5] shape.
Get back to list a. There you have a list with only one positional number. That list has a shape of [5] only, so thereĀ“s no second positional argument to the transposition operation to work with. Therefore it remains [5] and if you try a.T you get back a.
Got it?
Best regards,
Gustavo,

Related

i write 3d matrix . but why it is showed it is 2d

i write a 3 dimension matrix. i used .ndim to get the dimension.
but it shows it is 2D
third_matrix = np.array([[23,45,56,78],[98,76,54,43],[80,79,57,35]])
print("third matrix dimension = ",third_matrix.ndim)
output is :
third matrix dimension = 2
You have a list of lists, so it is a 2D matrix. In order to make it 3D, put the numbers in lists.
i.e
[ [[23],[45],[56],[78]], [[98],[76],[54],[43]], [[80],[79],[57],[35]] ]
You also have to have in Mind, that numpy.array() only accepts an iterable as input, not many, maybe the confusion is there.
2_D_list = [[23,45,56,78],
[98,76,54,43],
[80,79,57,35]]
numpy.array(2_D_list)
The output is exactly the same.
Its a 2D list. THere are 3 elements in the parent list, and each of those has 4 Ints.
These are the dimensions
1_dimension = [4, 9, 4, 5]
2_dimension = [[23,45,56,78],[98,76,54,43],[80,79,57,35]]
3_dimension = [[[23],[45],[56],[78]], [[98],[76],[54],[43]], [[80],[79],[57],[35]]]

Removing indices from rows in 3D array

I have a 3D array with the shape (9, 100, 7200). I want to remove the 2nd half of the 7200 values in every row so the new shape will be (9, 100, 3600).
What can I do to slice the array or delete the 2nd half of the indices? I was thinking np.delete(arr, [3601:7200], axis=2), but I get an invalid syntax error when using the colon.
Why not just slicing?
arr = arr[:,:,:3600]
The syntax error occurs because [3601:7200] is not valid python. I assume you are trying to create a new array of numbers to pass as the obj parameter for the delete function. You could do it this way using something like the range function:
np.delete(arr, range(3600,7200), axis=2)
keep in mind that this will not modify arr, but it will return a new array with the elements deleted. Also, notice I have used 3600 not 3601.
However, its often better practice to use slicing in a problem like this:
arr[:,:,:3600]
This gives your required shape. Let me break this down a little. We are slicing a numpy array with 3 dimensions. Just putting a colon in means we are taking everything in that dimension. :3600 means we are taking the first 3600 elements in that dimension. A better way to think about deleting the last have, is to think of it as keeping the first half.

Numpy/Pytorch generate mask based on varying index values

I've been trying to do the following as a batch operation in numpy or torch (no looping). Is this possible?
Suppose I have:
indices: [[3],[2]] (2x1)
output: [[0,0,0,0,1], [0,0,0,1,1]] (2xfixed_num) where fixed_num is 5 here
Essentially, I want to make indices up to that index value 0 and the rest 1 for each element.
Ok, so I actually assume this is some sort of HW assignment - but maybe it's not, either way it was fun to do, here's a solution for your specific example, maybe you can generalize it to any shape array:
def fill_ones(arr, idxs):
x = np.where(np.arange(arr.shape[1]) <= idxs[0], 0, 1) # This is the important logic.
y = np.where(np.arange(arr.shape[1]) <= idxs[1], 0, 1)
return np.array([x, y])
So where the comment is located - we use a condition to assign 0 to all indices before some index value, and 1 after such value. This actually creates a new array as opposed to a mask that we can use to the original array - so maybe it's "dirtier".
Also, I suspect it's possible to generalize to arrays more than 2 dimensions, but the solution i'm imagining now uses a for-loop. Hope this helps!
Note: arr is just a numpy array of whatever shape you want the output to be and idxs is a tuple of what indices past you want to the array elements to turn into 1's - hope that is clear

Applying a probabilistic function over specific element of an array individually with Numpy/Python

I'm fairly new to Python/Numpy. What I have here is a standard array and I have a function which I have vectorized appropriately.
def f(i):
return np.random.choice(2,1,p=[0.7,0.3])*9
f = np.vectorize(f)
Defining an example array:
array = np.array([[1,1,0],[0,1,0],[0,0,1]])
With the vectorized function, f, I would like to evaluate f on each cell on the array with a value of 0.
I am trying to leave for loops as a last resort. My arrays will eventually be larger than 100 by 100, so running each cell individually to look and evaluate f might take too long.
I have tried:
print f(array[array==0])
Unfortunately, this gives me a row array consisting of 5 elements (the zeroes in my original array).
Alternatively I have tried,
array[array==0] = f(1)
But as expected, this just turns every single zero element of array into 0's or 9's.
What I'm looking for is somehow to give me my original array with the zero elements replaced individually. Ideally, 30% of my original zero elements will become 9 and the array structure is conserved.
Thanks
The reason your first try doesn't work is because the vectorized function handle, let's call it f_v to distinguish it from the original f, is performing the operation for exactly 5 elements: the 5 elements that are returned by the boolean indexing operation array[array==0]. That returns 5 values, it doesn't set those 5 items to the returned values. Your analysis of why the 2nd form fails is spot-on.
If you wanted to solve it you could combine your second approach with adding the size option to np.random.choice:
array = np.array([[1,1,0],[0,1,0],[0,0,1]])
mask = array==0
array[mask] = np.random.choice([18,9], size=mask.sum(), p=[0.7, 0.3])
# example output:
# array([[ 1, 1, 9],
# [18, 1, 9],
# [ 9, 18, 1]])
There was no need for np.vectorize: the size option takes care of that already.

Numpy array broadcasting rules

I'm having some trouble understanding the rules for array broadcasting in Numpy.
Obviously, if you perform element-wise multiplication on two arrays of the same dimensions and shape, everything is fine. Also, if you multiply a multi-dimensional array by a scalar it works. This I understand.
But if you have two N-dimensional arrays of different shapes, it's unclear to me exactly what the broadcasting rules are. This documentation/tutorial explains that: In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.
Okay, so I assume by trailing axis they are referring to the N in a M x N array. So, that means if I attempt to multiply two 2D arrays (matrices) with equal number of columns, it should work? Except it doesn't...
>>> from numpy import *
>>> A = array([[1,2],[3,4]])
>>> B = array([[2,3],[4,6],[6,9],[8,12]])
>>> print(A)
[[1 2]
[3 4]]
>>> print(B)
[[ 2 3]
[ 4 6]
[ 6 9]
[ 8 12]]
>>>
>>> A * B
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Since both A and B have two columns, I would have thought this would work. So, I'm probably misunderstanding something here about the term "trailing axis", and how it applies to N-dimensional arrays.
Can someone explain why my example doesn't work, and what is meant by "trailing axis"?
Well, the meaning of trailing axes is explained on the linked documentation page.
If you have two arrays with different dimensions number, say one 1x2x3 and other 2x3, then you compare only the trailing common dimensions, in this case 2x3. But if both your arrays are two-dimensional, then their corresponding sizes have to be either equal or one of them has to be 1. Dimensions along which the array has size 1 are called singular, and the array can be broadcasted along them.
In your case you have a 2x2 and 4x2 and 4 != 2 and neither 4 or 2 equals 1, so this doesn't work.
From http://cs231n.github.io/python-numpy-tutorial/#numpy-broadcasting:
Broadcasting two arrays together follows these rules:
If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
The arrays can be broadcast together if they are compatible in all dimensions.
After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension
If this explanation does not make sense, try reading the explanation from the documentation or this explanation.
we should consider two points about broadcasting. first: what is possible. second: how much of the possible things is done by numpy.
I know it might look a bit confusing, but I will make it clear by some example.
lets start from the zero level.
suppose we have two matrices. first matrix has three dimensions (named A) and the second has five (named B). numpy tries to match last/trailing dimensions. so numpy does not care about the first two dimensions of B. then numpy compares those trailing dimensions with each other. and if and only if they be equal or one of them be 1, numpy says "O.K. you two match". and if it these conditions don't satisfy, numpy would "sorry...its not my job!".
But I know that you may say comparison was better to be done in way that can handle when they are devisable(4 and 2 / 9 and 3). you might say it could be replicated/broadcasted by a whole number(2/3 in out example). and i am agree with you. and this is the reason I started my discussion with a distinction between what is possible and what is the capability of numpy.

Categories