a=np.random.dirichlet(np.ones(3),size=1)
I want to use three numbers, where they sum up to 1. However, I noticed that a[0] will be:
array([0.24414272, 0.01769199, 0.7381653 ])
an index that already contains three elements.
Is there any way to split them into three indices?
The default behavior if you don't pass size is to return a single dimensional array with the specified elements, per the docstring on the function:
size : int or tuple of ints, optional
Output shape. If the give shape is, e.g., (m, n), then m * n * k [where k is size of input and sample sequences] samples are drawn. Default is None, in which case a vector of length k is returned.
By passing size=1, you explicitly tell it to make a multidimensional array of size samples (so, 1 sample, making the outer dimension 1), where not passing size (or passing size=None) would still make just one set of samples, as a single 1D array.
Short version: If you just drop the ,size=1 from your call, you'll get what you want.
If that's the only thing you want, then this should work:
a=np.random.dirichlet(np.ones(3),size=1)[0]
Related
I am incremently sampling a batch of size torch.Size([n, 8]).
I also have a list valid_indices of length n which contains tuples of indices that are valid for each entry in the batch.
For instance valid_indices[0] may look like this: (0,1,3,4,5,7) , which suggests that indices 2 and 6 should be excluded from the first entry in batch along dim 1.
Particularly I need to exclude these values for when I use torch.max(batch, dim=1, keepdim=True).
Indices to be excluded (if any) may differ from entry to entry within the batch.
Any ideas? Thanks in advance.
I assume that you are getting the good old
IndexError: too many indices for tensor of dimension 1
error when you use your tuple indices directly on the tensor.
At least that was the error that I was able to reproduce when I execute the following line
t[0][valid_idx0]
Where t is a random tensor with size (10,8) and valid_idx0 is a tuple with 4 elements.
However, same line works just fine when you convert your tuple to a list as following
t[0][list(valid_idx0)]
>>> tensor([0.1847, 0.1028, 0.7130, 0.5093])
But when it comes to applying these indices to 2D tensors, things get a bit different, since we need to preserve the structure of our tensor for batch processing.
Therefore, it would be reasonable to convert our indices to mask arrays.
Let's say we have a list of tuples valid_indices at hand. First thing will be converting it to a list of lists.
valid_idx_list = [list(tup) for tup in valid_indices]
Second thing will be converting them to mask arrays.
masks = np.zeros((t.size()))
for i, indices in enumerate(valid_idx_list):
masks[i][indices] = 1
Done. Now we can apply our mask and use the torch.max on the masked tensor.
torch.max(t*masks)
Kindly see the colab notebook that I've used to reproduce the problem.
https://colab.research.google.com/drive/1BhKKgxk3gRwUjM8ilmiqgFvo0sfXMGiK?usp=sharing
I'm trying to write a function to compute the dot product of two 2D lists passed in as arguments, lets call them x and y.
My idea is to first create a 2D list of zeros of the proper dimensions for the result of the dot product. In order to do so, I need to find the column size of y when computing x * y
dim1 = len(x)
dim2 = len(y[0])
result = [0]*dim1*dim2
The above code was my idea for getting these dimensions, however it fails on the second line due to an error:
dim2 = len(y[0])
TypeError: object of type 'int' has no len()
My python interpreter seems to not like that I am assuming my arguments will be 2D lists. It seems to think it'll be a 1D list. How can I get the column length of the 2D list. I am assuming the 2D lists passed in will be of dimensions NxM, so it should be a clean rectangle shape list/matrix.
I am not able to use numpy for this case.
I am trying to 'expand' an array (generate a new array with proportionally more elements in all dimensions). I have an array with known numbers (let's call it X) and I want to make it j times bigger (in each dimension).
So far I generated a new array of zeros with more elements, then I used broadcasting to insert the original numbers in the new array (at fixed intervals).
Finally, I used linspace to fill the gaps, but this part is actually not directly relevant to the question.
The code I used (for n=3) is:
import numpy as np
new_shape = (np.array(X.shape) - 1 ) * ratio + 1
new_array = np.zeros(shape=new_shape)
new_array[::ratio,::ratio,::ratio] = X
My problem is that this is not general, I would have to modify the third line based on ndim. Is there a way to use such broadcasting for any number of dimensions in my array?
Edit: to be more precise, the third line would have to be:
new_array[::ratio,::ratio] = X
if ndim=2
or
new_array[::ratio,::ratio,::ratio,::ratio] = X
if ndim=4
etc. etc. I want to avoid having to write code for each case of ndim
p.s. If there is a better tool to do the entire process (such as 'inner-padding' that I am not aware of, I will be happy to learn about it).
Thank you
array = array[..., np.newaxis] will add another dimension
This article might help
You can use slice notation -
slicer = tuple(slice(None,None,ratio) for i in range(X.ndim))
new_array[slicer] = X
Build the slicing tuple manually. ::ratio is equivalent to slice(None, None, ratio):
new_array[(slice(None, None, ratio),)*new_array.ndim] = ...
Pretty self-explanatory. Pillow's getcolors() method returns list of tuples, each with a (1,3) shape (i.e. (count, (r, g, b)) ). Unless there is a better way to handle this, how can I create a numpy array with a [n, [1, 3]] shape?
You should rather use a n x 4 dimensional numpy array. The first axis allows you to choose between different results of the getcolors method. The second axis contains your data. You can store in the first entry the count value, and then the r, g and the b value. Then you can do something like this:
result = np.empty(number, 4)
#get one entry
count, r, g, b = result[n]
You should always keep in mind, what you are acutally trying to do: The data you want to store contains 4 different integers, so it is 4-dimensional. And you expect n different data points of this type. Therefore, your array has to have the shape n x 4.
PS: You use a strange definition of shapes' dimension; this causes you a lot of trouble. I suggest using the default definition of shapes, and thinking about them as the axes of a multi-dimensional array.
numpy provides three handy routines to turn an array into at least a 1D, 2D, or 3D array, e.g. through numpy.atleast_3d
I need the equivalent for one more dimension: atleast_4d. I can think of various ways using nested if statements but I was wondering whether there is a more efficient and faster method of returning the array in question. In you answer, I would be interested to see an estimate (O(n)) of the speed of execution if you can.
The np.array method has an optional ndmin keyword argument that:
Specifies the minimum number of dimensions that the resulting array
should have. Ones will be pre-pended to the shape as needed to meet
this requirement.
If you also set copy=False you should get close to what you are after.
As a do-it-yourself alternative, if you want extra dimensions trailing rather than leading:
arr.shape += (1,) * (4 - arr.ndim)
Why couldn't it just be something as simple as this:
import numpy as np
def atleast_4d(x):
if x.ndim < 4:
y = np.expand_dims(np.atleast_3d(x), axis=3)
else:
y = x
return y
ie. if the number of dimensions is less than four, call atleast_3d and append an extra dimension on the end, otherwise just return the array unchanged.