Tuples as input of neuronal network - python

I have a dataset of 10 columns, each column is a tuple, for example (8,3). The first part of the tuple indicates the size of my item and the second part the form of my item. So I need both to identify the item.
Is there a way to pass the rows of 10 items (this means 10 tuples) as input of any kind of neuronal network? Any suggestion?
Thank you

Related

what does the third number in the tuple argument denotes in numpy.zeros((100,100,3)) function?

numpy.zeros((100,100,3))
What does number 3 denotes in this tuple?
I got the output but didn't totally understand the tuple argument.
This piece of code will create a 3D array with 100 rows, 100 columns, and in 3 dimensions.

Mask certain indices for every entry in a batch, when using torch.max()

I am incremently sampling a batch of size torch.Size([n, 8]).
I also have a list valid_indices of length n which contains tuples of indices that are valid for each entry in the batch.
For instance valid_indices[0] may look like this: (0,1,3,4,5,7) , which suggests that indices 2 and 6 should be excluded from the first entry in batch along dim 1.
Particularly I need to exclude these values for when I use torch.max(batch, dim=1, keepdim=True).
Indices to be excluded (if any) may differ from entry to entry within the batch.
Any ideas? Thanks in advance.
I assume that you are getting the good old
IndexError: too many indices for tensor of dimension 1
error when you use your tuple indices directly on the tensor.
At least that was the error that I was able to reproduce when I execute the following line
t[0][valid_idx0]
Where t is a random tensor with size (10,8) and valid_idx0 is a tuple with 4 elements.
However, same line works just fine when you convert your tuple to a list as following
t[0][list(valid_idx0)]
>>> tensor([0.1847, 0.1028, 0.7130, 0.5093])
But when it comes to applying these indices to 2D tensors, things get a bit different, since we need to preserve the structure of our tensor for batch processing.
Therefore, it would be reasonable to convert our indices to mask arrays.
Let's say we have a list of tuples valid_indices at hand. First thing will be converting it to a list of lists.
valid_idx_list = [list(tup) for tup in valid_indices]
Second thing will be converting them to mask arrays.
masks = np.zeros((t.size()))
for i, indices in enumerate(valid_idx_list):
masks[i][indices] = 1
Done. Now we can apply our mask and use the torch.max on the masked tensor.
torch.max(t*masks)
Kindly see the colab notebook that I've used to reproduce the problem.
https://colab.research.google.com/drive/1BhKKgxk3gRwUjM8ilmiqgFvo0sfXMGiK?usp=sharing

Dataframe slicing with more than two dimensions

So I'm going through a machine learning tutorial and I'm met with this line of code:
pred_list = []
batch = train[-n_input:].reshape((1, n_input, n_features))
for i in range(n_input):
pred_list.append(model.predict(batch)[0])
batch = np.append(batch[:,1:,:],[[pred_list[i]]],axis=1)
Specifically, what happens inside the for loop. I understand that the first line of code grabs the first value of whatever is predicted, this is only one value. Next it appends the value to the end of batch, this is where I'm confused.
Why is batch in the second line of code batch[:,1:,:]? What does that mean? I'm not too sure about dataframe slicing, can someone explain what the second line of code in the for loop means? It would be very much appreciated. Here's the article in question. Thank you for reading.
Seems to me batch is a numpy array with 3 dimensions of shape (1, n_input, n_features), 1 row, n_input columns, and n_features depths. batch[:,1:,:] would be a slice of batch that gets from second to last columns of batch (python is 0-based indexing). I am guessing these columns represent inputs, i.e. all the features of inputs 1 to last.
batch = np.append(batch[:,1:,:],[[pred_list[i]]],axis=1) appends [[pred_list[i]]] to that slice of batch along axis=1 which is columns. So I am guessing it removes the first input from batch and appends the new [[pred_list[i]]] as last input to batch and re-do this for all inputs in batch.
ndarray can be indexed in two way,
arr = np.array([[[1,2,3],
[3,4,5],
[7,8,9]]])
Either
arr[1][0][2] #row, col, layer
or
arr[1,0,2] #row, col, layer
First index gives you the row, second col, third layer and so on. Both the methods will give you the element present in the 2nd row, 1st column and 3rd layer.
batch[:,1:,:] means you want all the rows, all the columns following the 1st column and all the layers.
P.S
I have used the word layers here, if you know a better word do suggest.

using numpy to store objects of different sizes

I have the following task:
2 lists. 1st list -> item, 2nd list -> meta data (2 floats) for every item.
These lists get changed by various steps of an algorithm, but their length is kept equal. i.e. their length increases, but increases equally. This way the index is a way to identify which item the meta data refers to.
At one (repeated) step of the algorithm I am shortening the list identifying the same items. Respectively, I have to tune the meta data list.
I could implement that using generic lists, but at some point it overloads the memory. So, I tried using np.array, but the issue with them is that their dimensions should be equal for every element. i.e. arr=np.array([1,2, [3, [4]] ],dtype=object) returns arr.ndim=1. What I need though is for it to return arr.ndim=3. I played around with it and discovered that [3,[4]] is of type list and has nothing to do with np.array. With equal dimensions for every element of np.array, it returns elements along every axis of np types, say, np.int32 or np.array
Critical 3rd step. When I am going through the list and collecting meta data for the same items, I am putting them into the meta_list under the same index, i.e. creating (or expanding) a list of lists at that index. Example,
meta_list=[[1,2],[3,4],[5,6],[7,8]]
Then, say, 1st and 3rd elements of the item_list are the same. So I have to combine their meta data. It yields this:
meta_list=[[1,2],[[3,4],[7,8]],[5,6]]
But I cannot wrap my head around how to implement this step using np.array, profiting from its storage efficiency as that [[3,4],[7,8]] element will be of type list.
Would be very grateful for hints.

Keeping Track of Dynamic Programming Steps

I'm teaching myself basic programming principles, and I'm stuck on a dynamic programming problem. Let's take the infamous Knapsack Problem:
Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.
Let's set the weight limit to 10, and let's give two lists: weights = [2,4,7] and values = [8,4,9] (I just made these up). I can write the code to give the maximum value given the constraint--that's not a problem. But what about if I want to know what values I actually ended up using? Not the total value--the individual values. So for this example, the maximum would be the objects with weights 2 and 7, for a total value of 8 + 9 = 17. I don't want my answer to read "17" though--I want an output of a list like: (8, 9). It might be easy for this problem, but the problem I'm working on uses lists that are much bigger and that have repeat numbers (for example, multiple objects have a value of 8).
Let me know if anyone can think of anything. As always, much love and appreciation to the community.
Consider each partial solution a Node. Simply add whatever you use into each of these nodes and whichever node becomes the answer at the end will contain the set of items you used.
So each time you find a new solution you just set the list of items to the list of items of the new optimal solution and repeat for each.
A basic array implementation can help you keep track of what items enabled a new DP state to get it's value. For example, if your DP array is w[] then you can have another array p[]. Every time a state is generated for w[i], you set p[i] to the item you used to get to 'w[i]'. Then to output the list of items used for w[n], output p[n], and then move to the index n-weightOf(p[n]) until you reach 0 to output all the items.

Categories