Difference in "Matrix[::] =" and "Matrix = " in Python3 - python

What is the difference in below two lines.
I know [::-1] will reverse the matrix. but I want to know what [::] on LHS side '=' does, as without iterating each element how matrix gets reversed in-place in case of 1st case.
matrix[::] = matrix[::-1]
matrix = matrix[::-1]

The technic you are looking for called slicing. It is an advanced way to reference elements in some container. Instead of using single index you can use a slice to reference a range of elements.
The slice consists of start, end and step, like this matrix[start:end:step]. You can skip some parts and defaults values will be taken - 0, len(matrix), 1.
Of course, a container must support this technic (protocol).
matrix[::] = # get all elements of the matrix and assign something to them
matrix = # link matrix name with something
matrix[::-1] # get all elements of the matrix in reversed order
So, the first one is actually copying elements in different positions of the same object.
The second one is just linking name matrix with new object constructed from slice of matrix.

Related

Changing the values of sliced numpy array doesn't change the original data in it

I have a numpy array total_weights which is an IxI array of floats. Each row/columns corresponds to one of I items.
During my main loop I acquire another real float array weights of size NxM (N, M < I) where each/column row also corresponds to one of the original I items (duplicates may also exist).
I want to add this array to total_weights. However, the sizes and order of the two arrays are not aligned. Therefore, I maintain a position map, a pandas Series with an index of item IDs to their proper index/position in total_weights, called pos_df.
In order to properly make the addition I want I perform the following operation inside the loop:
candidate_pos = pos_df.loc[candidate_IDs] # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs] # ^^
total_weights[candidate_pos, :][:, rated_pos] += weights
Unfortunately, the above operation must be editing a copy of the orignal total_weights matrix and not a view of it, since after the loop the total_weights array is still full of zeroes. How do I make it change the original data?
Edit:
I want to clarify that candidate_IDs are the N IDs of items and rated_IDs are the M IDs of items in the NxM array called weights. Through pos_df I can get their total order in all of I items.
Also, my guess as to the reason a copy is returned is that candidate_IDs and thus candidate_pos will probably contain duplicates e.g. [0, 1, 3, 1, ...]. So the same rows will sometimes have to be pulled into the new array/view.
Your first problem is in how you are using indexing. As candidate_pos is an array, total_weights[candidate_pos, :] is a fancy indexing operation that returns a new array. When you apply indexing again, i.e. ...[:, rated_pos] you are assigning elements to the newly created array rather than to total_weights.
The second problem, as you have already spotted, is in the actual logic you are trying to apply. If I understand your example correctly, you have a I x I matrix with weights, and you want to update weights for a sequence of pairs ((Ix_1, Iy_1), ..., (Ix_N, Iy_N)) with repetitions, with a single line of code. This can't be done in this way, using += operator, as you'll find yourself having added to weights[Ix_n, Iy_n] the weight corresponding to the last time (Ix_n, Iy_n) appears in your sequence: you have to first merge all the repeating elements in your sequence of weight updates, and then perform the update of your weights matrix with the new "unique" sequence of updates. Alternatively, you must collect your weights as an I x I matrix, and directly sum it to total_weights.
After #rveronese pointed out that it's impossible to do it one go because of the duplicates in candidate_pos I believe I have managed to do what I want with a for-loop on them:
candidate_pos = pos_df.loc[candidate_IDs] # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs] # ^^
for i, c in enumerate(candidate_pos):
total_weights[c, rated_pos] += weights[i, :]
In this case, the indexing does not create a copy and the assignment should be working as expected...

How to remove numbers in an array if it exists in another another

Here is my code so far. (Using NumPy for arrays)
avail_nums = np.array([1,2,3,4,5,6,7,8,9]) # initial available numbers
# print(avail_nums.shape[0])
# print(sudoku[spaces[x,1],spaces[x,2]]) # index of missing numbers in sudoku
print('\n')
# print(sudoku[spaces[x,1],:]) # rows of missing numbers
for i in range(sudoku[spaces[x,1],:].shape[0]): # Number of elements in the missing number row
for j in range(avail_nums.shape[0]): # Number of available numbers
if(sudoku[spaces[x,1],i] == avail_nums[j]):
avail_nums= np.delete(avail_nums,[j])
print(avail_nums)
A for loop cycles through all the elements in the 'sudoku row' and nested inside, another loop cycles through avail_nums. Every time there is a match (given by the if statement), that value is to be deleted from the avail_nums array until finally all the numbers in 'sudoku row' aren't in avail_nums.
I'm greeted with this error:
IndexError: index 8 is out of bounds for axis 0 with size 8
pointing to the line with the if statement.
Because avail_nums is shrinking, after the first deletion this happens. How can I resolve this issue?
When you are deleting items from the array, the array is getting smaller but your for loop does not know that because it is iterating over the original size of the array. So you are getting an out of bound error. So I would avoid using the for loop and deleting from the array I am iterating over.
My solution is to use a temporary array that contains allowed elements and then assign it to the original array name
temporary_array=list()
for element in array:
If element in another_array: # you can do this in Python
continue # ignore it
temporary_array.append(element)
array=temporary_array
the resulting array will have only the elements that do not exist in the another_array
You could also use list comprehension:
temporary_array = [ element for element in array if element not in another_array ]
array = temporary_array
Which is the same concept using fancy python syntax
Another option would be to use the builtin filter() which takes a filter function and an array and returns the filtered array. In the following I am using the lambda function notation, which is another nice Python syntax:
array = filter(lambda x: x not in another_array, array)
Since you are using numpy you should look for the numpy.extract() method here https://numpy.org/doc/stable/reference/generated/numpy.extract.html... for example using, numpy.where(), numpy.in1d() and numpy.extract() we could:
condition = numpy.where(numpy.in1d(np_array, np_another_array),False,True)
np_array = numpy.extract(condition, np_array)

i want to iterate over this structure. It is treating everyhting inside as one element and not as seperate numbers

loss=[[ -137.70171527 -81408.95809899 -94508.84395371 -311.81198933 -294.08711874]]
When I print loss it prints the addition of the numbers and not the individual numbers. I want to change this to a list so I can iterate over each individual number bit and I don't know how, please help.
I have tried:
result = map(tuple,loss)
However it prints the addition of the inside. When I try to index it says there is only 1 element. It works if I put a comma in between but this is a matrix that is outputed from other codes so i can't change or add to it.
You have a list of a list with numbers the outer list contains exactly one element (namely the inner list). The inner list is your list of integers over which you want to iterate. Hence to iterate over the inner list you would first have to access it from the outer list using indices for example like this:
for list_item in loss[0]:
do_something_for_each_element(list_item)
Moreover, I think that you wanted to have separate elements in the inner list and not compute one single number, didn't you? If that is the case you have to separate each element using a ,.
E.g.:
loss=[[-137.70171527, -81408.95809899, -94508.84395371, -311.81198933, -294.08711874]]
EDIT:
As you clarified in the comments you want to iterate over a numpy matrix. One way to do so is by converting the matrix into an n dimensional array (ndarray) and the iterate that structure. This could for example look like this, other options have also been presented in this answer(Iterate over a numpy Matrix rows):
import numpy as np
test_matrix=np.matrix([[1, 2], [3, 4]])
for row in test_matrix.A:
print(row)
note that the A attribute of a matrix object is its ndarray representation (https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html).

understanding array slices in python

I'm working on a machine learning project for university and I'm having trouble understanding some bits of code online. Here's the example:
digits = np.loadtxt(raw_data, delimiter=",")
x_train, y_train = digits[:,:-1], digits[:,-1:].squeeze()
What do the slices done in the second line mean? I'm trying to make a slice selecting the first 2/3 of the array and I've done before by something like [:2*array_elements // 3], but I don't understand how to do it if there's a delimiter in half.
numpy (or anything, but this seems like numpy) can implement __getitem__ to accept tuples instead of what stdlib does, where only scalar values are accepted (afaik) (e.g. integers, strings, slice objects).
You want to look at the slice "parts" individually, as specified by , delimiters. So [:,:-1] is actually : and :-1, are are completely independent.
First slice
: is "all", no slicing along that axis.
:x is all up until (and not including) x and -1 means the last element, so...
:-1 is all up until (and not including) the last.
Second slice
x: is all after (and including) x, and we already know about -1 so...
-1: is all after (and including) the last -- in this case just the last.
There are two mechanisms involved here.
The python's notation for slicing array : Understanding Python's slice notation
Basically the syntax is array[x:y] where the resulting slice starts at x (included) and end at y (excluded).
If start (resp. end) is omitted it means "from the first item" (resp. "to the last item) (This is a shortcut).
Also the notation is cyclic :
array[-1:0]
# The elements between the last index - 1 and the first (in this order).
# Which means the elements between the last index -1 and the last index
# Which means a list containing only the last element
array[-1:] = [array[-1]]
The numpy's 2-dimensionnal arrays (assuming the np is for numpy) :
Numpy frequently uses arrays of 2 dimensions like a matrix. So to access the element in row x and column y you can write it matrix[x,y]
Plus the python's notation for slicing arrays also apply here to slice matrix into a sub-matrix of smaller size
So, back at your problem:
digits[:,:-1]
= digits[start:end , start:-1]
= digits[start:end , start:end-1]
= the sub-matrix where you take all the rows (start:end) and you take all the columns except the last one (start:end-1)
And
digit[:,-1:]
= digit[start:end, -1:start]
= digit[start:end, -1:end]
= sub-matrix with all the rows and only the last column

How to keep track of original row indices in Numpy array when comparing to only a slice?

I'm working with a 2D numpy array A, performing a comparison of a one dimensional array, X, against each row in A. As approximate matches are found, I'm keeping track of their indices in A in a dtype=bool array S. I'd like to use S to shrink the field of match candidates in A to improve efficiency. Here's the basic idea in code:
def compare(nxt):
S[nxt] = 0 #sets boolean
T = A[nxt, i:] == A[S, :-i] #T has different dimesions than A
compare() is iterated over and S is progressively populated with False values.
The problem is that the boolean array T is of the same dimensions as the pared down version of A not the original version. I'm hoping to use T to get the indices (in the unsliced A) of the approximate matches for later use.
np.argwhere(T)
This returns a list of indices of the matches, but again in the slice of A.
It seems like there has to be a better way to, at the same time, crop A for more efficient searching and still be able to get the correct index of the matching row.
Any thoughts?

Categories