Subtracting a number from an array if condition is met python - python

I am facing a very basic problem in if condition in python.
The array current_img_list is of dimension (500L,1). If the number 82459 is in this array, I want to subtract it by 1.
index = np.random.randint(0,num_train-1,500)
# shape of current_img_list is (500L,1)
# values in index array is randomized
current_img_list = train_data['img_list'][index]
# shape of current_img_list is (500L,1)
if (any(current_img_list) == 82459):
i = np.where(current_img_list == 82459)
final_list = i-1
Explanation of variables - train_data is of type dict. It has 4 elements in it. img_list is one of the elements with size (215375L,1). The value of num_train is 215375 and size of index is (500L,1)
Firsly I don't know whether this loop is working or not. I tried all() function and numpy.where() function but to no success. Secondly, I can't think of a way of how to subtract 1 from 82459 directly from the index at which it is stored without affecting the rest of the values in this array.
Thanks

Looping over the array in Python will be much slower than letting numpy's vectorized operators do their thing:
import numpy as np
num_train = 90000 # for example
index = np.random.randint(0,num_train-1,500)
index[index == 82459] -= 1

current_img_list = np.array([1,2,3,82459,4,5,6])
i = np.where(current_img_list == 82459)
current_img_list[i] -= 1

I'm a little confused by whats trying to be achieved here, but I'll give it a go:
If you're trying to subtract 1 from anywhere in your array that is equal to 82459, then what maybe iterate through the array, with a for loop. Each time the current index is equal to 82459, just set the number at that index -= 1.
If you need more help, please post the rest of the relevant code so I can debug.

Related

Numpy/Pytorch generate mask based on varying index values

I've been trying to do the following as a batch operation in numpy or torch (no looping). Is this possible?
Suppose I have:
indices: [[3],[2]] (2x1)
output: [[0,0,0,0,1], [0,0,0,1,1]] (2xfixed_num) where fixed_num is 5 here
Essentially, I want to make indices up to that index value 0 and the rest 1 for each element.
Ok, so I actually assume this is some sort of HW assignment - but maybe it's not, either way it was fun to do, here's a solution for your specific example, maybe you can generalize it to any shape array:
def fill_ones(arr, idxs):
x = np.where(np.arange(arr.shape[1]) <= idxs[0], 0, 1) # This is the important logic.
y = np.where(np.arange(arr.shape[1]) <= idxs[1], 0, 1)
return np.array([x, y])
So where the comment is located - we use a condition to assign 0 to all indices before some index value, and 1 after such value. This actually creates a new array as opposed to a mask that we can use to the original array - so maybe it's "dirtier".
Also, I suspect it's possible to generalize to arrays more than 2 dimensions, but the solution i'm imagining now uses a for-loop. Hope this helps!
Note: arr is just a numpy array of whatever shape you want the output to be and idxs is a tuple of what indices past you want to the array elements to turn into 1's - hope that is clear

Trying to convert a MATLAB array to a Python array

I have this MATLAB code that I need to translate to python, however there is an issue in creating a new column in the firings array. In MATLAB, the code creates an n*2 matrix that is initially empty and I want to be able to do the same in python. Using NumPy, I created fired = np.where(v >= 30). However python creates a tuple rather than an array so it throws an error:
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
This is the code I have in MATLAB that I would like converted into Python
firings=[];
firings=[firings; t+0*fired, fired];
Help is appreciated! Thanks!
np.where generates a two-element tuple if the array is 1D in nature. For the 1D case, you would need to access the first element of the result of np.where only:
fired = np.where(v >= 30)[0]
You can then go ahead and concatenate the matrices. Also a suggestion provided by user #Divakar would be to use np.flatnonzero which would equivalently find the non-zero values in a NumPy array and flattened into a 1D array for less headaches:
fired = np.flatnonzero(v >= 30)
Take note that the logic to concatenate would not work if there were no matches found in fired. You will need to take this into account when you look at your concatenating logic. The convenient thing with MATLAB is that you're able to concatenate empty matrices and the result is no effect (obviously).
Also note that there is no conception of a row vector or column vector in NumPy. It is simply a 1D array. If you want to specifically force the array to be a column vector as you have it, you need to introduce a singleton axis in the second dimension for you to do this. Note that this only works provided that np.where gave you matched results. After, you can use np.vstack and np.hstack to vertically and horizontally concatenate arrays to help you do what you ask. What you have to do first is create a blank 2D array, then do what we just covered:
firings = np.array([[]]) # Create blank 2D array
# Some code here...
# ...
# ...
# fired = find(v >= 30); % From MATLAB
fired = np.where(v >= 30)[0]
# or you can use...
# fired = np.flatnonzero(v >= 30)
if np.size(fired) != 0:
fired = fired[:, None] # Introduce singleton axis
# Update firings with two column vectors
# firings = [firings; t + 0 * fired, fired]; % From MATLAB
firings = np.vstack([firings, np.hstack([t + 0*fired, fired])])
Here np.size finds the total number of elements in the NumPy array. If the result of np.where generated no results, the number of elements in fired should be 0. Therefore the if statement only executes if we have found at least one element in v subject to v >= 30.
If you use numpy, you can define an ndarray:
import numpy as np
firings=np.ndarray(shape=(1,2)
firings[0][0:]=(1.,2.)
firings=np.append(firings,[[3.,4.]],axis=0)

How to access values from a multidimensional array with a for loop?

I have created an IF statement tree to give sets of data in arrays a label. The multidimensional array is called featureVectors and numberOfSides, standardDeviationsPerimeter, standardDeviationsAngles(not used in this section of code) and largestAngles are all arrays included in the array. I want to pass all the arrays in featureVectors through the IF statement but it doesn't loop past the first one therefore giving every dataset the label 2. I am not very good at loops with multidimensional arrays.This is my code so far:
for shape in range(0, len(sidesDividedByPerimeter)):
if numberOfSides[0] == 1:
labels = 0
elif numberOfSides[0] > 1 and numberOfSides[0] < 3.5:
labels = 1
elif numberOfSides[0] > 3.5:
if standardDeviationsPerimeter[0] < 0.1458:
if largestAngles[0] < 104.79:
labels = 2
elif largestAngles[0] >= 104.79:
labels = 3
elif standardDeviationsPerimeter[0] >= 0.1458:
labels = 4
print(featureVectors)
print(labels)
#featureVectors[shape].append(labels)
This gives me the output:
I just need it to run through each of the arrays instead of stopping at the first. I know its because of my [0]'s but i just dont know what I should do, im only learning python.
Here shape is the cycling variable that adds with one in each cycling step, but you are not using this variable at all in the cycling body.
I don't quite understand what all your arrays mean, but try to replace all [0] with [shape] and perhaps it will work as you expected.

Avoiding an indexing error in Python while looping

Regardless of whether this is the most efficient way to structure this sorting algorithm in Python (it's not), my understandings of indexing requirements/the nature of the built-in 'min' function are failing to account for the following error in the following code:
Error:
builtins.IndexError: list index out of range
Here's the code:
#Create function to sort arrays with numeric entries in increasing order
def selection_sort(arr):
arruns = arr #pool of unsorted array values, initially the same as 'arr'
indmin = 0 #initialize arbitrary value for indmin.
#indmin is the index of the minimum value of the entries in arruns
for i in range(0,len(arr)):
if i > 0: #after the first looping cycle
del arruns[indmin] #remove the entry that has been properly sorted
#from the pool of unsorted values.
while arr[i] != min(arruns):
indmin = arruns.index(min(arruns)) #get index of min value in arruns
arr[i] = arruns[indmin]
#example case
x = [1,0,5,4] #simple array to be sorted
selection_sort(x)
print(x) #The expectation is: [0,1,4,5]
I've looked at a couple other index error examples and have not been able to attribute my problem to anything occurring while entering/exiting my while loop. I thought that my mapping of the sorting process was sound, but my code even fails on the simple array assigned to x above. Please help if able.
arr and arruns are the same lists. You are removing items from the list, decreasing its size, but leaving max value of i variable untouched.
Fix:
arruns = [] + arr
This will create new array for arruns

python performance tips -- loop involving comparisons

I am new to python and am trying to learn more about how it works by successively optimizing chunks of naive code I have already written.
The following function involves a loop that performs operations on the elements of a list of list of floats only when the values of the data structure satisfy some condition. I was wondering if anyone could comment on (1) ways to improve the performance of this loop and (2) general features of the type of loop I'm describing that make it more or less suitable for different approaches to improving it. Below I've included a minimal version of the loop I'm working with.
Some notes on the variables used below:
#p is a small integer (say, p=10)
#index1 is an integer between 0 and p
#k is an integer between 0 and, say, kmax=100
#mat1 is a list of list of floats whose size is [kmax,p],
# with all values initialized to 0.0.
# mat1 is changed by the loop below
#mat2 is a list of list of floats whose size is [kmax,p]
# with all values initialized to -2e10.
# mat2 is changed by other parts of the program
Also, if it matters, in my code this is all part of a class, so there are "self." statements for the variables. I have read that local variables are handled better by python functions; how does this translate to class constructs?
def greatFunction(index1,k):
index2 = index1
for j in range(p):
if (mat2[k][index2] > -1e10):
mat1[k][j] = mat1[k][j] + mat2[k][index1]*mat2[k][index2]
index2 = index2 - 1
if(index2 < 0):
index2 = index2 + p
From what I have read I thought this would be a prime candidate for replacing the lists of lists with nparrays (in the class itself, not converting things in the function) and using masks to take care of the boolean conditions. However, the numpy version I wrote turned out to be slower than the vanilla python implementation above. Any help both speeding up the code but more importantly helping me understand why and how such loops can be replaced with a better construction would be much appreciated. Thank you!
It looks like index2 goes from index1 to 0 decreasing at steps of 1 and then circulates back to p-1 and then starts decreasing again. This is basically modulo operation and thus could be simulated with np.mod to give us index2 as column indices at each iteration. Then, we index into the k-th row of mat2 and use column indices from the previous step to get all elements needed for our purpose. These elements are compared with the threshold (-1e10 in our case), giving us a mask, which is used to select elements from that row and set the corresponding ones into the output array mat1 after scaling with mat2[k][index1].
Since you are working with NumPy arrays, I am assuming you already have mat1 and mat2 converted to NumPy arrays with np.array().
Also, as mentioned in the comments, if mat2 has all its values initialized to -2e10, then mat2[k][index2] > -1e10 would never be true, so in that particular case, mat1 would keep its zeros. Thus, to generically explain how to vectorize such a case, I am assuming mat2 to have random numbers instead. The implementation would look something like this -
# Get column indices corresponding to index2
col_idx = np.mod(np.arange(index1,index1-p,-1),p)
# Get all mat2 values with those column indices at kth row
mat2_val = mat2[k,col_idx]
# Mask of valid elements
mask = mat2_val > -1e10
# Set valid ones in mat1 with valid ones from mat2
# after scaling with mat2[k][index1]
mat1[k][mask] = mat2[k][index1]*mat2_val[mask]

Categories