Would like to vectorize while loop for performance - python

I am trying to set values for a window of an array based on the current value of another array.
It should ignore values that the windown overrides.
I need to be able to change the size of the window for different runs.
This works but it is very slow.
I thought there would be a vectorized solution somewhere.
window_size=3
def signal(self):
signal = pd.Series(data=0, index=arr.index)
i = 0
while i < len(self.arr) - 1:
s = self.arr.iloc[i]
if s in [-1, 1]:
j = i + window_size
signal.iloc[i: j] = s
i = i + window_size
else:
i += 1
return signal
arr = [0 0 0 0 1 0 0 0 0 0 0 -1 -1 0 0 0 0 ]
signal = [0 0 0 0 1 1 1 0 0 0 0 -1 -1 -1 0 0 0 ]

You could use shift function of pd.Series
arr_series = pd.Series(arr)
arr_series + arr_series.shift(periods=1, fill_value=0) + arr_series.shift(periods=2, fill_value=0)

Related

Integers wont append to my array for grid - python

Recently for a school project i've been making a "Treasure hunt" where the player finds treasure and bandits on a grid in python. I have a way to have the grid at a set size but, as an extra point they ask for us to be able to change the size of the grid, the amount of chests and the amount of bandits.
Here is the code for my grid maker but it wont make the "grid" array but it does for "playergrid":
def gridmaker(gridsize, debug):
global grid
global playergrid
gridinator = 1
grid = [[0]]
playergrid = [[" "]]
if debug == 1:
while gridinator <= gridsize:
grid[gridinator].append(0)
gridinator = gridinator + 1
gridinator = 1
else:
while gridinator <= gridsize:
playergrid[0].append(gridinator)
gridinator = gridinator + 1
gridinator = 1
while gridinator <= gridsize:
if debug == 1:
grid.append([0])
for i in range(gridsize):
grid[gridinator].append(0)
else:
playergrid.append([gridinator])
for i in range(gridsize):
playergrid[gridinator].append("#")
gridinator = gridinator+1
if debug == 1:
grid[1][1] = 1
else:
playergrid[1][1] = "P"
gridmaker(9, 1)
for row in grid:
print(" ".join(map(str,row)))
Sorry if it is formatted differently as there are 2 space tabs rather than 4, it works best on repl.it
print(grid) should return a grid like this:
0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Please let me know,
Thanks!
You have to remember that lists are 0-indexed.
Which means that to access the 1st element of the grid list you would use the index 0.
With grid = [[0]] you create a list with one item (you can get that item with grid[0]), which is a list whose 1st item (grid[0][0]) is 0.
But your gridinator's starting value is 1. So when your first append runs:
grid[gridinator].append(0)
it tries to access the 2nd element of grid:
grid[1].append(0)
Which gives you an IndexError since, as the traceback should tell you* list index out of range.
You can try this yourself:
grid = [[0]]
grid[0]
grid[1]
One of your solutions could be starting the gridinator with 0, and using strict less instead of less or equal here: gridinator <= gridsize (because grid[8] gives you the 9th element of the grid).
*Please remember to include the traceback for errors in the future. They really help both yourself and the people trying to help you.
Let me know if this helps, or if I should find another way to explain it.

Series calculation based on shifted values / recursive algorithm

I have the following:
df['PositionLong'] = 0
df['PositionLong'] = np.where(df['Alpha'] == 1, 1, (np.where(np.logical_and(df['PositionLong'].shift(1) == 1, df['Bravo'] == 1), 1, 0)))
This lines basically only take in df['Alpha'] but not the df['PositionLong'].shift(1).. It cannot recognize it but I dont understand why?
It produces this:
df['Alpha'] df['Bravo'] df['PositionLong']
0 0 0
1 1 1
0 1 0
1 1 1
1 1 1
However what I wanted the code to do is this:
df['Alpha'] df['Bravo'] df['PositionLong']
0 0 0
1 1 1
0 1 1
1 1 1
1 1 1
I believe the solution is to loop each row, but this will take very long.
Can you help me please?
You are looking for a recursive function, since a previous PositionLong value depends on Alpha, which itself is used to determine PositionLong.
But numpy.where is a regular function, so df['PositionLong'].shift(1) is evaluated as a series of 0 values, since you initialise the series with 0.
A manual loop need not be expensive. You can use numba to efficiently implement your recursive algorithm:
from numba import njit
#njit
def rec_algo(alpha, bravo):
res = np.empty(alpha.shape)
res[0] = 1 if alpha[0] == 1 else 0
for i in range(1, len(res)):
if (alpha[i] == 1) or ((res[i-1] == 1) and bravo[i] == 1):
res[i] = 1
else:
res[i] = 0
return res
df['PositionLong'] = rec_algo(df['Alpha'].values, df['Bravo'].values).astype(int)
Result:
print(df)
Alpha Bravo PositionLong
0 0 0 0
1 1 1 1
2 0 1 1
3 1 1 1
4 1 1 1

Python numpy zeros array being assigned 1 for every value when only one index is updated

The following is my code:
amount_features = X.shape[1]
best_features = np.zeros((amount_features,), dtype=int)
best_accuracy = 0
best_accuracy_index = 0
def find_best_features(best_features, best_accuracy):
for i in range(amount_features):
trial_features = best_features
trial_features[i] = 1
svc = SVC(C = 10, gamma = .1)
svc.fit(X_train[:,trial_features==1],y_train)
y_pred = svc.predict(X_test[:,trial_features==1])
accuracy = metrics.accuracy_score(y_test,y_pred)
if (accuracy > best_accuracy):
best_accuracy = accuracy
best_accuracy_index = i
print(best_accuracy_index)
best_features[best_accuracy_index] = 1
return best_features, best_accuracy
bf, ba = find_best_features(best_features, best_accuracy)
print(bf, ba)
And this is my output:
25
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] 0.865853658537
And my expected output:
25
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0] 0.865853658537
I am trying to update the zeros array with the index that gives the highest accuracy. As you see it should be index 25, and I follow that by assigning the 25 index for my array equal to 1. However, when I print the array it shows every index has been updated to 1.
Not sure what is the mishap. Thanks for spending your limited time on Earth to help me.
Change trial_features = best_features to trial_features = numpy.copy(best_features). Reasoning behind the change is already given by #Michael Butscher.

How do I add to a grid coordinate in python?

What I'm trying to do is have a 2D array and for every coordinate in the array, ask all the other 8 coordinates around it if they have stored a 1 or a 0. Similar to a minesweeper looking for mines.
I used to have this:
grid = []
for fila in range(10):
grid.append([])
for columna in range(10):
grid[fila].append(0)
#edited
for fila in range (10):
for columna in range (10):
neighbour = 0
for i in range 10:
for j in range 10:
if gird[fila + i][columna + j] == 1
neighbour += 1
But something didn't work well. I also had print statments to try to find the error that way but i still didnt understand why it only made half of the for loop. So I changed the second for loop to this:
#edited
for fila in range (10):
for columna in range (10):
neighbour = 0
if grid[fila - 1][columna - 1] == 1:
neighbour += 1
if grid[fila - 1][columna] == 1:
neighbour += 1
if grid[fila - 1][columna + 1] == 1:
neighbour += 1
if grid[fila][columna - 1] == 1:
neighbour += 1
if grid[fila][columna + 1] == 1:
neighbour += 1
if grid[fila + 1][columna - 1] == 1:
neighbour += 1
if grid[fila + 1][columna] == 1:
neighbour += 1
if grid[fila + 1][columna + 1] == 1:
neighbour += 1
And got this error:
if grid[fila - 1][columna + 1] == 1:
IndexError: list index out of range
It seems like I can't add on the grid coordinates but I can subtract. Why is that?
Valid indices in python are -len(grid) to len(grid)-1. the positive indices are accessing elements with offset from the front, the negative ones from the rear. adding gives a range error if the index is greater than len(grid)-1 that is what you see. subtracting does not give you a range error unless you get an index value less than -len(grid). although you do not check for the lower bound, which is 0 (zero) it seems to work for you as small negative indices return you values from the rear end. this is a silent error leading to wrong neighborhood results.
If you are computing offsets, you need to make sure your offsets are within the bounds of the lists you have. So if you have 10 elements, don't try to access the 11th element.
import collections
grid_offset = collections.namedtuple('grid_offset', 'dr dc')
Grid = [[0 for c in range(10)] for r in range(10)]
Grid_height = len(Grid)
Grid_width = len(Grid[0])
Neighbors = [
grid_offset(dr, dc)
for dr in range(-1, 2)
for dc in range(-1, 2)
if not dr == dc == 0
]
def count_neighbors(row, col):
count = 0
for nb in Neighbors:
r = row + nb.dr
c = col + nb.dc
if 0 <= r < Grid_height and 0 <= c < Grid_width:
# Add the value, or just add one?
count += Grid[r][c]
return count
Grid[4][6] = 1
Grid[5][4] = 1
Grid[5][5] = 1
for row in range(10):
for col in range(10):
print(count_neighbors(row, col), "", end='')
print()
Prints:
$ python test.py
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 0 0
0 0 0 1 2 3 1 1 0 0
0 0 0 1 1 2 2 1 0 0
0 0 0 1 2 2 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
The error is exactly what it says, you need to check if the coordinates fit within the grid:
0 <= i < 10 and 0 <= j < 10
Otherwise you're trying to access an element that doesn't exist in memory, or an element that's not the one you're actually thinking about - Python handles negative indexes, they're counted from the end.
E.g. a[-1] is the last element, exactly the same as a[len(a) - 1].

Unable to retrieve required indices from multiple NumPy arrays

I have 4 numpy arrays of same shape(i.e., 2d). I have to know the index of the last array (d) where the elements of d are smaller than 20, but those indices of d should be located in the region where elements of array(a) are 1; and the elements of array (b) and (c) are not 1.
I tried as follows:
mask = (a == 1)|(b != 1)|(c != 1)
answer = d[mask | d < 20]
Now, I have to set those regions of d into 1; and all other regions of d into 0.
d[answer] = 1
d[d!=1] = 0
print d
I could not solve this problem. How do you solve it?
import numpy as np
a = np.array([[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0],
[0,0,0,1,1,1,1,1,0,0,0]])
b = np.array([[0,0,0,1,1,0,0,0,0,0,0],
[0,0,0,0,0,0,1,1,0,0,0],
[0,0,0,1,0,1,0,0,0,0,0],
[0,0,0,1,1,1,0,1,0,0,0],
[0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,1,0,1,0,0,0,0]])
c = np.array([[0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,1,1,0,0,0],
[0,0,0,0,0,0,1,0,0,0,0],
[0,0,0,0,1,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0]])
d = np.array([[0,56,89,67,12,28,11,12,14,8,240],
[1,57,89,67,18,25,11,12,14,9,230],
[4,51,89,87,19,20,51,92,54,7,210],
[6,46,89,67,51,35,11,12,14,6,200],
[8,36,89,97,43,67,81,42,14,1,220],
[9,16,89,67,49,97,11,12,14,2,255]])
The conditions should be AND-ed together, instead of OR-ed. You can first get the Boolean array / mask representing desired region, and then modify d based on it:
mask = (a == 1) & (b != 1) & (c != 1) & (d < 20)
d[mask] = 1
d[~mask] = 0
print d
Output:
[[0 0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0]]

Categories