I am new to pyevolve and GA in Python.
I am trying to make a 2D binary array representing matches. Something like this:
A B C
1 1 0 0
2 0 1 0
3 1 0 0
4 0 0 1
My goal is to have only one "1" in each row and the "1" in the array should be equal to the number of rows. One number can be matched with only one letter but a letter can be matched with multiple numbers.
I wrote this code in Evaluation function
def eval_func(chromosome):
score = 0.0
num_of_rows = chromosome.getHeight()
num_of_cols = chromosome.getWidth()
# create 2 lists. One with the sums of each row and one
# with the sums of each column
row_sums = [sum(chromosome[i]) for i in xrange(num_of_rows)]
col_sums = [sum(x) for x in zip(*chromosome)]
# if the sum of "1"s in a row is > 1 then a number (1,2,3,4) is matched with
# more than one letter. We want to avoid that.
for row_sum in row_sums:
if row_sum <= 1:
score += 0.5
else:
score -= 1.0
# col_sum is actually the number of "1"s in the array
col_sum = sum(col_sums)
# if all the numbers are matched we increase the score
if col_sum == num_of_rows:
score += 0.5
if score < 0:
score = 0.0
return score
Seems to work but when I add some other checks, eg if 1 is in A, 2 can not be in C, it fails.
how can this become possible? (many checks)
Thanks in advance.
Related
I am trying to validate if any numbers are duplicates in a 9x9 array however need to exclude all 0 as they are the once I will solve later. I have a 9x9 array and would like to validate if there are any duplicates in the rows and columns however excluding all 0 from the check only numbers from 1 to 9 only. The input array as example would be:
[[1 0 0 7 0 0 0 0 0]
[0 3 2 0 0 0 0 0 0]
[0 0 0 6 0 0 0 0 0]
[0 8 0 0 0 2 0 7 0]
[5 0 7 0 0 1 0 0 0]
[0 0 0 0 0 3 6 1 0]
[7 0 0 0 0 0 2 0 9]
[0 0 0 0 5 0 0 0 0]
[3 0 0 0 0 4 0 0 5]]
Here is where I am currently with my code for this:
#Checking Columns
for c in range(9):
line = (test[:,c])
print(np.unique(line).shape == line.shape)
#Checking Rows
for r in range(9):
line = (test[r,:])
print(np.unique(line).shape == line.shape)
Then I would like to do the exact same for the 3x3 sub arrays in the 9x9 array. Again I need to somehow exclude the 0 from the check. Here is the code I currently have:
for r0 in range(3,9,3):
for c0 in range(3,9,3):
test1 = test[:r0,:c0]
for r in range(3):
line = (test1[r,:])
print(np.unique(line).shape == line.shape)
for c in range(3):
line = (test1[:,c])
print(np.unique(line).shape == line.shape)
``
I would truly appreciate assistance in this regard.
It sure sounds like you're trying to verify the input of a Sudoku board.
You can extract a box as:
for r0 in range(0, 9, 3):
for c0 in range(0, 9, 3):
box = test1[r0:r0+3, c0:c0+3]
... test that np.unique(box) has 9 elements...
Note that this is only about how to extract the elements of the box. You still haven't done anything about removing the zeros, here or on the rows and columns.
Given a box/row/column, you then want something like:
nonzeros = [x for x in box.flatten() if x != 0]
assert len(nonzeros) == len(set(nonzeros))
There may be a more numpy-friendly way to do this, but this should be fast enough.
Excluding zeros is fairly straight forward by masking the array
test = np.array(test)
non_zero_mask = (test != 0)
At this point you can either check the whole matrix for uniqueness
np.unique(test[non_zero_mask])
or you can do it for individual rows/columns
non_zero_row_0 = test[0, non_zero_mask[0]]
unique_0 = np.unique(non_zero_row_0)
You can add the logic above into a loop to get the behavior you want
As for the 3x3 subarrays, you can loop through them as you did in your example.
When you have a small collection of things (small being <=64 or 128, depending on architecture), you can turn it into a set using bits. So for example:
bits = ((2**board) >> 1).astype(np.uint16)
Notice that you have to use right shift after the fact rather than pre-subtracting 1 from board to cleanly handle zeros.
You can now compute three types of sets. Each set is the bitwise OR of bits in a particular arrangement. For this example, you can use sum just the same:
rows = bits.sum(axis=1)
cols = bits.sum(axis=0)
blocks = bits.reshape(3, 3, 3, 3).sum(axis=(1, 3))
Now all you have to do is compare the bit counts of each number to the number of non-zero elements. They will be equal if and only if there are no duplicates. Duplicates will cause the bit count to be smaller.
There are pretty efficient algorithms for counting bits, especially for something as small as a uint16. Here is an example: How to count the number of set bits in a 32-bit integer?. I've adapted it for the smaller size and numpy here:
def count_bits16(arr):
count = arr - ((arr >> 1) & 0x5555)
count = (count & 0x3333) + ((count >> 2) & 0x3333)
return (count * 0x0101) >> 8
This is the count of unique elements for each of the configurations. You need to compare it to the number of non-zero elements. The following boolean will tell you if the board is valid:
count_bits16(rows) == np.count_nonzero(board, axis=1) and \
count_bits16(cols) == np.count_nonzero(board, axis=0) and \
count_bits16(blocks) == np.count_nonzero(board.reshape(3, 3, 3, 3), axis=(1, 3))
I'm trying to create trend streak that displays 1,-1,0 (win/loss/no movement) from a pandas database. I'm looking for the streak to increase when positive, and reset on 0, or reset and create a negative streak on -1. The desired results would be something like this:
win streak
0 0
1 1
1 2
1 3
1 4
0 0
0 0
-1 -1
-1 -2
1 1
Currently I have this that creates the win column.
dataframe.loc[dataframe['close'] > dataframe['close_1h'].shift(1), 'win'] = 1
dataframe.loc[dataframe['close'] < dataframe['close_1h'].shift(1), 'win'] = -1
dataframe.loc[dataframe['close'] == dataframe['close_1h'].shift(1), 'win'] = 0
dataframe['streak'] = numpy.nan_to_num(dataframe['win'].cumsum())
But that doesn't correctly reset the streaks as I would like it to. I've played around with the groupby doing dataframe['streak'] = dataframe.groupby([(dataframe['win'] != dataframe['win'].shift()).cumsum()]) but that gave me an error resulting in "ValueError: Length of values (927) does not match length of index (1631)"
try this:
df['streak'] = df.groupby(df['win'].diff().ne(0).cumsum())['win'].cumsum()
I got a dataframe of 453627 rows like :
number action
1 34
34 2
45 1
42 0
33 3
3 4
I need to split it to 2000 row each, but if total action sum reaches 5000 I split it till that
example: if sum of actions column reaches 5000 at row 1200 split the dataframe to that row, if no split it to row 2000 and so on
how can I do so?
also, how can I read multiple CSVs files in folder each in individual dataframe?
I cannot imagine a vectorized way, so I would just iterate the action column to produce a Series with a distinct value per slot.
After that, a mere groupby would be enough to split the initial dataframe:
maxlen = 2000
thresh = 5000
cursum = 0
curlen = 0
curval = 0
arr = df['action'].values
cat = np.zeros(len(arr), int)
for i in range(len(arr)):
curlen += 1
cursum += arr[i]
if curlen != 1 and (curlen >= maxlen or cursum >= thresh):
cursum = 0
curlen = 0
curval += 1
cat[i] = curval
cat = pd.Series(cat, df.index)
dfs = [dg for _, dg in df.groupby(cat)]
dfs contains the list of the splitted dataframe
I need to write a code to create the new column B embedding the cumsum of the first column A.
When the cumsum value is < 0, then the value in B should be 0.
Then cumsum starts again until next value <0.
I search similar answer but I was not able to find an answer fitting my case. Thanks for your help.
A B
1 1
3 4
5 9
7 16
-6 10
-8 2
-10 *0*
6 6
-15 *0*
11 11
Set up a loop over A and have the total. If the total is less than 0 then set it to 0. Then append the new total to B
You have a A = [ 1, ...,], total = 0, B = []
total = 0
B = []
for i in range(len(A)):
# process the sum
total += A[i]
if total < 0:
total = 0
B.append(total)
Here is a non-Pandas answer that iteratively loops through the values in column A and creates column B by never going below 0.
result = []
cur_res = 0
for i in df.A:
cur_res = max(cur_res + i, 0)
result.append(cur_res)
df['B'] = result
I have a data frame that looks like this, but with several hundred thousand rows:
df
D x y
0 y 5.887672 6.284714
1 y 9.038657 10.972742
2 n 2.820448 6.954992
3 y 5.319575 15.475197
4 n 1.647302 7.941926
5 n 5.825357 13.747091
6 n 5.937630 6.435687
7 y 7.789661 11.868023
8 n 2.669362 11.300062
9 y 1.153347 17.625158
I want to know what proportion of values ("D") in each x:y grid space is "n".
I can do it by brute force, by stepping through x and y and calculating the percentage:
zonexy = {}
for x in np.arange(0,10,2.5):
dfx = df[(df['x'] >= x) & (df['x'] < x+2.5)]
zonexy[x] = {}
for y in np.arange(0,24,6):
dfy = dfx[(dfx['y'] >= y) & (dfx['y'] < y+6)]
try:
pctn = len(dfy[dfy['Descr']=='n'])/len(dfy) * 100.0
except ZeroDivisionError:
pctn = 0
zonexy[x][y] = pctn
Output:
pd.DataFrame(zonexy)
0.0 2.5 5.0 7.5
0 0 0 0 0
6 100 100 50 0
12 0 0 50 0
18 0 0 0 0
But this, and all the variations on this theme that I've tried, is very slow. It seems like there should be a much more efficient way (probably via numpy), but I'm blanking on it.
One way would be to use the 2D histogram function of numpy:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html
Then,
Run it once on the data where the criteria is matched (here, "D" is "n")
Run it again on all of the data.
Divide the first result, element-by-element, with the second result.