Why won't second for loop execute correctly? - python

I'm trying to write two for loops that will return a score for different inputs, and create a new field with the new score. The first loop works fine but the second loop never returns the correct score.
import pandas as pd
d = {'a':['foo','bar'], 'b':[1,3]}
df = pd.DataFrame(d)
score1 = df.loc[df['a'] == 'foo']
score2 = df.loc[df['a'] == 'bar']
for i in score1['b']:
if i < 3:
score1['c'] = 0
elif i <= 3 and i < 4:
score1['c'] = 1
elif i >= 4 and i < 5:
score1['c'] = 2
elif i >= 5 and i < 8:
score1['c'] = 3
elif i == 8:
score1['c'] = 4
for j in score2['b']:
if j < 2:
score2['c'] = 0
elif j <= 2 and i < 4:
score2['c'] = 1
elif j >= 4 and i < 6:
score2['c'] = 2
elif j >= 6 and i < 8:
score2['c'] = 3
elif j == 8:
score2['c'] = 4
print(score1)
print(score2)
When I run script it returns the following:
print(score1)
a b c
0 foo 1 0
print(score2)
a b
1 bar 3
Why doesn't score2 create the new field "c" or a score?

Avoid the use of for loops to conditionally update DataFrame columns which are not Python lists. Use vectorized methods of Pandas and Numpy such as numpy.select which scales to millions of rows! Remember these data science tools calculate much differently than general use Python:
# LIST OF BOOLEAN CONDITIONS
conds = [
score1['b'].lt(3), # EQUIVALENT TO < 3
score1['b'].between(3, 4, inclusive="left"), # EQUIVALENT TO >= 3 or < 4
score1['b'].between(4, 5, inclusive="left"), # EQUIVALENT TO >= 4 or < 5
score1['b'].between(5, 8, inclusive="left"), # EQUIVALENT TO >= 5 or < 8
score1['b'].eq(8) # EQUIVALENT TO == 8
]
# LIST OF VALUES
vals = [0, 1, 2, 3, 4]
# VECTORIZED ASSIGNMENT
score1['c'] = numpy.select(conds, vals, default=numpy.nan)
# LIST OF BOOLEAN CONDITIONS
conds = [
score2['b'].lt(2),
score2['b'].between(2, 4, inclusive="left"),
score2['b'].between(4, 6, inclusive="left"),
score2['b'].between(6, 8, inclusive="left"),
score2['b'].eq(8)
]
# LIST OF VALUES
vals = [0, 1, 2, 3, 4]
# VECTORIZED ASSIGNMENT
score2['c'] = numpy.select(conds, vals, default=numpy.nan)

On the first iteration of second for loop, j will be in 3. so that none your condition satisfies.
for j in score2['b']:
if j < 3:
score2['c'] = 0
elif j <= 3 and i < 5:
score2['c'] = 1
elif j >= 5 and i < 7:
score2['c'] = 2
elif j >= 7 and i < 9:
score2['c'] = 3
elif j == 9:
score2['c'] = 4

Related

Matplotlib and Pandas Plotting amount of numbers in certain range

I have pandas Dataframe that looks like this:
I am asking to create this kind of plot for every year [1...10] with the Score range of [1...10].
This means that for every year, the plot will present:
how many values between [0-1] have in year 1
how many values between [2-3] have in year 1
how many values between [4-5] have in year 1
.
.
.
.
.
how many values between [6-7] have in year 10
how many values between [8-9] have in year 10
how many values between [10] has in year 10
Need some help, Thank you!
The following code works perfectly:
def visualize_yearly_score_distribution(ds, year):
sns.set_theme(style="ticks")
first_range = 0
second_range = 0
third_range = 0
fourth_range = 0
fifth_range = 0
six_range = 0
seven_range = 0
eight_range = 0
nine_range = 0
last_range = 0
score_list = []
for index, row in ds.iterrows():
if row['Publish Date'] == year:
if 0 < row['Score'] < 1:
first_range += 1
if 1 < row['Score'] < 2:
second_range += 1
if 2 < row['Score'] < 3:
third_range += 1
if 3 < row['Score'] < 4:
fourth_range += 1
if 4 < row['Score'] < 5:
fifth_range += 1
if 5 < row['Score'] < 6:
six_range += 1
if 6 < row['Score'] < 7:
seven_range += 1
if 7 < row['Score'] < 8:
eight_range += 1
if 8 < row['Score'] < 9:
nine_range += 1
if 9 < row['Score'] < 10:
last_range += 1
score_list.append(first_range)
score_list.append(second_range)
score_list.append(third_range)
score_list.append(fourth_range)
score_list.append(fifth_range)
score_list.append(six_range)
score_list.append(seven_range)
score_list.append(eight_range)
score_list.append(nine_range)
score_list.append(last_range)
range_list = ['0-1', '1-2', '2-3', '3-4', '4-5', '5-6', '6-7', '7-8', '8-9', '9-10']
plt.pie([x*100 for x in score_list], labels=[x for x in range_list], autopct='%0.1f', explode=None)
plt.title(f"Yearly Score Distribution for {str(year)}")
plt.tight_layout()
plt.legend()
plt.show()
Thank you all for the kind comments :)
This case is closed.

Python: Add a complex conditional column without for loop

I'm trying to add a "conditional" column to my dataframe. I can do it with a for loop but I understand this is not efficient.
Can my code be simplified and made more efficient?
(I've tried masks but I can't get my head around the syntax as I'm a relative newbie to python).
import pandas as pd
path = (r"C:\Users\chris\Documents\UKHR\PythonSand\PY_Scripts\CleanModules\Racecards")
hist_file = r"\x3RC_trnhist.xlsx"
racecard_path = path + hist_file
df = pd.read_excel(racecard_path)
df["Mask"] = df["HxFPos"].copy
df["Total"] = df["HxFPos"].copy
cnt = -1
for trn in df["HxRun"]:
cnt = cnt + 1
if df.loc[cnt,"HxFPos"] > 6 or df.loc[cnt,"HxTotalBtn"] > 30:
df.loc[cnt,"Mask"] = 0
elif df.loc[cnt,"HxFPos"] < 2 and df.loc[cnt,"HxRun"] < 4 and df.loc[cnt,"HxTotalBtn"] < 10:
df.loc[cnt,"Mask"] = 1
elif df.loc[cnt,"HxFPos"] < 4 and df.loc[cnt,"HxRun"] < 9 and df.loc[cnt,"HxTotalBtn"] < 10:
df.loc[cnt,"Mask"] = 1
elif df.loc[cnt,"HxFPos"] < 5 and df.loc[cnt,"HxRun"] < 20 and df.loc[cnt,"HxTotalBtn"] < 20:
df.loc[cnt,"Mask"] = 1
else:
df.loc[cnt,"Mask"] = 0
df.loc[cnt,"Total"] = df.loc[cnt,"Mask"] * df.loc[cnt,"HxFPos"]
df.to_excel(r'C:\Users\chris\Documents\UKHR\PythonSand\PY_Scripts\CleanModules\Racecards\cond_col.xlsx', index = False)
Sample data/output:
HxRun HxFPos HxTotalBtn Mask Total
7 5 8 0 0
13 3 2.75 1 3
12 5 3.75 0 0
11 5 5.75 0 0
11 7 9.25 0 0
11 9 14.5 0 0
10 10 26.75 0 0
8 4 19.5 1 4
8 8 67 0 0
Use df.assign() for a complex vectorized expression
Use vectorized pandas operators and methods, where possible; avoid iterating. You can do a complex vectorized expression/assignment like this with:
.loc[]
df.assign()
or alternatively df.query (if you like SQL syntax)
or if you insist on doing it by iteration (you shouldn't), you never need to use an explicit for-loop with .loc[] as you did, you can use:
df.apply(your_function_or_lambda, axis=1)
or df.iterrows() as a fallback
df.assign() (or df.query) are going to be less grief when you have long column names (as you do) which get used repreatedly in a complex expression.
Solution with df.assign()
Rewrite your fomula for clarity
When we remove all the unneeded .loc[] calls your formula boils down to:
HxFPos > 6 or HxTotalBtn > 30:
Mask = 0
HxFPos < 2 and HxRun < 4 and HxTotalBtn < 10:
Mask = 1
HxFPos < 4 and HxRun < 9 and HxTotalBtn < 10:
Mask = 1
HxFPos < 5 and HxFPos < 20 and HxTotalBtn < 20:
Mask = 1
else:
Mask = 0
pandas doesn't have a native case-statement/method.
Renaming your variables HxFPos->f, HxFPos->r, HxTotalBtn->btn for clarity:
(f > 6) or (btn > 30):
Mask = 0
(f < 2) and (r < 4) and (btn < 10):
Mask = 1
(f < 4) and (r < 9) and (btn < 10):
Mask = 1
(f < 5) and (r < 20) and (btn < 20):
Mask = 1
else:
Mask = 0
So really the whole boolean expression for Mask is gated by (f <= 6) or (btn <= 30). (Actually your clauses imply you can only have Mask=1 for (f < 5) and (r < 20) and (btn < 20), if you want to optimize further.)
Mask = ((f<= 6) & (btn <= 30)) & ... you_do_the_rest
Vectorize your expressions
So, here's a vectorized rewrite of your first line. Note that comparisons > and < are vectorized, that the vectorized boolean operators are | and & (instead of 'and', 'or'), and you need to parenthesize your comparisons to get the operator precedence right:
>>> (df['HxFPos']>6) | (df['HxTotalBtn']>30)
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 True
dtype: bool
Now that output is a logical expression (vector of 8 bools); you can use that directly in df.loc[logical_expression_for_row, 'Mask'].
Similarly:
((df['HxFPos']<2) & (df['HxRun']<4)) & (df['HxTotalBtn']<10)
Edit - this is where I found an answer: Pandas conditional creation of a series/dataframe column
by #Hossein-Kalbasi
I've just found an answer - please comment if this is not the most efficient.
df.loc[(((df['HxFPos']<3)&(df['HxRun']<5)|(df['HxRun']>4)&(df['HxFPos']<5)&(df['HxRun']<9)|(df['HxRun']>8)&(df['HxFPos']<6)&(df['HxRun']<30))&(df['HxTotalBtn']<30)), 'Mask'] = 1

How can I replace values in a CSV column from a range?

I am attempting to change the values of two columns in my dataset from specific numeric values (2, 10, 25 etc.) to single values (1, 2, 3 or 4) based on the percentile of the specific value within the dataset.
Using the pandas quantile() function I have got the ranges I wish to replace between, but I haven't figured out a working method to do so.
age1 = datasetNB.Age.quantile(0.25)
age2 = datasetNB.Age.quantile(0.5)
age3 = datasetNB.Age.quantile(0.75)
fare1 = datasetNB.Fare.quantile(0.25)
fare2 = datasetNB.Fare.quantile(0.5)
fare3 = datasetNB.Fare.quantile(0.75)
My current solution attempt for this problem is as follows:
for elem in datasetNB['Age']:
if elem <= age1:
datasetNB[elem].replace(to_replace = elem, value = 1)
print("set to 1")
elif (elem > age1) & (elem <= age2):
datasetNB[elem].replace(to_replace = elem, value = 2)
print("set to 2")
elif (elem > age2) & (elem <= age3):
datasetNB[elem].replace(to_replace = elem, value = 3)
print("set to 3")
elif elem > age3:
datasetNB[elem].replace(to_replace = elem, value = 4)
print("set to 4")
else:
pass
for elem in datasetNB['Fare']:
if elem <= fare1:
datasetNB[elem] = 1
elif (elem > fare1) & (elem <= fare2):
datasetNB[elem] = 2
elif (elem > fare2) & (elem <= fare3):
datasetNB[elem] = 3
elif elem > fare3:
datasetNB[elem] = 4
else:
pass
What should I do to get this working?
pandas already has one function to do that, pandas.qcut.
You can simply do
q_list = [0, 0.25, 0.5, 0.75, 1]
labels = range(1, 5)
df['Age'] = pd.qcut(df['Age'], q_list, labels=labels)
df['Fare'] = pd.qcut(df['Fare'], q_list, labels=labels)
Input
import numpy as np
import pandas as pd
# Generate fake data for the sake of example
df = pd.DataFrame({
'Age': np.random.randint(10, size=6),
'Fare': np.random.randint(10, size=6)
})
>>> df
Age Fare
0 1 6
1 8 2
2 0 0
3 1 9
4 9 6
5 2 2
Output
DataFrame after running the above code
>>> df
Age Fare
0 1 3
1 4 1
2 1 1
3 1 4
4 4 3
5 3 1
Note that in your specific case, since you want quartiles, you can just assign q_list = 4.

Cant create a Sudoku level with python

I'm making a Sudoku that runs in terminal with python, and I can't assign numbers to fill up the board with numbers. Here is my code with all functions and main program. I think the error is in the checker for the number.
def createBoard ():
rows = 9
columns = 9
matrix = []
for r in range(rows):
matrix.append([]) # agregar lista
for c in range(columns):
matrix[r].append("")
return matrix
def printBoard (board):
for i in range (len(board)):
print (board[i])
def defineSubMatrix (row, column):
subMatrix = -1
if row >= 0 and row <= 2:
if column >= 0 and column <= 2:
subMatrix = 0
if column >= 3 and column <= 5:
subMatrix = 1
if column >= 6 and column <= 8:
subMatrix = 2
if row >= 3 and row <= 5:
if column >= 0 and column <= 2:
subMatrix = 3
if column >= 3 and column <= 5:
subMatrix = 4
if column >= 6 and column <= 8:
subMatrix = 5
if row >= 3 and row <= 5:
if column >= 0 and column <= 2:
subMatrix = 6
if column >= 3 and column <= 5:
subMatrix = 7
if column >= 6 and column <= 8:
subMatrix = 8
return subMatrix
def createLevel (board):
for i in range (0, 8):
for j in range (0, 8):
num = random.randint (1, 9)
check = checker(board, num, i, j)
while check == False:
if check == False:
num = random.randint (1, 9)
check = checker(board, num, i, j)
board[i][j] = num
board[i][j] = num
return board
def checker (board, num, posX, posY):
### ok = True cuando check == 0
ok = False
checkT = 0
checkR = 0
checkC = 0
checkSM = 0
###Check row right
i = posX
while i + 1 <= 8:
if board[i][posY] == num:
checkR += 1
i = i + 1
###Check row left
i = posX
while i - 1 >= 0:
if board[i][posY] == num:
checkR += 1
i = i - 1
###Check column down
j = posY
while j + 1 <= 8:
if board[posX][j] == num:
checkC += 1
j = j + 1
###Check column up
j = posY
while j - 1 >= 0:
if board[posX][j] == num:
checkC += 1
j = j - 1
###Check Submatrix
subMatrix = defineSubMatrix(posX, posY)
if subMatrix == 0:
for i in range (0, 2):
for j in range (0, 2):
if board[i][j] == num:
checkSM += 1
if subMatrix == 1:
for i in range (3, 5):
for j in range (0, 2):
if board[i][j] == num:
checkSM += 1
if subMatrix == 2:
for i in range (6, 8):
for j in range (0, 2):
if board[i][j] == num:
checkSM += 1
if subMatrix == 3:
for i in range (0, 2):
for j in range (3, 5):
if board[i][j] == num:
checkSM += 1
if subMatrix == 4:
for i in range (3, 5):
for j in range (3, 5):
if board[i][j] == num:
checkSM += 1
if subMatrix == 5:
for i in range (6, 8):
for j in range (3, 5):
if board[i][j] == num:
checkSM += 1
if subMatrix == 6:
for i in range (0, 2):
for j in range (6, 8):
if board[i][j] == num:
checkSM += 1
if subMatrix == 7:
for i in range (3, 5):
for j in range (6, 8):
if board[i][j] == num:
checkSM += 1
if subMatrix == 8:
for i in range (6, 8):
for j in range (6, 8):
if board[i][j] == num:
checkSM += 1
checkT = checkR + checkSM + checkC
if checkT == 0:
ok = True
return ok
def main ():
board = createBoard()
subma = defineSubMatrix(0, 6)
print (subma)
printBoard(board)
print ("Board Created")
level = createLevel(board)
print ("Level created")
printBoard(level)
###PROGRAMA
main()
You can't create a Sudoku this way, by just making random selections. You quickly get into a situation like this:
1 2 3 4 5 6 7 8 9
4 5 6 1 2 3 . . .
and now there are no possibilities for the next cells.
Many Sudoku algorithms create the grids the same way humans solve them, using complicated heuristics. It is possible to use brute force. Consider that every Sudoku puzzle can be derived from every other Sudoku puzzle, by using a combination of (a) swapping rows, (b) swapping columns, (c) swapping sets of 3 rows, (d) swapping sets of 3 columns, (e) rotating 90 degrees, and (f) mirroring across one of the axes. Given that, you can start with a well ordered matrix like this:
1 2 3 4 5 6 7 8 9
4 5 6 7 8 9 1 2 3
7 8 9 1 2 3 4 5 6
2 3 4 5 6 7 8 9 1
5 6 7 8 9 2 3 4 5
8 9 1 2 3 4 5 6 7
3 4 5 6 7 8 9 1 2
6 7 8 9 1 2 3 4 5
9 1 2 3 4 5 6 7 8
and then doing random swaps, rotates, and mirrors, just like shuffling a deck of cards. See this article:
https://www.algosome.com/articles/create-a-solved-sudoku.html

Counter set or rows with the same numbering based on condition

I have dataset. For a certain condition there is a column has True or False values. If there is a sequence of rows has the same value, then let the counter of these rows be the same.
To make it clear, below is my code:
c1 = [True,True,False,False,False,True,False,True,True,False,True]
counter = 1
switch = 0 #increase the counter when the vector has switched twice
c2 = np.repeat(None, len(c1))
c2[i]=counter
for i in range(1,len(c1)):
p = c1[i-1]
x = c1[i]
if p==x:
counter=counter
c2[i]=counter
if p!=x :
switch = switch + 1
c2[i]=switch
elif switch == 2:
counter = counter + 1
switch = 0 #reset the counter
print(c2)
The actual output is
[None 1 1 1 1 2 3 4 1 5 6]
while the expected one should be
[None, 1,1,1,1,2,2,3,3,3,4]
c1 = [True,True,False,False,False,True,False,True,True,False,True]
res = []
var = 1
cur=c1[0]
flag = 0
res.append(None)
for val in c1[1:]:
if val==cur and flag == 0:
res.append(var)
elif val == cur and flag == 1:
var+=1
flag = 0
res.append(var)
elif val != cur and flag == 0:
flag = 1
res.append(var)
elif val != cur and flag == 1:
res.append(var)
else:
pass
print(res)
Output:[None, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4]

Categories