I have pandas Dataframe that looks like this:
I am asking to create this kind of plot for every year [1...10] with the Score range of [1...10].
This means that for every year, the plot will present:
how many values between [0-1] have in year 1
how many values between [2-3] have in year 1
how many values between [4-5] have in year 1
.
.
.
.
.
how many values between [6-7] have in year 10
how many values between [8-9] have in year 10
how many values between [10] has in year 10
Need some help, Thank you!
The following code works perfectly:
def visualize_yearly_score_distribution(ds, year):
sns.set_theme(style="ticks")
first_range = 0
second_range = 0
third_range = 0
fourth_range = 0
fifth_range = 0
six_range = 0
seven_range = 0
eight_range = 0
nine_range = 0
last_range = 0
score_list = []
for index, row in ds.iterrows():
if row['Publish Date'] == year:
if 0 < row['Score'] < 1:
first_range += 1
if 1 < row['Score'] < 2:
second_range += 1
if 2 < row['Score'] < 3:
third_range += 1
if 3 < row['Score'] < 4:
fourth_range += 1
if 4 < row['Score'] < 5:
fifth_range += 1
if 5 < row['Score'] < 6:
six_range += 1
if 6 < row['Score'] < 7:
seven_range += 1
if 7 < row['Score'] < 8:
eight_range += 1
if 8 < row['Score'] < 9:
nine_range += 1
if 9 < row['Score'] < 10:
last_range += 1
score_list.append(first_range)
score_list.append(second_range)
score_list.append(third_range)
score_list.append(fourth_range)
score_list.append(fifth_range)
score_list.append(six_range)
score_list.append(seven_range)
score_list.append(eight_range)
score_list.append(nine_range)
score_list.append(last_range)
range_list = ['0-1', '1-2', '2-3', '3-4', '4-5', '5-6', '6-7', '7-8', '8-9', '9-10']
plt.pie([x*100 for x in score_list], labels=[x for x in range_list], autopct='%0.1f', explode=None)
plt.title(f"Yearly Score Distribution for {str(year)}")
plt.tight_layout()
plt.legend()
plt.show()
Thank you all for the kind comments :)
This case is closed.
Related
I'm trying to write two for loops that will return a score for different inputs, and create a new field with the new score. The first loop works fine but the second loop never returns the correct score.
import pandas as pd
d = {'a':['foo','bar'], 'b':[1,3]}
df = pd.DataFrame(d)
score1 = df.loc[df['a'] == 'foo']
score2 = df.loc[df['a'] == 'bar']
for i in score1['b']:
if i < 3:
score1['c'] = 0
elif i <= 3 and i < 4:
score1['c'] = 1
elif i >= 4 and i < 5:
score1['c'] = 2
elif i >= 5 and i < 8:
score1['c'] = 3
elif i == 8:
score1['c'] = 4
for j in score2['b']:
if j < 2:
score2['c'] = 0
elif j <= 2 and i < 4:
score2['c'] = 1
elif j >= 4 and i < 6:
score2['c'] = 2
elif j >= 6 and i < 8:
score2['c'] = 3
elif j == 8:
score2['c'] = 4
print(score1)
print(score2)
When I run script it returns the following:
print(score1)
a b c
0 foo 1 0
print(score2)
a b
1 bar 3
Why doesn't score2 create the new field "c" or a score?
Avoid the use of for loops to conditionally update DataFrame columns which are not Python lists. Use vectorized methods of Pandas and Numpy such as numpy.select which scales to millions of rows! Remember these data science tools calculate much differently than general use Python:
# LIST OF BOOLEAN CONDITIONS
conds = [
score1['b'].lt(3), # EQUIVALENT TO < 3
score1['b'].between(3, 4, inclusive="left"), # EQUIVALENT TO >= 3 or < 4
score1['b'].between(4, 5, inclusive="left"), # EQUIVALENT TO >= 4 or < 5
score1['b'].between(5, 8, inclusive="left"), # EQUIVALENT TO >= 5 or < 8
score1['b'].eq(8) # EQUIVALENT TO == 8
]
# LIST OF VALUES
vals = [0, 1, 2, 3, 4]
# VECTORIZED ASSIGNMENT
score1['c'] = numpy.select(conds, vals, default=numpy.nan)
# LIST OF BOOLEAN CONDITIONS
conds = [
score2['b'].lt(2),
score2['b'].between(2, 4, inclusive="left"),
score2['b'].between(4, 6, inclusive="left"),
score2['b'].between(6, 8, inclusive="left"),
score2['b'].eq(8)
]
# LIST OF VALUES
vals = [0, 1, 2, 3, 4]
# VECTORIZED ASSIGNMENT
score2['c'] = numpy.select(conds, vals, default=numpy.nan)
On the first iteration of second for loop, j will be in 3. so that none your condition satisfies.
for j in score2['b']:
if j < 3:
score2['c'] = 0
elif j <= 3 and i < 5:
score2['c'] = 1
elif j >= 5 and i < 7:
score2['c'] = 2
elif j >= 7 and i < 9:
score2['c'] = 3
elif j == 9:
score2['c'] = 4
I am attempting to change the values of two columns in my dataset from specific numeric values (2, 10, 25 etc.) to single values (1, 2, 3 or 4) based on the percentile of the specific value within the dataset.
Using the pandas quantile() function I have got the ranges I wish to replace between, but I haven't figured out a working method to do so.
age1 = datasetNB.Age.quantile(0.25)
age2 = datasetNB.Age.quantile(0.5)
age3 = datasetNB.Age.quantile(0.75)
fare1 = datasetNB.Fare.quantile(0.25)
fare2 = datasetNB.Fare.quantile(0.5)
fare3 = datasetNB.Fare.quantile(0.75)
My current solution attempt for this problem is as follows:
for elem in datasetNB['Age']:
if elem <= age1:
datasetNB[elem].replace(to_replace = elem, value = 1)
print("set to 1")
elif (elem > age1) & (elem <= age2):
datasetNB[elem].replace(to_replace = elem, value = 2)
print("set to 2")
elif (elem > age2) & (elem <= age3):
datasetNB[elem].replace(to_replace = elem, value = 3)
print("set to 3")
elif elem > age3:
datasetNB[elem].replace(to_replace = elem, value = 4)
print("set to 4")
else:
pass
for elem in datasetNB['Fare']:
if elem <= fare1:
datasetNB[elem] = 1
elif (elem > fare1) & (elem <= fare2):
datasetNB[elem] = 2
elif (elem > fare2) & (elem <= fare3):
datasetNB[elem] = 3
elif elem > fare3:
datasetNB[elem] = 4
else:
pass
What should I do to get this working?
pandas already has one function to do that, pandas.qcut.
You can simply do
q_list = [0, 0.25, 0.5, 0.75, 1]
labels = range(1, 5)
df['Age'] = pd.qcut(df['Age'], q_list, labels=labels)
df['Fare'] = pd.qcut(df['Fare'], q_list, labels=labels)
Input
import numpy as np
import pandas as pd
# Generate fake data for the sake of example
df = pd.DataFrame({
'Age': np.random.randint(10, size=6),
'Fare': np.random.randint(10, size=6)
})
>>> df
Age Fare
0 1 6
1 8 2
2 0 0
3 1 9
4 9 6
5 2 2
Output
DataFrame after running the above code
>>> df
Age Fare
0 1 3
1 4 1
2 1 1
3 1 4
4 4 3
5 3 1
Note that in your specific case, since you want quartiles, you can just assign q_list = 4.
Problem is probably simple, but my brain doesn't work as expected.
Imagine you have this Panda Series:
y = pd.Series([5, 5 , -5 , -10, 7 , 7 ])
z = y * 0
I would like to have output:
1, 2 , -1 ,-2 ,1 ,2
My solution below:
for i, row in y.iteritems():
if i == 0 and y[i] > 0:
z[i] = 1
elif i == 0:
z[i] = -1
elif y[i] >= 0 and y[i-1] >= 0:
z[i] = 1 + z[i-1]
elif y[i] < 0 and y[i-1] < 0:
z[i] = -1 + z[i-1]
elif y[i] >= 0 and y[i-1] < 0:
z[i] = 1
elif y[i] < 0 and y[i-1] >= 0:
z[i] = -1
I would think there is a more Python/Panda solution.
You can use np.sign() to check if the number is positive/negative ans compare it to the next row using shift(). Finally, use cumcount() to sum each sub series
y = pd.Series([5, 5 , -5 , -10, 7 , 7 ])
parts = (np.sign(y) != np.sign(y.shift())).cumsum()
print((y.groupby(parts).cumcount() + 1) * np.sign(y))
# or print(y.groupby(parts).cumcount().add(1).mul(np.sign(y)))
Output
0 1
1 2
2 -1
3 -2
4 1
5 2
Turning points in terms of sign are found via looking at difference not being 0 when subjected to np.sign. Then cumulative sum of this gives consecutive groups of same sign. We lastly put cumcount to number each group and also multiply by the sign to get negative counts:
signs = np.sign(y)
grouper = signs.diff().ne(0).cumsum()
result = y.groupby(grouper).cumcount().add(1).mul(signs)
where add(1) is because cumcount gives 0, 1, .. but we need 1 more.
>>> result
0 1
1 2
2 -1
3 -2
4 1
5 2
I am trying to get this code to calculate 5 and print numbers by 5's with a while statement of 7, so I want it to loop through, generating a different number 7 times; however, when it gets to a number over 10, I want it to start back over at 0 and ignore the 10.
This is my code:
while z < 7:
firstpickplusfive = int(firstpickplusfive) + 1
counts = counts + 1
if counts == 1:
if firstpickplusfive > 9:
firstpickplusfive = 0
if counts == 5:
print firstpickplusfive
z = int(z) + 1
The code prints the first number, but freezes on printing any others. Why isn't this working?
Your code is not in the loop. Python's code blocks are created with indents:
while z < 7:
firstpickplusfive = int(firstpickplusfive) + 1
counts = counts + 1
if counts == 1:
if firstpickplusfive > 9:
firstpickplusfive = 0
if counts == 5:
print firstpickplusfive
z = int(z) + 1
Is this the result you were trying to achieve:
import random
x = random.randint(1,9)
for i in range(1,8):
print x
x += 5
if x >= 10:
x -= 9
This generates a random number, and adds 5 until it passes 10, then subtracts 9, and it does this seven times. If I understand your question correctly, this is what you were trying to do. Please correct me if I am wrong.
no this is not what I was trying to do here is what I am trying to do.
counts = 0
r = 0
firstpickplusfive = 7
p = firstpickplusfive + 5
while r < 3:
firstpickplusfive = p + counts
if firstpickplusfive > 6:
firstpickplusfive = firstpickplusfive + counts - 10
if p > 9:
p = firstpickplusfive + counts
print firstpickplusfive
counts = counts + 1
r = int(r) + 1
it works alone, but when I add it to the script I am trying to write it doesn't work...if there is a simpler way to do it I would appreciate knowing it.
ie.
number = number + 5 + 0
then
number = number + 5 + 1.....ect which
example 7 + 5 + 0 = 12,
7 + 5 + 1 = 13.........
if the number is equal to 10 then I want it to drop the tens place and keep the 1's place
example 7 + 5 + 0 = 2,
7 + 5 + 1 = 3
Here is an easier method:
for i in range(1,4):
num = 7
num += 5
num +=(i-1)
if num >=10:
num -= 10
print num
i+=1
Try this in your script, does it work?
When I was testing a counter, I discovered that it only seems to display the last item to go through it. For example, if something was excellent, it showed up as counted so it would be "1". However regardless of other data, the rest would be 0.
def mealrating(score, review):
for x in range(0,len(score)):
mp = 0
mg = 0
me = 0
if score[x] >= 1 and score[x] <= 3:
review.append("poor")
mp = mp + 1
if score[x] >= 4 and score[x] <= 6:
review.append("good")
mg = mg + 1
if score[x] >= 7 and score[x] <= 10:
review.append("excellent")
me = me + 1
print("The customer rated tonight's meal as:")
print('Poor:' + str(mp))
print('Good:' + str(mg))
print('Excellent:' + str(me))
print("\n")
You are resetting mp, mg, and me in each iteration.
def mealrating(score, review):
mp = 0
mg = 0
me = 0
for x in range(0,len(score)):
if score[x] >= 1 and score[x] <= 3:
review.append("poor")
mp = mp + 1
if score[x] >= 4 and score[x] <= 6:
review.append("good")
mg = mg + 1
if score[x] >= 7 and score[x] <= 10:
review.append("excellent")
me = me + 1
print("The customer rated tonight's meal as:")
print('Poor:' + str(mp))
print('Good:' + str(mg))
print('Excellent:' + str(me))
print("\n")
You must initialize the counters outside the loop:
mp = 0
mg = 0
me = 0
for x in range(0, len(score)):
# same as before
Otherwise they'll get reset at each iteration! To make your code more Pythonic, take the following tips into consideration:
A condition of the form x >= i and x <= j can be written more concisely as i <= x <= j
The idiomatic way to traverse a list is using iterators, without explicitly using indexes
The conditions are mutually exclusive, so you should use elif
Use += for incrementing a variable
This is what I mean:
mp = mg = me = 0
for s in score:
if 1 <= s <= 3:
review.append("poor")
mp += 1
elif 4 <= s <= 6:
# and so on