Switch color of sns.swarnplot - python

Does anybody know, if its possible to switch the colors, so that i can distinguish every row instead of every column ? And how do I add in a legend, where i can see which player (one color for each player) has e.g. which pace?
My code is:
feldspieler = feldspieler["sofifa_id"]
skills = ['pace','shooting','passing','dribbling','defending','physic']
diagramm = plt.figure(figsize=(40,20))
plt.xticks(rotation=90,fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel('Skills', fontsize=30)
plt.ylabel('Skill value', fontsize=30)
plt.title('Spielervergleich', fontsize = 40)
sns.set_palette("pastel")
for i in feldspieler:
i = fifa_21.loc[fifa_21['sofifa_id'] == i]
i = pd.DataFrame(i, columns = skills)
sns.swarmplot(data=i,size=12)

Thanks a lot #Trevis.
Unfortunately, it still does not work.
Here you can find a screenshot of the dataset and the code that the graphic accesses.
while True:
team = input("Welches Team suchen Sie?: ")
if team in fifa_21.values:
break
else:
print("Dieser Verein existiert nicht. Bitte achten Sie auf eine korrekte Schreibweise.")
gesuchtes_team = fifa_21.loc[(fifa_21['club_name'] == team)]
spieler_verein = gesuchtes_team[["sofifa_id","short_name","nationality","age","player_positions","overall","value_eur"]]
spieler_verein = pd.DataFrame(spieler_verein)
spieler_verein = spieler_verein.reset_index(drop=True)
spieler_verein
feldspieler = spieler_verein.loc[spieler_verein.player_positions != "GK", :]
feldspieler = feldspieler.reset_index(drop=True)
feldspieler
feldspieler = feldspieler["sofifa_id"]
skills = ['pace','shooting','passing','dribbling','defending','physic']
diagramm = plt.figure(figsize=(40,20))
plt.xticks(rotation=90,fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel('Skills', fontsize=30)
plt.ylabel('Skill value', fontsize=30)
plt.title('Spielervergleich', fontsize = 40)
sns.set_palette("pastel")
for i in feldspieler:
i = fifa_21.loc[fifa_21['sofifa_id'] == i]
i = pd.DataFrame(i, columns = skills)
sns.swarmplot(data=fifa_21, x="skills", y="skill_value", hue="sofifa_id")
#sns.swarmplot(x =skills, y= pd.DataFrame(i, columns == skills) ,hue= "sofifa_id", data=i,size=12)

Set the hue parameter to the value of the column you're interested in (sofifa_id). You can then provide the whole dataset at once to plot the data. The legend will be added automatically.
So you should have a DataFrame with a 'skills' column containing the different skills you have in x-axis here. If necessary, see the documentation for pd.melt, in particular the third example.
Then, assuming the default column name value for the value after melting, call
sns.swarmplot(data=fifa_21, x="skills", y="value", hue="sofifa_id")
This is from the official swarmplot function documentation (here).
Edit: So, seeing your data, you should really use pd.melt like this:
(I'm considering one row per player, with distinct short_name values).
data = pd.melt(fifa_21, id_vars='short_name', var_name='skill',
value_vars=['pace', 'shooting', 'passing', 'dribbling',
'defending', 'physic'])
sns.swarmplot(x='skill', y='value', hue='short_name', data=data)
melt will transform to columns and value from a wide format
short_name
pace
shooting
a_name
85
92
to a long table format
short_name
skill
value
a_name
pace
85
a_name
shooting
92

Related

Python: Creating AreaChart3D in a recursive way

My target is to create some AreaChart3D plots in an automatically way.
Precisely, for example I have the following picture:
This table is automatically outputed by a tool.
I can have only one graph, maybe 2 graphs or even 100 graphs (does not matter so much), it is important every time I will have this kind of behavior with Location, Speed, and some times inside.
Now, I would like to have in the second sheet(ws2_obj) 4 graphs or maybe 2 graphs depends how many graphs will be outputed by the tool.
If I would have had a fixed number of graph it would have been easier.
Because this graphs are not fixed i have to cover the entire sheet and I do not know how to do it.
Also, there is another question: how to handle Depth (% of base) using Python?
from openpyxl.chart import (
AreaChart3D,
Reference,
)
wb_obj = xl.load_workbook('Plots.xlsx')
ws_obj = wb_obj.active
ws2_obj = wb_obj.create_sheet("Graphs")
c1 = AreaChart3D()
c1.legend = None
c1.style = 15
cats = Reference(ws_obj, min_col=1, min_row=7, max_row=200)
data = Reference(ws_obj, min_col=2, min_row=6, max_col=8, max_row=200)
c1.add_data(data, titles_from_data=True)
c1.set_categories(cats)
ws2_obj.add_chart(c1, "A1")
wb_obj.save("Plots.xlsx")
The Code above produces only one graph, but how should I proceed to create 2 or 4 or 100 graphs?
Later edit 1:
I tried something like this and it is almost working:
for i in range(1, 4):
c1 = AreaChart3D()
cats = Reference(ws_obj, min_col=1, min_row=7, max_row=200)
data = Reference(ws_obj, min_col=2, min_row=6, max_col=i * int(step), max_row=200)
c1.title = ws_obj.cell(row=1, column=i * int(step)).value
c1.legend = None
c1.style = 15
c1.y_axis.title = 'Fire Time'
c1.x_axis.title = 'Temperature'
c1.z_axis.title = "Velocity"
c1.add_data(data, titles_from_data=True)
c1.set_categories(cats)
ws2_obj.add_chart(c1, "A2")
For me the last ws2_obj.add_chart(c1, "A2") seems to be the problematic one.
Instead of A2 I would like to use something like ws2_obj.add_chart(c1, cell(row=2, column=i)).value but does not working.
Later Edit 2
I have observed if you want to add a chart to a certain cell, you have to use something like: ws2_obj.add_chart(my_chart, "R2")
In order to use the for loop I tried to find out a way to get this value R2.
Please, see below:
my_cells = []
for i in range(1, 4):
my_cell = ws2_obj.cell(row=1, column=i * int(step) - (int(step) - 1))
my_cells.append(my_cell)
print("My_Cell:", my_cells)
new_cells = []
for i in my_cells:
new_cells.append(re.findall("\W\w\d", str(i)))
new_new_cells = []
for i in new_cells:
new_new_cells.append(i[0])
print("new_new_cells:", new_new_cells)
final_list = [re.sub('[^a-zA-Z0-9]+', '', _) for _ in new_new_cells]
print("final list:", final_list)
And the output will be ['A1', 'H1', 'O1']
and then I can output the graph:
for i in range(1, 4):
c1 = AreaChart3D()
# my_cell = ws2_obj.cell(row=i, column=i * int(step))
cats = Reference(ws_obj, min_col=1, min_row=7, max_row=255)
data = Reference(ws_obj, min_col=2, min_row=6, max_col=i * int(step), max_row=255)
c1.title = ws_obj.cell(row=1, column=i * int(step)).value
c1.legend = None
c1.style = 20
c1.y_axis.title = 'Time'
c1.x_axis.title = 'Location'
c1.z_axis.title = "Velocity"
c1.add_data(data, titles_from_data=True)
c1.set_categories(cats)
c1.x_axis.scaling.max = 75
c1.y_axis.scaling.max = 50
c1.z_axis.scaling.max = 25
ws2_obj.add_chart(c1, str(final_list[i - 1]))
You can create a list of the series data (position where the data series starts). The list has 1 element per series. Iterate the list creating a chart for each and ensure you have some means to place the chart in a unique position.
Example code with comments below.
import openpyxl as xl
from openpyxl.chart import (
AreaChart3D,
Reference,
)
def create_chart(tl, maxr, hdr, x_ax):
"""
Creates a standard Area 3D Chart
"""
cht = AreaChart3D()
cht.legend = None
cht.style = 15
cht.title = hdr + " Chart"
cht.x_axis.title = x_ax
cht.y_axis.title = 'Something' # Some text for the y axis
data = Reference(ws_obj, min_col=tl[0], min_row=tl[1], max_col=tl[0]+1, max_row=maxr-1)
cht.add_data(data, titles_from_data=True)
return cht
## Sheet constants
chart_header = 'Speed' # It is assumed this is located in a merged cell
x_axis_header = 'Location'
series_topleft_header = 25
## Load Workbook and Sheet of Excel with data series
wb_obj = xl.load_workbook('Plots.xlsx')
ws_obj = wb_obj.active
## Get the total used rows in the sheet (end of the series table)
maxrows = ws_obj.max_row
speed_row = ''
speed_col_start = ''
speed_col_end = ''
speed_col_letter = ''
## Get a list of Merged cell in the sheet these contain the Headers for position referencing
merge_list = [m.coord for m in ws_obj.merged_cells.ranges]
## Search for the row with Header name 'Speed' to use as reference for series data postioning
for merge_element in ws_obj.merged_cells:
merge_cell_val = merge_element.start_cell.internal_value
if merge_cell_val.lower() == chart_header.lower():
speed_row = merge_element.max_row
speed_col_start = merge_element.min_col
speed_col_end = merge_element.max_col
speed_col_letter = merge_element.start_cell.column_letter
series_header_row = speed_row + 1
series1_start = speed_col_letter + str(series_header_row+1)
"""
Obtain the location of the top left cell where the series data exists
This searches the row below the header (containing the text 'Speed') for the first
series header (i.e. 25 in the example) and adds each position to the series_postion_list
"""
series_position_list = []
for row in ws_obj.iter_rows(min_row=series_header_row,
max_row=series_header_row,
min_col=speed_col_start,
max_col=speed_col_end):
for cell in row:
if cell.value == series_topleft_header:
series_position_list.append([cell.column, series_header_row])
## Create the Charts
"""
With the series_position_list indicating the top left cell of the series data
and the number of rows in the series determined be the maxrows - 1. This data
can be passed to the create_chart function to create the chart.
Charts are placed below the series data table from Column A with two charts
per row. First row for chart location is 2 rows below the series table.
"""
chart_start_row = maxrows + 2
chart_col = 'A'
"""
The series_position_list is used to create 1 chart per series
The chart creation function takes the top left coordinate and max rows along
with Chart header name and x axis header name
"""
for enum, top_left in enumerate(series_position_list, 1):
chart_obj = create_chart(top_left,
maxrows,
chart_header + ' ' + str(enum),
x_axis_header)
## This sets the position the chart will be placed. Based on standard size
## of plot area the charts are 16 rows and 10 columns apart
if enum == 1:
pass
elif enum % 2 == 1:
chart_col = 'A'
chart_start_row += 16
else:
chart_col = 'J'
## Adds chart to the Excel sheet
print(f"Adding chart {chart_header + ' ' + str(enum)} to Excel:")
print(f"Series Data Start; Row:{str(top_left[1]+1)} Column:{top_left[0]}")
ws_obj.add_chart(chart_obj, chart_col + str(chart_start_row))
print("--------------\n")
wb_obj.save("Plots.xlsx")
-----------------Additional Information--------------
add_chart is a method that accepts two arguments; the chart object and optionally an anchor point (i.e the top left cell where the chart is placed in the sheet). Use of .value at the end of
ws2_obj.add_chart(c1, cell(row=2, column=i)).value
is invalid as you are not entering the method into the cell you are using the method to add the chart object c1 at position cell(row=2, column=i). Using cell(row=2, column=i) is also an invalid syntax. You may have meant to use ws2_obj.cell(row=2, column=i) as the anchor. This would be accepted by the add_chart method however when saving the worksheet there would be an error on checking the anchor point as this expects the anchor to be an "Excel style coordinate" i.e. a string like 'A2' rather than a cell object like ws2_obj.cell(row=2, column=i). Even using (2, 1) would fail the same check.
To set the anchor points I will show how to do two options; All charts on the same row and X charts across the row then start next X charts on the next row etc.
Place all charts on same row;
If you are going to put all charts on the same row then the row coord will not change and only the column position needs adjustment for each chart.
You can generate the anchor points like below, the example code uses a for loop with 18 elements;
from openpyxl.utils.cell import coordinate_to_tuple
from openpyxl.utils import get_column_letter
anchor = 'A2' # Position of anchor, first anchor point is 'A2'
column_separation = 9 # Number of columns to separate each chart
for i in range(0, 18):
coord_tuple = coordinate_to_tuple(anchor)
row = coord_tuple[0]
col_offset = column_separation if i > 0 else 0
col_new = get_column_letter(coord_tuple[1] + col_offset)
anchor = f'{col_new}{row}'
print(f'Adding chart at Anchor point {anchor}')
ws2_obj.add_chart(c1, anchor)
This will put the chart at the following achor points;
A2, J2, S2, AB2, AK2, AT2, BC2, BL2, BU2, CD2, CM2, CV2, DE2, DN2, DW2, EF2, EX2, EO2
Placing the charts is a pattern.
Placing the charts is a pattern of rows and columns is similar to the previous code however when the number of charts reaches your limit the 'row' value has to change and the column resets back to 'A'.
The example code again uses a for loop with 18 elements and splits the charts into rows of max_chart_row, set to 5 in this case;
from openpyxl.utils.cell import coordinate_to_tuple
from openpyxl.utils import get_column_letter
anchor = 'A2'
column_separation = 9
max_chart_row = 5
for i in range(0, 18):
coord_tuple = coordinate_to_tuple(anchor)
row = coord_tuple[0]
col_offset = column_separation if i > 0 else 0
# When the number of charts across the row is reached, set the row to 16 more than the current
# and reset the column offset to 0
if i % (max_chart_row) == 0 and i != 0:
row = row + 16
col_offset = 0
col_new = get_column_letter(col_offset+1)
else:
col_new = get_column_letter(coord_tuple[1] + col_offset)
anchor = f'{col_new}{row}'
print(f'Adding chart at Anchor point {anchor}')
ws2_obj.add_chart(c1, anchor)
This will put the chart at the following achor points;
A2, J2, S2, AB2, AK2,
A18, J18, S18, AB18, AK18,
A34, J34, S34, AB34, AK34,
A50, J50, S50

Create multiple barplots based off groupby conditions

I am trying to create mutliple horizontal barplots for a dataset. The data deals with race times from a running race.
Dataframe has the following columns: Name, Age Group, Finish Time, Finish Place, Hometown. Sample data below.
Name
Age Group
Finish Time
Finish Place
Hometown
Times Ran The Race
John
30-39
15.5
1
New York City
2
Mike
30-39
17.2
2
Denver
1
Travis
40-49
20.4
1
Louisville
3
James
40-49
22.1
2
New York City
1
I would like to create a bar plot similar to what is shown below. There would be 1 bar chart per age group, fastest runner on bottom of chart, runner name with city and number of times ran the race below their name.
Do I need a for loop or would a simple groupby work? The number and sizing of each age group can be dynamic based off the race so it is not a constant, but would be dependent on the dataframe that is used for each race.
I employed a looping process. I use the extraction by age group as a temporary data frame, and then accumulate label information for multiple x-axis to prepare for reuse. The accumulated label information is decomposed into strings and stored in a new list. Next, draw a horizontal bar graph and update the labels on the x-axis.
for ag in df['Age Group'].unique():
label_all = []
tmp = df[df['Age Group'] == ag]
labels = [[x,y,z] for x,y,z in zip(tmp.Name.values, tmp.Hometown.values, tmp['Times Ran The Race'].values)]
for k in range(len(labels)):
label_all.append(labels[k])
l_all = []
for l in label_all:
lbl = l[0] + '\n'+ l[1] + '\n' + str(l[2]) + ' Time'
l_all.append(lbl)
ax = tmp[['Name', 'Finish Time']].plot(kind='barh', legend=False)
ax.set_title(ag +' Age Group')
ax.set_yticklabels([l_all[x] for x in range(len(l_all))])
ax.grid(axis='x')
for i in ['top','bottom','left','right']:
ax.spines[i].set_visible(False)
Here's a quite compact solution. Only tricky part is the ordinal number, if you really want to have that. I copied the lambda solution from Ordinal numbers replacement
Give this a try and please mark the answer with Up-button if you like it.
import matplotlib.pyplot as plt
ordinal = lambda n: "{}{}".format(n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])
for i, a in enumerate(df['Age Group'].unique()):
plt.figure(i)
dfa = df.loc[df['Age Group'] == a].copy()
dfa['Info'] = dfa.Name + '\n' + dfa.Hometown + '\n' + \
[ordinal(row) for row in dfa['Times Ran The Race']] + ' Time'
plt.barh(dfa.Info, dfa['Finish Time'])
plt.title(f'{a} Age Group')
plt.xlabel("Time (Minutes)")

Why matplotlib draws me the new graphic superimposing the old one?

I'm working on django project and using the matplotlib library. Theoretically I have created a filter where you can choose the day and and "node" that you want to graph and with this information a pythonscript is executed that together with pandas and matplotlib creates a graph.
The values ​​of "node" and "day" arrive correctly to the script, and this generates the graphic well. But the only thing wrong is that instead of overwriting the old image (with the previous graphic), draw the new lines on it. Next I show an image of how it looks.
As you can see, each line is equivalent to a different day, because it has been overlapping the different tests I have done. Can anyone tell me where I fail?
Below I attach code
def bateria2(node, day):
csv_path = os.path.join(os.path.dirname(__file__), '..\\data\\csv\\dataframe.csv')
df = pd.read_csv(csv_path)
mes, anyo = 12, 2019
new_df = df[(df['Dia'] == day) & (df['Mes'] == mes) & (df['Año'] == anyo) & (df['Node name'] == node)]
if len(new_df) > 0:
#os.remove('static\\img\\bateria2.png')
x = new_df['Hora[UTC]'].tolist()
y = new_df['Bateria'].tolist()
title = 'Carga/Descarga de la batería día '+str(day)+'/'+str(mes)+'/'+str(anyo)+' de '+str(node)
plt.title(title)
plt.xlabel('Hora [UTC]')
plt.ylabel('Batería')
#plt.legend((y)(node))
plt.plot(x,y)
plt.xticks(x, rotation='vertical')
plt.savefig('static\\img\\bateria2.png',transparent=True)
return 1
else:
return 0
Basically what I'm doing it is to access the .csv file that contains the info, filter according to the data that I want. And if the new dataframe generated has data, create the graph to finally save it.
Regards thank you very much.
Try to clear the current figure, plt.clf() after your savefig command. This should keep your plots from stacking up on top of each other.

How to remove a series from box-plot data?

Hello I'm trying to solve this question with python and seaborn : Use "seaborn" to create box plots to represent the number of pieces per decade. We will not use the decade of the 40s because it only contains one year. **
The decades are starting from 1940 to 2010 and I would like to know how to delete the first decade (1940) from my boxplot.
Here this is what I did :
piecesDecade = sns.boxplot(x = "decade", y ="pieces" , data = lego)
but I dont know how to not use the first decade !
here the output of lego :
You can just filter out the decades:
sns.boxplot(x = "decade", y ="pieces" , data = lego[lego['year'] > 1949])
# or data = lego[lego['decade'] != '1940s']

Looping scatter plot colors

How can scatter plot color be looped?
my code:
col = {'Male':'green','Female':'blue'}
gender = [‘Male’,’Female’,’Male’,’Male’,’Female’, …]
Matched_Days = [list of days…]
Marital_Status = [list of statuses…]
for type in gender:
plt.scatter(Marital_Status, Matched_Days, c=col[type])
I only get one color: blue because last gender is ‘female’ in list.
For some reason, I can't get it to loop and register all colors inside the dictionary
You're not using matplotlib correctly. You only need one scatter, not a while loop.
gender = [‘Male’,’Female’,’Male’,’Male’,’Female’, …]
gender_color=[]
for elem in gender:
if elem=="Male":
gender_color.append("green")
else:
gender_color.append("blue")
Matched_Days = [list of days…]
Marital_Status = [list of statuses…]
plt.scatter(Marital_Status, Matched_Days, c=gender_color)
plt.show()
The c argument can take a list of colors. You shouldn't use a for loop unless you want multiple plots.

Categories