How can scatter plot color be looped?
my code:
col = {'Male':'green','Female':'blue'}
gender = [‘Male’,’Female’,’Male’,’Male’,’Female’, …]
Matched_Days = [list of days…]
Marital_Status = [list of statuses…]
for type in gender:
plt.scatter(Marital_Status, Matched_Days, c=col[type])
I only get one color: blue because last gender is ‘female’ in list.
For some reason, I can't get it to loop and register all colors inside the dictionary
You're not using matplotlib correctly. You only need one scatter, not a while loop.
gender = [‘Male’,’Female’,’Male’,’Male’,’Female’, …]
gender_color=[]
for elem in gender:
if elem=="Male":
gender_color.append("green")
else:
gender_color.append("blue")
Matched_Days = [list of days…]
Marital_Status = [list of statuses…]
plt.scatter(Marital_Status, Matched_Days, c=gender_color)
plt.show()
The c argument can take a list of colors. You shouldn't use a for loop unless you want multiple plots.
Related
I am trying to create mutliple horizontal barplots for a dataset. The data deals with race times from a running race.
Dataframe has the following columns: Name, Age Group, Finish Time, Finish Place, Hometown. Sample data below.
Name
Age Group
Finish Time
Finish Place
Hometown
Times Ran The Race
John
30-39
15.5
1
New York City
2
Mike
30-39
17.2
2
Denver
1
Travis
40-49
20.4
1
Louisville
3
James
40-49
22.1
2
New York City
1
I would like to create a bar plot similar to what is shown below. There would be 1 bar chart per age group, fastest runner on bottom of chart, runner name with city and number of times ran the race below their name.
Do I need a for loop or would a simple groupby work? The number and sizing of each age group can be dynamic based off the race so it is not a constant, but would be dependent on the dataframe that is used for each race.
I employed a looping process. I use the extraction by age group as a temporary data frame, and then accumulate label information for multiple x-axis to prepare for reuse. The accumulated label information is decomposed into strings and stored in a new list. Next, draw a horizontal bar graph and update the labels on the x-axis.
for ag in df['Age Group'].unique():
label_all = []
tmp = df[df['Age Group'] == ag]
labels = [[x,y,z] for x,y,z in zip(tmp.Name.values, tmp.Hometown.values, tmp['Times Ran The Race'].values)]
for k in range(len(labels)):
label_all.append(labels[k])
l_all = []
for l in label_all:
lbl = l[0] + '\n'+ l[1] + '\n' + str(l[2]) + ' Time'
l_all.append(lbl)
ax = tmp[['Name', 'Finish Time']].plot(kind='barh', legend=False)
ax.set_title(ag +' Age Group')
ax.set_yticklabels([l_all[x] for x in range(len(l_all))])
ax.grid(axis='x')
for i in ['top','bottom','left','right']:
ax.spines[i].set_visible(False)
Here's a quite compact solution. Only tricky part is the ordinal number, if you really want to have that. I copied the lambda solution from Ordinal numbers replacement
Give this a try and please mark the answer with Up-button if you like it.
import matplotlib.pyplot as plt
ordinal = lambda n: "{}{}".format(n,"tsnrhtdd"[(n/10%10!=1)*(n%10<4)*n%10::4])
for i, a in enumerate(df['Age Group'].unique()):
plt.figure(i)
dfa = df.loc[df['Age Group'] == a].copy()
dfa['Info'] = dfa.Name + '\n' + dfa.Hometown + '\n' + \
[ordinal(row) for row in dfa['Times Ran The Race']] + ' Time'
plt.barh(dfa.Info, dfa['Finish Time'])
plt.title(f'{a} Age Group')
plt.xlabel("Time (Minutes)")
Does anybody know, if its possible to switch the colors, so that i can distinguish every row instead of every column ? And how do I add in a legend, where i can see which player (one color for each player) has e.g. which pace?
My code is:
feldspieler = feldspieler["sofifa_id"]
skills = ['pace','shooting','passing','dribbling','defending','physic']
diagramm = plt.figure(figsize=(40,20))
plt.xticks(rotation=90,fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel('Skills', fontsize=30)
plt.ylabel('Skill value', fontsize=30)
plt.title('Spielervergleich', fontsize = 40)
sns.set_palette("pastel")
for i in feldspieler:
i = fifa_21.loc[fifa_21['sofifa_id'] == i]
i = pd.DataFrame(i, columns = skills)
sns.swarmplot(data=i,size=12)
Thanks a lot #Trevis.
Unfortunately, it still does not work.
Here you can find a screenshot of the dataset and the code that the graphic accesses.
while True:
team = input("Welches Team suchen Sie?: ")
if team in fifa_21.values:
break
else:
print("Dieser Verein existiert nicht. Bitte achten Sie auf eine korrekte Schreibweise.")
gesuchtes_team = fifa_21.loc[(fifa_21['club_name'] == team)]
spieler_verein = gesuchtes_team[["sofifa_id","short_name","nationality","age","player_positions","overall","value_eur"]]
spieler_verein = pd.DataFrame(spieler_verein)
spieler_verein = spieler_verein.reset_index(drop=True)
spieler_verein
feldspieler = spieler_verein.loc[spieler_verein.player_positions != "GK", :]
feldspieler = feldspieler.reset_index(drop=True)
feldspieler
feldspieler = feldspieler["sofifa_id"]
skills = ['pace','shooting','passing','dribbling','defending','physic']
diagramm = plt.figure(figsize=(40,20))
plt.xticks(rotation=90,fontsize=20)
plt.yticks(fontsize=20)
plt.xlabel('Skills', fontsize=30)
plt.ylabel('Skill value', fontsize=30)
plt.title('Spielervergleich', fontsize = 40)
sns.set_palette("pastel")
for i in feldspieler:
i = fifa_21.loc[fifa_21['sofifa_id'] == i]
i = pd.DataFrame(i, columns = skills)
sns.swarmplot(data=fifa_21, x="skills", y="skill_value", hue="sofifa_id")
#sns.swarmplot(x =skills, y= pd.DataFrame(i, columns == skills) ,hue= "sofifa_id", data=i,size=12)
Set the hue parameter to the value of the column you're interested in (sofifa_id). You can then provide the whole dataset at once to plot the data. The legend will be added automatically.
So you should have a DataFrame with a 'skills' column containing the different skills you have in x-axis here. If necessary, see the documentation for pd.melt, in particular the third example.
Then, assuming the default column name value for the value after melting, call
sns.swarmplot(data=fifa_21, x="skills", y="value", hue="sofifa_id")
This is from the official swarmplot function documentation (here).
Edit: So, seeing your data, you should really use pd.melt like this:
(I'm considering one row per player, with distinct short_name values).
data = pd.melt(fifa_21, id_vars='short_name', var_name='skill',
value_vars=['pace', 'shooting', 'passing', 'dribbling',
'defending', 'physic'])
sns.swarmplot(x='skill', y='value', hue='short_name', data=data)
melt will transform to columns and value from a wide format
short_name
pace
shooting
a_name
85
92
to a long table format
short_name
skill
value
a_name
pace
85
a_name
shooting
92
EDITED
I really need help from Networkx/graph experts.
Let us say I have the following data frames and I would like to convert these data frames to graphs. Then I would like to map the two graphs with corresponding nodes based on description and priority attributes.
df1
From description To priority
10 Start 20, 50 1
20 Left 40 2
50 Bottom 40 2
40 End - 1
df2
From description To priority
60 Start 70,80 1
70 Left 80, 90 2
80 Left 100 2
90 Bottom 100 2
100 End - 1
I just converted the two data frames and created a graph (g1, and g2).
And then I am trying to match the nodes based on their description and priority for only once. for example 10/60, 40/100, 50/90 but not 20/70, 20/80, and 70/80. 20 has three conditions to be mapped which are not what I want. Because I would like to map nodes for only once unless I would like to put them as a single node and mark the node as red to differentiate.
A node should only be mapped for only once means, for example, if I want to map 10, it has priority 1 and description Start on the first graph and then find the same priority and description on the second graph. For this, 60 is there. There are no other nodes other than 60. But if we take 20 on the first graph, it has priority 2 and description left. On the second graph, there are two nodes with priority 2 and description left which is 70 and 80. This creates confusion. I cannot map 20 twice like 20/70 and 20/80. But I would like to put them as a single node as shown below on the sample graph.
I am expecting the following result.
To get the above result, I tried it with the following python code.
mapped_list= []
for node_1, data_1 in g1.nodes(data=True):
for node_2, data_2 in g2.nodes(data=True):
if (((g1.node[node_1]['priority']) == (g2.node[node_2]['priority'])) &
((g1.node[node_1]['description']) == (g2.node[node_2]['description']))
):
if (node_1 in mapped_list) & (node_2 in mapped_list): // check of if the node exist on the mapped_list
pass
else:
name = str(node_1) + '/' + str(node_2)
mapped_list.append((data_1["priority"], data_1["descriptions"], node_1, name))
mapped_list.append((data_2["priority"], data_2["descriptions"], node_2, name))
Can anyone help me to achieve the above result shown on the figure /graph/? Any help is appreciated.
The way I'd go about this instead, is to build a new graph taking the nx.union of both graphs, and then "combine" together the start and end nodes that share attributes using contracted_nodes.
Let's start by creating both graphs from the dataframes:
df1 = df1.drop('To',1).join(df1.To.str.replace(' ','').str.split(',').explode())
df2 = df2.drop('To',1).join(df2.To.str.replace(' ','').str.split(',').explode())
g1 = nx.from_pandas_edgelist(df1.iloc[:-1,[0,3]].astype(int),
source='From', target='To', create_using=nx.DiGraph)
g2 = nx.from_pandas_edgelist(df2.iloc[:-1,[0,3]].astype(int),
source='From', target='To', create_using=nx.DiGraph)
df1_node_ix = df1.assign(graph='graph1').set_index('From').rename_axis('nodes')
nx.set_node_attributes(g1, values=df1_node_ix.description.to_dict(),
name='description')
nx.set_node_attributes(g1, values=df1_node_ix.priority.to_dict(),
name='priority')
nx.set_node_attributes(g1, values=df1_node_ix.graph.to_dict(),
name='graph')
df2_node_ix = df2.assign(graph='graph2').set_index('From').rename_axis('nodes')
nx.set_node_attributes(g2, values=df2_node_ix.description.to_dict(),
name='description')
nx.set_node_attributes(g2, values=df2_node_ix.priority.to_dict(),
name='priority')
nx.set_node_attributes(g2, values=df2_node_ix.graph.to_dict(),
name='graph')
Now by taking the nx.union of both graphs, we have:
g3 = nx.union(g1,g2)
from networkx.drawing.nx_agraph import graphviz_layout
plt.figure(figsize=(8,5))
pos=graphviz_layout(g3, prog='dot')
nx.draw(g3, pos=pos,
with_labels=True,
node_size=1500,
node_color='red',
arrowsize=20)
What we can do now is come up with some data structure which we can later use to easily combine the pairs of nodes that share attributes. For that we can sort the nodes by their description. Sorting them will enable us to use itertools.groupby to group consecutive equal pairs of nodes, which we can then easily combine using nx.contrated_nodes, and then just overwrite on the same previous graph. The nodes can be relabeled as specified in the question with nx.relabel_nodes:
from itertools import groupby
g3_node_view = g3.nodes(data=True)
sorted_by_descr = sorted(g3_node_view, key=lambda x: x[1]['description'])
node_colors = dict()
colors = {'Bottom':'saddlebrown', 'Start':'lightblue',
'Left':'green', 'End':'lightblue'}
all_graphs = {'graph1', 'graph2'}
for _, grouped_by_descr in groupby(sorted_by_descr,
key=lambda x: x[1]['description']):
for _, group in groupby(grouped_by_descr, key=lambda x: x[1]['priority']):
grouped_nodes = list(group)
nodes = [i[0] for i in grouped_nodes]
graphs = {i[1]['graph'] for i in grouped_nodes}
# check if there are two nodes that share attributes
# and both belong to different graphs
if len(nodes)==2 and graphs==all_graphs:
# contract both nodes and update graph
g3 = nx.contracted_nodes(g3, *nodes)
# define new contracted node name and relabel
new_node = '/'.join(map(str, nodes))
g3 = nx.relabel_nodes(g3, {nodes[0]:new_node})
node_colors[new_node] = colors[grouped_nodes[0][1]['description']]
else:
for node in nodes:
node_colors[node] = 'red'
Which would give:
plt.figure(figsize=(10,7))
pos=graphviz_layout(g3, prog='dot')
nx.draw(g3, pos=pos,
with_labels=True,
node_size=2500,
nodelist=node_colors.keys(),
node_color=node_colors.values(),
arrowsize=20)
I am trying to filter CSV file where I need to store prices of different commodities that are > 1000 in different arrays, I can able to get only 1 commodity values perfectly but other commodity array just a duplicate of the 1st commodity.
CSV file looks like below figure:
CODE
import matplotlib.pyplot as plt
import csv
import pandas as pd
import numpy as np
# csv file name
filename = "CommodityPrice.csv"
# List gold price above 1000
gold_price_above_1000 = []
palladiun_price_above_1000 = []
gold_futr_price_above_1000 = []
cocoa_future_price_above_1000 = []
df = pd.read_csv(filename)
commodity = df["Commodity"]
price = df['Price']
for gold_price in price:
if (gold_price <= 1000):
break
else:
for gold in commodity:
if ('Gold' == gold):
gold_price_above_1000.append(gold_price)
break
for palladiun_price in price:
if (palladiun_price <= 1000):
break
else:
for palladiun in commodity:
if ('Palladiun' == palladiun):
palladiun_price_above_1000.append(palladiun_price)
break
for gold_futr_price in price:
if (gold_futr_price <= 1000):
break
else:
for gold_futr in commodity:
if ('Gold Futr' == gold_futr):
gold_futr_price_above_1000.append(gold_futr_price)
break
for cocoa_future_price in price:
if (cocoa_future_price <= 1000):
break
else:
for cocoa_future in commodity:
if ('Cocoa Future' == cocoa_future):
cocoa_future_price_above_1000.append(cocoa_future_price)
break
print(gold_price_above_1000)
print(palladiun_price_above_1000)
print(gold_futr_price_above_1000)
print(cocoa_future_price_above_1000)
plt.ylim(1000, 3000)
plt.plot(gold_price_above_1000)
plt.plot(palladiun_price_above_1000)
plt.plot(gold_futr_price_above_1000)
plt.plot(cocoa_future_price_above_1000)
plt.title('Commodity Price(>=1000)')
y = np.array(gold_price_above_1000)
plt.ylabel("Price")
plt.show()
print("SUCCESS")
Here is my question in detail,
Please use pandas and matplotlib to sort out the data in the csv and output and store the sorted data into the process chart. The output results are shown in the following figures.
Figure 1 The upper picture is to take all the products with Price> = 1000 in csv, mark all their prices in April and May and draw them into a linear graph. When outputting, the year in the date needs to be removed. The label name is marked and displayed. The title names of the chart, x-axis, and y- axis need to be marked. The range of the y-axis falls within 1000 ~ 3000, and the color of the line is not specified.
Figure 1 The picture below is from all the products with Price> = 1000 in csv. Mark their Change% in April and May and draw them into a dotted line graph. The dots need to be in a dot style other than '.' And 'o'. To mark, please mark the line with a line other than a solid line. When outputting, you need to remove the year from the date. You need to mark and display the label name of each line. The title names of the chart, x-axis, and y-axis must be marked. You need to add a grid line, the y-axis range falls from -15 to +15, and the color of the line is not specified.
The upper and lower two pictures in Figure 2 are changed to 1000> Price> = 500. The other conditions are basically the same as in Figure 1, except that the points and lines of the dot and line diagrams below Figure 2 need to use different styles from Figure 1.
The first and second pictures in Figure 1 must be displayed in the same window, as is the picture in Figure 2.
All of your blocks of code are doing the exact same thing. Changing the same of the iterator doesn't change what it does.
for gold_price in price:
for palladiun_price in price:
for gold_futr_price in price:
for cocoa_future_price in price:
This is going through the exact same data. You haven't subsetted for specific commodities.
Using the break statement in that loop doesn't make sense either. It should be a pass.
Basically for every number above 1000, you iterate through your entire Commodities column and add number to the list for every time you see a specific commodity.
Read how to index and select data in pandas.
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
gold_price_above_1000 = df[(df.Commodity=='Gold') & (df.Price>1000)]
I've got a dataframe with categorical data.
A1 = ["cat",'apple','red',1,2]
A2 = ['dog','grape','blue',3,4]
A3 = ['rat','grape','gray',5,6]
A4 = ['toad','kiwi','yellow',7,8]
df_MD = pd.DataFrame([A1,A2,A3,A4],columns= ["animal","fruit","color","length","weight"])
animal fruit color length weight
0 cat apple red 1 2
1 dog grape blue 3 4
2 rat grape gray 5 6
3 toad kiwi yellow 7 8
I want to use bokeh serve to eventually create interactive plots.
I implemented this suggestion on how to add listeners:
tgtCol1 = 'animal'
catList = list(np.unique(df_MD[tgtCol1]))
def updatexy(tgtCol1,catList,df_MD):
'''creates x and y values based on whether the entry is found in catlist'''
mybool = df_MD[tgtCol1]==catList[0]
for cc in catList:
mybool = (mybool) | ( df_MD[tgtCol1]== cc)
df_MD_mybool = df_MD[mybool].copy()
x = df_MD_mybool['length'].copy()
y = df_MD_mybool['weight'].copy()
return(x,y)
x,y = updatexy(tgtCol1,catList,df_MD)
#create dropdown menu for column selection
menu = [x for x in zip(df_MD.columns,df_MD.columns)]
dropdown = Dropdown(label="select column", button_type="success", menu=menu)
def function_to_call(attr, old, new):
print(dropdown.value)
dropdown.on_change('value', function_to_call)
dropdown.on_click(function_to_call)
#create buttons for category selection
catList = np.unique(df_MD[tgtCol1].dropna())
checkbox = CheckboxGroup(labels=catList)
def function_to_call2(attr, old, new):
print(checkbox.value)
checkbox.on_change('value', function_to_call2)
checkbox.on_click(function_to_call2)
#slap everything together
layout = column(dropdown,checkbox)
#add it to the layout
curdoc().add_root(layout)
curdoc().title = "Selection Histogram"
This works ok to create the initial set of menus. But when I try to change the column or select different categories I get an error:
TypeError("function_to_call() missing 2 required positional arguments: 'old' and 'new'",)
so I can't even call the "listener" functions.
Once I call the listener functions, how do I update my list of checkbox values, as well as x and y?
I can update them within the scope of function_to_call1 and function_to_call2, but the global values for x ,y ,tgtCol1, and catList are unchanged!
I couldn't really find any guide or documentation on how listeners work, but after some experimentation I found that the structure for the listener is wrong. It should be as follows
myWdiget = <some widget>(<arguments>)
def function_to_call(d):
<actions, where d is a string or object corresponding to the widget state>
myWdiget.on_click(function_to_call) #add a event listener to myWdiget
So, my code turns out to be
dropdown = Dropdown(label="select column", button_type="success", menu=menu)
def function_to_call(d): #only one argument, the selection from the dropdown
catList = list(np.unique(df_MD[d].dropna()))
checkbox.labels=catList
checkbox.active = [] #makes it so nothing is checked
dropdown.on_click(function_to_call)
#create buttons for category selection
checkbox = CheckboxGroup(labels=catList)
def function_to_call2(cb):
tempindex = [x for x in cb] #cb is some bokeh object, we convert it to a list
#tgtCol1 is a global variable referring to the currently active column
#not an ideal solution
catList = np.unique(df_MD[tgtCol1].dropna())
if len(tempindex) != 0: catList = list(catList[tempindex])
x,y = updatexy(tgtCol1,catList,df_MD) #gets new data based on desired column, category set, and dataframe
s.data = dict(x = x,y = y) #'source' object to update a plot I made, unrelated to this answer, so I'm not showing it
checkbox.on_click(function_to_call2)