I want to visualize the number of crimes by state using plotly express.
This is the code :
import plotly.express as px
fig = px.choropleth(grouped, locations="Code",
color="Incident",
hover_name="Code",
animation_frame='Year',
scope='usa')
fig.show()
The dataframe itself looks like this:
I only get blank map:
What is the wrong with the code?
The reason for the lack of color coding is that the United States is not specified in the location mode. please find attached a graph with locationmode='USA-states' added. You can find an example in the references. The data was created for your data.
df.head()
Year Code State incident
0 1980 AL Alabama 1445
1 1980 AK Alaska 970
2 1980 AZ Arizona 3092
3 1980 AR Arkansas 1557
4 1980 CA California 1614
import plotly.express as px
fig = px.choropleth(grouped,
locations='Code',
locationmode='USA-states',
color='incident',
hover_name="Code",
animation_frame='Year',
scope="usa")
fig.show()
Currently I have built a network using NetworkX from source-target dataframe:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Person1', target='Person2')
Dataset
Person1 Age Person2 Wedding
0 Adam John 3 Yao Ming Green
1 Mary Abbey 5 Adam Lebron Green
2 Samuel Bradley 24 Mary Lane Orange
3 Lucas Barney 12 Julie Lime Yellow
4 Christopher Rice 0.9 Matt Red Green
I would like to set the size/weights of the links based on the Age column (i.e. age of marriage) and the colour of nodes as in the column Wedding.
I know that, if I wanted add an edge, I could set it as follows: G.add_edge(Person1,Person2, size = 10); for applying different colours to nodes I should probably use the parameter node_color=color_map, where color_map should be the list of colours in the Wedding column (if I am right).
Can you please explain me how to apply these settings to my case?
IIUC:
df = pd.read_clipboard(sep='\s\s+')
collist = df.drop('Age', axis=1).melt('Wedding')
collist
G = nx.from_pandas_edgelist(df, source='Person1', target='Person2', edge_attr='Age')
pos=nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, nodelist=collist['value'], node_color=collist['Wedding'])
nx.draw_networkx_edges(G, pos, width = [i['Age'] for i in dict(G.edges).values()])
Output:
Here is some data I will use to demonstrate my question. This is a question branching from one of my old questions found here.
Alright so to start, I am implementing this code:
1) Load in the data and establish valid and non valid data.
df = pd.read_excel('Downloads/output.xlsx', index_col='date')
good_ranges = []
for i in df:
col = df[i]
gauge_name = col.name
start_mark = (col.notnull() & col.shift().isnull())
start = col[start_mark].index
end_mark = (col.notnull() & col.shift(-1).isnull())
end = col[end_mark].index
for s, e in zip(start, end):
good_ranges.append((gauge_name, s, e))
good_ranges = pd.DataFrame(good_ranges, columns=['gauge', 'start', 'end'])
good_ranges yields:
gauge start end
0 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-02-06 2019-08-27
1 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-08-30 2019-10-01
2 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-09 2019-10-19
3 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-22 2019-10-22
4 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-25 2019-10-25
5 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-10-27 2019-10-31
6 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-11-05 2019-11-29
7 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-12-01 2019-12-02
8 GALISTEO CREEK BELOW GALISTEO DAM, NM 2019-12-04 2019-12-29
9 GALISTEO CREEK BELOW GALISTEO DAM, NM 2020-01-01 2020-01-02
10 GALISTEO CREEK BELOW GALISTEO DAM, NM 2020-01-04 2020-01-17
11 GALISTEO CREEK BELOW GALISTEO DAM, NM 2020-01-19 2020-02-04
12 RIO GRANDE AT OTOWI BRIDGE, NM 2019-02-06 2020-02-04
2) Create the plot of the data where there is valid data
fig, ax = plt.subplots(figsize=(14,8))
ax = ax.xaxis_date()
ax = plt.hlines(good_ranges['gauge'],
dt.date2num(good_ranges['start']),
dt.date2num(good_ranges['end']))
fig.tight_layout()
plt.show()
3) Find the total number of valid Daily Data Days; specify locations with >350 days of valid data
c = good_ranges[['start','end']]
good_ranges['Days'] = good_ranges['end'] - good_ranges['start']
good_ranges['Days'] = good_ranges['Days'].dt.days
df_days = good_ranges.filter(['gauge','Days'], axis=1)
df_new = df_days.groupby(df_days['gauge']).sum()
df_new['Day_Con'] = np.nan
df_new['Day_Con'] = 'YES'
df_new.loc[df_new['Days'] > 350,'Day_Con'] = 'NO'
#### Sort the gauge so that the list will line up with the list of ylabels
df_new = df_new.sort_values("gauge",ascending=False)
print(df_new)
yes_idxs = list(np.where(df_new["Day_Con"] == "YES")[0])
print(yes_idxs)
no_idxs = list(np.where(df_new["Day_Con"] == "NO")[0])
print(no_idxs)
This is what the df_new parameter yields after finding gauge sites with >350 days of data
gauge
TESUQUE CREEK ABOVE DIVERSIONS NEAR SANTA FE, NM 310 YES
SANTA FE RIVER NEAR SANTA FE, NM 336 YES
SANTA FE RIVER ABOVE MCCLURE RES, NR SANTA FE, NM 344 YES
SANTA FE RIVER ABOVE COCHITI LAKE, NM 363 NO
SANTA CRUZ RIVER NEAR CUNDIYO, NM 304 YES
RIO TESUQUE BELOW DIVERSIONS NEAR SANTA FE, NM 361 NO
RIO NAMBE BELOW NAMBE FALLS DAM NEAR NAMBE, NM 363 NO
RIO NAMBE ABOVE NAMBE FALLS DAM NEAR NAMBE, NM 267 YES
RIO GRANDE AT OTOWI BRIDGE, NM 363 NO
GALISTEO CREEK BELOW GALISTEO DAM, NM 328 YES
yes_idxs yields:
[0, 1, 2, 4, 7, 9]
no_idxs yields:
[3, 5, 6, 8]
I was thinking I could specify the yes_idxs and no_idxs to help me specify the ylabel color on the correct label locations. However, when I run the code below, you can only specify an integer or index to get this to work.
ax.get_yticklabels()[1].set_color("red")
In essence, I want to be able to do is highlight the horizontal lines and/or ylabels text to be red if the values do not meet the criteria of being > 350 days. At the moment, I cannot seem to find an easy way to go about this.
Thank you in advance for the help!
Tick labels are Text Artists and have a color properties. You can get a list of the labels from the plot's Axes and use those two lists as indices to change their color.
Toy plot with 10 x-axis ticks/labels
xs = [0,1,2,3,4,5,6,7,8,9]
ys = [n*2 for n in xs]
fig,ax = plt.subplots()
ax.plot(xs,ys)
ax.set_xticks(xs)
ax.set_xticklabels(xs)
Iterate over the x-axis tick labels and set their color based on a condition.
#Assuming the `yes` and `no` lists are indices
yes = [0, 1, 2, 4, 7, 9]
no = [3, 5, 6, 8]
tlabels = ax.get_xmajorticklabels()
for i,tl in enumerate(tlabels):
if i in yes:
tl.set_color('r')
else:
tl.set_color('b')
#plt.show()
You can also get to the labels through the tick.
for i,tick in enumerate(ax.xaxis.get_major_ticks()):
if i in yes:
tick.label.set_color('r')
else:
tick.label.set_color('b')
Ticks and tick labels
More Ticks and ticklabels
I need to plot a histogram for the data below, country wise quantity sum.
Country Quantity
0 United Kingdom 4263829
1 Netherlands 200128
2 EIRE 142637
3 Germany 117448
4 France 110480
5 Australia 83653
6 Sweden 35637
7 Switzerland 30325
8 Spain 26824
9 Japan 25218
so far i have tried this but unable to specify the axis myself:
df.plot(x='Country', y='Quantity', kind='hist', bins=10)
Try a bar plot instead of a plot:
df.bar(x='Country', y='Quantity')
Try this :
import matplotlib.pyplot as plt
plt.bar(df['Country'],df['Quantity'])
plt.show()
I've this data of 2007 with population in Millions,GDP in Billions and index column is Country
continent year lifeExpectancy population gdpPerCapita GDP Billions
country
China Asia 2007 72.961 1318.6831 4959.11485 6539.50093
India Asia 2007 64.698 1110.39633 2452.21041 2722.92544
United States Americas 2007 78.242 301.139947 42951.6531 12934.4585
Indonesia Asia 2007 70.65 223.547 3540.65156 791.502035
Brazil Americas 2007 72.39 190.010647 9065.80083 1722.59868
Pakistan Asia 2007 65.483 169.270617 2605.94758 441.110355
Bangladesh Asia 2007 64.062 150.448339 1391.25379 209.311822
Nigeria Africa 2007 46.859 135.031164 2013.97731 271.9497
Japan Asia 2007 82.603 127.467972 31656.0681 4035.1348
Mexico Americas 2007 76.195 108.700891 11977.575 1301.97307
I am trying to plot a histogram as the following:
This was plotted using matplotlib (code below), and I want to get this with df.plot method.
The code for plotting with matplotlib:
x = data.plot(y=[3],kind = "bar")
data.plot(y = [3,5],kind = "bar",secondary_y = True,ax = ax,style='g:', figsize = (24, 6))
plt.show()
You could use df.plot() with the y axis columns you need in your plot and secondary_y argument as the second column
data[['population','gdpPerCapita']].plot(kind='bar', secondary_y='gdpPerCapita')
If you want to set the y labels for each side, then you have to get all the axes of the plot (in this case 2 y axis) and set the labels respectively.
ax1, ax2 = plt.gcf().get_axes()
ax1.set_ylabel('Population')
ax2.set_ylabel('GDP')
Output: