Python Matplotlib X-axis label dual axis with dataframe - python

I've got a dual axis bar and line plot using matplotlib. I read the data in as a dataframe,
[WEEK SIGNUPS APPLICATIONS PRECOURSE_WORK QUALIFIED ENROLLED SPEND
2019-10-07 5674 2938 2220 106 2 77581.67
2019-10-14 4538 2225 2309 567 204 61258.08
2019-10-21 3865 1997 1801 121 39 53700.58
2019-10-28 3559 1886 1641 162 39 53543.28
2019-11-04 3782 1946 1980 190 109 49495.64
2019-11-11 4033 2035 1568 118 109 49952.17
2019-11-18 3999 2009 1537 83 77 58545.72
2019-11-25 6170 3322 1660 110 61 52332.4
2019-12-02 5189 2658 7041 73 30 56727.55
2019-12-09 4631 2497 7904 174 116 60977.49
2019-12-16 4935 2501 3492 108 82 68179.54
2019-12-23 5289 2603 1983 80 38 76956.81
2019-12-30 5843 3037 2150 90 80 76246.14
2020-01-06 4194 1930 1619 74 57 46114.68]
My code works and produces a graph (below)
Here is my code
import matplotlib.pyplot as plt
from pylab import rcParams
from matplotlib import style
style.use('seaborn-paper')
#print(plt.style.available)
rcParams['figure.figsize'] = 20, 10
#plt.xticks(df[['WEEK']])
ax = df[['SPEND']].plot(kind='bar', color = 'lightblue')
ax.set_ylabel("Spend",color="blue",fontsize=20)
ax.set_xlabel('Weeks',color="blue",fontsize=20)
ax2 = ax.twinx()
ax2.plot(df[['SIGNUPS','APPLICATIONS','ENROLLED']].values, linestyle='-', marker='o', linewidth=4.0)
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)
When I uncomment the line plt.xticks(df[['WEEK']]) I get the following error
ConversionError Failed to convert value(s) to axis unit.
Can anyone help me out?

plt.xticks is expecting the tick locations to be specified and optionally the labels, from the docs the signature is
xticks(ticks, [labels], **kwargs)
So when you do
plt.xticks(df[['WEEK']])
It is trying to interpret the dates in the 'WEEK' column as the locations for the ticks. What you want to do instead is use plt.set_xticklabels which expects only the labels be specified, i.e.
plt.set_xticklabels(df[['WEEK']])
# or
plt.set_xticklabels(df[['WEEK']].values)
Although you may also need to manually covert the values to strings, depending on how they are defined.

Related

matplotlib bar chart just appear transparent white

I have the following code and it was working fine a month ago and suddenly i get a strange looking white transparent bar chart..
fig, ax1 = plt.subplots(figsize=(16,6))
ax1.bar(loanapp.date, loanapp.total.rolling(14).mean(), width = 4.4, color = 'tab:blue', label="Rejected applicants")
Why is this happening???
My df looks like
date total accepted
0 2017-11-08 147 30
1 2017-11-09 402 230
2 2017-11-10 529 350
3 2017-11-11 186 106
4 2017-11-12 222 153
...

How can I draw circle on a map?

Here is my dataframe:
Boston
Zipcode Employees Latitude Longitude
0 02021 174 -71.131057 42.228065
1 02026 193 -71.143038 42.237719
3 02109 45 -71.054027 42.363498
4 02110 14 -71.053642 42.357649
5 02111 30 -71.060280 42.350586
6 02113 77 -71.054618 42.365215
8 02115 116 -71.095106 42.343330
10 02118 318 -71.072103 42.339342
11 02119 804 -71.085268 42.323002
12 02120 168 -71.097569 42.332539
13 02121 781 -71.086649 42.305792
15 02124 1938 -71.066702 42.281721
16 02125 859 -71.053049 42.310813
17 02126 882 -71.090424 42.272444
19 02128 786 -71.016037 42.375254
21 02130 886 -71.114080 42.309087
22 02131 1222 -71.121464 42.285216
23 02132 1348 -71.168150 42.280316
24 02134 230 -71.123323 42.355355
25 02135 584 -71.147046 42.357537
26 02136 1712 -71.125550 42.255064
28 02152 119 -70.960324 42.351129
29 02163 1 -71.120420 42.367263
30 02186 361 -71.113223 42.258883
31 02199 4 -71.082279 42.346991
32 02210 35 -71.044281 42.347148
33 02215 83 -71.103877 42.348709
34 02459 27 -71.187563 42.286356
35 02467 66 -71.157691 42.314277
And I want to draw circles on my map, each circle corresponds to one point, the size of the circle depends on the number of Employees
Here are my map code (I try to use marker, but I think circle is better:
boston_map=folium.Map([Boston['Longitude'].mean(), Boston['Latitude'].mean()],zoom_start=12)
incidents2=plugins.MarkerCluster().add_to(boston_map)
for Latitude,Longitude,Employees in zip(Boston.Latitude,Boston.Longitude,Boston.Employees):
folium.Marker(location=[Latitude,Longitude],icon=None,popup=Employees).add_to(incidents2)
boston_map.add_child(incidents2)
boston_map
Here is my map:
If the number of employees can show in the circle, it will be better! Thank you very much!
To draw circles you can use CircleMarker instead of Marker
BTW: you have wrong column's names. Boston has lat: 42.361145, long: -71.057083 but you have values 42 in column Longitude and values -71 in column Latitude
Because I don't use Juputer so I save map in HTML file and use webbrowser to automatically open it in web browser.
Because it created big circles so I divide Employees to create smaller circles. But now some circles are very small and it shows number of circles instead circles. Maybe it should be used math.log() or other method to make it smaller (normalized).
I use tooltip=str(employees) to display number when you hover circle.
text = '''
Zipcode Employees Longitude Latitude
0 02021 174 -71.131057 42.228065
1 02026 193 -71.143038 42.237719
3 02109 45 -71.054027 42.363498
4 02110 14 -71.053642 42.357649
5 02111 30 -71.060280 42.350586
6 02113 77 -71.054618 42.365215
8 02115 116 -71.095106 42.343330
10 02118 318 -71.072103 42.339342
11 02119 804 -71.085268 42.323002
12 02120 168 -71.097569 42.332539
13 02121 781 -71.086649 42.305792
15 02124 1938 -71.066702 42.281721
16 02125 859 -71.053049 42.310813
17 02126 882 -71.090424 42.272444
19 02128 786 -71.016037 42.375254
21 02130 886 -71.114080 42.309087
22 02131 1222 -71.121464 42.285216
23 02132 1348 -71.168150 42.280316
24 02134 230 -71.123323 42.355355
25 02135 584 -71.147046 42.357537
26 02136 1712 -71.125550 42.255064
28 02152 119 -70.960324 42.351129
29 02163 1 -71.120420 42.367263
30 02186 361 -71.113223 42.258883
31 02199 4 -71.082279 42.346991
32 02210 35 -71.044281 42.347148
33 02215 83 -71.103877 42.348709
34 02459 27 -71.187563 42.286356
35 02467 66 -71.157691 42.314277
'''
import pandas as pd
import io
import folium
import folium.plugins
boston = pd.read_csv(io.StringIO(text), sep='\s+')
boston_map = folium.Map([boston.Latitude.mean(), boston.Longitude.mean(), ], zoom_start=12)
incidents2 = folium.plugins.MarkerCluster().add_to(boston_map)
for latitude, longitude, employees in zip(boston.Latitude, boston.Longitude, boston.Employees):
print(latitude, longitude, employees)
folium.vector_layers.CircleMarker(
location=[latitude, longitude],
tooltip=str(employees),
radius=employees/10,
color='#3186cc',
fill=True,
fill_color='#3186cc'
).add_to(incidents2)
boston_map.add_child(incidents2)
# display in web browser
import webbrowser
boston_map.save('map.html')
webbrowser.open('map.html')
EDIT: answer for question how to add a label on each circle in a folium.circile map python shows how to use Marker with icon=DivIcon(text) to add text but it doesn't work as I expect.

Write a pandas DataFrame mixing integers and floats in a csv file

I'm working with pandas DataFrames full of float numbers, but with integers in one every three lines (the whole line is made of integers). When I make a print df, all the values displayed are shown as floats (the integers values have a ``.000000```added) for example :
aromatics charged polar unpolar
Ac_obs_counts 712.000000 1486.000000 2688.000000 2792.000000
Ac_obs_freqs 0.092732 0.193540 0.350091 0.363636
Ac_pvalues 0.524752 0.099010 0.356436 0.495050
Am_obs_counts 10.000000 59.000000 62.000000 50.000000
Am_obs_freqs 0.055249 0.325967 0.342541 0.276243
Am_pvalues 0.495050 0.980198 0.356436 0.009901
Ap_obs_counts 18.000000 34.000000 83.000000 78.000000
Ap_obs_freqs 0.084507 0.159624 0.389671 0.366197
Ap_pvalues 0.524752 0.039604 0.980198 0.663366
When I use df.iloc[range(0, len(df.index), 3)], I see integers displayed :
aromatics charged polar unpolar
Ac_obs_counts 712 1486 2688 2792
Am_obs_counts 10 59 62 50
Ap_obs_counts 18 34 83 78
Pa_obs_counts 47 81 125 144
Pf_obs_counts 31 58 99 109
Pg_obs_counts 27 106 102 108
Ph_obs_counts 7 49 42 36
Pp_obs_counts 15 83 45 65
Ps_obs_counts 57 125 170 216
Pu_obs_counts 14 62 102 84
When I use df.to_csv("mydf.csv", sep=",", encoding="utf-8") , the integers are written as floats ; how can I force the writing as integers for these lines ? Would it be better to split the data in two DataFrames ?
Thanks in advance.
Simply call object
df.astype('object')
Out[1517]:
aromatics charged polar unpolar
Ac_obs_counts 712 1486 2688 2792
Ac_obs_freqs 0.092732 0.19354 0.350091 0.363636
Ac_pvalues 0.524752 0.09901 0.356436 0.49505
Am_obs_counts 10 59 62 50
Am_obs_freqs 0.055249 0.325967 0.342541 0.276243
Am_pvalues 0.49505 0.980198 0.356436 0.009901
Ap_obs_counts 18 34 83 78
Ap_obs_freqs 0.084507 0.159624 0.389671 0.366197
Ap_pvalues 0.524752 0.039604 0.980198 0.663366

Set Xticks frequency to dataframe index

I currently have a dataframe that has as an index the years from 1990 to 2014 (25 rows). I want my plot to have the X axis with all the years showing. I'm using add_subplot as I plan to have 4 plots in this figure (all of them with the same X axis).
To create the dataframe:
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot = df_.fillna(0)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
Total Population Urban Population
1990 150 50
1991 151 53
1992 152 56
1993 153 59
1994 154 62
1995 155 65
1996 156 68
1997 157 71
1998 158 74
1999 159 77
2000 160 80
2001 161 83
2002 162 86
2003 163 89
2004 164 92
2005 165 95
2006 166 98
2007 167 101
2008 168 104
2009 169 107
2010 170 110
2011 171 113
2012 172 116
2013 173 119
2014 174 122
The code that I currently have:
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1, xticklabels=pop_plot.index)
plt.subplot(2, 2, 1)
plt.plot(pop_plot)
legend = plt.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(range(len(pop_plot.index)))
This is the plot that I get:
When I comment the set_xticks I get the following plot:
#ax1.set_xticks(range(len(pop_plot.index)))
I've tried a couple of answers that I found here, but I didn't have much success.
It's not clear what ax1.set_xticks(range(len(pop_plot.index))) should be used for. It will set the ticks to the numbers 0,1,2,3 etc. while your plot should range from 1990 to 2014.
Instead, you want to set the ticks to the numbers of your data:
ax1.set_xticks(pop_plot.index)
Complete corrected example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
index = np.arange(1990,2015,1)
columns = ['Total Population','Urban Population']
pop_plot = pd.DataFrame(index=index, columns=columns)
pop_plot['Total Population'] = np.arange(150,175,1)
pop_plot['Urban Population'] = np.arange(50,125,3)
fig = plt.figure(figsize=(10,5))
ax1 = fig.add_subplot(2,2,1)
ax1.plot(pop_plot)
legend = ax1.legend(pop_plot, bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand')
legend.get_frame().set_alpha(0)
ax1.set_xticks(pop_plot.index)
plt.show()
The easiest option is to use the xticks parameter for pandas.DataFrame.plot
Pass the dataframe index to xticks: xticks=pop_plot.index
# given the dataframe in the OP
ax = pop_plot.plot(xticks=pop_plot.index, figsize=(15, 5))
# move the legend
ax.legend(bbox_to_anchor=(0.1, 1, 0.8, .45), loc=3, ncol=1, mode='expand', frameon=False)

Pandas dataframe as input for matplotlib.pyplot.boxplot

I have a pandas dataframe which looks like this:
[('1975801_m', 1 0.203244
10 -0.159756
16 -0.172756
19 -0.089756
20 -0.033756
23 -0.011756
24 0.177244
32 0.138244
35 -0.104756
36 0.157244
40 0.108244
41 0.032244
42 0.063244
45 0.362244
59 -0.093756
62 -0.070756
65 -0.030756
66 -0.100756
73 -0.140756
77 -0.110756
81 -0.100756
84 -0.090756
86 -0.180756
87 0.119244
88 0.709244
102 -0.030756
105 -0.000756
107 -0.010756
109 0.039244
111 0.059244
Name: RTdiff), ('3878418_m', 1637 0.13811
1638 -0.21489
1644 -0.15989
1657 -0.11189
1662 -0.03289
1666 -0.09489
1669 0.03411
1675 -0.00489
1676 0.03511
1677 0.39711
1678 -0.02289
1679 -0.05489
1681 -0.01989
1691 0.14411
1697 -0.10589
1699 0.09411
1705 0.01411
1711 -0.12589
1713 0.04411
1715 0.04411
1716 0.01411
1731 0.06411
1738 -0.25589
1741 -0.21589
1745 0.39411
1746 -0.13589
1747 -0.10589
1748 0.08411
Name: RTdiff)
I would like to use it as input for the mtplotlib.pyplot.boxplot function.
the error I get from matplotlib.pyplot.boxplot(mydataframe) is ValueError: cannot set an array element with a sequence
I tried to use list(mydataframe) instead of mydataframe. That fails with the same error.
I also tried matplotlib.pyplot.boxplot(np.fromiter(mydataframe, np.float)) - that fails with ValueError: setting an array element with a sequence.
It's not clear that your data are in a DataFrame. It appears to be a list of Series objects.
Once it's really in a DataFrame, the trick here is the create your figure and axes ahead of time and use the **kwargs that you would normally use with matplotlib.axes.boxplot. You also need to make sure that your data is a DataFrame and not a Series
import numpy as np
import matplotlib.pyplot as plt
import pandas
fig, ax = plt.subplots()
df = pandas.DataFrame(np.random.normal(size=(37,5)), columns=list('ABCDE'))
df.boxplot(ax=ax, positions=[2,3,4,6,8], notch=True, bootstrap=5000)
ax.set_xticks(range(10))
ax.set_xticklabels(range(10))
plt.show()
Which gives me:
Failing that, you can take a similar approach, looping through the columns you would like to plot using your ax object directly.
import numpy as np
import matplotlib.pyplot as plt
import pandas
df = pandas.DataFrame(np.random.normal(size=(37,5)), columns=list('ABCDE'))
fig, ax = plt.subplots()
for n, col in enumerate(df.columns):
ax.boxplot(df[col], positions=[n+1], notch=True)
ax.set_xticks(range(10))
ax.set_xticklabels(range(10))
plt.show()
Which gives:

Categories