matplotlib bar chart just appear transparent white - python

I have the following code and it was working fine a month ago and suddenly i get a strange looking white transparent bar chart..
fig, ax1 = plt.subplots(figsize=(16,6))
ax1.bar(loanapp.date, loanapp.total.rolling(14).mean(), width = 4.4, color = 'tab:blue', label="Rejected applicants")
Why is this happening???
My df looks like
date total accepted
0 2017-11-08 147 30
1 2017-11-09 402 230
2 2017-11-10 529 350
3 2017-11-11 186 106
4 2017-11-12 222 153
...

Related

How to plot Numerical Values in matplotlib

So I have this kind of database:
Time Type Profit
2 82 s/l -51.3
5 9 t/p 164.32
8 38 s/l -53.19
11 82 s/l -54.4
14 107 s/l -54.53
.. ... ... ...
730 111 s/l -70.72
731 111 s/l -70.72
732 111 s/l -70.72
733 113 s/l -65.13
734 113 s/l -65.13
[239 rows x 3 columns]
I want to plot a chart which shows X as the time (that's already on week hours), and Y as profit(Which can be positive or negative). For Y, I would like for each hour (X) to have 2 bars to show the profit. The negative profit would be positive too in this case but in another bar.
For example we have -65 and 70. They would show as 65 and 70 on the chart but the loss would have a different bar color.
This is my code so far:
#reading the csv file
data = pd.read_csv(filename)
df = pd.DataFrame(data, columns = ['Time','Type','Profit']).astype(str)
#turns time column into hours of week
df['Time'] = df['Time'].apply(lambda x: findHourOfWeek(x))
#Takes in winning trades (t/p) and losing trades(s/l)
df = df[(df['Type'] == 't/p') | (df['Type'] == 's/l')]
#Plots the chart
ax = df.plot(title='Profits and Losses (Hour Of Week)',kind='bar')
#ax.legend(['Losses', 'Winners'])
plt.xlabel('Hour of Week')
plt.ylabel('Amount Of Profit/Loss')
plt.show()
You can groupby, unstack and plot:
(df.groupby(['Time','Type']).Profit.sum().abs()
.unstack('Type')
.plot.bar()
)
For your sample data above, the output is:

Matplotlib is printing the line plot twice/multiple times

What could be the problem if Matplotlib is printing a line plot twice or multiple like this one:
Here is my code:
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
from scipy import integrate
def compute_integrated_spectral_response_ikonos(file, sheet):
df = pd.read_excel(file, sheet_name=sheet, header=2)
blue = integrate.cumtrapz(df['Blue'], df['Wavelength'])
green = integrate.cumtrapz(df['Green'], df['Wavelength'])
red = integrate.cumtrapz(df['Red'], df['Wavelength'])
nir = integrate.cumtrapz(df['NIR'], df['Wavelength'])
pan = integrate.cumtrapz(df['Pan'], df['Wavelength'])
plt.figure(num=None, figsize=(6, 4), dpi=80, facecolor='w', edgecolor='k')
plt.plot(df[1:], blue, label='Blue', color='darkblue');
plt.plot(df[1:], green, label='Green', color='b');
plt.plot(df[1:], red, label='Red', color='g');
plt.plot(df[1:], nir, label='NIR', color='r');
plt.plot(df[1:], pan, label='Pan', color='darkred')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.xlabel('Wavelength (nm)')
plt.ylabel('Spectral Response (%)')
plt.title(f'Integrated Spectral Response of {sheet} Bands')
plt.show()
compute_integrated_spectral_response_ikonos('Sorted Wavelengths.xlsx', 'IKONOS')
Here is my dataset.
This is because plotting df[1:] is plotting the entire dataframe as the x-axis.
>>> df[1:]
Wavelength Blue Green Red NIR Pan
1 355 0.001463 0.000800 0.000504 0.000532 0.000619
2 360 0.000866 0.000729 0.000391 0.000674 0.000361
3 365 0.000731 0.000806 0.000597 0.000847 0.000244
4 370 0.000717 0.000577 0.000328 0.000729 0.000435
5 375 0.001251 0.000842 0.000847 0.000906 0.000914
.. ... ... ... ... ... ...
133 1015 0.002601 0.002100 0.001752 0.002007 0.149330
134 1020 0.001602 0.002040 0.002341 0.001793 0.136372
135 1025 0.001946 0.002218 0.001260 0.002754 0.118682
136 1030 0.002417 0.001376 0.000898 0.000000 0.103634
137 1035 0.001300 0.001602 0.000000 0.000000 0.089097
[137 rows x 6 columns]
The slice [1:] just gives the dataframe without the first row. Altering each instance of df[1:] to df['Wavelength'][1:] gives us what I presume is the expected output:
>>> df['Wavelength'][1:]
1 355
2 360
3 365
4 370
5 375
133 1015
134 1020
135 1025
136 1030
137 1035
Name: Wavelength, Length: 137, dtype: int64
Output:

Python Matplotlib X-axis label dual axis with dataframe

I've got a dual axis bar and line plot using matplotlib. I read the data in as a dataframe,
[WEEK SIGNUPS APPLICATIONS PRECOURSE_WORK QUALIFIED ENROLLED SPEND
2019-10-07 5674 2938 2220 106 2 77581.67
2019-10-14 4538 2225 2309 567 204 61258.08
2019-10-21 3865 1997 1801 121 39 53700.58
2019-10-28 3559 1886 1641 162 39 53543.28
2019-11-04 3782 1946 1980 190 109 49495.64
2019-11-11 4033 2035 1568 118 109 49952.17
2019-11-18 3999 2009 1537 83 77 58545.72
2019-11-25 6170 3322 1660 110 61 52332.4
2019-12-02 5189 2658 7041 73 30 56727.55
2019-12-09 4631 2497 7904 174 116 60977.49
2019-12-16 4935 2501 3492 108 82 68179.54
2019-12-23 5289 2603 1983 80 38 76956.81
2019-12-30 5843 3037 2150 90 80 76246.14
2020-01-06 4194 1930 1619 74 57 46114.68]
My code works and produces a graph (below)
Here is my code
import matplotlib.pyplot as plt
from pylab import rcParams
from matplotlib import style
style.use('seaborn-paper')
#print(plt.style.available)
rcParams['figure.figsize'] = 20, 10
#plt.xticks(df[['WEEK']])
ax = df[['SPEND']].plot(kind='bar', color = 'lightblue')
ax.set_ylabel("Spend",color="blue",fontsize=20)
ax.set_xlabel('Weeks',color="blue",fontsize=20)
ax2 = ax.twinx()
ax2.plot(df[['SIGNUPS','APPLICATIONS','ENROLLED']].values, linestyle='-', marker='o', linewidth=4.0)
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)
When I uncomment the line plt.xticks(df[['WEEK']]) I get the following error
ConversionError Failed to convert value(s) to axis unit.
Can anyone help me out?
plt.xticks is expecting the tick locations to be specified and optionally the labels, from the docs the signature is
xticks(ticks, [labels], **kwargs)
So when you do
plt.xticks(df[['WEEK']])
It is trying to interpret the dates in the 'WEEK' column as the locations for the ticks. What you want to do instead is use plt.set_xticklabels which expects only the labels be specified, i.e.
plt.set_xticklabels(df[['WEEK']])
# or
plt.set_xticklabels(df[['WEEK']].values)
Although you may also need to manually covert the values to strings, depending on how they are defined.

X-Axis scales not matching with 2 data sets on same plot

I have 2 datasets that I'm trying to plot on the same figure. They share a common column that I'm using for the X-axis, however one of my sets of data is collected annually and the other monthly so the number of data points in each set is significantly different.
Pyplot is not plotting the X values for each set where I would expect when I plot both sets on the same graph
When I plot just my annually collected data set I get:
When I plot just my monthly collected data set I get:
But when I plot the two sets overlayed (code below) I get:
tframe:
10003 Date
0 257 201401
1 216 201402
2 417 201403
3 568 201404
4 768 201405
5 836 201406
6 798 201407
7 809 201408
8 839 201409
9 796 201410
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 201301
0 5380 ... 201401
1 5320 ... 201501
3 5030 ... 201601
So I did as wwii suggested in the comments and converted my Date columns to datetime objects:
tframe:
10003 Date
0 257 2014-01-31
1 216 2014-02-28
2 417 2014-03-31
3 568 2014-04-30
4 768 2014-05-31
5 836 2014-06-30
6 798 2014-07-31
7 809 2014-08-31
8 839 2014-09-30
9 796 2014-10-31
tax_for_zip_data:
TAX BRACKET $1 under $25,000 ... Date
2 5740 ... 2013-01-31
0 5380 ... 2014-01-31
1 5320 ... 2015-01-31
3 5030 ... 2016-01-31
But the dates are still plotting offset,
None of my data goes back to 2012- Jan 2013 is the earliest. The tax_for_zip_data are all offset by a year. If I plot just that set alone it plots properly.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
tframe.plot(kind = 'line',x = 'Date', y = "10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
tax_for_zip_data.plot(kind = 'line', x = 'Date', y = tax_for_zip_data.columns[:-1], ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
If you can make the DataFrame index a datetime index plotting is easier.
s = '''10003 Date
257 201401
216 201402
417 201403
568 201404
768 201405
836 201406
798 201407
809 201408
839 201409
796 201410
'''
df1 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df1.index = pd.to_datetime(df1['Date'],format='%Y%m')
s = '''TAX BRACKET $1 under $25,000 Date
2 5740 201301
0 5380 201401
1 5320 201501
3 5030 201601
'''
df2 = pd.read_csv(io.StringIO(s), delimiter='\s{2,}',engine='python')
df2.index = pd.to_datetime(df2['Date'],format='%Y%m')
You don't need to specify an argument for plot's x parameter.
fig, ax1 = plt.subplots(sharex = True)
color = "tab:red"
ax1.set_xlabel('Date')
ax1.set_ylabel('Trips', color = color)
df1.plot(kind = 'line',y="10003", ax = ax1, color = color)
ax1.tick_params(axis = 'y', labelcolor = color)
ax2 = ax1.twinx()
color = "tab:blue"
ax2.set_ylabel('Num Returns', color = color)
df2.plot(kind = 'line', y='$1 under $25,000', ax = ax2)
ax2.tick_params(axis = 'y', labelcolor = color)
plt.show()
plt.close()

python plotly create a color scale related to max and min number of value

This is a code, I try to show a car speed for each point.
import plotly.plotly as py
from plotly.graph_objs import *
mapbox_access_token = 'MAPBOX API KEY'
data = Data([
Scattermapbox(
lat=dataframe_monday_morning['latitude'],
lon=dataframe_monday_morning['longitude'],
mode='markers',
marker=Marker(
size=5,
color =dataframe_monday_morning['speed'],
colorscale= 'YlOrRd',
#opacity=0.3,
symbol = 'circle',
),
)
])
layout = Layout(
autosize=True,
hovermode='closest',
width=1300,
margin=go.Margin(
l=0,
r=0,
b=0,
t=0
),
height=700,
mapbox=dict(
accesstoken=mapbox_access_token,
bearing=0,#
center=dict(
lat=-36.7526,
lon=174.7274
),
pitch=0,
zoom=16.2,
style='dark',
),
)
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='Multiple Mapbox')
but when I try
color =dataframe_monday_morning['speed']
code choose in current data min and max speed then gives me a graph. In data, some speed data gaps are very big so I would like to create a colour scale between my speed value.
(e.g If you choose max speed 200km/h your other 30km/h and 90km/h looks similar colour but normally really different speed )
My question is how can I create a scale for choosing a colour for speed?
EDIT:
This is an example of data I edited.
13 1.464301e+10 2015-11-15 18:28:50 191 10051 76 -36.817540 174.750526
14 1.464298e+10 2015-11-15 18:27:20 209 10051 48 -36.806104 174.759209
15 1.464180e+10 2015-11-15 17:41:27 171 10051 0 -36.718553 174.713503
16 1.464186e+10 2015-11-15 17:43:44 172 10051 25 -36.720747 174.713897
17 1.464238e+10 2015-11-15 18:05:36 137 10051 5 -36.753691 174.728945
18 1.464199e+10 2015-11-15 17:49:22 170 10051 0 -36.728252 174.715084
19 1.464279e+10 2015-11-15 18:20:41 153 10051 20 -36.787389 174.752337
20 1.464229e+10 2015-11-15 18:01:47 146 10051 16 -36.749369 174.724865
21 1.464298e+10 2015-11-15 18:27:39 216 10051 51 -36.807940 174.757603
22 1.464254e+10 2015-11-15 18:11:35 162 10051 36 -36.765195 174.739728
23 1.464301e+10 2015-11-15 18:28:37 197 10051 66 -36.815369 174.751177
Not 100% sure if I got the question right but you could try adding
showscale=True,
cmax=200,
cmin=0
to your Marker object to get a graph with a colorbar with fixed max and min values.

Categories