How to make bar plot with converting the month column in python? - python

I have a dataframe like this. The month column is in type of string.
I want to make a bar plot from 201501 to 201505 with x axis is month while y axis is total_gmv. x format is like Jan,2015 Feb 2015. So how can I realize it using python? Thanks.
month total_gmv
201501 NaN
201502 2.824294e+09
201503 7.742665e+09
201504 2.024132e+10
201505 6.705012e+10

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'month': ['201501', '201502', '201503', '201504', '201505'],
'total_gmv': [np.nan, 2.824294e+09, 7.742665e+09, 2.024132e+10, 6.705012e+10]})
df['month'] = pd.to_datetime(df['month'], format='%Y%m').dt.month
df = df.set_index('month')
print df
df.plot(kind='bar')
plt.show()
Result:
total_gmv
month
1 NaN
2 2.824294e+09
3 7.742665e+09
4 2.024132e+10
5 6.705012e+10

You should be able to force month to be a timestamp and then set it as an index and plot it.
df['month'] = pd.to_datetime(df.month)
ax = df.set_index('month').plot(kind='bar')
And you might have to change the date format.
import matplotlib.dates as mdates
ax.xaxis.set_major_formatter= mdates.DateFormatter('%b, %Y')
Check here for more

Previous replies have some clues but its does not show exhaustive answer.
You have to set custom xtick labels and rotate it like here:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'month': ['201501', '201502', '201503', '201504', '201505'],
'total_gmv': [np.nan, 2.824294e+09, 7.742665e+09, 2.024132e+10, 6.705012e+10]})
df['month'] = pd.to_datetime(df['month'], format='%Y%m', errors='ignore')
ax = df.plot(kind='bar')
ax.set_xticklabels(df['month'].dt.strftime('%b, %Y'))
plt.xticks(rotation=0)
plt.show()

You should use matplotlib.pyplot and calendar module.
import matplotlib.pyplot as plt
import calendar
#change the numeric representation to texts (201501 -> Jan,2015)
df['month_name'] = [','.join([calendar.month_name[int(date[-1:-3]),date[-3:]] for date in df['month']
#change the type of df['month'] to int so plt can read it
df['month'].apply(int)
x = df['month']
y = df['total_gmv']
plt.bar(x, y, align = 'center')
#i'm not sure if you have to change the Series to a list; do whatever works
plt.xticks =(x, df['month_name'])
plt.show()

Related

How to use time as x axis for a scatterplot with seaborn?

I have a simple dataframe with the time as index and dummy values as example.[]
I did a simple scatter plot as you see here:
Simple question: How to adjust the xaxis, so that all time values from 00:00 to 23:00 are visible in the xaxis? The rest of the plot is fine, it shows all the datapoints, it is just the labeling. Tried different things but didn't work out.
All my code so far is:
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from datetime import time
data = []
for i in range(0, 24):
temp_list = []
temp_list.append(time(i))
temp_list.append(i)
data.append(temp_list)
my_df = pd.DataFrame(data, columns=["time", "values"])
my_df.set_index(['time'],inplace=True)
my_df
fig = sns.scatterplot(my_df.index, my_df['values'])
fig.set(xlabel='time', ylabel='values')
I think you're gonna have to go down to the matplotlib level for this:
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from datetime import time
import matplotlib.pyplot as plt
data = []
for i in range(0, 24):
temp_list = []
temp_list.append(time(i))
temp_list.append(i)
data.append(temp_list)
df = pd.DataFrame(data, columns=["time", "values"])
df.time = pd.to_datetime(df.time, format='%H:%M:%S')
df.set_index(['time'],inplace=True)
ax = sns.scatterplot(df.index, df["values"])
ax.set(xlabel="time", ylabel="measured values")
ax.set_xlim(df.index[0], df.index[-1])
ax.xaxis.set_major_locator(mdates.HourLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S"))
ax.tick_params(axis="x", rotation=45)
This produces
i think you have 2 options:
convert the time to hour only, for that just extract the hour to new column in your df
df['hour_'] = datetime.hour
than use it as your xaxis
if you need the time in the format you described, it may cause you a visibility problem in which timestamps will overlay each other. i'm using the
plt.xticks(rotation=45, horizontalalignment='right')
ax.xaxis.set_major_locator(plt.MaxNLocator(12))
so first i rotate the text then i'm limiting the ticks number.
here is a full script where i used it:
sns.set()
sns.set_style("whitegrid")
sns.axes_style("whitegrid")
for k, g in df_forPlots.groupby('your_column'):
fig = plt.figure(figsize=(10,5))
wide_df = g[['x', 'y', 'z']]
wide_df.set_index(['x'], inplace=True)
ax = sns.lineplot(data=wide_df)
plt.xticks(rotation=45,
horizontalalignment='right')
ax.yaxis.set_major_locator(plt.MaxNLocator(14))
ax.xaxis.set_major_locator(plt.MaxNLocator(35))
plt.title(f"your {k} in somthing{g.z.unique()}")
plt.tight_layout()
hope i halped

How to set datetime xlim in seaborn

I have a dataframe:
df = pd.DataFrame({"max_cr_date":{"0":1569115380000,"1":1569115500000,"2":1569115560000,"3":1569115620000,"4":1569115680000,"5":1569115740000,"6":1569115800000,"7":1569115860000,"8":1569115920000,"9":1569115980000,"10":1569116040000,"11":1569116100000,"12":1569116160000,"13":1569116220000,"14":1569130800000,"15":1569130800000,"16":1569130800000,"17":1569130800000,"18":1569130860000,"19":1569130860000,"20":1569130860000,"21":1569130860000,"22":1569131100000,"23":1569131100000,"24":1569131160000,"25":1569131160000,"26":1569131220000,"27":1569131220000,"28":1569131280000,"29":1569131280000,"30":1569131340000,"31":1569131340000,"32":1569131400000,"33":1569131400000,"34":1569131460000,"35":1569131460000,"36":1569131520000,"37":1569131520000,"38":1569131580000,"39":1569131580000,"40":1569131640000,"41":1569131640000,"42":1569131700000,"43":1569131700000},"cnt":{"0":14,"1":14,"2":14,"3":14,"4":14,"5":14,"6":14,"7":14,"8":14,"9":14,"10":14,"11":14,"12":14,"13":14,"14":11,"15":12,"16":13,"17":14,"18":11,"19":12,"20":13,"21":14,"22":11,"23":12,"24":11,"25":12,"26":11,"27":12,"28":11,"29":12,"30":11,"31":12,"32":11,"33":12,"34":11,"35":12,"36":11,"37":12,"38":11,"39":12,"40":11,"41":12,"42":11,"43":12},"uuid":{"0":80,"1":66,"2":70,"3":80,"4":72,"5":110,"6":358,"7":123,"8":110,"9":123,"10":96,"11":89,"12":83,"13":58,"14":7,"15":28,"16":9,"17":5,"18":129,"19":116,"20":266,"21":87,"22":57,"23":86,"24":99,"25":36,"26":89,"27":30,"28":88,"29":18,"30":75,"31":26,"32":94,"33":29,"34":81,"35":32,"36":64,"37":19,"38":74,"39":26,"40":77,"41":17,"42":51,"43":21}})
df.max_cr_date = pd.to_datetime(df.max_cr_date, unit='ms')
df
df.max_cr_date.agg(['min', 'max'])
min 2019-09-22 01:23:00
max 2019-09-22 05:55:00
Name: max_cr_date, dtype: datetime64[ns]
When I try to plot the dataframe using seaborn, I get wrong xlim. For example, max_cr_date range is from 2019-09-22 01:23:00 to 2019-09-22 05:55:00, but on graph you can see year 2000, 2004...
How to set xlim to min/max of the max_cr_date column?
Regards.
You can do in this way:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
df.max_cr_date = pd.to_datetime(df.max_cr_date, unit='ms')
ax = sns.scatterplot(data=df, x="max_cr_date", y="uuid", hue='cnt', palette="vlag")
ax.set_xlim(df['max_cr_date'].min(), df['max_cr_date'].max())
myFmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(myFmt)
for item in ax.get_xticklabels():
item.set_rotation(45)
plt.show()

How to format x axis time data from 'Month-Day Hour' (mm-dd HH) to just 'Hour' in matplotlib? Or how to place Hour data on the minor axis? [duplicate]

I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter

Plot the x-axis as a date

I am trying to perform some analysis on data. I got csv file and I convert it into pandas dataframe. the data looks like this. Its has several columns, but I am trying to draw x-axis as date column. .
the pandas dataframe looks like this
print (df.head(10)
cus-id date value_limit
0 10173 2011-06-12 455
1 95062 2011-09-11 455
2 171081 2011-07-05 212
3 122867 2011-08-18 123
4 107186 2011-11-23 334
5 171085 2011-09-02 376
6 169767 2011-07-03 34
7 80170 2011-03-23 34
8 154178 2011-10-02 34
9 3494 2011-01-01 34
I am trying to plot date data because there are multiple values for same date. for this purpose I am trying to plot x-asis ticks as date. since the minimum date in date column is 2011-01-01 and maximum date is 2012-04-20.
I tried something like this
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates as mdates
df = pd.read_csv('rio_data.csv', delimiter=',')
print (df.head(10))
d = []
for dat in df.date:
# print (dat)
d.append(datetime.strptime(df['date'], '%Y-%m-%d'))
days = dates.DayLocator()
datemin = datetime(2011, 1, 1)
datemax = datetime(2012, 4, 20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(days)
ax.set_xlim(datemin, datemax)
ax.set_ylabel('Count values')
But I am getting this error.
AttributeError: 'DataFrame' object has no attribute 'date'
I am trying to draw date as x-axis, it should look like this.
Can someone help me to draw the x-axis as date column. I would be grateful.
Set the index as a datetime dtype
If you set the index to the datetime series by converting the dates with pd.to_datetime(...), matplotlib will handle the x axis for you.
Here is a minimal example of how you might deal with this visualization.
Plot directly with pandas.DataFrame.plot, which uses matplotlib as the default backend.
Simple example:
import pandas as pd
import matplotlib.pyplot as plt
date_time = ["2011-09-01", "2011-08-01", "2011-07-01", "2011-06-01", "2011-05-01"]
# convert the list of strings to a datetime and .date will remove the time component
date_time = pd.to_datetime(date_time).date
temp = [2, 4, 6, 4, 6]
DF = pd.DataFrame({'temp': temp}, index=date_time)
ax = DF.plot(x_compat=True, rot=90, figsize=(6, 5))
This will yield a plot that looks like the following:
Setting the index makes things easier
The important note is that setting the DataFrame index to the datetime series allows matplotlib to deal with x axis on time series data without much help.
Follow this link for detailed explanation on spacing axis ticks (specifically dates)
You missed a ' line 12. It cause the SyntaxError.
This should correct the error.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates as mdates
df = pd.read_csv('rio_data.csv', delimiter=',')
print (df.head(10))
d = []
for dat in df.date:
# print (dat)
d.append(datetime.strptime(df['date'], '%Y-%m-%d'))
days = dates.DayLocator()
datemin = datetime(2011, 1, 1)
datemax = datetime(2012, 4, 20)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.xaxis.set_major_locator(days)
ax.set_xlim(datemin, datemax)
ax.set_ylabel('Count values')

How do I change the year interval on a Pandas DataFrame area plot?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dts
def use_matplot():
ax = df.plot(x='year', kind="area" )
years = dts.YearLocator(20)
ax.xaxis.set_major_locator(years)
fig = ax.get_figure()
fig.savefig('output.pdf')
dates = np.arange(1990,2061, 1)
dates = dates.astype('str').astype('datetime64')
df = pd.DataFrame(np.random.randint(0, dates.size, size=(dates.size,3)), columns=list('ABC'))
df['year'] = dates
cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]
use_matplot()
In the above code, I get an error, "ValueError: year 0 is out of range" when trying to set the YearLocator so as to ensure the X-Axis has year labels for every 20th year. By default the plot has the years show up every 10 years. What am I doing wrong? Desired outcome is simply a plot with 1990, 2010, 2030, 2050 on the bottom. (Instead of default 1990, 2000, 2010, etc.)
Since the years are simple numbers, you may opt for not using them as dates at all and keeping them as numbers.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dates = np.arange(1990,2061, 1)
df = pd.DataFrame(np.random.randint(0,dates.size,size=(dates.size,3)),columns=list('ABC'))
df['year'] = dates
cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]
ax = df.plot(x='year', kind="area" )
ax.set_xticks(range(2000,2061,20))
plt.show()
Apart from that, using Matplotlib locators and formatters on date axes created via pandas will most often fail. This is due to pandas using a completely different datetime convention. In order to have more freedom for setting custom tickers for datetime axes, you may use matplotlib. A stackplot can be plotted with plt.stackplot. On such a matplotlib plot, the use of the usual matplotlib tickers is unproblematic.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dts
dates = np.arange(1990,2061, 1)
df = pd.DataFrame(np.random.randint(0,dates.size,size=(dates.size,3)),columns=list('ABC'))
df['year'] = pd.to_datetime(dates.astype(str))
cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]
plt.stackplot(df["year"].values, df[list('ABC')].values.T)
years = dts.YearLocator(20)
plt.gca().xaxis.set_major_locator(years)
plt.margins(x=0)
plt.show()
Consider using set_xticklabels to specify values of x axis tick marks:
ax.set_xticklabels(sum([[i,''] for i in range(1990, 2060, 20)], []))
# [1990, '', 2010, '', 2030, '', 2050, '']

Categories