Cannot prepare proper labels in Matplotlib - python

I have very simple code:
from matplotlib import dates
import matplotlib.ticker as ticker
my_plot=df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90)
I've got:
but I would like to have fewer labels on X axis. To do this I've add:
my_plot.xaxis.set_major_locator(ticker.MaxNLocator(12))
It generates fewer labels but values of labels have wrong values (=first of few labels from whole list)
What am I doing wrong?
I have add additional information:
I've forgoten to show what is inside DataFrame.
I have three columns:
reg_Date - datetime64 (index)
temperature - float64
Day - date converted from reg_Date to string, it looks like '2017-10' (YYYY-MM)
Box plot group date by 'Day' and I would like to show values 'Day" as a label but not all values
, for example every third one.

You were almost there. Just set ticker.MultipleLocator.
The pandas.DataFrame.boxplot also returns axes, which is an object of class matplotlib.axes.Axes. So you can use this code snippet to customize your labels:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
center = np.random.randint(50,size=(10, 20))
spread = np.random.rand(10, 20) * 30
flier_high = np.random.rand(10, 20) * 30 + 30
flier_low = np.random.rand(10, 20) * -30
y = np.concatenate((spread, center, flier_high, flier_low))
fig, ax = plt.subplots(figsize=(10, 5))
ax.boxplot(y)
x = ['Label '+str(i) for i in range(20)]
ax.set_xticklabels(x)
ax.set_xlabel('Day')
# Set a tick on each integer multiple of a base within the view interval.
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.xticks(rotation=90)

I think there is a compatibility issue with Pandas plots and Matplotlib formatters.
With the following code:
df = pd.read_csv('lt_stream-1001-full.csv', header=0, encoding='utf8')
df['reg_date'] = pd.to_datetime(df['reg_date'] , format='%Y-%m-%d %H:%M:%S')
df.set_index('reg_date', inplace=True)
df_h = df.resample(rule='H').mean()
df_h['Day']=df_h.index.strftime('%Y-%m')
print(df_h)
f, ax = plt.subplots()
my_plot = df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90, ax=ax)
locs, labels = plt.xticks()
i = 0
new_labels = list()
for l in labels:
if i % 3 == 0:
label = labels[i]
i += 1
new_labels.append(label)
else:
label = ''
i += 1
new_labels.append(label)
ax.set_xticklabels(new_labels)
plt.show()
You get this chart:
But I notice that this is grouped by month instead of by day. It may not be what you wanted.
Adding the day component to the string 'Day' messes up the chart as there seems to be too many boxes.
df = pd.read_csv('lt_stream-1001-full.csv', header=0, encoding='utf8')
df['reg_date'] = pd.to_datetime(df['reg_date'] , format='%Y-%m-%d %H:%M:%S')
df.set_index('reg_date', inplace=True)
df_h = df.resample(rule='H').mean()
df_h['Day']=df_h.index.strftime('%Y-%m-%d')
print(df_h)
f, ax = plt.subplots()
my_plot = df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90, ax=ax)
locs, labels = plt.xticks()
i = 0
new_labels = list()
for l in labels:
if i % 15 == 0:
label = labels[i]
i += 1
new_labels.append(label)
else:
label = ''
i += 1
new_labels.append(label)
ax.set_xticklabels(new_labels)
plt.show()
The for loop creates the tick labels every as many periods as desired. In the first chart they were set every 3 months. In the second one, every 15 days.
If you would like to see less grid lines:
df = pd.read_csv('lt_stream-1001-full.csv', header=0, encoding='utf8')
df['reg_date'] = pd.to_datetime(df['reg_date'] , format='%Y-%m-%d %H:%M:%S')
df.set_index('reg_date', inplace=True)
df_h = df.resample(rule='H').mean()
df_h['Day']=df_h.index.strftime('%Y-%m-%d')
print(df_h)
f, ax = plt.subplots()
my_plot = df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90, ax=ax)
locs, labels = plt.xticks()
i = 0
new_labels = list()
new_locs = list()
for l in labels:
if i % 3 == 0:
label = labels[i]
loc = locs[i]
i += 1
new_labels.append(label)
new_locs.append(loc)
else:
i += 1
ax.set_xticks(new_locs)
ax.set_xticklabels(new_labels)
ax.grid(axis='y')
plt.show()
I've read about x_compat in Pandas plot in order to apply Matplotlib formatters, but I get an error when trying to apply it. I'll give it another shot later.
Old unsuccesful answer
The tick labels seem to be dates. If they are set as datetime in your dataframe, you can:
months = mdates.MonthLocator(1,4,7,10) #Choose the months you like the most
ax.xaxis.set_major_locator(months)
Otherwise, you can let Matplotlib know they are dates by:
ax.xaxis_date()
Your comment:
I have add additional information:
I've forgoten to show what is inside DataFrame.
I have three columns:
reg_Date - datetime64 (index)
temperature - float64
Day - date converted from reg_Date to string, it looks like '2017-10' *(YYYY-MM) *
Box plot group date by 'Day' and I would like to show values 'Day" as a label but not all values
, for example every third one.
Based on your comment in italic above, I would use reg_Date as the input and the following lines:
days = mdates.DayLocator(interval=3)
daysFmt = mdates.DateFormatter('%Y-%m') #to format display
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(daysFmt)
I forgot to mention that you will need to:
import matplotlib.dates as mdates
Does this work?

Related

Mathplotlib graph problems

I'm trying to display data from a weather station with mathplotlib. For some reason that I can't quite figure out my last values on the graph are acting randomly, going back in time on the x axis.
x axis is the dates,
y axis is the water level
y1 axis is the discharge flow
Here's a picture of the result
Graph
import pandas as pd
import matplotlib.pyplot as plt
url_hourly = "https://dd.weather.gc.ca/hydrometric/csv/BC/hourly/BC_08MG005_hourly_hydrometric.csv"
url_daily = "https://dd.weather.gc.ca/hydrometric/csv/BC/daily/BC_08MG005_daily_hydrometric.csv"
fields = ["Date","Water Level / Niveau d'eau (m)", "Discharge / Débit (cms)"]
#Read csv files
hourly_data = pd.read_csv(url_hourly, usecols=fields)
day_data = pd.read_csv(url_daily, usecols=fields)
#Merge csv files
water_data = pd.concat([day_data,hourly_data])
#Convert date to datetime
water_data['Date'] = pd.to_datetime(water_data['Date']).dt.normalize()
water_data['Date'] = water_data['Date'].dt.strftime('%m/%d/%Y')
# CSV files contains 288 data entries per day (12per hour * 24hrs). Selecting every 288th element to represent one day
data_24hr = water_data[::288]
# Assigning columns to x, y, y1 axis
x = data_24hr[fields[0]]
y1 = data_24hr[fields[1]]
y2= data_24hr[fields[2]]
#Ploting the graph
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
curve1 = ax1.plot(x,y1, label='Water Level', color = 'r', marker="o")
curve2 = ax2.plot(x,y2,label='Discharge Volume', color = 'b',marker="o")
plt.plot()
plt.show()
Any tips would be greatly appreciated as I'm quite new to this
thank you
Okay I went through the code removed the duplicates (as suggest by Arne) by the "Date" column. Oh and I made the graph formatting slightly more readable. This graphed without going back in time:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
url_hourly = "https://dd.weather.gc.ca/hydrometric/csv/BC/hourly/BC_08MG005_hourly_hydrometric.csv"
url_daily = "https://dd.weather.gc.ca/hydrometric/csv/BC/daily/BC_08MG005_daily_hydrometric.csv"
fields = ["Date","Water Level / Niveau d'eau (m)", "Discharge / Débit (cms)"]
#Read csv files
hourly_data = pd.read_csv(url_hourly, usecols=fields)
day_data = pd.read_csv(url_daily, usecols=fields)
#Merge csv files
water_data = pd.concat([day_data,hourly_data])
#Convert date to datetime
water_data['Date'] = pd.to_datetime(water_data['Date']).dt.normalize()
water_data['Date'] = water_data['Date'].dt.strftime('%m/%d/%Y')
# CSV files contains 288 data entries per day (12per hour * 24hrs). Selecting every 288th element to represent one day
data_24hr = water_data.iloc[::288]
data_24hr.drop_duplicates(subset="Date",inplace=True) #remove duplicates according to the date column
# Assigning columns to x, y, y1 axis
x = data_24hr[fields[0]]
y1 = data_24hr[fields[1]]
y2= data_24hr[fields[2]]
print(len(x), len(y1))
#Ploting the graph
fig, ax1 = plt.subplots()
ax2 = plt.twinx()
curve1 = ax1.plot(x, y1, label='Water Level', color = 'r', marker="o")
curve2 = ax2.plot(x, y2, label='Discharge Volume', color = 'b',marker="o")
fig.autofmt_xdate(rotation=90)
plt.show()

How to plot large dataset of date vs time using matplot lib

I want to plot date vs time graph using matplot lib. The issue I am facing is that due to access of data many lines are showing on the xaxis and I can't find a way to plot my time on xaxis cleanly with one hour gap. Say i have data in my list as string as ['6:01','6:30','7:20','7:25']. I want to divide my xaxis from 6:00 to 7:00 and the time points between them should be plotted based on time.
Note: time list is just and example I want to do this for whole 24 hour.
I tried to use ticks and many other options to complete my task but unfortunatly I am stuck at this problem. My data is in csv file.
Below is my code:
def arrivalGraph():
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
with open("Timetable2021.csv","r") as f:
fileData = f.readlines()
del fileData[0]
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M").time()
train1.append(train1Time.strftime("%H:%M"))
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
plt.gca().xaxis.set_major_formatter(dateformater) # to format the xaxis
plt.xlabel("Date")
plt.ylabel("Time")
plt.title("Train Time vs Date Schedule")
plt.tight_layout()
plt.show()
When i run the code i get the following output:
output of above code
Assuming that every single minute that every single minute is present in train1 (i.e. train1 = ["00:00", "00:01", "00:02", "00:03", ... , "23:59"]), you can use plt.xticks() by generating an array representing xticks with empty string on every minute which is not 0.
unique_times = sorted(set(train1))
xticks = ['' if time[-2:]!='00' else time for time in unique_times]
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the yaxis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
plt.xticks(range(len(xticks)), xticks)
plt.tight_layout()
plt.show()
If every single minute is not in the train1 array, you have to keep train1 data as an object and generate arrays representing xticks location and values to be used as plt.xticks() parameters.
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M")
train1.append(train1Time)
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the y axis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
ax = plt.gca()
xticks_val = []
xticks_loc = []
distance = (ax.get_xticks()[-1] - ax.get_xticks()[0]) / 24
def to_hour_str(x):
x = str(x)
if len(x) < 2:
x = '0' + x
return x + ':00'
for h in range(25):
xticks_val.append(to_hour_str(h))
xticks_loc.append(ax.get_xticks()[0] + h * distance)
plt.xticks(xticks_loc, xticks_val, rotation=90, ha='left')
plt.tight_layout()
plt.show()
Here's the code output using dummy data I generated myself.

How to label my x-axis with years extracted from my time-series data?

I have data in this format / shape etc in a dataframe that I would like to represent in the form of a graph showing the total counts per each month. I have resampled the data so that it shows one row for one month, and then I wrote the following code to chart it out:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
#Read in data & create total column
stacked_bar_data = new_df
stacked_bar_data["total"] = stacked_bar_data.var1 + stacked_bar_data.var2
#Set general plot properties
sns.set_style("whitegrid")
sns.set_context({"figure.figsize": (24, 10)})
sns.set_context("poster")
#Plot 1 - background - "total" (top) series
sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.total, color = "red")
#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.attended, color = "#0000A3")
topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3', edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['var1', 'var2'], loc=1, ncol = 2, prop={'size':18})
l.draw_frame(False)
#Optional code - Make plot look nicer
sns.despine(left=True)
bottom_plot.set_ylabel("Count")
# bottom_plot.set_xlabel("date")
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(16)
# making sure our xticks is formatted correctly
plt.xticks(fontsize=20)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
bottom_plot.xaxis.set_major_locator(years)
bottom_plot.xaxis.set_major_formatter(years_fmt)
bottom_plot.xaxis.set_minor_locator(months)
plt.show()
# bottom_plot.axes.xaxis.set_visible(False)
Thing is, my chart doesn't show me the years at the bottom. I believe I have all the pieces necessary to solve this problem, but for some reason I can't figure out what I'm doing wrong.
I think I'm doing something wrong with how I set up the subplots of the sns.barplot. Maybe I should be assigning them to fig and ax or something like that? That's how I saw it done on the matplotlib site. I just can't managed to transfer that logic over to my example.
Any help would be most appreciated. Thanks!
There are few things to consider. First of all, please try to convert your date column (new_df.date) to datetime.
new_df.date = pd.to_datetime(new_df.date)
Second of all do not use this part:
bottom_plot.xaxis.set_major_locator(years)
bottom_plot.xaxis.set_major_formatter(years_fmt)
bottom_plot.xaxis.set_minor_locator(months)
Instead use:
x_dates = stacked_bar_data['date'].dt.strftime('%Y').sort_values().unique()
bottom_plot.set_xticklabels(labels=x_dates, rotation=0, ha='center')
This is because seaborn re-locates the bars to integer positions. Even if we set them to be dates - Note, that you used indices explicitly. Below is fully working example. Note - this gives you major ticks only. You'll have to work the minor ticks out. My comments and things I've commented out after double #.
stacked_bar_data.date = pd.to_datetime(stacked_bar_data.date)
stacked_bar_data["total"] = stacked_bar_data.var1 + stacked_bar_data.var2
#Set general plot properties
sns.set_style("whitegrid")
sns.set_context({"figure.figsize": (14, 7)}) ## modified size :)
sns.set_context("poster")
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.total, color = "red")
bottom_plot = sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.attended, color = "#0000A3")
topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3', edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['var1', 'var2'], loc=1, ncol = 2, prop={'size':18})
l.draw_frame(False)
#Optional code - Make plot look nicer
sns.despine(left=True)
bottom_plot.set_ylabel("Count")
# bottom_plot.set_xlabel("date")
# making sure our xticks is formatted correctly
## plt.xticks(fontsize=20) # not needed as you change font below in the loop
## Do not use at all
## bottom_plot.xaxis.set_major_locator(years)
## bottom_plot.xaxis.set_major_formatter(years_fmt)
## bottom_plot.xaxis.set_minor_locator(months)
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(16)
## This part is required if you want to stick to seaborn
## This is because the moment you start using seaborn it will "re-position" the bars
## at integer position rather than dates. W/o seaborn there is no such need
x_dates = stacked_bar_data['date'].dt.strftime('%Y').sort_values().unique()
bottom_plot.set_xticklabels(labels=x_dates, rotation=0, ha='center')
plt.show()

divide x and y labels in Matplotlib

I have a graph with X as a date and Y as some readings. the X axis has a date interval with an increment of one day. what i want is to show the hours on the x axis between two days(just to set the hours in the yellow area in the graph).
The idea of the code is:
Date=[];Readings=[] # will be filled from another function
dateconv=np.vectorize(datetime.fromtimestamp)
Date_F=dateconv(Date)
ax1 = plt.subplot2grid((1,1), (0,0))
ax1.plot_date(Date_F,Readings,'-')
for label in ax1.xaxis.get_ticklabels():
label.set_rotation(45)
ax1.grid(True)
plt.xlabel('Date')
plt.ylabel('Readings')
ax1.set_yticks(range(0,800,50))
plt.legend()
plt.show()
You can use MultipleLocator from matplotlib.ticker with set_major_locator and set_minor_locator. See example.
Example
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import datetime
# Generate some data
d = datetime.timedelta(hours=1/5)
now = datetime.datetime.now()
times = [now + d * j for j in range(250)]
ax = plt.gca() # get the current axes
ax.plot(times, range(len(times)))
for label in ax.xaxis.get_ticklabels():
label.set_rotation(30)
# Set the positions of the major and minor ticks
dayLocator = MultipleLocator(1)
hourLocator = MultipleLocator(1/24)
ax.xaxis.set_major_locator(dayLocator)
ax.xaxis.set_minor_locator(hourLocator)
# Convert the labels to the Y-m-d format
xax = ax.get_xaxis() # get the x-axis
adf = xax.get_major_formatter() # the the auto-formatter
adf.scaled[1/24] = '%Y-%m-%d' # set the < 1d scale to Y-m-d
adf.scaled[1.0] = '%Y-%m-%d' # set the > 1d < 1m scale to Y-m-d
plt.show()
Result

convert Panda Line graph to Bar graph with month name

I am trying to convert Line garph to Bar graph using python panda.
Here is my code which gives perfect line graph as per my requirement.
conn = sqlite3.connect('Demo.db')
collection = ['ABC','PQR']
df = pd.read_sql("SELECT * FROM Table where ...", conn)
df['DateTime'] = df['Timestamp'].apply(lambda x: dt.datetime.fromtimestamp(x))
df.groupby('Type').plot(x='DateTime', y='Value',linewidth=2)
plt.legend(collection)
plt.show()
Here is my DataFrame df
http://postimg.org/image/75uy0dntf/
Here is my Line graph output from above code.
http://postimg.org/image/vc5lbi9xv/
I want to draw bar graph instead of line graph.I want month name on x axis and value on y axis. I want colorful bar graph.
Attempt made
df.plot(x='DateTime', y='Value',linewidth=2, kind='bar')
plt.show()
It gives improper bar graph with date and time(instead of month and year) on x axis. Thank you for help.
Here is a code that might do what you want.
In this code, I first sort your database by time. This step is important, because I use the indices of the sorted database as abscissa of your plots, instead of the timestamp. Then, I group your data frame by type and I plot manually each group at the right position (using the sorted index). Finally, I re-define the ticks and the tick labels to display the date in a given format (in this case, I chose MM/YYYY but that can be changed).
import datetime
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
types = ['ABC','BCD','PQR']*3
vals = [126,1587,141,10546,1733,173,107,780,88]
ts = [1414814371, 1414814371, 1406865621, 1422766793, 1422766793, 1425574861, 1396324799, 1396324799, 1401595199]
aset = zip(types, vals, ts)
df = pd.DataFrame(data=aset, columns=['Type', 'Value', 'Timestamp'])
df = df.sort(['Timestamp', 'Type'])
df['Date'] = df['Timestamp'].apply(lambda x: datetime.datetime.fromtimestamp(x).strftime('%m/%Y'))
groups = df.groupby('Type')
ngroups = len(groups)
colors = ['r', 'g', 'b']
fig = plt.figure()
ax = fig.add_subplot(111, position=[0.15, 0.15, 0.8, 0.8])
offset = 0.1
width = 1-2*offset
#
for j, group in enumerate(groups):
x = group[1].index+offset
y = group[1].Value
ax.bar(x, y, width=width, color=colors[j], label=group[0])
xmin, xmax = min(df.index), max(df.index)+1
ax.set_xlim([xmin, xmax])
ax.tick_params(axis='x', which='both', top='off', bottom='off')
plt.xticks(np.arange(xmin, xmax)+0.5, list(df['Date']), rotation=90)
ax.legend()
plt.show()
I hope this works for you. This is the output that I get, given my subset of your database.

Categories