Matplotlib mangled line graphs when using datetime data on x axis

Matplotlib mangled line graphs when using datetime data on x axis - python

When using matplotlib to graph time series data, I get strange mangled outputs
This only happens when I convert the time series string in the csv to a datetime object. I need to do this conversion to make use of Matplotlib's time series x axis labelling.
abc = pd.read_csv(path + "Weekly average capacity factor and demand - regular year" + ".csv", parse_dates=True, index_col="Month", header=0)
fig, ax = plt.subplots()
x = abc.index
y1 = abc["Wind"]
curve1 = ax.plot(x, y1)
pd.read_csv(parse_dates=True) creates the index as a datetime64[64] object. Perhaps this isn't optimized for use by matplotlib??
How can I make this work.

Your date index is not in order. Matplotlib does not sort the data prior to plotting, it assumes list order is the data order and tries to connect the points (You can check this by plotting a scatter plot and your data will look fine). You have to sort your data before trying to plot it.
abc = pd.read_csv(path + "Weekly average capacity factor and demand - regular year" + ".csv", parse_dates=True, index_col="Month", header=0)
x, y1 = zip(*sorted(zip(abc.index, abc["Wind"])))
fig, ax = plt.subplots()
curve1 = ax.plot(x, y1)

Related

Mathplotlib graph problems

I'm trying to display data from a weather station with mathplotlib. For some reason that I can't quite figure out my last values on the graph are acting randomly, going back in time on the x axis.
x axis is the dates,
y axis is the water level
y1 axis is the discharge flow
Here's a picture of the result
Graph
import pandas as pd
import matplotlib.pyplot as plt
url_hourly = "https://dd.weather.gc.ca/hydrometric/csv/BC/hourly/BC_08MG005_hourly_hydrometric.csv"
url_daily = "https://dd.weather.gc.ca/hydrometric/csv/BC/daily/BC_08MG005_daily_hydrometric.csv"
fields = ["Date","Water Level / Niveau d'eau (m)", "Discharge / Débit (cms)"]
#Read csv files
hourly_data = pd.read_csv(url_hourly, usecols=fields)
day_data = pd.read_csv(url_daily, usecols=fields)
#Merge csv files
water_data = pd.concat([day_data,hourly_data])
#Convert date to datetime
water_data['Date'] = pd.to_datetime(water_data['Date']).dt.normalize()
water_data['Date'] = water_data['Date'].dt.strftime('%m/%d/%Y')
# CSV files contains 288 data entries per day (12per hour * 24hrs). Selecting every 288th element to represent one day
data_24hr = water_data[::288]
# Assigning columns to x, y, y1 axis
x = data_24hr[fields[0]]
y1 = data_24hr[fields[1]]
y2= data_24hr[fields[2]]
#Ploting the graph
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
curve1 = ax1.plot(x,y1, label='Water Level', color = 'r', marker="o")
curve2 = ax2.plot(x,y2,label='Discharge Volume', color = 'b',marker="o")
plt.plot()
plt.show()
Any tips would be greatly appreciated as I'm quite new to this
thank you

Okay I went through the code removed the duplicates (as suggest by Arne) by the "Date" column. Oh and I made the graph formatting slightly more readable. This graphed without going back in time:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import ticker
url_hourly = "https://dd.weather.gc.ca/hydrometric/csv/BC/hourly/BC_08MG005_hourly_hydrometric.csv"
url_daily = "https://dd.weather.gc.ca/hydrometric/csv/BC/daily/BC_08MG005_daily_hydrometric.csv"
fields = ["Date","Water Level / Niveau d'eau (m)", "Discharge / Débit (cms)"]
#Read csv files
hourly_data = pd.read_csv(url_hourly, usecols=fields)
day_data = pd.read_csv(url_daily, usecols=fields)
#Merge csv files
water_data = pd.concat([day_data,hourly_data])
#Convert date to datetime
water_data['Date'] = pd.to_datetime(water_data['Date']).dt.normalize()
water_data['Date'] = water_data['Date'].dt.strftime('%m/%d/%Y')
# CSV files contains 288 data entries per day (12per hour * 24hrs). Selecting every 288th element to represent one day
data_24hr = water_data.iloc[::288]
data_24hr.drop_duplicates(subset="Date",inplace=True) #remove duplicates according to the date column
# Assigning columns to x, y, y1 axis
x = data_24hr[fields[0]]
y1 = data_24hr[fields[1]]
y2= data_24hr[fields[2]]
print(len(x), len(y1))
#Ploting the graph
fig, ax1 = plt.subplots()
ax2 = plt.twinx()
curve1 = ax1.plot(x, y1, label='Water Level', color = 'r', marker="o")
curve2 = ax2.plot(x, y2, label='Discharge Volume', color = 'b',marker="o")
fig.autofmt_xdate(rotation=90)
plt.show()

How to plot observations from each row of a dataframe as a line plot

I want to display multiple datasets in one graph.
But i can't seem to get the y axis to work and get the following error: ValueError: x and y must have same first dimension, but have shapes (2,) and (6060000,)
Since I am still a beginner and i copied parts of my code from different sources, my code is most likely pretty bad at some places.
I never asked any pandas/matplotlib questions, so i hope this is reproducible.
The dataframe has many columns, but only a small subset have been provided in the code sample.
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
channel_data = pd.DataFrame({'Creation date': ['2014-01-02', '2013-09-11', '2007-08-19'], 'Subscriber count': [6060000, 4110000, 4440000 ]})
# get x and y from first channel
now = str(dt.datetime.now())
now = now[:10]
dates = [channel_data["Creation date"][0], now]
dates2 = [channel_data["Creation date"][1], now]
dates3 = [channel_data["Creation date"][2], now]
x1 = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in dates]
x2 = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in dates2]
x3 = [dt.datetime.strptime(d,'%Y-%m-%d').date() for d in dates3]
# PROBLEM HERE
y1 = range(len(x1)) # i got the x axis to work but am having problems with this part
y2 = range(len(x2))
y3 = range(len(x3))
#y1 = range(0, channel_data["Subscriber count"][0])
# this was my idea of displaying the data (y-axis)
# -----------------------------------------------------------
plt.figure(figsize=(10, 5))
plt.title("Channel growth over time [USD]", fontdict={"fontweight": "bold"})
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.plot(x1, y1, "b.-", label="Carwow") #b.- to choose color=blue, pointer=. , line=normal line
plt.plot(x2, y2, "r.-", label="Doug Demuro")
plt.plot(x3, y3, "g.-", label="Scotty Kilmer")
plt.xlabel("Date", fontdict={"fontsize": 13})
plt.ylabel("Subscribers", fontdict={"fontsize": 12})
plt.legend()
plt.show()
The first image shows the current graph (with wrong y values).
The second image shows a scetch of how i want to display the data.
I know this is a lot to ask at once but maybe just has an idea or a direction which i could go. Tried out a bunch of stuff but nothing worked.
Thank you for reading.

As a note, this is not the correct way to visualize growth rate. The plot implies linear growth, because you're just plotting a line between two points. Growth rate should be determine based on the intermediate count on other dates.
The error was occurring at plt.plot(x1, y1,...), because x1 was the length d in dates (which is 2), but y1 was a length of 6060000.
Use pandas.DataFrame.iterrows to iterate through and plot each observation.
Each list for x and y of plot is comprised of 2 values
x always begins at the creation date, and ends at now
y always begins at 0, and ends at the subscriber count
import pandas as pd
import matplotlib.pyplot as plt
# crate a dataframe
df = pd.DataFrame({'Creation date': ['2014-01-02', '2013-09-11', '2007-08-19'], 'Subscriber count': [6060000, 4110000, 4440000], 'Channel name': ['Carwow', 'Doug Demuro', 'Scotty Kilmer']})
# convert any date columns to a datetime dtype
df['Creation date'] = pd.to_datetime(df['Creation date']).dt.date
# display(df)
Creation date Subscriber count Channel name
0 2014-01-02 6060000 Carwow
1 2013-09-11 4110000 Doug Demuro
2 2007-08-19 4440000 Scotty Kilmer
# get the current datetime date
now = datetime.now().date()
# iterate through the rows and plot
for i, v in df.iterrows():
# get the values and labels to plot
x0 = v['Creation date']
y1 = v['Subscriber count']
label = v['Channel name']
plt.plot([x0, now], [0, y1], label=label)
plt.legend()

How to plot large dataset of date vs time using matplot lib

I want to plot date vs time graph using matplot lib. The issue I am facing is that due to access of data many lines are showing on the xaxis and I can't find a way to plot my time on xaxis cleanly with one hour gap. Say i have data in my list as string as ['6:01','6:30','7:20','7:25']. I want to divide my xaxis from 6:00 to 7:00 and the time points between them should be plotted based on time.
Note: time list is just and example I want to do this for whole 24 hour.
I tried to use ticks and many other options to complete my task but unfortunatly I am stuck at this problem. My data is in csv file.
Below is my code:
def arrivalGraph():
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
from matplotlib import dates as mpl_dates
with open("Timetable2021.csv","r") as f:
fileData = f.readlines()
del fileData[0]
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M").time()
train1.append(train1Time.strftime("%H:%M"))
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
plt.gca().xaxis.set_major_formatter(dateformater) # to format the xaxis
plt.xlabel("Date")
plt.ylabel("Time")
plt.title("Train Time vs Date Schedule")
plt.tight_layout()
plt.show()
When i run the code i get the following output:
output of above code

Assuming that every single minute that every single minute is present in train1 (i.e. train1 = ["00:00", "00:01", "00:02", "00:03", ... , "23:59"]), you can use plt.xticks() by generating an array representing xticks with empty string on every minute which is not 0.
unique_times = sorted(set(train1))
xticks = ['' if time[-2:]!='00' else time for time in unique_times]
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the yaxis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
plt.xticks(range(len(xticks)), xticks)
plt.tight_layout()
plt.show()
If every single minute is not in the train1 array, you have to keep train1 data as an object and generate arrays representing xticks location and values to be used as plt.xticks() parameters.
date = []
train1 = []
for data in fileData:
ind = data.split(",")
date.append(datetime.strptime(ind[0],"%d/%m/%Y").date())
train1Time = datetime.strptime(ind[1],"%H:%M")
train1.append(train1Time)
plt.style.use("seaborn")
plt.figure(figsize = (10,10))
plt.plot_date(train1,date)
plt.gcf().autofmt_xdate()#gcf is get current figure - autofmt is auto format
dateformater = mpl_dates.DateFormatter("%b ,%d %Y")
# I think you wanted to format the y axis instead of xaxis
plt.gca().yaxis.set_major_formatter(dateformater) # to format the yaxis
plt.ylabel("Date")
plt.xlabel("Time")
plt.title("Train Time vs Date Schedule")
ax = plt.gca()
xticks_val = []
xticks_loc = []
distance = (ax.get_xticks()[-1] - ax.get_xticks()[0]) / 24
def to_hour_str(x):
x = str(x)
if len(x) < 2:
x = '0' + x
return x + ':00'
for h in range(25):
xticks_val.append(to_hour_str(h))
xticks_loc.append(ax.get_xticks()[0] + h * distance)
plt.xticks(xticks_loc, xticks_val, rotation=90, ha='left')
plt.tight_layout()
plt.show()
Here's the code output using dummy data I generated myself.

Adjusting x-axis in matplotlib

I have a range of values for every hour of year. Which means there are 24 x 365 = 8760 values. I want to plot this information neatly with matplotlib, with x-axis showing January, February......
Here is my current code:
from matplotlib import pyplot as plt
plt.plot(x_data,y_data,label=str("Plot"))
plt.xticks(rotation=45)
plt.xlabel("Time")
plt.ylabel("Y axis values")
plt.title("Y axis values vs Time")
plt.legend(loc='upper right')
axes = plt.gca()
axes.set_ylim([0,some_value * 3])
plt.show()
x_data is a list containing dates in datetime format. y_data contains values corresponding to the values in x_data. How can I get the plot neatly done with months on the X axis? An example:

You could create a scatter plot with horizontal lines as markers. The month is extracted by using the datetime module. In case the dates are not ordered, the plot sorts both lists first according to the date:
#creating a toy dataset for one year, random data points within month-specific limits
from datetime import date, timedelta
import random
x_data = [date(2017, 1, 1) + timedelta(days = i) for i in range(365)]
random.shuffle(x_data)
y_data = [random.randint(50 * (i.month - 1), 50 * i.month) for i in x_data]
#the actual plot starts here
from matplotlib import pyplot as plt
#get a scatter plot with horizontal markers for each data point
#in case the dates are not ordered, sort first the dates and the y values accordingly
plt.scatter([day.strftime("%b") for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], marker = "_", s = 900)
plt.show()
Output
The disadvantage is obviously that the lines have a fixed length. Also, if a month doesn't have a data point, it will not appear in the graph.
Edit 1:
You could also use Axes.hlines, as seen here.
This has the advantage, that the line length changes with the window size. And you don't have to pre-sort the lists, because each start and end point is calculated separately.
The toy dataset is created as above.
from matplotlib import pyplot as plt
#prepare the axis with categories Jan to Dec
x_ax = [date(2017, 1, 1) + timedelta(days = 31 * i) for i in range(12)]
#create invisible bar chart to retrieve start and end points from automatically generated bars
Bars = plt.bar([month.strftime("%b") for month in x_ax], [month.month for month in x_ax], align = "center", alpha = 0)
start_1_12 = [plt.getp(item, "x") for item in Bars]
end_1_12 = [plt.getp(item, "x") + plt.getp(item, "width") for item in Bars]
#retrieve start and end point for each data point line according to its month
x_start = [start_1_12[day.month - 1] for day in x_data]
x_end = [end_1_12[day.month - 1] for day in x_data]
#plot hlines for all data points
plt.hlines(y_data, x_start, x_end, colors = "blue")
plt.show()
Output
Edit 2:
Now your description of the problem is totally different from what you show in your question. You want a simple line plot with specific axis formatting. This can be found easily in the matplotlib documentation and all over SO. An example, how to achieve this with the above created toy dataset would be:
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator
ax = plt.subplot(111)
ax.plot([day for day in sorted(x_data)], [y for _xsorted, y in sorted(zip(x_data, y_data))], "r.-")
ax.xaxis.set_major_locator(MonthLocator(bymonthday=15))
ax.xaxis.set_minor_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter("%B"))
plt.show()
Output

Matplotlib - Formatting two plots on the same figure

I'm trying to plot some data but I'm getting stuck on plotting 2 plots on same figure. It looks like this:
The code is:
import re
import sqlite3
import matplotlib.pyplot as plt
from matplotlib.dates import datetime as dt
from matplotlib.dates import DateFormatter
...
for company in companies:
cursor.execute("select distinct url from t_surv_data where company = ? order by product_type", (company,))
urls = [r[0] for r in cursor.fetchall()]
for idx, url in enumerate(urls):
cursor.execute("select price, timestamp from t_surv_data where url = ? order by timestamp", (url,))
data = [[r[0], r[1]] for r in cursor.fetchall()]
price, date = zip(*data)
date = [dt.datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in date]
f = plt.figure('''figsize=(3, 2)''')
ax = f.add_subplot(111)
ax.plot(date, price) # x, y
ax.xaxis.set_major_formatter(DateFormatter('%d\n%h\n%Y'))
#ax.set_ylim(ymin=0) # If I use this a break the plot
ax2 = f.add_subplot(211)
ax2.scatter(date, [1,1,-1])
ax2.xaxis.set_major_formatter(DateFormatter('%d\n%h\n%Y'))
#ax2.set_ylim(ymin=-1, ymax=1) # If I use this a break the plot
plt.savefig('plt/foo' + str(idx) + '.png')
plt.close()
How can I solve this questions:
1 - The plots looks like they are one above the other. How can I format this with a visual to look like independent plots on the same figure.
2 - I'm using this line of code to both plots "ax2.xaxis.set_major_formatter(DateFormatter('%d\n%h\n%Y'))" but there is no sync in the dates. The dates should be equal in the two plots.
Some one can give me a clue on this questions?
Best Regards,

You are not using add_subplot correctly:
ax = f.add_subplot(2,1,1)
ax2 = f.add_subplot(2,1,2)
The first number indicates the number of rows, the second the number of columns and the third the index of the plot.

If you want the plots to share the x axis (that is the axis with dates), you have to specify the sharex property.
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
ax1.plot(...)
ax2.scatter(...)
ax1.xaxis.set_major_formatter(DateFormatter('%d\n%h\n%Y'))
You only have to set the major formatter once since they share the x axis.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib mangled line graphs when using datetime data on x axis - python

Related

Mathplotlib graph problems

How to plot observations from each row of a dataframe as a line plot

How to plot large dataset of date vs time using matplot lib

Adjusting x-axis in matplotlib

Matplotlib - Formatting two plots on the same figure

Categories

Resources