pandas read from CSV and plot the data

pandas read from CSV and plot the data - python

What I want to do is read from a .csv file and then plot the data read. The file has the below format:
Date/Time,Humidity,Temperature
00:00:56,90.00,16.90 00:01:56,90.00,16.90 00:02:56,90.00,16.90 00:03:56,91.00,16.90 00:04:56,90.00,16.90 00:05:56,91.00,16.90 00:06:56,91.00,16.90 00:07:56,91.00,16.90 00:08:56,91.00,16.90 00:09:56,91.00,16.90 00:10:56,91.00,16.90 00:11:56,91.00,16.90 00:12:56,91.00,16.90 00:13:56,91.00,16.90 00:14:56,91.00,16.90 00:15:56,91.00,16.90 00:16:56,91.00,16.90 00:17:56,91.00,16.80 00:18:56,91.00,16.90
Then after reading the data I'm formatting them in datetime using the below code:
data = pd.read_csv("Data-24-January-2021.csv")
data["Date/Time"] = pd.to_datetime(data["Date/Time"])
data.sort_values("Date/Time", inplace=True)
dt_time = data["Date/Time"]
When I try to print dt_time I get ->
0 2021-01-25 00:00:56
1 2021-01-25 00:01:56
2 2021-01-25 00:02:56
3 2021-01-25 00:03:56
4 2021-01-25 00:04:56
...
1435 2021-01-25 23:55:46
1436 2021-01-25 23:56:46
1437 2021-01-25 23:57:46
1438 2021-01-25 23:58:46
1439 2021-01-25 23:59:46
Name: Date/Time, Length: 1440, dtype: datetime64[ns]
So when I try to plot the data, in x-axis I get something like this:
I want to get rid of the 01-25 in the plot.
I've managed getting only the time from the .csv file but with all the methods I tried it returned dtype: object, so the plot takes too long to plot and isn't displaying correctly the x-axis.
Can you please advise on which is the best way to approach this? What I basically want is to read the Date/Time from the .csv file and plot it in x-axis, without the 01-25.
The code I'm currently running is the below:
import pandas as pd
import os
from matplotlib import pyplot as plt
from datetime import datetime
plt.style.use("seaborn")
data = pd.read_csv("Data-24-January-2021.csv")
data["Date/Time"] = pd.to_datetime(data["Date/Time"])
data.sort_values("Date/Time", inplace=True)
dt_time = data["Date/Time"].dt.strftime("%H :%M")
# print(dt_time)
humidity = data["Humidity"]
temperature = data["Temperature"]
# plt.plot(dt_time, humidity, label="Humidity")
plt.plot(dt_time, temperature, label="Temperature")
plt.gcf().autofmt_xdate()
plt.legend(loc="upper left")
plt.title("Test Plot")
plt.xlabel("Time")
plt.ylabel("Temperature")
plt.tight_layout()
plt.savefig("Figure_example3.png")
plt.show()

Related

Plotly, Python. Can i plot vertical line based on a datetime object?

I am forecasting Covid cases and using Plotly for visualization. I would like to plot a straight vertical line in place where forecast starts. I have a chart like this.
chart. I just want to plot a vertical line on date 25 Jan 2021, so it is visible where forecast starts.

because you didn't share your data i tried to solve the answer with a sample code snippet from plotly:
import plotly.express as px
df = px.data.stocks()
fig = px.line(df, x='date', y="GOOG")
fig.add_vline(x='2019-01-25')
fig.show()
I have added the following line to my code before fig.show():
fig.add_vline(x='2021-01-25')
If my date format differs from yours, you get it by printing your graph input:
print(df)
date ...
0 2018-01-01 ...
1 2018-01-08 ...
2 2018-01-15 ...
3 2018-01-22 ...
4 2018-01-29 ...
... ...
If you need more info and examples check: https://plotly.com/python/horizontal-vertical-shapes/
My result chart

Seaborn Bar Plot with Dates as X-Axis

I have tried unsuccessfully to create a bar plot of a time series dataset. I have tried converting the dates to Pandas Datetime objects, Timestamp Objects, primitive strings, floats, and ints. No matter what I do, I get the following error: TypeError: float() argument must be a string or a number, not 'Timestamp' Here are a few minimal examples that produce the error:
Here, the 'Date' object is of type <class 'pandas._libs.tslibs.timestamps.Timestamp'>, so I know why this doesn't work:
import matplotlib.pylab as plt
import matplotlib.dates as mdates
import seaborn as sns
def main():
path = 'Data/AQ+RX Counts.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
weekly_df = df.resample('W').mean().reset_index()
weekly_df['count'] = df['count'].resample('W').sum().reset_index()
sns.barplot(x = 'Date', y='count', data = weekly_df)
plt.show()
main()
I then tried making the dates floats, intending to format them back to dates after, but this still doesnt work:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I also tried making them integers, to no avail:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
dates = dates.astype(int)
dates = pd.Series(dates)
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I've tried many other variations, and all produce the same error. I've even compared it to other code and verified that all the types are identical, and the comparison code works fine. I am at a complete loss of where to go from here.
Dataframe:
Date,WSA,WSV,WDV,WSM,SGT,T2M,T10M,DELTA_T,PBAR,SRAD,RH,PM25,AQI,count
2015-01-01,1.0708333333333335,0.8750000000000001,132.95833333333334,3.4708333333333337,35.39166666666667,30.72916666666667,30.625,-0.11666666666666667,738.8249999999998,72.66666666666667,99.75416666666666,24.80833333333333,73.30793131580873,0.0
2015-01-02,1.1086956521739129,0.9391304347826086,148.47826086956522,3.734782608695653,32.46521739130434,34.39130434782609,34.27826086956521,-0.11739130434782602,738.3478260869565,61.39130434782609,100.01304347826084,23.500000000000004,64.15072523318715,4.0
2015-01-03,1.0173913043478258,0.7173913043478259,168.04347826086956,3.773913043478261,42.71739130434783,36.24782608695652,36.160869565217396,-0.09565217391304348,739.4434782608695,49.60869565217392,100.76956521739132,20.460869565217394,55.65271063058384,0.0
2015-01-04,1.0,0.6,159.95833333333334,3.85,49.15,38.8875,38.66666666666666,-0.225,741.5000000000001,31.54166666666667,101.47916666666669,13.012499999999998,46.835258118800965,0.0
2015-01-05,1.0333333333333334,0.4416666666666667,137.0,4.0,57.56666666666666,42.99583333333333,42.94583333333333,-0.04999999999999995,742.5333333333333,44.58333333333334,101.00416666666666,16.654166666666665,52.420271225456766,4.0
2015-01-06,0.7818181818181817,0.5590909090909091,114.72727272727272,3.654545454545455,42.86818181818182,40.7409090909091,41.09545454545454,0.36818181818181817,740.9045454545453,48.27272727272727,100.57727272727274,21.954545454545453,67.31833852518514,6.0
2015-01-07,0.9739130434782608,0.8304347826086954,110.82608695652172,3.956521739130436,30.817391304347833,40.36521739130435,40.59565217391304,0.22173913043478266,739.8652173913043,60.04347826086956,100.19565217391305,24.456521739130434,72.3472505968891,6.0
2015-01-08,0.9833333333333336,0.8250000000000001,156.5,4.208333333333333,32.67083333333333,41.520833333333336,41.36666666666667,-0.12916666666666668,736.35,69.58333333333333,99.95833333333331,22.274999999999995,65.77072473472253,10.0
2015-01-09,0.9583333333333331,0.7291666666666669,133.70833333333334,3.3791666666666664,39.645833333333336,42.279166666666654,42.15833333333333,-0.11666666666666665,735.2041666666665,60.41666666666666,100.04166666666669,19.370833333333334,59.08512936837911,10.0
2015-01-10,0.9666666666666668,0.7583333333333336,164.5,3.675,37.34583333333333,42.96250000000001,42.775,-0.2,734.2875,41.5,100.12083333333337,14.658333333333335,49.31465266245389,0.0

The transformation in the function is converting 'count' from floats to a datetime dtype.
Using the posted sample data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path = 'data/test.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
# display(df)
WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count
Date
2015-01-01 1.070833 0.875000 132.958333 3.470833 35.391667 30.729167 30.625000 -0.116667 738.825000 72.666667 99.754167 24.808333 73.307931 0.0
2015-01-02 1.108696 0.939130 148.478261 3.734783 32.465217 34.391304 34.278261 -0.117391 738.347826 61.391304 100.013043 23.500000 64.150725 4.0
2015-01-03 1.017391 0.717391 168.043478 3.773913 42.717391 36.247826 36.160870 -0.095652 739.443478 49.608696 100.769565 20.460870 55.652711 0.0
2015-01-04 1.000000 0.600000 159.958333 3.850000 49.150000 38.887500 38.666667 -0.225000 741.500000 31.541667 101.479167 13.012500 46.835258 0.0
2015-01-05 1.033333 0.441667 137.000000 4.000000 57.566667 42.995833 42.945833 -0.050000 742.533333 44.583333 101.004167 16.654167 52.420271 4.0
2015-01-06 0.781818 0.559091 114.727273 3.654545 42.868182 40.740909 41.095455 0.368182 740.904545 48.272727 100.577273 21.954545 67.318339 6.0
2015-01-07 0.973913 0.830435 110.826087 3.956522 30.817391 40.365217 40.595652 0.221739 739.865217 60.043478 100.195652 24.456522 72.347251 6.0
2015-01-08 0.983333 0.825000 156.500000 4.208333 32.670833 41.520833 41.366667 -0.129167 736.350000 69.583333 99.958333 22.275000 65.770725 10.0
2015-01-09 0.958333 0.729167 133.708333 3.379167 39.645833 42.279167 42.158333 -0.116667 735.204167 60.416667 100.041667 19.370833 59.085129 10.0
2015-01-10 0.966667 0.758333 164.500000 3.675000 37.345833 42.962500 42.775000 -0.200000 734.287500 41.500000 100.120833 14.658333 49.314653 0.0
# resample mean
dfr = df.resample('W').mean()
# add the resampled sum to dfr
dfr['mean'] = df['count'].resample('W').sum()
# reset index
dfr = dfr.reset_index()
# display(dfr)
Date WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count mean
0 2015-01-04 1.049230 0.782880 152.359601 3.707382 39.931069 35.063949 34.932699 -0.138678 739.529076 53.802083 100.503986 20.445426 59.986656 1.0 4.0
1 2015-01-11 0.949566 0.690615 136.210282 3.812261 40.152457 41.810743 41.822823 0.015681 738.190794 54.066590 100.316321 19.894900 61.042728 6.0 36.0
# plot dfr
fig, ax = plt.subplots(figsize=(16, 10))
fig = sns.barplot(x='Date', y='count', data=dfr)
# configure the xaxis ticks from datetime to date
x_dates = dfr.Date.dt.strftime('%Y-%m-%d').sort_values().unique()
ax.set_xticklabels(labels=x_dates, rotation=90, ha='right')
plt.show()

Plotting two dataframe time-series on same graph with different sampling (and using double Y axis)

I have two data frames that look like:
Temp [Degrees_C] Cond [mS/cm]
yyyy-mm-ddThh:mm:ss.sss
2020-01-28 03:00:59 14.553947 19.301285
2020-01-28 08:00:59 14.501740 19.310037
2020-01-28 13:00:59 14.425415 18.531609
2020-01-28 18:00:59 14.414717 16.155998
...
And this:
CONDUCTIVITY Temp [C]
DATE TIME
2020-01-28 03:00:00 18.240 15.761111
2020-01-28 04:00:00 18.147 15.722222
2020-01-28 05:00:00 17.930 15.722222
2020-01-28 06:00:00 17.873 15.666667
...
I want to create one plot using these two data sets, they should share the same x-axis as the date-time, and two different y-axes (one for temperature and one for conductivity).
However, since the sampling is different for both of them, I'm not sure how to do it.
Any suggestions?
Thank you!!

To get the dates to match between the two datasets, I would use pd.DateOffset. Alternatively, you could use dt.round, but that could lead to some issues if you aren't sure the rounding will work.
To plot two y-axes, you are looking for ax.twinx
import pandas as pd
from io import StringIO
str1 = """
2020-01-28 03:00:59, 14.553947, 19.301285
2020-01-28 08:00:59, 14.501740, 19.310037
2020-01-28 13:00:59, 14.425415, 18.531609
2020-01-28 18:00:59, 14.414717, 16.155998
"""
str2="""
2020-01-28 03:00:00, 18.240, 15.761111
2020-01-28 04:00:00, 18.147, 15.722222
2020-01-28 05:00:00, 17.930, 15.722222
2020-01-28 06:00:00, 17.873, 15.666667
"""
colnames1 = ['date','cond','temp']
colnames2 = ['date','temp', 'cond']
df1 = pd.read_csv(StringIO(str1), header=None, names = colnames1, parse_dates=['date'])
df2 = pd.read_csv(StringIO(str2), header=None, names = colnames2, parse_dates=['date'])
#Offset to even seconds
df1.date = df1.date - pd.DateOffset(seconds=59)
#plot
ax = df1.plot(x='date',y='temp', label='df1', color='k', ls = '--')
#Create second y axis
ax_tw = ax.twinx()
df1.plot(x='date',y='cond', ax = ax_tw, label='df1', color='k', ls='-')
df2.plot(x='date',y='temp', ax= ax, label='df2', color='red', ls='--')
df2.plot(x='date',y='cond', ax = ax_tw,label='df2', color='red', ls ='-')
ax.legend()
which returns:

This matplotlib documentation webpage shows a simple example how to plot with double y axis: https://matplotlib.org/gallery/api/two_scales.html
You wrote you have a data frames, which sounds like you use pandas for data handling. You might simply cast your data to numpy arrays and follow the example.

How to save a windrose plot

I plotted my wind data (direction and speed) with the windrose module https://windrose.readthedocs.io/en/latest/index.html.
The results look nice but I cannot export them as a figure (png, eps, or anything to start with) because the result is a special object type that does not have a 'savefig' attribute, or I don't find it.
I have two pandas.core.series.Series: ff, dd
print(ff)
result:
TIMESTAMP
2016-08-01 00:00:00 1.643
2016-08-01 01:00:00 2.702
2016-08-01 02:00:00 1.681
2016-08-01 03:00:00 2.208
....
print(dd)
result:
TIMESTAMP
2016-08-01 00:00:00 328.80
2016-08-01 01:00:00 299.60
2016-08-01 02:00:00 306.90
2016-08-01 03:00:00 288.60
...
My code looks like:
from windrose import WindroseAxes
ax2 = WindroseAxes.from_ax()
ax2.bar(dd, ff, normed=True, opening=0.8, edgecolor='white', bins = [0,4,11,17])
ax2.set_legend()
ax2.tick_params(labelsize=18)
ax2.set_legend(loc='center', bbox_to_anchor=(0.05, 0.005), fontsize = 18)
ax2.savefig('./figures/windrose.eps')
ax2.savefig('./figures/windrose.png')
But the result is:
AttributeError: 'WindroseAxes' object has no attribute 'savefig'
Do you know how to create a figure out of the result so I can use it in my work?

We can use pyplot.savefig() from matplotlib.
import pandas as pd
import numpy as np
from windrose import WindroseAxes
from matplotlib import pyplot as plt
from IPython.display import Image
df_ws = pd.read_csv('WindData.csv')
# df_ws has `Wind Direction` and `Wind Speed`
ax = WindroseAxes.from_ax()
ax.bar(df_ws['Wind Direction'], df_ws['Wind Speed'])
ax.set_legend()
# savefig() supports eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff
plt.savefig('WindRose.jpg')
plt.close()
Image(filename='WindRose.jpg')

assuming that you are using box type of windrose module. Then the following code converts your wind rose into an image:
ax = WindroseAxes.from_ax()
ax.box(direction=wd, var=ws, bins=bins)
buff = io.BytesIO()
plt.savefig(buff, format="jpeg")
pixmap = QtGui.QPixmap()
pixmap.loadFromData(buff.getvalue())
dialog.ui.windrose_label.setScaledContents(True)
dialog.ui.windrose_label.setPixmap(pixmap)

The error is occuring because you are trying to save the subplot instead of the figure. Try:
fig,ax2 = plt.subplots(1,1) # Or whatever you need.
# The windrose code you showed
fig.savefig('./figures/windrose.png')

Matplotlib - Changing Xticks to value on the line not a data point

I'm using matplotlib.pyplot (plt) to plot a graph of temperature against time. I am wanting the xticks to be at 12am and 12pm only, matplot auto picks where the xticks go.
How can I pick exactly 12am and 12pm for each date as the data I'm using doesn't have those data points? Basically I'm wanting an xtick on a point between two data points / on the line between two data points.
Below is a snippet of my data and my function.
Temp UNIX Time Time
5.04 1490562000 2017-03-26 22:00:00
3.21 1490572800 2017-03-27 01:00:00
2.15 1490583600 2017-03-27 04:00:00
1.66 1490594400 2017-03-27 07:00:00
6.92 1490605200 2017-03-27 10:00:00
11.73 1490616000 2017-03-27 13:00:00
13.77 1490626800 2017-03-27 16:00:00
def ploting_graph(self):
ax=plt.gca()
xfmt = md.DateFormatter('%d/%m/%y %p')
ax.xaxis.set_major_formatter(locator)
plt.plot(self.df['Time'],self.df['Temp'], 'k--^')
plt.xticks(rotation=10)
plt.grid(axis='both',color='r')
plt.ylabel("Temp (DegC)")
plt.xlim(min(self.df['Time']),max(self.df['Time']))
plt.ylim(min(self.df['Temp'])-1,max(self.df['Temp'])+1)
plt.title("Forecast Temperature {}".format(self.location))
plt.savefig('{}_day_forecast_{}.png'.format(self.num_of_days,self.location),dpi=300)
As hopefully you can see I'm looking for an xtick on the 2017-03-27 00:00:00 and 12:00:00.
Any help would be greatly appreciated and even a push in the right direction would be fantastic! Thank you very much in advanced!
John.

You need a locator and a formatter. The locator determines that you only want ticks every 12 hour while the formatter determines how the datetime should look like.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
dates = ["2017-03-26 22:00:00","2017-03-27 01:00:00","2017-03-27 04:00:00","2017-03-27 07:00:00",
"2017-03-27 10:00:00","2017-03-27 13:00:00","2017-03-27 16:00:00"]
temps = [5.04, 3.21, 2.15,1.66, 6.92, 11.73, 13.77 ]
df = pd.DataFrame({"Time":dates, "Temp":temps})
df["Time"] = pd.to_datetime(df["Time"], format="%Y-%m-%d %H:%M:%S")
fig, ax = plt.subplots()
xfmt = matplotlib.dates.DateFormatter('%d/%m/%y 12%p')
ax.xaxis.set_major_formatter(xfmt)
locator = matplotlib.dates.HourLocator(byhour=[0,12])
ax.xaxis.set_major_locator(locator)
plt.plot(df['Time'],df['Temp'], 'k--^')
plt.xticks(rotation=10)
plt.grid(axis='both',color='r')
plt.ylabel("Temp (DegC)")
plt.xlim(min(df['Time']),max(df['Time']))
plt.ylim(min(df['Temp'])-1,max(df['Temp'])+1)
plt.title("Forecast Temperature")
plt.show()

Try playing with the xticks function on plot. Below is a quick example I tried.
import matplotlib.pyplot as plt
x = [2,3,4,5,6,14,16,18,24,22,20]
y = [1, 4, 9, 6,5,6,7,8,9,5,7]
labels = ['12Am', '12Pm']
plt.plot(x, y, 'ro')
'12,24 are location on x-axis for labels'
plt.xticks((12,24),('12Am', '12Pm'))
plt.margins(0.2)
plt.show()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas read from CSV and plot the data - python

Related

Plotly, Python. Can i plot vertical line based on a datetime object?

Seaborn Bar Plot with Dates as X-Axis

Plotting two dataframe time-series on same graph with different sampling (and using double Y axis)

How to save a windrose plot

Matplotlib - Changing Xticks to value on the line not a data point

Categories

Resources