Plotting txt file with date, time and value data - python

I am pretty new to Python programing and have already been confronted with a problem that drives me insane. I kept on searching for the problem - even here at stack overflow. However, I didn't get any solution to solve my problem, which made me sign up for this site.
Nevertheless, this is my problem:
I have several txt files, that contain 3 columns. The first one can be neglected, the second one contains a mixture of date and time, separated with the letter "T" and the third column contains the value (pressure, temperature, what so ever).
Now, what I want to do is, to plot the second column (time and date) on the x axis and the value on the y axis.
I've tried MANY codes - also some are described here at stack overflow - but none of them was the one I was searching for and brought the right results.
More detailed, this is what a my txt files look like:
# MagPy ASCII
234536326.456,2014-06-17T14:23:00.000000,459.7463393940044
674346235.235,2014-06-17T14:28:00.000000,462.8783040474751
and so on.
Forget about the first column. Only the second and third one are relevant. So here, I guess, I have to skip the first line (and the first column), right?
HOWEVER - and here comes the part I cannot solve - with this "T" inside the second column, this becomes a string format.
One of my many errors I get is: could not convert string to float
Well, I searched stack overflow and came across the following code:
x, y = np.loadtxt('example.txt', dtype=int, delimiter=',',
unpack=True, usecols=(1,2))
plt.plot(x, y)
plt.title('example 1')
plt.xlabel('D')
plt.ylabel('Frequency')
plt.show()
I edited the "usecols" to 1 and 2, but with this code, I get the error: list index out of range
So, it doesn't matter what I do, I get an error any time. And the only thing I want is a plot (with matplotlib), that contains time and date on the x axis and the value (e.g. 459.7463393940044 from above) on the y axis.
And talking about what I need: At the end, I have to put several diagrams (about 4-6), that were generated with MANY txt file data, in one figure.
Please, can anyone help me with this? I'd appreciate your help a lot!

This is numpy datetime format. You need to add an explicit converter for the datetime field. The documentation contains additional format details. This question shows how to fill in the converters argument.
date, closep, highp, lowp, openp, volume =
np.loadtxt(f, delimiter=',', unpack=True,
converters={0:mdates.strpdate2num('%d-%b-%y')})
Is that enough to lead you to a full solution?

First, thanks for your response! Unfortunately, this doesn't work at all. I tried it with your converter and combined it with my code, but it didn't work out well. I tried then this code:
# Converter function
datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%d %m %Y %H %M %S'))
# Read data from 'file.dat'
dates, levels = np.genfromtxt('BMP085_10085001_0001_201508.txt', # Data to be read
delimiter=19, # First column is 19 characters wide
converters={1: datefunc}, # Formatting of column 0
dtype=float, # All values are floats
unpack=True) # Unpack to several variables
fig = plt.figure()
ax = fig.add_subplot(111)
# Configure x-ticks
ax.set_xticks(dates) # Tickmark + label at every plotted point
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y %H:%M'))
ax.plot_date(dates, levels, ls='-', marker='o')
ax.set_title('title')
ax.set_ylabel('Waterlevel (m)')
ax.grid(True)
# Format the x-axis for dates (label formatting, rotation)
fig.autofmt_xdate(rotation=45)
fig.tight_layout()
fig.show()
And I get an list index out of range error again. Seriously, I do not know any possible solution to make it look like that in the end: {Plot date and time (x axis) versus a value (y axis) using data from file} - see diagram on the bottom of the page.

I've had success doing the following. First, it's ok to read in your ISO8601 date as a string (so a basic read a line and split on comma will work). To convert the date string to a datetime object you can use
import dateutil
# code to read in date_strings and my_values as lists goes here ...
# Here's the magic to parse the ISO8601 strings and make them into datetime objects
my_dates = dateutil.parser.parse(date_strings)
# Now plot the results
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot_date(x=my_dates, y=my_values, marker='o', linestyle='')
ax.set_xlabel('Time')

Related

Time Series Chart: Groupby seasons (or specfic months) for multiple years in xarray

Thank you for taking interest in my question.
I am hoping to do plot a temperature time series chart specifically between the months January to August from 1981-1999.
Below are my codes and attempts:
temperature = xr.open_dataarray('temperature.nc')
temp = temperature.sel(latitude=slice(34.5,30), longitude=slice(73,78.5))
templatlonmean = temp.mean(dim=['latitude','longitude'])-273.15
tempgraph1 = templatlonmean.sel(time=slice('1981','1999'))
The above commands read in fine without any errors.
Below are my attempts to divide the months into seasons:
1st Attempt
tempseason1 = tempgraph1.groupby("time.season").mean("time")
#Plotting Graph Command
myfig, myax = plt.subplots(figsize=(14,8))
timeyears = np.unique(tempgraph1["time.season"])
tempseason1.plot.line('b-', color='red', linestyle='--',linewidth=4, label='1981-1999 Mean')
I got this error:
"Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead."
I tried this as my second attempt (retrieved from this post Select xarray/pandas index based on specific months)
However, I wasn't sure how can I plot a graph with this, so I tried the following:
def is_amj(month):
return (month >= 4) & (month <= 6)
temp_seasonal = tempgraph1.sel(time=is_amj(tempgraph1['time.month']))
#Plotting Graph Command
timeyears = np.unique(tempgraph1["time.season"])
temp_seasonal.plot.line('b-', color='red', linestyle='--',linewidth=4, label='1981-1999 Mean')
And it caused no error but the graph was not ideal
So I moved on to my 3rd attempt (from here http://xarray.pydata.org/en/stable/examples/monthly-means.html):
month_length = tempmean.time.dt.days_in_month
weights = month_length.groupby('time.season') / month_length.groupby('time.season').sum()
np.testing.assert_allclose(weights.groupby('time.season').sum().values, np.ones(4))
ds_weighted = (tempmean * weights).groupby('time.season').sum(dim='time')
ds_unweighted = tempmean.groupby('time.season').mean('time')
#Plot Commands
timeyears = np.unique(tempgraph1["time.season"])
ds_unweighted.plot.line('b-', color='red', linestyle='--',linewidth=4, label='1981-1999 Mean')
Still I got the same error as the 1st attempt:
"Plotting requires coordinates to be numeric, boolean, or dates of type numpy.datetime64, datetime.datetime, cftime.datetime or pandas.Interval. Received data of type object instead."
As I this command was used to plot weather maps rather than time series chart, however I believed the groupby process would be similar or the same even, thats's why I used it.
However, as I am relatively new in coding, please excuse any syntax errors and that I am not able to spot any obvious ways to go about this.
Therefore, I am wondering if you could suggest any other ways to plot specific monthly datas for xarray or if there's any adjustment I need to make for the commands I have attempted.
I greatly appreciate your generous help.
Please let me know if you need any more further information, I will respond as soon as possible.
Thank you!
About your issues 1. and 3., the object is the seasons of grouping.
You can visualize that by doing:
tempseason1 = tempgraph1.groupby("time.season").mean("time")
print(tempseason1.coords)
You should see something like:
Coordinates:
* lon (lon) float32 ...
* lat (lat) float32 ...
* season (season) object 'DJF' 'JJA' 'MAM' 'SON'
Notice the type object of season dimension.
I think you should use resample instead of groupby here.
Resample is basically a groupby to upsample or downsample time series.
It would look like:
tempseason1 = tempgraph1.resample(time="Q").mean("time")
The argument "Q" is a pandas offset for quarterly frequency, see there for details.
I don't know much about plotting though.

Setting personalized xticks (dates given in `int` format) in matplotlib without ruining the figure?

I have a text data file with a time series whose entries are in the form (This is the first column):
20000101
20000102
20000103
...
20001231
20010101
...
20151231
Using this int values results in the points being accumulated around the year with unequal spaces (This is logical since it will simply leave the corresponding gap between 20001231 to 20010101)
Now one solution to this is to use an array like this one (let's suppose I have the dates stored in an array called date):
xaxis= np.arange(0, len(date))
The problem is that, although the plot is correct, the x axis ticks are then labeled as 0,1,2,3...
I have been trying to modify the xticks, but whatever I do changes the whole figure resulting in a weird plot.
What is the best solution to this?
You have to pass the x data and the x ticks to the plt.xticks(xdata, xticks) function. As seen here.

Histogram of times from a CSV via Pandas

I am analysing race results from a CSV which looks like this:
Position,Time,Race #,Batch,Name,Surname,Category,Sex,Age
1,00:25:04,58,E,Luke,Schlebusch,Junior,Male,17
2,00:25:16,92,E,Anrich,Zimmermann,Junior,Male,17
3,00:26:27,147,E,Ryan,Mathaba,Open,Male,33
4,00:26:58,53,E,Daniel,Rademan,Junior,Male,16
5,00:27:17,19,E,Werner,Du Preez,Open,Male,29
6,00:27:44,148,E,Mazu,Ndandani,Open,Male,37
7,00:27:45,42,E,Dakota,Murphy,Open,Male,20
8,00:28:29,56,E,David,Schlebusch,Master,Male,51
9,00:28:32,52,E,Caleb,Rademan,Minimee,Male,12
I am using the following call to read_csv to parse this into a Pandas dataframe:
race1 = pandas.read_csv('data.csv', parse_dates='Time', index_col='Time')
This enables me to plot a cumulative distribution of race times very easily by just doing:
race1.Position.plot()
Pandas handles all the intricacies of the date data type and makes a nice x axis with proper formatting of the times.
Is there an elegant way of getting a histogram of times which is similarly straightforward? Ideally, I would like to be able to do race1.index.hist() or race1.index.to_series().hist(), but I know that doesn't work.
I've been able to coerce the time to a timedelta and get a working result with
times = race1.index.to_series()
((times - times[0]).dt.seconds/60).hist()
This produces a histogram of the correct shape, but obviously with wrong x values (they are off by the fastest time).
Is there an elegant way to read the column as a timedelta to begin with, and is there a better way of creating the histogram, including proper ticks? Proper ticks here mean that they use the correct locator and updates properly.
This appears to work pretty well, although I would be happier with it if it didn't go through the Matplotlib date specifics regarding ordinal dates.
times = race1.index.to_series()
today = pandas.Timestamp('00:00:00')
timedelta = times - today
times_ordinal = timedelta.dt.seconds/(24*60*60) + today.toordinal()
ax = times_ordinal.hist()
ax.xaxis_date()
plt.gcf().autofmt_xdate()
plt.ylabel('Number of finishers')

How to get multiple legends from multiple pandas plots

I've got two dataframes (both indexed on time), and I'd like plot columns from both dataframes together on the same plot, with legend as if there were two columns in the same dataframe.
If I turn on legend with one column, it works fine, but if I try to do both, the 2nd one overwrites the first one.
import pandas as pd
# Use ERDDAP's built-in relative time functionality to get last 48 hours:
start='now-7days'
stop='now'
# URL for wind data
url='http://www.neracoos.org/erddap/tabledap/E01_met_all.csv?\
station,time,air_temperature,barometric_pressure,wind_gust,wind_speed,\
wind_direction,visibility\
&time>=%s&time<=%s' % (start,stop)
# load CSV data into Pandas
df_met = pd.read_csv(url,index_col='time',parse_dates=True,skiprows=[1]) # skip the units row
# URL for wave data
url='http://www.neracoos.org/erddap/tabledap/E01_accelerometer_all.csv?\
station,time,mooring_site_desc,significant_wave_height,dominant_wave_period&\
time>=%s&time<=%s' % (start,stop)
# Load the CSV data into Pandas
df_wave = pd.read_csv(url,index_col='time',parse_dates=True,skiprows=[1]) # skip the units row
plotting one works fine:
df_met['wind_speed'].plot(figsize=(12,4),legend=True);
but if I try to plot both, the first legend disappears:
df_met['wind_speed'].plot(figsize=(12,4),legend=True)
df_wave['significant_wave_height'].plot(secondary_y=True,legend=True);
Okay, thanks to the comment by unutbu pointing me to essentially the same question (which I searched for but didn't find), I just need to modify my plot command to:
df_met['wind_speed'].plot(figsize=(12,4))
df_wave['significant_wave_height'].plot(secondary_y=True);
ax = gca();
lines = ax.left_ax.get_lines() + ax.right_ax.get_lines()
ax.legend(lines, [l.get_label() for l in lines])
and now I get this, which is what I was looking for:
Well. Almost. It would be nice to get the (right) and (left) on the legend to make it clear which scale was for which line. #unutbu to the rescue again:
df_met['wind_speed'].plot(figsize=(12,4))
df_wave['significant_wave_height'].plot(secondary_y=True);
ax = gca();
lines = ax.left_ax.get_lines() + ax.right_ax.get_lines()
ax.legend(lines, ['{} ({})'.format(l.get_label(), side) for l, side in zip(lines, ('left', 'right'))]);
produces:

Plotting Pandas DataFrames as single days on the x-axis in Python/Matplotlib

I've got data like this:
col1 ;col2
2001-01-01;1
2001-01-01;2
2001-01-02;3
2001-01-03;4
2001-01-03;2
2001-01-04;2
I'm reading it in Python/Pandas using pd.read_csv(...) into a DataFrame.
Now I want to plot col2 on the y-axis and col1 on the x-axis day-wise. I searched a lot but couldn't too many very useful pages describing this in detail. I found that matplotlib does currently NOT support the dataformat in which the dates are stored in (datetime64).
I tried converting it like this:
fig, ax = plt.subplots()
X = np.asarray(df['col1']).astype(DT.datetime)
xfmt = mdates.DateFormatter('%b %d')
ax.xaxis.set_major_formatter(xfmt)
ax.plot(X, df['col2'])
plt.show()
but this does NOT work.
What is the best way?
I can only find bits there and bits there, but nothing really working in complete and more importantly, up-to-date ressources related to this functionality for the latest version of pandas/numpy/matplotlib.
I'd also be interested to convert this absolut dates to consecutive day-indices, i.e:
The starting day 2001-01-01 is Day 1, thus the data would look like this:
col1 ;col2 ; col3
2001-01-01;1;1
2001-01-01;2;1
2001-01-02;3;2
2001-01-03;4;3
2001-01-03;2;3
2001-01-04;2;4
.....
2001-02-01;2;32
Thank you very much in advance.
Pandas.read_csv supports parse_dates=True (default of course is False) That would save you converting the dates separately.
Also for a simple dataframe like this, pandas plot() function works perfectly well.
Example:
dates = pd.date_range('20160601',periods=4)
dt = pd.DataFrame(np.random.randn(4,1),index=dates,columns=['col1'])
dt.plot()
plt.show()
Ok as far as I can see there's no need anymore to use matplotlib directly, but instead pandas itself already offer plotting functions which can be used as methods to the dataframe-objects, see http://pandas.pydata.org/pandas-docs/stable/visualization.html. These functions themselves use matplotlib, but are easier to use because they handle the datatypes correctly themselves :-)

Categories