Incorrect xtick setting python - python

I've got an csv file, which contains five days of data, everything is set in few columns. The problem was that every measurement was set every 5 minutes, so during one day I have 288 rows, which for 5 days is 1440 and it goes like this (0:00, 0:05, 0:10 ...).
I used this code to plot everything in one plot, but somehow aranging xticks doesn't work properly.
Here is the code:
fig, ax = plt.subplots(1,1)
ax.set_xticks(x)
ax.set_xticklabels([v for v in data.Time], rotation=45)
ax.plot(x, data.Decfreq)
plt.xticks(np.arange(1, 1440, 60))
Plot I receive:
My data:
00:00 7.680827152169027 0.14000897718551028 7.600809170600135 0.23361947896117427
00:05 7.650820409080692 0.1564676061198724 7.530793436727354 0.2561764164383169
00:10 7.630815913688469 0.15549587808153068 7.540795684423466 0.2576230038042995
00:15 7.820858619914587 0.17966340911411277 7.540795684423466 0.28225658521669184
00:20 7.540795684423466 0.17165693216100902 7.50078669363902 0.2630767707044145
00:25 7.670824904472915 0.13538117325249963 7.390761968981794 0.24547505458369223
00:30 7.84086311530681 0.18094062831351296 7.630815913688469 0.26532083891716435
00:35 7.9608900876601485 0.14987576886445067 7.660822656776803 0.25499025558872285
00:40 7.200719262755675 0.12533028213451503 7.120701281186783 0.23856516035634334
Where only first (time) and second (data) columns interest me.
Implementing code by #Anwarvic I've got this:

As I understood from your comment, the problem is within the label, not the ticks themselves. You need to skip 60 from the data.Time and change set_xticklabels just like so:
fig, ax = plt.subplots(1,1)
ax.set_xticks(x)
values = data.Time.values
ax.set_xticklabels([values[i] for i in range(0, len(values), 60)], rotation=45)
ax.plot(data.Decfreq)
And there is no need for the plt.xticks() part as it's the same as the ax.set_xticks().
EDIT
I don't know how come your plot is far different than mine. Here is my code using this sample data that I created to look exactly like yours:
data = pd.read_csv('sample.csv')
x = np.arange(0, 1440, 60)
fig, ax = plt.subplots(1,1)
ax.set_xticks(x)
# ax.set_xticklabels([v for v in data.Time], rotation=45)
values = data.Time.values
ax.set_xticklabels([values[i] for i in range(0, len(values), 60)], rotation=45)
ax.plot(data.Decfreq)
plt.show()
And here is the plot:
So, my advice is to try changing my csv file with yours with few differences and see if it works

Related

Change matplotlib subplots to seperate plots

I have gathered a code to make plots from data from multiple days. I have a data file containing over 40 days and 19k timestamps, and I need a plot, one for each day. I want python to generate them as different plots.
Mr. T helped me a lot with providing the code, but I cannot manage the code to get it to plot individual plots instead of all in one subplot. Can somebody help me with this?
Picture shows the current output:
My code:
import matplotlib.pyplot as plt
import numpy as np
#read your data and create datetime index
df= pd.read_csv('test-februari.csv', sep=";")
df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S")
#group by date and hour, count entries
dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index()
dfcounts.columns = ["Date", "Hour", "Count"]
maxcount = dfcounts.Count.max()
#group by date for plotting
dfplot = dfcounts.groupby(dfcounts.Date)
#plot each day into its own subplot
fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8))
for i, groupdate in enumerate(dfplot.groups):
ax=axs[i]
#the marker is not really necessary but has been added in case there is just one entry per day
ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
ax.set_title(str(groupdate))
ax.set_xlim(0, 24)
ax.set_ylim(0, maxcount * 1.1)
ax.xaxis.set_ticks(np.arange(0, 25, 2))
plt.tight_layout()
plt.show()
Welcome to the Stackoverflow.
Instead of creating multiple subplots, you can create a figure on the fly and plot onto it in every loop separately. And at the end show all of them at the same time.
for groupdate in dfplot.groups:
fig = plt.figure()
plt.plot(groupdate.Hour, groupdate.Count, color="blue", marker="o")
plt.title(str(groupdate))
plt.xlim(0, 24)
plt.ylim(0, maxcount * 1.1)
plt.xticks(np.arange(0, 25, 2))
plt.tight_layout()
plt.show()

Adding a shaded box to a plot in python

I am looking to add a shaded box to my plot below. I want the box to go from Aug 25-Aug 30 and to run the length of the Y axis.
The following is my code for the two plots I have made...
df = pd.read_excel('salinity_temp.xlsx')
dates = df['Date']
sal = df['Salinity']
temp = df['Temperature']
fig, axes = plt.subplots(2, 1, figsize=(8,8), sharex=True)
axes[0].plot(dates, sal, lw=5, color="red")
axes[0].set_ylabel('Salinity (PSU)')
axes[0].set_title('Salinity', fontsize=14)
axes[1].set_title('Temperature', fontsize=14)
axes[1].plot(dates, temp, lw=5, color="blue")
axes[1].set_ylabel('Temperature (C)')
axes[1].set_xlabel('Dates, 2017', fontsize=12)
axes[1].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[0].xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%b %d'))
axes[1].xaxis_date()
axes[0].xaxis_date()
I want the shaded box to highlight when Hurricane Harvey hit Houston, Texas (Aug 25- Aug 30). My data looks like:
Date Salinity Temperature
20-Aug 15.88144647 31.64707184
21-Aug 18.83088846 31.43848419
22-Aug 19.51015264 31.47655487
23-Aug 23.41655369 31.198349
24-Aug 25.16410124 30.63014984
25-Aug 25.2273574 28.8677597
26-Aug 28.35557667 27.49458313
27-Aug 18.52829235 25.92834473
28-Aug 7.423231661 24.06635284
29-Aug 0.520394177 23.47881317
30-Aug 0.238508327 23.90857697
31-Aug 0.143210364 24.30892944
1-Sep 0.206473387 25.20442963
2-Sep 0.241343182 26.32663727
3-Sep 0.58000503 26.93431854
4-Sep 1.182055098 27.8212738
5-Sep 3.632014919 28.23947906
6-Sep 4.672006985 27.29686737
7-Sep 5.938766377 26.8693161
8-Sep 9.107671159 26.48963928
9-Sep 8.180587303 26.05213165
10-Sep 6.200532091 25.73104858
11-Sep 5.144526191 25.60035706
12-Sep 5.106032451 25.73139191
13-Sep 4.279492562 26.06132507
14-Sep 5.255868992 26.74919128
15-Sep 8.026764063 27.23724365
I have tried using the rectangle function in this link (https://discuss.analyticsvidhya.com/t/how-to-add-a-patch-in-a-plot-in-python/5518) however can't seem to get it to work properly.
Independent of your specific data, it sounds like you need axvspan. Try running this after your plotting code:
for ax in axes:
ax.axvspan('2017-08-25', '2017-08-30', color='black', alpha=0.5)
This will work if dates = df['Date'] is stored as type datetime64. It might not work with other datetime data types, and it won't work if dates contains date strings.

pandas dataframe recession highlighting plot

I have a pandas dataframe as shown in the figure below which has index as yyyy-mm,
US recession period (USREC) and timeseries varaible M1. Please see table below
Date USREC M1
2000-12 1088.4
2001-01 1095.08
2001-02 1100.58
2001-03 1108.1
2001-04 1 1116.36
2001-05 1 1117.8
2001-06 1 1125.45
2001-07 1 1137.46
2001-08 1 1147.7
2001-09 1 1207.6
2001-10 1 1166.64
2001-11 1 1169.7
2001-12 1182.46
2002-01 1190.82
2002-02 1190.43
2002-03 1194.85
2002-04 1186.82
2002-05 1186.9
2002-06 1194.55
2002-07 1199.26
2002-08 1183.7
2002-09 1197.1
2002-10 1203.47
I want to plot a chart in python that looks like the attached chart which was created in excel..
I have searched for various examples online, but none are able to show the chart like below. Can you please help? Thank you.
I would appreciate if there is any easier to use plotting library which has few inputs but easy to use for majority of plots similar to plots excel provides.
EDIT:
I checked out the example in the page https://matplotlib.org/examples/pylab_examples/axhspan_demo.html. The code I have used is below.
fig, axes = plt.subplots()
df['M1'].plot(ax=axes)
ax.axvspan(['USREC'],color='grey',alpha=0.5)
So I didnt see in any of the examples in the matplotlib.org webpage where I can input another column as axvspan range. In my code above I get the error
TypeError: axvspan() missing 1 required positional argument: 'xmax'
I figured it out. I created secondary Y axis for USREC and hid the axis label just like I wanted to, but it also hid the USREC from the legend. But that is a minor thing.
def plot_var(y1):
fig0, ax0 = plt.subplots()
ax1 = ax0.twinx()
y1.plot(kind='line', stacked=False, ax=ax0, color='blue')
df['USREC'].plot(kind='area', secondary_y=True, ax=ax1, alpha=.2, color='grey')
ax0.legend(loc='upper left')
ax1.legend(loc='upper left')
plt.ylim(ymax=0.8)
plt.axis('off')
plt.xlabel('Date')
plt.show()
plt.close()
plot_var(df['M1'])
There is a problem with Zenvega's answer: The recession lines are not vertical, as they should be. What exactly goes wrong, I am not entirely sure, but I show below how to get vertical lines.
My answer uses the following syntax ax.fill_between(date_index, y1=ymin, y2=ymax, where=True/False), where I compute the y1 and y2 arguments manually from the axis object and where the where argument takes the recession data as a boolean of True or False values.
import pandas as pd
import matplotlib.pyplot as plt
# get data: see further down for `string_data`
df = pd.read_csv(string_data, skipinitialspace=True)
df['Date'] = pd.to_datetime(df['Date'])
# convenience function
def plot_series(ax, df, index='Date', cols=['M1'], area='USREC'):
# convert area variable to boolean
df[area] = df[area].astype(int).astype(bool)
# set up an index based on date
df = df.set_index(keys=index, drop=False)
# line plot
df.plot(ax=ax, x=index, y=cols, color='blue')
# extract limits
y1, y2 = ax.get_ylim()
ax.fill_between(df[index].index, y1=y1, y2=y2, where=df[area], facecolor='grey', alpha=0.4)
return ax
# set up figure, axis
f, ax = plt.subplots()
plot_series(ax, df)
ax.grid(True)
plt.show()
# copy-pasted data from OP
from io import StringIO
string_data=StringIO("""
Date,USREC,M1
2000-12,0,1088.4
2001-01,0,1095.08
2001-02,0,1100.58
2001-03,0,1108.1
2001-04,1,1116.36
2001-05,1,1117.8
2001-06,1,1125.45
2001-07,1,1137.46
2001-08,1,1147.7
2001-09,1,1207.6
2001-10,1,1166.64
2001-11,1,1169.7
2001-12,0,1182.46
2002-01,0,1190.82
2002-02,0,1190.43
2002-03,0,1194.85
2002-04,0,1186.82
2002-05,0,1186.9
2002-06,0,1194.55
2002-07,0,1199.26
2002-08,0,1183.7
2002-09,0,1197.1
2002-10,0,1203.47""")
# after formatting, the data would look like this:
>>> df.head(2)
Date USREC M1
Date
2000-12-01 2000-12-01 False 1088.40
2001-01-01 2001-01-01 False 1095.08
See how the lines are vertical:
An alternative approach would be to use plt.axvspan() which would automatically calculate the y1 and y2values.

Changing x-axis labels to hours instead of the sample number

I currently have a dataset of 70,000 samples (sampled at 1Hz), and I am graphing it using MatPlotLib.
I am wondering how to change the x-axis labels to be in hours, instead of sample #.
The code that I am using today is as follows:
test = pd.read_csv("test.txt", sep='\t')
test.columns = ['TS', 'ppb', 'ppm']
test.head()
# The first couple minutes were with an empty container
# Then the apple was inserted into the container.
fig5 = plt.figure()
ax1 = fig5.add_subplot(111)
ax1.scatter(test.index, test['ppm'])
ax1.set_ylabel('(ppm)', color='b')
ax1.set_xlabel('Sampling Time', color='k')
ax2 = ax1.twinx()
ax2.scatter(test.index, test['ppb'], color = 'c')
ax2.set_ylabel('(ppb)', color='c')
plt.show
My data looks as follows:
If the data is sampled at 1Hz, that means that every 3600 samples is one hour. So create a new column like:
test['hours'] = (test.index - test.index[0])/3600.0

Changing axis on scatterplot to fixed intervals involving time

I have following code. my problem is I want to set the range of the y axis from 0:00 to 12:00 and have it equally spaced in increments of one. e.g. 0:00, 1:00, 2:00 etc. Any suggestions how I would go about doing this?
Also I also want to get rid of the extra :00 at the end of each number. As of right now it reads 00:00:00, 01:00:00 and so on when I only want it to read 0:00, 1:00 any ideas how I can go about doing this? here is the code I have so far.
import pandas as pd
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('data.csv', sep=',', header=None)
print (data)
ints = data[data[1]=='INT']
exts = data[data[1]=='EXT']
int_times = [datetime.datetime.time(datetime.datetime.strptime(t, '%H:%M')) for t in ints[4]]
ext_times = [datetime.datetime.time(datetime.datetime.strptime(t, '%H:%M')) for t in exts[4]]
int_dist = [d for d in ints[3]]
ext_dist = [d for d in exts[3]]
fig, ax = plt.subplots()
ax.scatter(int_dist, int_times, c='red', s=80)
ax.scatter(ext_dist, ext_times, c='blue', s=80)
plt.legend(['INT', 'EXT'], loc=4)
plt.xlabel('Distance')
plt.ylim(0,45000)
plt.show()
Well its possible to generate a list of time having only the minute and second. You need to change the format to '%M:%S'.
Next you need to change the labels using the plt.xticks(). I changed for x axis.
Here is a sample
start = datetime.combine(date.today(), time(0, 0))
axis_times = []
y_values = []
i = 0
while i<9:
start += timedelta(seconds=7)
axis_times.append(start.strftime("%M:%S"))
y_values.append(i)
i+=1
fig, ax = plt.subplots()
ax.scatter(range(len(axis_times)), y_values, c='red', s=80)
ax.scatter(range(len(axis_times)), y_values, c='blue', s=20)
plt.legend(['INT', 'EXT'], loc=4)
plt.xlabel('Distance')
plt.xticks(range(len(axis_times)), axis_times, size='small')
plt.show()
You can manually specify ticks to whatever you need. I can't run your example without the csv data but you can do,
import numpy as np
import pylab as plt
import datetime
#Some arbitrary data
x = np.linspace(0.,12.,100)
fig, ax = plt.subplots(1, 1)
ax.plot(x,np.sin(x)+6.)
#Set number of ticks to 12
ax.set_yticks(range(13))
#Relabel the ticks as needed
locs, labels = plt.yticks()
new_labels = [str(time) + ":00" for time in range(0,13)]
plt.yticks(locs, new_labels)
plt.show()
You can replace the new labels using datetime values or formatted strings which you obtain from you data (e.g. convert to string and remove the last 0)...

Categories