I currently have a dataset of 70,000 samples (sampled at 1Hz), and I am graphing it using MatPlotLib.
I am wondering how to change the x-axis labels to be in hours, instead of sample #.
The code that I am using today is as follows:
test = pd.read_csv("test.txt", sep='\t')
test.columns = ['TS', 'ppb', 'ppm']
test.head()
# The first couple minutes were with an empty container
# Then the apple was inserted into the container.
fig5 = plt.figure()
ax1 = fig5.add_subplot(111)
ax1.scatter(test.index, test['ppm'])
ax1.set_ylabel('(ppm)', color='b')
ax1.set_xlabel('Sampling Time', color='k')
ax2 = ax1.twinx()
ax2.scatter(test.index, test['ppb'], color = 'c')
ax2.set_ylabel('(ppb)', color='c')
plt.show
My data looks as follows:
If the data is sampled at 1Hz, that means that every 3600 samples is one hour. So create a new column like:
test['hours'] = (test.index - test.index[0])/3600.0
Related
I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?
IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I have data for all the time I've spent coding. This data is represented as a dictionary where the key is the date and the value is a list of tuples containing the time I started a coding session and how long the coding session lasted.
I have successfully plotted this on a broken_barh using the below code, where the y-axis is the date, the x-axis is the time in that day and each broken bar is an individual session.
for i,subSessions in enumerate(sessions.values()):
plt.broken_barh(subSessions, (i,1))
months = {}
start = getStartMonth()
for month in period_range(start=start,end=datetime.today(),freq="M"):
month = str(month)
months[month] = (datetime.strptime(month,'%Y-%m')-start).days
plt.yticks(list(months.values()),months.keys())
plt.xticks(range(0,24*3600,3600),[str(i)+":00" for i in range(24)],rotation=45)
plt.gca().invert_yaxis()
plt.show()
I want to use this data to discover what times of the day I spend the most time coding, but it isn't very clear from the above chart so I'd like to display it as a line graph or heatmap where the y-axis is the number of days I spent coding at the time on the x-axis (or, in other words, how many sessions are present in that column of the above chart). How do I accomplish this?
You can find some great examples of how to create a heatmap from matplotlib website.
Here is a basic code with some random data:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
index_labels = np.arange(0,24)
column_labels = pd.date_range(start='1/1/2022', end='1/31/2022').strftime('%m/%d')
#random data
np.random.seed(12345)
data = np.random.randint(0,60, size=(len(index_labels), len(column_labels)))
df = pd.DataFrame(data=data, columns=column_labels, index=index_labels)
#heatmap function
def heatmap(df, ax, cbarlabel="", cmap="Greens", label_num_dec_place=0):
df = df.copy()
# Ploting a blank heatmap
im = ax.imshow(df.values, cmap)
# create a customized colorbar
cbar = ax.figure.colorbar(im, ax=ax, fraction=0.05, extend='both', extendfrac=0.05)
cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom", fontsize=14)
# Setting ticks
ax.set_xticks(np.arange(df.shape[1]), labels=df.columns, fontsize=12)
ax.set_yticks(np.arange(df.shape[0]), labels=list(df.index), fontsize=12)
# proper placement of ticks
ax.tick_params(axis='x', top=True, bottom=False,
labeltop=True, labelbottom=False)
ax.spines[:].set_visible(False)
ax.grid(which="both", visible="False", color="white", linestyle='solid', linewidth=2)
ax.grid(False)
# Rotation of tick labels
plt.setp(ax.get_xticklabels(), rotation=-60,
ha="right", rotation_mode=None)
plt.setp(ax.get_yticklabels(), rotation=30)
#plotting and saving
fig, ax = plt.subplots(facecolor=(1,1,1), figsize=(20,8), dpi=200)
heatmap(df=df, ax=ax, cbarlabel="time (min)", cmap="Greens", label_num_dec_place=0)
plt.savefig('time_heatmap.png',
bbox_inches='tight',
facecolor=fig.get_facecolor(),
transparent=True,
)
Output:
One way to do it is to use sampling. Choose how many samples you want to take in a given interval (the precision, for example 288 samples per day) and split each interval by that number of samples and count how many sessions are within this sample. The downside to this is that it can't be 100% precise and increasing the precision increases the time it takes to generate (for me, it takes several minutes to generate a second-precise image, though this level of precision makes little to no difference to the result).
Here is some code which can produce both a heatmap and a line graph
# Configuration options
precisionPerDay = 288
timeTicksPerDay = 24
timeTickRotation = 60
timeTickFontSize = 6
heatmap = True
# Constants
hoursInDay = 24
secondsInHour = 3600
secondsInDay = hoursInDay*secondsInHour
xInterval = secondsInDay/precisionPerDay
timeTickSecondInterval = precisionPerDay/timeTicksPerDay
timeTickHourInterval = hoursInDay/timeTicksPerDay
# Calculating x-axis (time) ticks
xAxis = range(precisionPerDay)
timeTickLabels = []
timeTickLocations = []
for timeTick in range(timeTicksPerDay):
timeTickLocations.append(int(timeTick*timeTickSecondInterval))
hours = timeTick/timeTicksPerDay*hoursInDay
hour = int(hours)
minute = int((hours-hour)*60)
timeTickLabels.append(f"{hour:02d}:{minute:02d}")
# Calculating y-axis (height)
heights = []
for dayX in xAxis:
rangeStart = dayX*xInterval
rangeEnd = rangeStart+xInterval
y = 0
for date,sessions in sessions.items():
for session in sessions:
if session[0] < rangeEnd and session[0]+session[1] > rangeStart:
y += 1
heights.append(y)
# Plotting data
if heatmap:
plt.yticks([])
plt.imshow([heights], aspect="auto")
else:
plt.plot(xAxis,heights)
plt.ylim(ymin=0)
plt.xlim(xmin=0,xmax=len(heights))
plt.xlabel("Time of day")
plt.ylabel("How often I've coded at that time")
plt.xticks(timeTickLocations,timeTickLabels,
fontsize=timeTickFontSize,rotation=timeTickRotation)
plt.show()
And here are some sample results
Graph produced by same configuration options shown in above code
Same data but as a line graph with a lower precision (24 per day) and more time ticks (48)
I have gathered a code to make plots from data from multiple days. I have a data file containing over 40 days and 19k timestamps, and I need a plot, one for each day. I want python to generate them as different plots.
Mr. T helped me a lot with providing the code, but I cannot manage the code to get it to plot individual plots instead of all in one subplot. Can somebody help me with this?
Picture shows the current output:
My code:
import matplotlib.pyplot as plt
import numpy as np
#read your data and create datetime index
df= pd.read_csv('test-februari.csv', sep=";")
df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S")
#group by date and hour, count entries
dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index()
dfcounts.columns = ["Date", "Hour", "Count"]
maxcount = dfcounts.Count.max()
#group by date for plotting
dfplot = dfcounts.groupby(dfcounts.Date)
#plot each day into its own subplot
fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8))
for i, groupdate in enumerate(dfplot.groups):
ax=axs[i]
#the marker is not really necessary but has been added in case there is just one entry per day
ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
ax.set_title(str(groupdate))
ax.set_xlim(0, 24)
ax.set_ylim(0, maxcount * 1.1)
ax.xaxis.set_ticks(np.arange(0, 25, 2))
plt.tight_layout()
plt.show()
Welcome to the Stackoverflow.
Instead of creating multiple subplots, you can create a figure on the fly and plot onto it in every loop separately. And at the end show all of them at the same time.
for groupdate in dfplot.groups:
fig = plt.figure()
plt.plot(groupdate.Hour, groupdate.Count, color="blue", marker="o")
plt.title(str(groupdate))
plt.xlim(0, 24)
plt.ylim(0, maxcount * 1.1)
plt.xticks(np.arange(0, 25, 2))
plt.tight_layout()
plt.show()
I am trying to write a define function to plot a line graph by the data of a imported a csv file.
This a small sample of my data( temperature reading for every minutes):-
00:01:00.0305040, 35.35985
00:02:00.0438094, 35.48547
00:03:00.0571148, 35.65295
00:04:00.0704203, 35.90417
00:05:00.0837257, 36.23914
.
.
.
.
08:52:07.2370729, 74.92772
08:53:07.2503783, 75.01146
08:54:07.2648837, 75.05333
08:55:07.2781891, 75.0952
08:56:07.2914945, 75.0952
When I try to set the x ticker to be appear every hour, they do not show up in the plotted graph.
This is my code
df = pd.read_csv(file,names=["time", "temp"])
df["time"]=pd.to_datetime(df["time"])
df=df.set_index('time')
df.index = df.index.map (lambda t: t.strftime('%H:%M'))
print(df)
fig, ax = plt.subplots()
df.plot(ax = ax, color = 'black', linewidth = 0.4, x_compat=True)
ax.set(xlabel='Time (Hour:Minutes)', ylabel='Temperature (Celsius)')
ax.xaxis.set_major_locator(mdates.HourLocator(interval = 1))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
fig.autofmt_xdate()
return plt.show()
I have tried labeling the x tickers manually
plt.xticks(['0:00', '1:00', '2:00', '3:00', '4:00:0', '5:00', '6:00:0', '7:00', '8:00', '9:00', '10:00'])
and it worked, but it there a way for any given case?
According to the official documentation
All of plotting functions expect np.array or np.ma.masked_array as input. Classes that are 'array-like' such as pandas data objects and np.matrix may or may not work as intended. It is best to convert these to np.array objects prior to plotting.
So I changed your code slightly (basically converted the pd df into numpy array).
df = pd.read_csv(file,names=["time", "temp"])
df["time"]=pd.to_datetime(df["time"])
x_axis = np.array(df.time.values)
y_axis = np.array(df.temp.values)
fig, ax = plt.subplots()
ax.plot(x_axis,y_axis)
ax.set(xlabel='Time (Hour:Minutes)', ylabel='Temperature (Celsius)')
ax.xaxis.set_major_locator(mdates.HourLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.show()
The ticks are visible now as below.
I'm trying to display to curves. The abscesses are dates and the ordinates are double value (in this case power).
The data are not provided with the same dates. But when two dates matches, the second set of data added is stacked on the previous one.
Example 1: FR is added after DE and has 4 times less data
Example 2: DE is added after FR and has 4 times more data.
The code i'm currently running is :
# Clean figure
fig = plt.figure()
for country in ['DE', 'FR']:
production = getProduction(
country=country,
start=start,
end=end,
session=session,
verbose=False,
debug=False)
allTimeseries = production['all']['timeseries']
print(allTimeseries)
timestamps = []
values = []
for date in allTimeseries.keys():
timestamps.append(date)
values.append(allTimeseries[date]['power']['quantity'])
# Add the plot to the figure
plt.plot_date(timestamps, values, label=country, antialiased=True)
plt.xticks(rotation=30, ha="right")
plt.legend(loc='upper left', ncol=1)
# plt.show()
plt.tight_layout()
plt.savefig('test.png', dpi=fig.dpi)
How to prevent the two series to stack ?