I'm plotting a stacked line chart from a pandas dataframe. The data was collected irregularly over the course of two days. In the image below, you can see that the time change between equal intervals varies (~7 hours to ~36 hours between equally spaced intervals). I don't want this to happen, I want points on the graph to be stretched and squeezed appropriately such that time scales linearly with the x-axis. How can I do this?
The data was read in as follows:
df = pd.read_csv("filepath", index_col=0)
df = df.T
Above, I had to transpose the dataframe for the pandas stacked line plot to work as I wanted it to. The plot was produced as follows:
plot = df.plot.area(rot=90)
plot.axhline(y=2450, color="black")
In response to ImportanceOfBeingErnest, here is a minimal, complete, and verifiable example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as mpl
dateTimeIndex = ["04.12.17 23:03", "05.12.17 00:09", "05.12.17 21:44", "05.12.17 22:34", "08.12.17 16:23"]
d = {'one' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'two' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'three' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex)}
df = pd.DataFrame(d)
plot = df.plot.area(rot=90)
Here is what the dataframe looks like (random values will vary):
one three two
04.12.17 23:03 0.472832 0.283329 0.739657
05.12.17 00:09 3.166099 1.065015 0.561079
05.12.17 21:44 0.209190 0.674236 0.143453
05.12.17 22:34 1.275056 0.764328 0.650507
08.12.17 16:23 0.764038 0.265599 0.342435
and the plot produced:
As you can tell, the dateTimeIndex entries are rather random but they are given equal spacing on the x-axis. I don't mind if the tick marks coincide with the data points. I only want time to scale linearly. How can this be achieved?
Whats happening above is pandas is just using the strings as the x-ticks. You need to make the dateTimeIndex a datetime object:
dateTimeIndex = pd.to_datetime( ["04.12.17 23:03", "05.12.17 00:09",
"05.12.17 21:44", "05.12.17 22:34", "08.12.17 16:23"])
d = {'one' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'two' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'three' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex)}
df = pd.DataFrame(d)
plot = df.plot.area(rot=90)
Related
I want my matplotlib plot to display my df's DateTimeIndex as consecutive count data (in seconds) on the x-axis and my df's Load data on the y axis. Then I want to overlap it with a scipy.signal find_peaks result (which has an x-axis of consecutive seconds). My data is not consecutive (real world data), though it does have a frequency of seconds.
Code
import pandas as pd
import matplotlib.pyplot as plt
from scipy import signal
import numpy as np
# Create Sample Dataset
df = pd.DataFrame([['2020-07-25 09:26:28',2],['2020-07-25 09:26:29',10],['2020-07-25 09:26:32',203],['2020-07-25 09:26:33',30]],
columns = ['Time','Load'])
df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index("Time")
print(df)
# Try to solve the problem
rng = pd.date_range(df.index[0], df.index[-1], freq='s')
print(rng)
peaks, _ = signal.find_peaks(df["Load"])
plt.plot(rng, df["Load"])
plt.plot(peaks, df["Load"][peaks], "x")
plt.plot(np.zeros_like(df["Load"]), "--", color="gray")
plt.show()
This code does not work because rng has a length of 6, while the df has a length of 4. I think I might be going about this the wrong way entirely. Thoughts?
You are really close - I think you can get what you want by reindexing your df with your range. For instance:
df = df.reindex(rng).fillna(0)
peaks, _ = signal.find_peaks(df["Load"])
...
Does that do what you expect?
I'm using seaborn to do a line plot.
here's a sample data:
error_mean.head(5)
output is below:
error_rate
10 0.829440
20 0.833747
30 0.835182
40 0.837922
50 0.835835
so the index values are indeed ordered (or at least it seems like).
here's my code plotting the above data:
plt.figure(figsize=(15,5))
sns.lineplot(x=error_mean.index.values, y=error_mean['error_rate'])
and i keep getting a plot like following:
as you can see, the x-axis values are so out of order! i tried googling into this but i couldnt find similar issues answered.
appreciate any help!
I guess the issue is that error_mean.index.values is a Series of type str. You need to convert it as int.
Check the difference between:
import pandas as pd
import seaborn as sns
import matplotlib as plt
df1 = pd.DataFrame([
["10", 0.829440],
["20", 0.833747],
["100", 0.835182],
["40" , 0.837922],
["50", 0.835835]])
sns.lineplot(x=df1[0], y=df1[1])
and
df1 = pd.DataFrame([
["10", 0.829440],
["20", 0.833747],
["100", 0.835182],
["40" , 0.837922],
["50", 0.835835]])
sns.lineplot(x=(df1[0]).astype(int), y=df1[1])
So I will try:
plt.figure(figsize=(15,5))
sns.lineplot(x=error_mean.index.values.astype(int), y=error_mean['error_rate'])
I'm trying to plot the x-axis from the top row of my dataframe, and the y-axis from another row in my dataframe.
My dataframe looks like this:
sector_data =
Time 13:00 13:15 13:30 13:45
Utilities 1235654 1456267 1354894 1423124
Transports 506245 554862 534685 524962
Telecomms 142653 153264 162357 154698
I've tried a lot of different things, with this seeming to make the most sense. But nothing works:
sector_data.plot(kind='line',x='Time',y='Utilities')
plt.show()
I keep getting:
KeyError: 'Time'
It should end up looking like this:
Expected Chart
enter image description here
Given the little information you provide I believe this should help:
df = sector_data.T
df.plot(kind='line',x='Time',y='Utilities')
plt.show()
This is how I made a case example (I have already transposed the dataframe)
import pandas as pd
import matplotlib.pyplot as plt
a = {'Time':['13:00','13:15','13:30','13:45'],'Utilities':[1235654,1456267,1354894,1423124],'Transports':[506245,554862,534685,524962],'Telecomms':[142653,153264,162357,154698]}
df = pd.DataFrame(a)
df.plot(kind='line',x='Time',y='Utilities')
plt.show()
Output:
Let's take an example DataFrame:
import pandas as pd
df = pd.DataFrame({'ColA':['Time','Utilities','Transports','Telecomms'],'ColB':['13:00', 1235654, 506245, 142653],'ColC':['14:00', 1234654, 506145, 142650], 'ColD':['15:00', 4235654, 906245, 142053],'ColE':['16:00', 4205654, 906845, 742053]})
df = df.set_index('ColA') #set index for the column A or the values you want to plot for
Now you can easily plot with matplotlib
plt.plot(df.loc['Time'].values,df.loc['Utilities'].values)
I am trying to create a line plot in order of time. For the df below, the first value appears at 07:00:00 and finishes at 00:00:40.
But the timestamps aren't assigned to the x-axis and the row after midnight is plotted first, instead of last.
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'Time' : ['7:00:00','10:30:00','12:40:00','16:25:00','18:30:00','22:40:00','00:40:00'],
'Value' : [1,2,3,4,5,4,10],
})
df = pd.DataFrame(d)
df['Time'] = pd.to_timedelta(df['Time'])
plt.plot(df['Time'], df['Value'])
plt.show()
print(df)
Your timedelta object is being converted to a numerical representation by matplotlib. That's why you aren't getting a date on your x axis. And the plot is going in order. It's just that '00:40:00' is less than all the other times so it's being plotted as the left most point.
What you can do instead is use a datetime format to include days, which will indicate that 00:40:00 should come last since it'll fall on the next day. You can also use pandas plotting method for easier formatting:
d = ({
'Time' : ['2019/1/1 7:00:00','2019/1/1 10:30:00','2019/1/1 12:40:00',
'2019/1/1 16:25:00','2019/1/1 18:30:00','2019/1/1 22:40:00',
'2019/1/2 00:40:00'],
'Value' : [1,2,3,4,5,4,10],
})
df = pd.DataFrame(d)
df['Time'] = pd.to_datetime(df['Time'])
df.plot(x='Time', y='Value')
Update
To set the tick/tick lables at your time points is a bit tricky. This post will give you an idea of how the positioning works. Basically, you'll need to use something like matplotlib.dates.date2num to get the numerical representation of datetime:
xticks = [matplotlib.dates.date2num(x) for x in df['Time']]
xticklabels = [x.strftime('%H:%M') for x in df['Time']]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
One year of sample data:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
I want to boxplot these data side-by-side grouped by the month (i.e., two boxes per month, one for A and one for B).
For a single column sns.boxplot(df.index.month, df["A"]) works fine. However, sns.boxplot(df.index.month, df[["A", "B"]]) throws an error (ValueError: cannot copy sequence with size 2 to array axis with dimension 365). Melting the data by the index (pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")) in order to use seaborn's hue property as a workaround doesn't work either (TypeError: unhashable type: 'DatetimeIndex').
(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.)
Edit
I found a workaround that basically produces what I want. However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. So if there is a more elegant/direct way to do it, please share!
df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)
Produces:
here's a solution using pandas melting and seaborn:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
"B": rnd.randn(n)+1,
"C": rnd.randn(n) + 10, # will not be plotted
},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)
month_dfs = []
for group in df.groupby(df.index.month):
month_dfs.append(group[1])
plt.figure(figsize=(30,5))
for i,month_df in enumerate(month_dfs):
axi = plt.subplot(1, len(month_dfs), i + 1)
month_df.plot(kind='box', subplots=False, ax = axi)
plt.title(i+1)
plt.ylim([-4, 4])
plt.show()
Will give this
Not exactly what you're looking for but you get to keep a readable DataFrame if you add more variables.
You can also easily remove the axis by using
if i > 0:
y_axis = axi.axes.get_yaxis()
y_axis.set_visible(False)
in the loop before plt.show()
This is quite straight-forward using Altair:
alt.Chart(
df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
).mark_boxplot(
extent='min-max'
).encode(
alt.X('variable:N', title=''),
alt.Y('value:Q'),
column='month:N',
color='variable:N'
)
The code above melts the DataFrame and adds a month column. Then Altair creates box-plots for each variable broken down by months as the plot columns.