How do I "reset the index" for a matplotlib plot? - python

I have the following code:
fig, ax = plt.subplots(1, 1)
calls["2016-12-24"].resample("1h").sum().plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().plot(ax=ax)
which generates the following image:
How can I make this so the lines share the x-axis? In other words, how do I make them not switch days?

If you don't care about using the correct datetime as index, you could just reset the index as you suggested for all the series. This is going to overlap all the time series, if this is what you're trying to achieve.
# the below should
calls["2016-12-24"].resample("1h").sum().reset_index("2016-12-24").plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().reset_index("2016-12-25").plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().reset_index("2016-12-26").plot(ax=ax)
Otherwise you should try as well to resample the three columns at the same time. Have a go with the below but not knowing how your original dataframe look like, I'm not sure this will fit your case. You should post some more information about the input dataframe.
# have a try with the below
calls[["2016-12-24","2016-12-25","2016-12-26"].resample('1h').sum().plot()

Related

same data produces different pandas plot

I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.
The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype

How can I loop through a list of elements and create time series plots in Python

Here is a sample of the data I'm working with WellAnalyticalData I'd like to loop through each well name and create a time series chart for each parameter with sample date on the x-axis and the value on the y-axis. I don't think I want subplots, I'm just looking for individual plots of each analyte for each well. I've used pandas to try grouping by well name and then attempting to plot, but that doesn't seem to be the way to go. I'm fairly new to python and I think I'm also having trouble figuring out how to construct the loop statement. I'm running python 3.x and am using the matplotlib library to generate the plots.
so if I understand your question correctly you want one plot for each combination of Well and Parameter. No subplots, just a new plot for each combination. Each plot should have SampleDate on the x-axis and Value on the y-axis. I've written a loop here that does just that, although you'll see that since in your data has just one date per well per parameter, the plots are just a single dot.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({'WellName':['A','A','A','A','B','B','C','C','C'],
'SampleDate':['2018-02-15','2018-03-31','2018-06-07','2018-11-14','2018-02-15','2018-11-14','2018-02-15','2018-03-31','2018-11-14'],
'Parameter':['Arsenic','Lead','Iron','Magnesium','Arsenic','Iron','Arsenic','Lead','Magnesium'],
'Value':[0.2,1.6,0.05,3,0.3,0.79,0.3,2.7,2.8]
})
for well in df.WellName.unique():
temp1 = df[df.WellName==well]
for param in temp1.Parameter.unique():
fig = plt.figure()
temp2 = temp1[temp1.Parameter==param]
plt.scatter(temp2.SampleDate,temp2.Value)
plt.title('Well {} and Parameter {}'.format(well,param))

Indexing my dataframe properly with pandas

I'm trying to plot a bargraph with errorbars acquired from my tests, i found some code on the internet on how to make it. But the code does not fit the way i want the table to look like.
I've tried leaving things out however i don't understand the dataframe enough to know what kind of code i need to process the data correctly.
order=pd.MultiIndex.from_arrays([['402515','402515','402515','402510','402510','402510'],
['z','z','z','z','z','z']],names=['letter','word'])
datas=pd.DataFrame({'first cracking strength':[em1,em2,em3,em4,em5,em6],'flexural strength':[en1,en2,en3,en4,en5,en6]},index=order)
gp4 = datas.groupby(level=('letter', 'word'))
means = gp4.mean()
errors = gp4.std()
print(means)
fig, ax = plt.subplots()
means.plot.bar(yerr=errors, ax=ax, capsize=4);
The multi-index code requires two labels (the 'z' and the '402515/402510', I only want the '402515/402510') on your dataset, but I only want one. What other code does that?
How it looks when I run the code.
How I want it to look.

Pandas plotting: How to format datetimeindex?

I am doing a barplot out of a dataframe with a 15min datetimeindex over a couple of years.
Using this code:
df_Vol.resample(
'A',how='sum'
).plot.bar(
title='Sums per year',
style='ggplot',
alpha=0.8
)
Unfortunately the ticks on the X-axis are now shown with the full timestamp like this: 2009-12-31 00:00:00.
I would prefer to Keep the code for plotting short, but I couldn't find an easy way to format the timestamp simply to the year (2009...2016) for the plot.
Can someone help on this?
As it does not seem to be possible to Format the date within the Pandas df.plot(), I have decided to create a new dataframe and plot from it.
The solution below worked for me:
df_Vol_new = df_Vol.resample('A',how='sum')
df_Vol_new.index = df_Vol_new.index.format(formatter=lambda x: x.strftime('%Y'))
ax2 =df_Vol_new.plot.bar(title='Sums per year',stacked=True, style='ggplot', alpha=0.8)
I figured an alternative (better, at least to me) way is to add the following to df_Vol_new.plot() command:
plt.legend(df_Vol_new.index.to_period('A'))
This way you would reserve df_Vol_new.index datetime format while getting better plots at the same time.

How to output a large number of histograms in a pandas groupby

df is a dataframe with a days column. There are 100 days. I want to look at a histogram for my data column for each of the 100 days. The problem is that this code outputs everything on a single chart and all histograms are stacked together. Two questions:
Any advice to get one histogram for each day?
Any advice to save each histogram to an appropriately named file?
Note: When I replace hist in my code below with describe, it perfectly gives me 100 describe series. Also, the type of the grouper.get_group(days) object is pandas.series.
My simple code:
grouper = df.groupby('days')['data']
for days in grouper.groups.keys():
print grouper.get_group(days).hist()
One option would be to use inline plotting either in ipython qtconsole or ipython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
for days in grouper.groups.keys():
grouper.get_group(days).hist()
plt.show()
Actually, if you are using the Ipython notebook, you can simply do:
df.groupby('days')['data'].hist()
Any function added to the end of the groupby will be fired for all groups in parallel, this is the strength of the groupby function.
No need to iterate

Categories