recognizing date variable in python - python

I am trying to plot two time series of data what are not in sequential date. So when I plot them they look weird. I don't know how I can fix it here is my code and figure. Thank you.
date=read_myfile['Date']
x=[dt.datetime.strptime(d,'%m/%d/%Y').date() for d in date]
y=read_myfile['Observed']
y1=read_myfile['Simulated']
plt.plot(x,y,color='blue');
plt.plot(x,y1,color='red');
plt.gcf().autofmt_xdate()
Two time series of data.

Related

same data produces different pandas plot

I created a graph using DOGE crypto data:
import pandas as pd
import csv
df2 = pd.read_csv("https://raw.githubusercontent.com/peoplecure/pandoras-box/master/doge.csv")
plt.plot(df2['begins_at'], df2['open_price'])
plt.show()
Above graph looks fine. But, when I try to create a graph using another method with the exact same data, the graph looks totally off
from pandas import DataFrame
df = DataFrame (DOGE_data)
plt.plot(df['begins_at'], df['open_price'])
plt.show()
Regrettably, I don't have a way to share the data in the second method. However, data used in the first graph was created by df. I was hoping if anyone has any idea what may be going on here.
The messed up y-axis could be the hint: Usually, with numerical data, there would be 4-12 y-axis label ticks and markers. Then, usually, with non-numerical data, there is one tick for each "category".
Check the data type of y-data in the second dataset: df['open_price'].dtype

How do I "reset the index" for a matplotlib plot?

I have the following code:
fig, ax = plt.subplots(1, 1)
calls["2016-12-24"].resample("1h").sum().plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().plot(ax=ax)
which generates the following image:
How can I make this so the lines share the x-axis? In other words, how do I make them not switch days?
If you don't care about using the correct datetime as index, you could just reset the index as you suggested for all the series. This is going to overlap all the time series, if this is what you're trying to achieve.
# the below should
calls["2016-12-24"].resample("1h").sum().reset_index("2016-12-24").plot(ax=ax)
calls["2016-12-25"].resample("1h").sum().reset_index("2016-12-25").plot(ax=ax)
calls["2016-12-26"].resample("1h").sum().reset_index("2016-12-26").plot(ax=ax)
Otherwise you should try as well to resample the three columns at the same time. Have a go with the below but not knowing how your original dataframe look like, I'm not sure this will fit your case. You should post some more information about the input dataframe.
# have a try with the below
calls[["2016-12-24","2016-12-25","2016-12-26"].resample('1h').sum().plot()

Pandas plotting: How to format datetimeindex?

I am doing a barplot out of a dataframe with a 15min datetimeindex over a couple of years.
Using this code:
df_Vol.resample(
'A',how='sum'
).plot.bar(
title='Sums per year',
style='ggplot',
alpha=0.8
)
Unfortunately the ticks on the X-axis are now shown with the full timestamp like this: 2009-12-31 00:00:00.
I would prefer to Keep the code for plotting short, but I couldn't find an easy way to format the timestamp simply to the year (2009...2016) for the plot.
Can someone help on this?
As it does not seem to be possible to Format the date within the Pandas df.plot(), I have decided to create a new dataframe and plot from it.
The solution below worked for me:
df_Vol_new = df_Vol.resample('A',how='sum')
df_Vol_new.index = df_Vol_new.index.format(formatter=lambda x: x.strftime('%Y'))
ax2 =df_Vol_new.plot.bar(title='Sums per year',stacked=True, style='ggplot', alpha=0.8)
I figured an alternative (better, at least to me) way is to add the following to df_Vol_new.plot() command:
plt.legend(df_Vol_new.index.to_period('A'))
This way you would reserve df_Vol_new.index datetime format while getting better plots at the same time.

How to output a large number of histograms in a pandas groupby

df is a dataframe with a days column. There are 100 days. I want to look at a histogram for my data column for each of the 100 days. The problem is that this code outputs everything on a single chart and all histograms are stacked together. Two questions:
Any advice to get one histogram for each day?
Any advice to save each histogram to an appropriately named file?
Note: When I replace hist in my code below with describe, it perfectly gives me 100 describe series. Also, the type of the grouper.get_group(days) object is pandas.series.
My simple code:
grouper = df.groupby('days')['data']
for days in grouper.groups.keys():
print grouper.get_group(days).hist()
One option would be to use inline plotting either in ipython qtconsole or ipython notebook:
%matplotlib inline
import matplotlib.pyplot as plt
for days in grouper.groups.keys():
grouper.get_group(days).hist()
plt.show()
Actually, if you are using the Ipython notebook, you can simply do:
df.groupby('days')['data'].hist()
Any function added to the end of the groupby will be fired for all groups in parallel, this is the strength of the groupby function.
No need to iterate

ggplot Bar Plot semantics

I am trying to use ggplot in Python for the first time and the semantics are completely unobvious to me.
I have a pandas dataframe with two columns: date and entries_sum. What I would like to do is plot a bar plot with the date column as each entry on the x-axis and entries_sum as the respective heights.
I cannot figure out how to do this with the ggplot API. Am I formatting my data wrong for this?
How about:
ggplot(aes(x='date', y='entries_sum'), data=data) + geom_bar(stat='identity')

Categories