Bar Plot with recent dates left where date is datetime index - python

I tried to sort the dataframe by datetime index and then plot the graph but no change still it was showing where latest dates like 2017, 2018 were in right and 2008, 2009 were left.
I wanted the latest year to come left and old to the right.
This was the dataframe earlier.
Title
Date
2001-01-01 0
2002-01-01 9
2003-01-01 11
2004-01-01 17
2005-01-01 23
2006-01-01 25
2007-01-01 51
2008-01-01 55
2009-01-01 120
2010-01-01 101
2011-01-01 95
2012-01-01 118
2013-01-01 75
2014-01-01 75
2015-01-01 3
2016-01-01 35
2017-01-01 75
2018-01-01 55
Ignore the values.
Then I sort the above dataframe by index, and then plot but still no change in plots
df.sort_index(ascending=False, inplace=True)

Use invert_xaxis():
df.plot.bar().invert_xaxis()
If you don't want to use Pandas plotting:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plt.bar(df.index, df['Title'])
ax.invert_xaxis()

I guess you've not change your index to year. This is why it is not working.you can do so by:
df.index = pd.to_datetime(df.Date).dt.year
#then sort index in descending order
df.sort_index(ascending = False , inplace = True)
df.plot.bar()

Related

Group datetime column with hourly granularity by days

How can I group the following data frame (with an hourly granularity in the date column)
import pandas as pd
import numpy as np
np.random.seed(42)
date_rng = pd.date_range(start='1/1/2018', end='1/03/2018', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))
print(df.head())
date data
0 2018-01-01 00:00:00 51
1 2018-01-01 01:00:00 92
2 2018-01-01 02:00:00 14
3 2018-01-01 03:00:00 71
4 2018-01-01 04:00:00 60
by day, to calculate min and max values per day?
Use DataFrame.resample:
print(df.resample('d', on='date')['data'].agg(['min','max']))
min max
date
2018-01-01 1 99
2018-01-02 2 91
2018-01-03 72 72
You can also specify columns names:
df1 = df.resample('d', on='date')['data'].agg([('min_data', 'min'),('max_data','max')])
print (df1)
min_data max_data
date
2018-01-01 1 99
2018-01-02 2 91
2018-01-03 72 72
Another solution with Grouper:
df1 = (df.groupby(pd.Grouper(freq='d', key='date'))['data']
.agg([('min_data', 'min'),('max_data','max')]))

Resample with categories in pandas, keep non-numerical columns

I have hourly data, of variable x for 3 types, and Category column, and ds is set as index.
> df
ds Category X
2010-01-01 01:00:00 A 32
2010-01-01 01:00:00 B 13
2010-01-01 01:00:00 C 09
2010-01-01 02:00:00 A 12
2010-01-01 02:00:00 B 62
2010-01-01 02:00:00 C 12
I want to resample it to Week. But if I use df2 = df.resample('W').mean(), it simply drops 'Category' Column.
If need resample per Category column per weeks add groupby, so is using DataFrameGroupBy.resample:
Notice:
For correct working is necessary DatetimeIndex.
df2 = df.groupby('Category').resample('W').mean()
print (df2)
X
Category ds
A 2010-01-03 22.0
B 2010-01-03 37.5
C 2010-01-03 10.5
To complete the answer by jezrael, I found it useful to put the content back as a DataFrame instead of a DataFrameGroup, as explained here. So, the answer will be:
df2 = df.groupby('Category').resample('W').mean()
# the inverse of groupby, reset_index
df2 = df2.reset_index()
# set again the timestamp as index
df2 = df2.set_index("ds")

Split rows in a column and plot graph for a dataframe. Python

My data set contains the data of days and hrs
time slot hr_slot location_point
2019-01-21 00:00:00 0 34
2019-01-21 01:00:00 1 564
2019-01-21 02:00:00 2 448
2019-01-21 03:00:00 3 46
.
.
.
.
2019-01-22 23:00:00 23 78
2019-01-22 00:00:00 0 34
2019-01-22 01:00:00 1 165
2019-01-22 02:00:00 2 65
2019-01-22 03:00:00 3 156
.
.
.
.
2019-01-22 23:00:00 23 78
The data set conatins 7 days. that is 7*24 row. How to plot the graph for the dataset above.
hr_slot on the X axis : (0-23 hours)
loaction_point on Y axis : (location_point)
and each day should have different color on the graph: (Day1: color1, Day2:color2....)
Consider pivoting your data first:
# Create normalized date column
df['date'] = df['time slot'].dt.date.astype(str)
# Pivot
piv = df.pivot(index='hr_slot', columns='date', values='location_point')
piv.plot()
Update
To filter which dates are plotted, using loc or iloc:
# Exclude first and last day
piv.iloc[:, 1:-1].plot()
# Include specific dates only
piv.loc[:, ['2019-01-21', '2019-01-22']].plot()
Alternate approach using pandas.crosstab instead:
(pd.crosstab(df['hr_slot'],
df['time slot'].dt.date,
values=df['location_point'],
aggfunc='sum')
.plot())

best way to fill up gaps by yearly dates in Python dataframe

all, I'm newbie to Python and am stuck with the problem below. I have a DF as:
ipdb> DF
asofdate port_id
1 2010-01-01 76
2 2010-04-01 43
3 2011-02-01 76
4 2013-01-02 93
5 2017-02-01 43
For the yearly gaps, say 2012, 2014, 2015, and 2016, I'd like to fill in the gap using the new year date for each of the missing years, and port_id from previous year. Ideally, I'd like:
ipdb> DF
asofdate port_id
1 2010-01-01 76
2 2010-04-01 43
3 2011-02-01 76
4 2012-01-01 76
5 2013-01-02 93
6 2014-01-01 93
7 2015-01-01 93
8 2016-01-01 93
9 2017-02-01 43
I tried multiple approaches but still no avail. Could some expert shed me some lights on how to make it work out? Thanks much in advance!
You can use set.difference with range to find missing dates and then append a dataframe:
# convert to datetime if not already converted
df['asofdate'] = pd.to_datetime(df['asofdate'])
# calculate missing years
years = df['asofdate'].dt.year
missing = set(range(years.min(), years.max())) - set(years)
# append dataframe, sort and front-fill
df = df.append(pd.DataFrame({'asofdate': pd.to_datetime(list(missing), format='%Y')}))\
.sort_values('asofdate')\
.ffill()
print(df)
asofdate port_id
1 2010-01-01 76.0
2 2010-04-01 43.0
3 2011-02-01 76.0
1 2012-01-01 76.0
4 2013-01-02 93.0
2 2014-01-01 93.0
3 2015-01-01 93.0
0 2016-01-01 93.0
5 2017-02-01 43.0
I would create a helper dataframe, containing all the year start dates, then filter out the ones where the years match what is in df, and finally merge them together:
# First make sure it is proper datetime
df['asofdate'] = pd.to_datetime(df.asofdate)
# Create your temporary dataframe of year start dates
helper = pd.DataFrame({'asofdate':pd.date_range(df.asofdate.min(), df.asofdate.max(), freq='YS')})
# Filter out the rows where the year is already in df
helper = helper[~helper.asofdate.dt.year.isin(df.asofdate.dt.year)]
# Merge back in to df, sort, and forward fill
new_df = df.merge(helper, how='outer').sort_values('asofdate').ffill()
>>> new_df
asofdate port_id
0 2010-01-01 76.0
1 2010-04-01 43.0
2 2011-02-01 76.0
5 2012-01-01 76.0
3 2013-01-02 93.0
6 2014-01-01 93.0
7 2015-01-01 93.0
8 2016-01-01 93.0
4 2017-02-01 43.0

Matplotlib plots (0,0) even though there is no NaN or NULL value in dataframe

I use matplotlib to plot multilple lines with different color each. And it works quite nicely but somehow all of the plotted lines connects to (0,0) for the last value.
X-axis: binwhich are my timeframes Y-axis: count which contain the values I'd like to plot
My dataframe looks like this:
>df3.head()
start_time count date_day bin cw
0 2016-05-02 00:00:00 45 2016-05-02 00:00:00 18
1 2016-05-02 00:15:00 35 2016-05-02 00:15:00 18
2 2016-05-02 00:30:00 34 2016-05-02 00:30:00 18
3 2016-05-02 00:45:00 31 2016-05-02 00:45:00 18
4 2016-05-02 01:00:00 34 2016-05-02 01:00:00 18
>df3.tail()
start_time count date_day bin cw
17563 2016-10-31 22:45:00 114 2016-10-31 22:45:00 44
17564 2016-10-31 23:00:00 94 2016-10-31 23:00:00 44
17565 2016-10-31 23:15:00 121 2016-10-31 23:15:00 44
17566 2016-10-31 23:30:00 127 2016-10-31 23:30:00 44
17567 2016-10-31 23:45:00 135 2016-10-31 23:45:00 44
This is how I plot:
I seperate the lines by calendar week cw
cw = np.arange(18,45,1)
for x in cw:
df4 = df3[df3.cw == x]
xa = df4['bin']
ya = df4['count']
plt.plot(xa, ya)
plt.show()
What I get is this:
Plot (labels and axis are not formatted yet..)
For df3.any().isnull() I don't get any NaN which should be ok, but it still plots (0,0).
df3.any().isnull()
Out[297]:
start_time False
count False
date_day False
bin False
cw False
dtype: bool
Any ideas how I can get rid of this Connection line to (0,0)
Thanks a lot!
I have found the solution: I was a little bit too careless when slicing my data.
I sliced my data set for all Mondays (df.weekday == 0). THEN I grouped my Monday data in 15-min-bins with pandas.Grouper.
And here's the error: Grouper is grouping in between the maximum range which is found in the start_time column and consequently brings back all days in between my start-date and end-date. So I had many zero values which explains my plot.
Somehow I failed checking for this issue. Thanks for your time though!

Categories