Pandas Drop rows for current Year-month - python

My dataframe has multiple years and months in "yyyy-mm-dd" format.
I would like to dynamically drop all current Year-month rows from the df

you could use a simple strftime method where the %Y%m is not equal to the year month of the current date.
df1 = df.loc[
df['Date'].dt.strftime('%Y%m') != pd.Timestamp('today').strftime('%Y%m')]
Example
d = pd.date_range('01 oct 2021', '01 dec 2021',freq='d')
df = pd.DataFrame(d,columns=['Date'])
print(df)
Date
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04
4 2021-10-05
.. ...
57 2021-11-27
58 2021-11-28
59 2021-11-29
60 2021-11-30
61 2021-12-01
print(df1)
Date
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04
4 2021-10-05
5 2021-10-06
6 2021-10-07
7 2021-10-08
8 2021-10-09
9 2021-10-10
10 2021-10-11
11 2021-10-12
12 2021-10-13
13 2021-10-14
14 2021-10-15
15 2021-10-16
16 2021-10-17
17 2021-10-18
18 2021-10-19
19 2021-10-20
20 2021-10-21
21 2021-10-22
22 2021-10-23
23 2021-10-24
24 2021-10-25
25 2021-10-26
26 2021-10-27
27 2021-10-28
28 2021-10-29
29 2021-10-30
30 2021-10-31
61 2021-12-01

Related

Remove duplicate dataframe column items just at the beginning while keeping the last entry

The dataframe below is what I'm trying to plot, but there are several duplicate entries in each column. By maintaining only the final entry, I wish to eliminate the initial repeated components in each column so that they do not appear in the graph(Ignore if duplicates in middle and last).
Could someone please help me solve this issue?
Code I tried, this removes if duplicates in entire row:
df = df.drop_duplicates(subset=df.columns[1:], keep='last')
df = df.groupby((df.shift() != df).cumsum()).filter(lambda x: len(x) < 5)
Input:
Date Build1 Build2 Build3 Build4 Build5 Build6
2022-11-26 00:00:00 30 30 30 30 30 30
2022-11-27 00:00:00 30 30 30 30 30 30
2022-11-28 00:00:00 30 30 30 30 30 30
2022-11-29 00:00:00 30 30 30 30 30 30
2022-11-30 00:00:00 30 30 30 30 30 30
2022-12-01 00:00:00 28 30 30 30 30 30
2022-12-02 00:00:00 25 30 30 30 30 30
2022-12-03 00:00:00 25 30 30 30 30 30
2022-12-04 00:00:00 22 28 30 30 30 30
2022-12-05 00:00:00 22 26 30 30 30 30
2022-12-06 00:00:00 22 23 30 30 30 30
2022-12-07 00:00:00 22 22 30 30 30 30
2022-12-08 00:00:00 22 20 30 30 30 30
2022-12-09 00:00:00 22 20 25 30 30 30
2022-12-10 00:00:00 22 20 23 30 30 30
2022-12-11 00:00:00 22 20 23 30 30 30
2022-12-12 00:00:00 22 20 18 30 30 30
2022-12-13 00:00:00 22 20 14 30 30 30
2022-12-14 00:00:00 22 20 11 30 30 30
2022-12-15 00:00:00 22 20 10 27 30 30
2022-12-16 00:00:00 22 20 10 20 30 30
2022-12-17 00:00:00 22 20 10 20 30 30
2022-12-18 00:00:00 22 20 10 20 30 30
2022-12-19 00:00:00 22 20 10 13 30 30
2022-12-20 00:00:00 22 20 10 2 30 30
2022-12-21 00:00:00 22 20 10 2 19 30
2022-12-22 00:00:00 22 20 10 2 11 30
2022-12-23 00:00:00 22 20 10 2 4 30
2022-12-24 00:00:00 22 20 10 2 0 30
2022-12-25 00:00:00 22 20 10 2 0 22
2022-12-26 00:00:00 22 20 10 2 0 15
2022-12-27 00:00:00 22 20 10 2 0 15
2022-12-28 00:00:00 22 20 10 2 0 9
Expected output:
Date Build1 Build2 Build3 Build4 Build5 Build6
2022-11-26 00:00:00
2022-11-27 00:00:00
2022-11-28 00:00:00
2022-11-29 00:00:00
2022-11-30 00:00:00 30
2022-12-01 00:00:00 28
2022-12-02 00:00:00 25
2022-12-03 00:00:00 25 30
2022-12-04 00:00:00 22 28
2022-12-05 00:00:00 22 26
2022-12-06 00:00:00 22 23
2022-12-07 00:00:00 22 22
2022-12-08 00:00:00 22 20 30
2022-12-09 00:00:00 22 20 25
2022-12-10 00:00:00 22 20 23
2022-12-11 00:00:00 22 20 23
2022-12-12 00:00:00 22 20 18
2022-12-13 00:00:00 22 20 14
2022-12-14 00:00:00 22 20 11 30
2022-12-15 00:00:00 22 20 10 27
2022-12-16 00:00:00 22 20 10 20
2022-12-17 00:00:00 22 20 10 20
2022-12-18 00:00:00 22 20 10 20
2022-12-19 00:00:00 22 20 10 13
2022-12-20 00:00:00 22 20 10 2 30
2022-12-21 00:00:00 22 20 10 2 19
2022-12-22 00:00:00 22 20 10 2 11
2022-12-23 00:00:00 22 20 10 2 4
2022-12-24 00:00:00 22 20 10 2 0 30
2022-12-25 00:00:00 22 20 10 2 0 22
2022-12-26 00:00:00 22 20 10 2 0 15
2022-12-27 00:00:00 22 20 10 2 0 15
2022-12-28 00:00:00 22 20 10 2 0 9
You can simply do
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
df.where(~is_duplicate, np.nan)
which gives
Date Build1 Build2 Build3 Build4
0 2022-11-26 00:00:00 30 30 NaN NaN NaN
1 2022-11-27 00:00:00 30 30 NaN NaN NaN
2 2022-11-28 00:00:00 30 30 NaN NaN NaN
3 2022-11-29 00:00:00 30 30 NaN NaN NaN
4 2022-11-30 00:00:00 30 30 NaN NaN NaN
5 2022-12-01 00:00:00 28 30 NaN NaN NaN
6 2022-12-02 00:00:00 25 30 NaN NaN NaN
7 2022-12-03 00:00:00 25 30 NaN NaN NaN
8 2022-12-04 00:00:00 22 30 NaN NaN NaN
9 2022-12-05 00:00:00 22 30 NaN NaN NaN
10 2022-12-06 00:00:00 22 30 NaN NaN NaN
11 2022-12-07 00:00:00 22 30 NaN NaN NaN
12 2022-12-08 00:00:00 22 30 NaN NaN NaN
13 2022-12-09 00:00:00 22 25 30.0 NaN NaN
14 2022-12-10 00:00:00 22 23 30.0 NaN NaN
15 2022-12-11 00:00:00 22 23 30.0 NaN NaN
16 2022-12-12 00:00:00 22 18 30.0 NaN NaN
17 2022-12-13 00:00:00 22 14 30.0 NaN NaN
18 2022-12-14 00:00:00 22 11 30.0 NaN NaN
19 2022-12-15 00:00:00 22 10 27.0 30.0 NaN
20 2022-12-16 00:00:00 22 10 20.0 30.0 NaN
21 2022-12-17 00:00:00 22 10 20.0 30.0 NaN
22 2022-12-18 00:00:00 22 10 20.0 30.0 NaN
23 2022-12-19 00:00:00 22 10 13.0 30.0 NaN
24 2022-12-20 00:00:00 22 10 2.0 30.0 NaN
25 2022-12-21 00:00:00 22 10 2.0 19.0 30.0
26 2022-12-22 00:00:00 22 10 2.0 11.0 30.0
27 2022-12-23 00:00:00 22 10 2.0 4.0 30.0
28 2022-12-24 00:00:00 22 10 2.0 0.0 30.0
29 2022-12-25 00:00:00 22 10 2.0 0.0 22.0
30 2022-12-26 00:00:00 22 10 2.0 0.0 15.0
31 2022-12-27 00:00:00 22 10 2.0 0.0 15.0
32 2022-12-28 00:00:00 22 10 2.0 0.0 9.0
or
is_duplicate = df.apply(pd.Series.duplicated, axis=1)
print(df.where(~is_duplicate, ''))
which gives:
Date Build1 Build2 Build3 Build4
0 2022-11-26 00:00:00 30 30
1 2022-11-27 00:00:00 30 30
2 2022-11-28 00:00:00 30 30
3 2022-11-29 00:00:00 30 30
4 2022-11-30 00:00:00 30 30
5 2022-12-01 00:00:00 28 30
6 2022-12-02 00:00:00 25 30
7 2022-12-03 00:00:00 25 30
8 2022-12-04 00:00:00 22 30
9 2022-12-05 00:00:00 22 30
10 2022-12-06 00:00:00 22 30
11 2022-12-07 00:00:00 22 30
12 2022-12-08 00:00:00 22 30
13 2022-12-09 00:00:00 22 25 30
14 2022-12-10 00:00:00 22 23 30
15 2022-12-11 00:00:00 22 23 30
16 2022-12-12 00:00:00 22 18 30
17 2022-12-13 00:00:00 22 14 30
18 2022-12-14 00:00:00 22 11 30
19 2022-12-15 00:00:00 22 10 27 30
20 2022-12-16 00:00:00 22 10 20 30
21 2022-12-17 00:00:00 22 10 20 30
22 2022-12-18 00:00:00 22 10 20 30
23 2022-12-19 00:00:00 22 10 13 30
24 2022-12-20 00:00:00 22 10 2 30
25 2022-12-21 00:00:00 22 10 2 19 30
26 2022-12-22 00:00:00 22 10 2 11 30
27 2022-12-23 00:00:00 22 10 2 4 30
28 2022-12-24 00:00:00 22 10 2 0 30
29 2022-12-25 00:00:00 22 10 2 0 22
30 2022-12-26 00:00:00 22 10 2 0 15
31 2022-12-27 00:00:00 22 10 2 0 15
32 2022-12-28 00:00:00 22 10 2 0 9

Mixing days and months in datetime value column in Pandas

I want to format dates in pandas, to have year-month-day. My dates are from april to september. I do not have values from january, feb etc, but sometimes my pandas reads day as month and month as day. Look at index 16 or 84.
6 2019-08-26 15:10:00
7 2019-08-25 13:22:00
8 2019-08-24 16:06:00
9 2019-08-23 15:13:00
10 2019-08-22 14:24:00
11 2019-08-21 14:02:00
12 2019-08-16 12:31:00
13 2019-08-15 15:31:00
14 2019-08-14 14:46:00
15 2019-08-13 17:13:00
16 2019-11-08 15:54:00
17 2019-10-08 10:07:00
68 2019-06-06 11:22:00
69 2019-05-06 15:16:00
70 2019-01-06 17:02:00
75 2019-05-21 09:01:00
76 2019-05-19 16:52:00
77 2019-05-15 15:40:00
78 2019-10-05 13:34:00
81 2019-06-05 11:55:00
82 2019-03-05 17:28:00
83 2019-02-05 18:01:00
84 2019-01-05 17:05:00
85 2019-01-05 09:57:00
86 2019-04-30 10:16:00
87 2019-04-29 17:51:00
88 2019-04-27 17:42:00
How to fix this?
I want to have date type values *(year-month-day), without time, so that I can group by day, or by month.
I have tried this, but It does not work:
df['Created'] = pd.to_datetime(df['Created'], format = 'something')
And for grouping by month, I have tried this:
df['Created'] = df['Created'].dt.to_period('M')
Solution for sample data - you can create both possible datetimes with both formats with errors='coerce' for missing values in not match and then replace missing values from second Series (YYYY-DD-MM) by first Series (YYYY-MM-DD) by Series.combine_first or Series.combine_first:
a = pd.to_datetime(df['Created'], format = '%Y-%m-%d %H:%M:%S', errors='coerce')
b = pd.to_datetime(df['Created'], format = '%Y-%d-%m %H:%M:%S', errors='coerce')
df['Created'] = b.combine_first(a).dt.to_period('M')
#alternative
#df['Created'] = b.fillna(a).dt.to_period('M')
print (df)
Created
6 2019-08
7 2019-08
8 2019-08
9 2019-08
10 2019-08
11 2019-08
12 2019-08
13 2019-08
14 2019-08
15 2019-08
16 2019-08
17 2019-08
68 2019-06
69 2019-06
70 2019-06
75 2019-05
76 2019-05
77 2019-05
78 2019-05
81 2019-05
82 2019-05
83 2019-05
84 2019-05
85 2019-05
86 2019-04
87 2019-04
88 2019-04
I created a dummy dataframe to parse this. Try strftime
from datetime import datetime
import time
import pandas as pd
time1 = datetime.now()
time.sleep(6)
time2 = datetime.now()
df = pd.DataFrame({'Created': [time1, time2]})
df['Created2'] = df['Created'].apply(lambda x: x.strftime('%Y-%m-%d'))
print(df.head())

python: compare data of different date types

I have a question of comparing data of datetime64[ns] and date like '2017-01-01'.
here is the code:
df.loc[(df['Date'] >= datetime.date(2017.1.1), 'TimeRange'] = '2017.1'
but , an error has been showed and said descriptor 'date' requires a 'datetime.datetime' object but received a 'int'.
how can i compare a datetime64 to data (2017-01-01 or 2-17-6-1 and likes)
Thanks
Demo:
Source DF:
In [83]: df = pd.DataFrame({'tm':pd.date_range('2000-01-01', freq='9999T', periods=20)})
In [84]: df
Out[84]:
tm
0 2000-01-01 00:00:00
1 2000-01-07 22:39:00
2 2000-01-14 21:18:00
3 2000-01-21 19:57:00
4 2000-01-28 18:36:00
5 2000-02-04 17:15:00
6 2000-02-11 15:54:00
7 2000-02-18 14:33:00
8 2000-02-25 13:12:00
9 2000-03-03 11:51:00
10 2000-03-10 10:30:00
11 2000-03-17 09:09:00
12 2000-03-24 07:48:00
13 2000-03-31 06:27:00
14 2000-04-07 05:06:00
15 2000-04-14 03:45:00
16 2000-04-21 02:24:00
17 2000-04-28 01:03:00
18 2000-05-04 23:42:00
19 2000-05-11 22:21:00
Filtering:
In [85]: df.loc[df.tm > '2000-03-01']
Out[85]:
tm
9 2000-03-03 11:51:00
10 2000-03-10 10:30:00
11 2000-03-17 09:09:00
12 2000-03-24 07:48:00
13 2000-03-31 06:27:00
14 2000-04-07 05:06:00
15 2000-04-14 03:45:00
16 2000-04-21 02:24:00
17 2000-04-28 01:03:00
18 2000-05-04 23:42:00
19 2000-05-11 22:21:00
In [86]: df.loc[df.tm > '2000-3-1']
Out[86]:
tm
9 2000-03-03 11:51:00
10 2000-03-10 10:30:00
11 2000-03-17 09:09:00
12 2000-03-24 07:48:00
13 2000-03-31 06:27:00
14 2000-04-07 05:06:00
15 2000-04-14 03:45:00
16 2000-04-21 02:24:00
17 2000-04-28 01:03:00
18 2000-05-04 23:42:00
19 2000-05-11 22:21:00
not standard date format:
In [87]: df.loc[df.tm > pd.to_datetime('03/01/2000')]
Out[87]:
tm
9 2000-03-03 11:51:00
10 2000-03-10 10:30:00
11 2000-03-17 09:09:00
12 2000-03-24 07:48:00
13 2000-03-31 06:27:00
14 2000-04-07 05:06:00
15 2000-04-14 03:45:00
16 2000-04-21 02:24:00
17 2000-04-28 01:03:00
18 2000-05-04 23:42:00
19 2000-05-11 22:21:00
You need to ensure that the data you're comparing it with is also in the same format. Assuming that you have two datetime objects, you can do it like this:
import datetime
print(df.loc[(df['Date'] >= datetime.date(2017, 1, 1), 'TimeRange'])
This will create a datetime object and list out the filtered results. You can also assign the results an updated value as you have mentioned above.

Changing time components of pandas datetime64 column

I have a dataframe that can be simplified as:
date id
0 02/04/2015 02:34 1
1 06/04/2015 12:34 2
2 09/04/2015 23:03 3
3 12/04/2015 01:00 4
4 15/04/2015 07:12 5
5 21/04/2015 12:59 6
6 29/04/2015 17:33 7
7 04/05/2015 10:44 8
8 06/05/2015 11:12 9
9 10/05/2015 08:52 10
10 12/05/2015 14:19 11
11 19/05/2015 19:22 12
12 27/05/2015 22:31 13
13 01/06/2015 11:09 14
14 04/06/2015 12:57 15
15 10/06/2015 04:00 16
16 15/06/2015 03:23 17
17 19/06/2015 05:37 18
18 23/06/2015 13:41 19
19 27/06/2015 15:43 20
It can be created using:
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"]})
The data has the following types:
tempDF.dtypes
date object
id int64
dtype: object
I have set the 'date' variable to be Pandas datefime64 format (if that's the right way to describe it) using:
import numpy as np
import pandas as pd
tempDF['date'] = pd_to_datetime(tempDF['date'])
So now, the dtypes look like:
tempDF.dtypes
date datetime64[ns]
id int64
dtype: object
I want to change the hours of the original date data. I can use .normalize() to convert to midnight via the .dt accessor:
tempDF['date'] = tempDF['date'].dt.normalize()
And, I can get access to individual datetime components (e.g. year) using:
tempDF['date'].dt.year
This produces:
0 2015
1 2015
2 2015
3 2015
4 2015
5 2015
6 2015
7 2015
8 2015
9 2015
10 2015
11 2015
12 2015
13 2015
14 2015
15 2015
16 2015
17 2015
18 2015
19 2015
Name: date, dtype: int64
The question is, how can I change specific date and time components? For example, how could I change the midday (12:00) for all the dates? I've found that datetime.datetime has a .replace() function. However, having converted dates to Pandas format, it would make sense to keep in that format. Is there a way to do that without changing the format again?
EDIT :
A vectorized way to do this would be to normalize the series, and then add 12 hours to it using timedelta. Example -
tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Demo -
In [59]: tempDF
Out[59]:
date id
0 2015-02-04 12:00:00 1
1 2015-06-04 12:00:00 2
2 2015-09-04 12:00:00 3
3 2015-12-04 12:00:00 4
4 2015-04-15 12:00:00 5
5 2015-04-21 12:00:00 6
6 2015-04-29 12:00:00 7
7 2015-04-05 12:00:00 8
8 2015-06-05 12:00:00 9
9 2015-10-05 12:00:00 10
10 2015-12-05 12:00:00 11
11 2015-05-19 12:00:00 12
12 2015-05-27 12:00:00 13
13 2015-01-06 12:00:00 14
14 2015-04-06 12:00:00 15
15 2015-10-06 12:00:00 16
16 2015-06-15 12:00:00 17
17 2015-06-19 12:00:00 18
18 2015-06-23 12:00:00 19
19 2015-06-27 12:00:00 20
In [60]: tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Out[60]:
0 2015-02-04 12:00:00
1 2015-06-04 12:00:00
2 2015-09-04 12:00:00
3 2015-12-04 12:00:00
4 2015-04-15 12:00:00
5 2015-04-21 12:00:00
6 2015-04-29 12:00:00
7 2015-04-05 12:00:00
8 2015-06-05 12:00:00
9 2015-10-05 12:00:00
10 2015-12-05 12:00:00
11 2015-05-19 12:00:00
12 2015-05-27 12:00:00
13 2015-01-06 12:00:00
14 2015-04-06 12:00:00
15 2015-10-06 12:00:00
16 2015-06-15 12:00:00
17 2015-06-19 12:00:00
18 2015-06-23 12:00:00
19 2015-06-27 12:00:00
dtype: datetime64[ns]
Timing information for both methods at bottom
One method would be to use Series.apply along with the .replace() method OP mentions in his post. Example -
tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))
Demo -
In [12]: tempDF
Out[12]:
date id
0 2015-02-04 02:34:00 1
1 2015-06-04 12:34:00 2
2 2015-09-04 23:03:00 3
3 2015-12-04 01:00:00 4
4 2015-04-15 07:12:00 5
5 2015-04-21 12:59:00 6
6 2015-04-29 17:33:00 7
7 2015-04-05 10:44:00 8
8 2015-06-05 11:12:00 9
9 2015-10-05 08:52:00 10
10 2015-12-05 14:19:00 11
11 2015-05-19 19:22:00 12
12 2015-05-27 22:31:00 13
13 2015-01-06 11:09:00 14
14 2015-04-06 12:57:00 15
15 2015-10-06 04:00:00 16
16 2015-06-15 03:23:00 17
17 2015-06-19 05:37:00 18
18 2015-06-23 13:41:00 19
19 2015-06-27 15:43:00 20
In [13]: tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))
In [14]: tempDF
Out[14]:
date id
0 2015-02-04 12:00:00 1
1 2015-06-04 12:00:00 2
2 2015-09-04 12:00:00 3
3 2015-12-04 12:00:00 4
4 2015-04-15 12:00:00 5
5 2015-04-21 12:00:00 6
6 2015-04-29 12:00:00 7
7 2015-04-05 12:00:00 8
8 2015-06-05 12:00:00 9
9 2015-10-05 12:00:00 10
10 2015-12-05 12:00:00 11
11 2015-05-19 12:00:00 12
12 2015-05-27 12:00:00 13
13 2015-01-06 12:00:00 14
14 2015-04-06 12:00:00 15
15 2015-10-06 12:00:00 16
16 2015-06-15 12:00:00 17
17 2015-06-19 12:00:00 18
18 2015-06-23 12:00:00 19
19 2015-06-27 12:00:00 20
Timing information
In [52]: df = pd.DataFrame([[datetime.datetime.now()] for _ in range(100000)],columns=['date'])
In [54]: %%timeit
....: df['date'].dt.normalize() + datetime.timedelta(hours=12)
....:
The slowest run took 12.53 times longer than the fastest. This could mean that an intermediate result is being cached
1 loops, best of 3: 32.3 ms per loop
In [57]: %%timeit
....: df['date'].apply(lambda x:x.replace(hour=12,minute=0))
....:
1 loops, best of 3: 1.09 s per loop
Here's the solution I used to replace the time component of the datetime values in a Pandas DataFrame. Not sure how efficient this solution is, but it fit my needs.
import pandas as pd
# Create a list of EOCY dates for a specified period
sDate = pd.Timestamp('2022-01-31 23:59:00')
eDate = pd.Timestamp('2060-01-31 23:59:00')
dtList = pd.date_range(sDate, eDate, freq='Y').to_pydatetime()
# Create a DataFrame with a single column called 'Date' and fill the rows with the list of EOCY dates.
df = pd.DataFrame({'Date': dtList})
# Loop through the DataFrame rows using the replace function to replace the hours and minutes of each date value.
for i in range(df.shape[0]):
df.iloc[i, 0]=df.iloc[i, 0].replace(hour=00, minute=00)
Not sure how efficient this solution is, but it fit my needs.

How to plot a pandas multiindex dataFrame with all xticks

I have a pandas dataFrame like this:
content
date
2013-12-18 12:30:00 1
2013-12-19 10:50:00 1
2013-12-24 11:00:00 0
2014-01-02 11:30:00 1
2014-01-03 11:50:00 0
2013-12-17 16:40:00 10
2013-12-18 10:00:00 0
2013-12-11 10:00:00 0
2013-12-18 11:45:00 0
2013-12-11 14:40:00 4
2010-05-25 13:05:00 0
2013-11-18 14:10:00 0
2013-11-27 11:50:00 3
2013-11-13 10:40:00 0
2013-11-20 10:40:00 1
2008-11-04 14:49:00 1
2013-11-18 10:05:00 0
2013-08-27 11:00:00 0
2013-09-18 16:00:00 0
2013-09-27 11:40:00 0
date being the index.
I reduce the values to months using:
dataFrame = dataFrame.groupby([lambda x: x.year, lambda x: x.month]).agg([sum])
which outputs:
content
sum
2006 3 66
4 65
5 48
6 87
7 37
8 54
9 73
10 74
11 53
12 45
2007 1 28
2 40
3 95
4 63
5 56
6 66
7 50
8 49
9 18
10 28
Now when I plot this dataFrame, I want the x-axis show every month/year as a tick. I have tries setting xticks but it doesn't seem to work. How could this be achieved? This is my current plot using dataFrame.plot():
You can use set_xtick() and set_xticklabels():
idx = pd.date_range("2013-01-01", periods=1000)
val = np.random.rand(1000)
s = pd.Series(val, idx)
g = s.groupby([s.index.year, s.index.month]).mean()
ax = g.plot()
ax.set_xticks(range(len(g)));
ax.set_xticklabels(["%s-%02d" % item for item in g.index.tolist()], rotation=90);
output:

Categories