columnx columny results
2019-02-15 2 2019-04-15
2019-05-08 1 2019-06-08
It should not change the days of the month like 15 should be 15 and 8 should be 8. In case of 31 to 30 and vice versa, it's okay. Most Importantly I don't wanna use .apply(). Thanks!
This should solve it.
Pls check that you have datetime format in your columnx and then run below.
EDIT
df["results"]=df["columnx"]+ df['columny'].astype('timedelta64[M]'))
Related
I'm not very good at formulating questions so I'm sorry for the title. I hope this example is specific enough; if I have a python pandas dataframe (df) that holds the following data:
Date Num
01-01-2020 2
01-02-2020 4
01-01-2020 8
01-02-2020 16
how do I add up all of the 'Num' columns that have the same date so the df would look like this:
Date Num
01-01-2020 10
01-02-2020 20
df.groupby("Date").sum()
should do it, thank you Henry Yik
I am new to Pandas
I am accessing the date column which is in the format of
Restaurent ISSDTM
CREAMERY INC 4/5/2013 12:47
CREAMERY INC 4/5/2013 12:47
SANDRA 3/5/2009 11:23
SANDRA 8/26/2009 13:11
print(df['ISSDTTM'].dtype)--> Is an object
I want to do a count plot for this as per the year.
I tried using the
`df1=df['ISSDTTM'].apply(lambda x:x.split('/'))
to access the date but I am unable` to split the space in between. Also,
df1=df['ISSDTTM'].apply(lambda x:x.split(['/',' ']))
didn't work.
I also tried to access the last 4 digits using the
df2=df['ISSDTTM'].apply(lambda x:x[-1:-4])
Any approach to split this type of date formats? Should I use the dt.strformat?
Yes, you were on the right track with dt. Coerce to datetime and use dt.year.
pd.to_datetime(df.ISSDTM, errors='coerce').dt.year
0 2013
1 2013
2 2009
3 2009
Name: ISSDTM, dtype: int64
You can use DataFrame.plot.bar, or seaborn.countplot to generate a count-plot.
I have big DataFrame (df) which looks like:
Acc_num date_diff
0 29 0:04:43
1 29 0:01:43
2 29 2:22:45
3 29 0:16:21
4 29 0:58:20
5 30 0:00:35
6 34 7:15:26
7 34 4:40:01
8 34 0:56:02
9 34 6:53:44
10 34 1:36:58
.....
Acc_num int64
date_diff timedelta64[ns]
dtype: object
I need to calculate 'date_diff' mean (in timedelta format) for each account number.
df.date_diff.mean() works correctly. But when I try next:
df.groupby('Acc_num').date_diff.mean() it raises an exception:
"DataError: No numeric types to aggregate"
I also tried df.pivot_table() method, but didn't acheive anything.
Could someone help me with this stuff. Thank you in advance!
Weird limitation indeed. But a simple solution would be:
df.groupby('Acc_num').date_diff.agg(lambda g:g.sum()/g.count())
Edit:
Pandas will actually attempt to aggregate non-numeric columns if you pass numeric_only=False
df.groupby('Acc_num').date_diff.mean(numeric_only=False)
Consider the DataFrame data:
one two three four
Ohio 2013-01-01 1 2 3
Colorado 2014-01-05 5 6 7
Utah 2015-05-06 9 10 11
New York 2016-10-11 13 14 15
I'd like to extract the row using only the criterion that the year is a given year, e.g., something like data['one'][:][0:4] == '2013'. But the command data['one'][:][0:4] returns
Ohio 2013-01-01
Colorado 2014-01-05
Utah 2015-05-06
New York 2016-10-11
Name: one, dtype: object
I thought this is the right thing to do because the command data['one'][0][0:4] returns
'2013'
Why the difference, and what's the correct way to do this?
Since column 'one' consists of dates, it'd be best to have pandas recognize it as such, instead of recognizing it as strings. You can use pd.to_datetime to do this:
df['one'] = pd.to_datetime(df['one'])
This allows you to filter on date properties without needing to worry about slicing strings. For example, you can check for year using Series.dt.year:
df['one'].dt.year == 2013
Combining this with loc allows you to get all rows where the year is 2013:
df.loc[df['one'].dt.year == 2013, :]
The condition you are looking for is
df['one'].str[0:4] == "2013"
Basically, you need to tell Pandas to read your column as a string, then operate on the strings from that column.
The way you have it written (df['one'][:]), says "give me the column called "one", then give me all of them [:].
query works out well too on datetime columns
In [13]: df.query('one == 2013')
Out[13]:
one two three four
Ohio 2013-01-01 1 2 3
DataFrame I have:
A B C
2012-01-01 1 2 3
2012-01-05 4 5 6
2012-01-10 7 8 9
2012-01-15 10 11 12
What I am using now:
date_after = dt.datetime( 2012, 1, 7 )
frame.ix[date_after:].ix[0:1]
Out[1]:
A B C
2012-01-10 7 8 9
Is there any better way of doing this? I do not like that I have to specify .ix[0:1] instead of .ix[0], but if I don't the output changes to a TimeSeries instead of a single row in a DataFrame. I find it harder to work with a rotated TimeSeries back on top of the original DataFrame.
Without .ix[0:1]:
frame.ix[date_after:].ix[0]
Out[1]:
A 7
B 8
C 9
Name: 2012-01-10 00:00:00
Thanks,
John
Couldn't resist answering this, even though the question was asked, and answered, in 2012, by Wes himself, and again in 2015, by ajsp. Yes, besides 'truncate', you can also use get_loc with the option 'backfill' to get the nearest date after the specific date. By the way, if you want to nearest date before the specific date, use 'ffill'. If you just want nearby, use 'nearest'.
df.iloc[df.index.get_loc(datetime.datetime(2016,2,2),method='backfill')]
You might want to go directly do the index:
i = frame.index.searchsorted(date)
frame.ix[frame.index[i]]
A touch verbose but you could put it in a function. About as good as you'll get (O(log n))
Couldn't resist answering this, even though the question was asked, and answered, in 2012, by Wes himself. Yes, just use truncate.
df.truncate(before='2012-01-07')