Python Conditional on a date time Object - python

for index, row in HDFC_Nifty.iterrows():
if HDFC_Nifty.iat(index,0).dt.year ==2016:
TE_Nifty_2016.append(row['TE'])
else:
TE_Nifty_2017.append(row['TE'])
Hello,
I am attempting to iterate over the DataFrame, more specifically apply a conditional to the Date column which is formatted as a Datatime object.
I keep getting the below error
**
TypeError: '_iAtIndexer' object is not callable
**
I do not know how to proceed further. I have tried various loops but am largely unsuccessful and not able to understand what I am doing incorrect.
Thanks for the help!

Related

Why am I getting error 'Series' object has no attribute 'days' when trying to get number of days between two dates?

I have come across some examples that show how to calculate the difference between two dates using .days. However for some reason does not seem to be working for me. I have the following code:
import datetime
from datetime import date
dfPort = pd.read_csv('C:\\Research\\Lockup\\Data\\lockupdates.csv')
dfPort = pd.DataFrame(dfPort)
todaysDate=datetime.datetime.today()
dfPort['LDate']=pd.to_datetime(dfPort['LockupExpDate'], format='%m/%d/%Y')
dfPort['TimeLeft']=(todaysDate-dfPort['LDate']).days
I get the following error:
AttributeError: 'Series' object has no attribute 'days'
So I tried the following:
xx=dfPort['LDate']-todaysDate
xx.days
and got the same error message. The references in Stack Overflow that I was reading are:
Difference between two dates in Python
How to fix?
Your problem is because you are trying to access the days attribute directly on a Series Object. The result of subtracting the Series dfPort["LDate"] from todaysDate is a series of timedelta objects.
As #MrFruppes pointed out, there is a .dt property of the series that can be accessed. This returns a pandas.core.indexes.accessors.TimedeltaProperties object that exposes the TimeDelta members.
So the most performant solution would be:
dfPort['TimeLeft'] = (todaysDate-dfPort['LDate']).dt.days
Credit goes to #MrFruppes.
Below is my modified original answer, which are some examples of less-performance approaches [So you probably want to stop reading here.]
You could call the apply method of the series where you can access each member of the series like this:
dfPort['TimeLeft'] = (todaysDate-dfPort['LDate']).apply(lambda x: x.days)
Here is a list comprehension approach:
dfPort['TimeLeft'] = [v.days() for v in (todaysDate-dfPort['LDate'])]

Replace string in one part pandas dataframe

print(df["date"].str.replace("2016","16"))
The code above works fine. What I really want to do is to make this replacement in just a small part of the data-frame. Something like:
df.loc[2:4,["date"]].str.replace("2016","16")
However here I get an error:
AttributeError: 'DataFrame' object has no attribute 'str'
What about df['date'].loc[2:4].str.replace('2016', 16')?
By selecting ['date'] first you know you are dealing with a series which does have a string attribute.

How can I fill the NAs using groupby function in Pandas (using Python) when I have more than one column?

I am trying to fill the NAs in my data frame using the following code and got an error. Can anyone help? Why it is not working? I need to use more than one column (gender and age). With only one column, it works but beyond one column, I have an error
Here is the code:
df['NewCol'].fillna(df.groupby(['gender','age'])['grade'].transform('mean'),inplace=True)
The error message is:
TypeError: 'NoneType' object is not subscriptable

Question on how to create a new column based on current df columns

I am trying to figure out how to create a new column for my df and cant seem to get it to work with what I have tried.
I have tried using
loans_df.insert("Debt_Ratio",["MonthlyDebt"*12/"Income"])
but I keep getting an error stating unsupported operand type.
BTW I am calculating the new column using already predefined columns in my df
loans_df.insert("Debt_Ratio",["MonthlyDebt"*12/"Income"])
My expected results would be that this new column is inserted into the df with the specific calculation defining it.
Hope this all makes sense!
Considering that dataframe is loans_df and that the column you want to create is named Debt_Ratio, the following will do the work
loans_df['Debt_Ratio'] = loans_df['MonthlyDebt'] * 12/loans_df['Income']

changing column types of a pandas data frame -- finding offending rows that prevent casting

My PANDAS data has columns that were read as objects. I want to change these into floats. Following the post linked below (1), I tried:
pdos[cols] = pdos[cols].astype(float)
But PANDAS gives me an error saying that an object can't be recast as float.
ValueError: invalid literal for float(): 17_d
But when I search for 17_d in my data set, it tells me it's not there.
>>> '17_d' in pdos
False
I can look at the raw data to see what's happening outside of python, but feel if I'm going to take python seriously, I should know how to deal with this sort of issue. Why doesn't this search work? How could I do a search over objects for strings in PANDAS? Any advice?
Pandas: change data type of columns
of course it does, because you're only looking in the column list!
'17_d' in pdos
checks to see if '17_d' is in pdos.columns
so what you want to do is pdos[cols] == '17_d', which will give you a truth table. if you want to find which row it is, you can do (pdos[cols] == '17_d').any(1)

Categories