csv_read: convert dates to int/boolean - python

I have a csv file with a column "graduated" which either shows the date of graduation, or 0 if there is no graduation yet.
df.dtypes return 'object' for this column, I want to turn all the dates into a '1' (indicating that the person in that column graduated). How can I do that ?

Use pandas.to_datetime to convert dates and convert to boolean series. Then, cast it to int to get the desired result.
pd.to_datetime(df.graduated, errors='coerce').notnull().astype(int)

Related

Split the given integer value as date

20160116
Suppose this is the data with datatype integer in a column and now I want to convert it like 2016/01/16 or 2016-01-16 and datatype as date. My column name is system and dataframe is df. How can I do that?
I tried using many date format function but It was not good enough to achieve the answer.
convert using to_datetime, provide the format
then convert to the format of your desire
pd.to_datetime(df['dte'], format='%Y%m%d').dt.strftime('%Y/%m/%d')
0 2016/01/06
Name: dte, dtype: object
Using str.replace we can try:
df["date"] = df["system"].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1/\2/\3', regex=True)

Split date column into YYYY.MM.DD

I have a dataframe column in the format of 20180531.
I need to split this properly i.e. I can get 2018/05/31.
This is a dataframe column that I have and I need to deal with it in a datetime format.
Currently this column is identified as int64 type
I'm not sure how efficient it'll be but if you convert it to a string, and the use pd.to_datetime with a .format=..., eg:
df['actual_datetime'] = pd.to_datetime(df['your_column'].astype(str), format='%Y%m%d')
As Emma points out - the astype(str) is redundant here and just:
df['actual_datetime'] = pd.to_datetime(df['your_column'], format='%Y%m%d')
will work fine.
Assuming the integer dates would always be fixed width at 8 digits, you may try:
df['dt'] = df['dt_int'].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1-\2-\3')

How to convert all timestamp values to a numeric value in a dataframe column Python Pandas

I have a pandas data frame which has int, float, timestamp datatypes.I would like to convert the timestamp values to a numeric value Something like 2018-08-20 18:57:07.797 to 20180820185707797 and store it as a numeric replacing the original timestamp column. How can I do this in Python
Timestamp is stored internally as int64 as the number of nanoseconds from 1970-01-01 (I think). You can get this number by:
df['time'] = pd.to_datetime(df['time']).astype(np.int64)
Or if you really want what the format you said, try:
df['time'] = pd.to_datetime(df['time']).dt.strftime('%Y%m%d%H%M%S%f').apply(lambda x: int(x[:-3]))

how to reproduce this excel formula in python-pandas

I have a pandas data frame with a column representing dates in the format yyyy-mm-dd. This are sorted oldest to newest. I want to add a column next to it with the difference in time between the date at that row and the previous date.
In excel this would be something like:
Assuming your "date" column is stored as a datetime64 type, you can just do
df['difference'] = df.date.diff()
Check df.dtypes to ensure the date type is correct first.
solved it
data['lowered'] = data['date'].shift(+1)
data['difference'] = data['date'] - data['lowered']

String date to date (pandas)

I have a dataframe that is called dfactual this dataframe has a column ForeCastEndDate, so
dfactual['ForeCastEndDate'] it contains:
311205
311205
This must be a date in the format 31-12-2005, but the current format is int64. I tried the following:
dfactual['ForeCastEndDate'] = pd.to_datetime(pd.Series(dfactual['ForecastEndDate']))
I tried also to add the format command to it, but it didn't work out the format stays the same, int64.
How should I do it?
You can't use to_datetime with dtypes that are not str so you need to convert the dtype using astype first and then you can use to_datetime and pass the format string:
In [154]:
df = pd.DataFrame({'ForecastEndDate':[311205]})
pd.to_datetime(df['ForecastEndDate'].astype(str), format='%d%m%y')
Out[154]:
0 2005-12-31
Name: ForecastEndDate, dtype: datetime64[ns]

Categories