I'm trying to get a year (1980) into a datetime column in pandas, but I'm getting an error. Anybody know what I'm doing wrong?
import pandas as pd
import datetime
df = pd.read_csv(r'd:\downloads\googlebooks-eng-all-1gram-20120701-a', sep='\t',
header=None, \
names=["word","year","occurred","books"], \
dtype={"word":"str","year":"datetime","occured":"int64","books":"int64"},
parse_dates=True)
df.head()
The error is
TypeError: data type "datetime" not understood
This seems to be a well-documented bug, the suggestion I can give now is to:
Remove dtype from pd.read_csv().
-> read_csv() automatically infers the data type of the columns,
Do df.dtypes to ensure you have your preferred datatypes.
Now, to explicitly convert the column year to datetime, you can use the method pd.to_datetime. For example:
df['year'] = pd.to_datetime(df['year'])
you need to import the datetime package
Related
How can I convert the following string format of datetime into datetime object to be used in pandas Dataframe? I tried many examples, but it seems my format is different from the standard Pandas datetime object. I know this could be a repetition, but I tried solutions on the Stackexchange, but they don't work!
Below code will convert it into appropriate format
df = pd.DataFrame({'datetime':['2013-11-1_00:00','2013-11-1_00:10','2013-11-1_00:20']})
df['datetime_changed'] = pd.to_datetime(df['datetime'].str.replace('_','T'))
df.head()
output:
You can use pd.to_datetime with format
df['datetime'] = pd.to_datetime(df['datetime'], format='%Y-%m-%d_%H:%M')
I am trying to convert a datetime datatype of the form 24/12/2021 07:24:00 to mm-yyyy format which is 12-2021 with datetime datatype. I need the mm-yyyy in datetime format in order to sort the column 'Month-Year' in a time series. I have tried
import pandas as pd
from datetime import datetime
df = pd.read_excel('abc.xlsx')
df['Month-Year'] = df['Due Date'].map(lambda x: x.strftime('%m-%y'))
df.set_index(['ID', 'Month-Year'], inplace=True)
df.sort_index(inplace=True)
df
The column 'Month-Year' does not sort in time series because 'Month-Year' is of object datatype. How do I please convert 'Month-Year' column to datetime datatype?
I have been able to obtain a solution to the problem.
df['month_year'] = pd.to_datetime(df['Due Date']).dt.to_period('M')
I got this from the link below
https://www.interviewqs.com/ddi-code-snippets/extract-month-year-pandas
df['Month-Year']=pd.to_datetime(df['Month-Year']).dt.normalize()
will convert the Month-Year to datetime64[ns].
Use it before sorting.
I have date as string (example: 3/24/2020) that I would like to convert to datetime64[ns] format
df2['date'] = pd.to_datetime(df1["str_date"], format='%m/%d/%Y')
Use pandas to_datetime on vaex dataframe will result an error:
ValueError: time data 'str_date' does not match format '%m/%d/%Y' (match)
I have see maybe duplicate question.
df2['pdate']=df2.date.astype('datetime64[ns]')
However, the answer is type casting. My case required to a format ('%m/%d/%Y') parse string to datetime64[ns], not just type cast.
Solution: make custom function, then .apply
vaex can use apply function for object operations, so you can use datetime and np.datetime64 convert each date string, then apply it.
import numpy as np
from datetime import datetime
def convert_to_datetime(date_string):
return np.datetime64(datetime.strptime(str(date_string), "%Y%m%d%H%M%S"))
df['date'] = df.date.apply(convert_to_datetime)
I have a date column in a dataset where the dates are like 'Apr-12','Jan-12' format. I would like to change the format to 04-2012,01-2012. I am looking for a function which can do this.
I think I know one guy with the same name. Jokes apart here is the solution to your problem.
We do have an inbuilt function named as strptime(), so it takes up the string and then convert into the format you want.
You need to import datetime first since it is the part of the datetime package of python. Don't no need to install anything, just import it.
Then this works like this: datetime.strptime(your_string, format_you_want)
# You can also do this, from datetime import * (this imports all the functions of datetime)
from datetime import datetime
str = 'Apr-12'
date_object = datetime.strptime(str, '%m-%Y')
print(date_object)
I hope this will work for you. Happy coding :)
You can do following:
import pandas as pd
df = pd.DataFrame({
'date': ['Apr-12', 'Jan-12', 'May-12', 'March-13', 'June-14']
})
pd.to_datetime(df['date'], format='%b-%y')
This will output:
0 2012-04-01
1 2012-01-01
2 2012-05-01
Name: date, dtype: datetime64[ns]
Which means you can update your date column right away:
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
You can chain a couple of pandas methods together to get this the desired output:
df = pd.DataFrame({'date_fmt':['Apr-12','Jan-12']})
df
Input dataframe:
date_fmt
0 Apr-12
1 Jan-12
Use pd.to_datetime chained with .dt date accessor and strftime
pd.to_datetime(df['date_fmt'], format='%b-%y').dt.strftime('%m-%Y')
Output:
0 04-2012
1 01-2012
Name: date_fmt, dtype: object
I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)
if I do df.dtype then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.
At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'.
I've tried multiple methods but have run into a problem every way I've tried.
Any help would be appreciated. If I need to give any more information then I will.
As suggested by #MaxU, you can use pd.to_datetime() method to bring the values of the given column to the 'appropriate' format, like this:
df['datetime'] = pd.to_datetime(df.datetime)
You would have to do this on whatever columns you have that you need trasformed to the right dtype.
Alternatively, you can use parse_dates argument of pd.read_csv() method, like this:
df = pd.read_csv(path, parse_dates=[1,2,3])
where columns 1,2,3 are expected to contain data that can be interpreted as dates.
I hope this helps.
convert a column to datetime using this approach
df["Date"] = pd.to_datetime(df["Date"])
If column has empty values then change error level to coerce to ignore errors: Details
df["Date"] = pd.to_datetime(df["Date"], errors='coerce')
After which you should be able to subtract two dates.
example:
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
consult this answer for more details:
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes
If you want to directly load the column as datetime object while reading from csv, consider this example :
Pandas read csv dateint columns to datetime
I found that the problem was to do with missing values within the column. Using coerce=True so df["Date"] = pd.to_datetime(df["Date"], coerce=True) solves the problem.