I am trying to use read_csv on a .csv file that contains a date column. The problem is that the date column is in a foreign language (romanian), with entries like:
'26 septembrie 2017'
'13 iulie 2017'
etc. How can I parse this nicely into a pandas dataframe which has a US date format?
you can pass a converter for that column:
df = pd.read_csv(myfile, converters={'date_column': foreign_date_converter})
But first you have to define the converter to do what you want. This approach uses locale manipulation:
def foreign_date_converter(text):
# Resets locale to "ro_RO" to parse romanian date properly
# (non thread-safe code)
loc = locale.getlocale(locale.LC_TIME)
locale.setlocale(locale.LC_TIME, 'ro_RO')
date = datetime.datetime.strptime(text '%d %b %Y').date()
locale.setlocale(locale.LC_TIME, loc) # restores locale
return date
Use dateparser module.
import dateparser
df = pd.read_csv('yourfile.csv', parse_dates=['date'], date_parser=dateparser.parse)
Enter your date column name in parse_dates parameter. I'm just assuming it as date
You may have output like this:
date
0 2017-09-26
1 2017-07-13
If you want to change the format use strftime strftime
df['date'] = df.date.dt.strftime(date_format = '%d %B %Y')
output:
date
0 26 September 2017
1 13 July 2017
The easiest solution would be to simply use 12 times the str.replace(old, new) function.
It is not pretty but if you just built the function:
def translater(date_string_with_exatly_one_date):
date_str = date_string_with_exatly_one_date
date_str = date_str.replace("iulie", "july")
date_str = date_str.replace("septembrie", "september")
#do this 10 more times with the right translation
return date_str
Now you just have to call it for every entry. After that you can handle it like a US date string. This is not very efficient but it will get the job done and you do not have to search for special libraries.
Related
I have question, in output my date time look like this 2020-11-03 i need make something like this
13-Nov-20
my code _weekday = str(_row[1].strftime('%A'))
any idea?
You can specify the form of the datetime during the datetime call with the arguments. For example:
"%d %b, %Y" # gives you 30 Nov, 2020
in your case it would be:
"%d-%b-%y"
I have the following date (as an object format) : Tue 31 Jan in a pandas Series.
and I try to change it into : 31/01/2019
Please, how can I achieve this ? I understand more or less that pandas.Datetime can convert easily when a string date is clearer (like 6/1/1930 22:00) but not in my case, when their is a weekday name.
Thank you for your help.
Concat the year and callpd.to_datetime with a custom format:
s = pd.Series(['Tue 31 Jan', 'Mon 20 Feb',])
pd.to_datetime(s + ' 2019', format='%a %d %b %Y')
0 2019-01-31
1 2019-02-20
dtype: datetime64[ns]
This is fine as long as all your dates follow this format. If that is not the case, this cannot be solved reliably.
More information on datetime formats at strftime.org.
Another option is using the 3rd party dateutil library:
import dateutil
s.apply(dateutil.parser.parse)
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
This can be installed with PyPi.
Another, slower option (but more flexible) is using the 3rd party datefinder library to sniff dates from string containing random text (if this is what you need):
import datefinder
s.apply(lambda x: next(datefinder.find_dates(x)))
0 2018-01-31
1 2018-02-20
dtype: datetime64[ns]
You can install it with PyPi.
Convert to a datetime object
If you wanted to use the datetime module, you could get the year by doing the following:
import datetime as dt
d = dt.datetime.strptime('Tue 31 Jan', '%a %d %b').replace(year=dt.datetime.now().year)
This is taking the date in your format, but replacing the default year 1900 with the current year in a reliable way.
This is similar to the other answers, but uses the builtin replace method as opposed to concatenating a string.
Output
To get the desired output from your new datetime object, you could perform the following:
>>> d.strftime('%d/%m/%Y')
'31/01/2018'
Here is two alternate ways to achieve the same result.
Method 1: Using datetime module
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan', '%a %d %b')
print(datetime_object) # outputs 1900-01-31 00:00:00
If you had given an Year parameter like Tue 31 Jan 2018, then this code would work.
from datetime import datetime
datetime_object = datetime.strptime('Tue 31 Jan 2018', '%a %d %b %Y')
print(datetime_object) # outputs 2018-01-31 00:00:00
To print the resultant date in a format like this 31/01/2019. You can use
print(datetime_object.strftime("%d/%m/%Y")) # outputs 31/01/2018
Here are all the possible formatting options available with datetime object.
Method 2: Using dateutil.parser
This method automatically fills in the Year parameter with current year.
from dateutil import parser
string = "Tue 31 Jan"
date = parser.parse(string)
print(date) # outputs 2018-01-31 00:00:00
I scraped a website and got the following Output:
2018-06-07T12:22:00+0200
2018-06-07T12:53:00+0200
2018-06-07T13:22:00+0200
Is there a way I can take the first one and convert it into a DateTime value?
Just parse the string into year, month, day, hour and minute integers and then create a new date time object with those variables.
Check out the datetime docs
You can convert string format of datetime to datetime object like this using strptime, here %z is the time zone :
import datetime
dt = "2018-06-07T12:22:00+0200"
ndt = datetime.datetime.strptime(dt, "%Y-%m-%dT%H:%M:%S%z")
# output
2018-06-07 12:22:00+02:00
The following function (not mine) should help you with what you want:
df['date_column'] = pd.to_datetime(df['date_column'], format = '%d/%m/%Y %H:%M').dt.strftime('%Y%V')
You can mess around with the keys next to the % symbols to achieve what you want. You may, however, need to do some light cleaning of your values before you can use them with this function, i.e. replacing 2018-06-07T12:22:00+0200 with 2018-06-07 12:22.
You can use datetime lib.
from datetime import datetime
datetime_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
datetime.strptime documentation
Solution here
I have a DataFrame that includes column df['date'] and df['time']
Which I have put together in one column named df['datetime']
the output looks like the following: 2017-04-12 17:30:18.733
My end goal is to have it converted to a string Wed, 12 Apr 2017 17:30:18 733
When I try different methods as pd.to_datetime() it tells me I need it to be a string.
and I can't find a method to turn the whole column to a bunch of strings
I tried calling .astype(str) .apply(str)
Any suggestions?
You are taking to strings (one in the date column and the other in the time column), joining them together with a space to create a new datetime string (e.g. "2017-04-12 17:30:18.733"). You then use strptime to parse this string into a datetime object. I used a form that is amenable to the inclusion of microseconds or not. You now use 'strftime' to parse this datetime object back into your desired string format.
from datetime import datetime
df = pd.DataFrame({'date': ['2017-04-12', '2017-04-13'],
'time': ['17:30:18.733', '07:30:18']})
def date_parser(date_string):
try:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S.%f')
timestamp = timestamp.strftime('%a, %d %b %Y %H:%M:%S %f')[:-3]
except ValueError:
timestamp = datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')
timestamp = timestamp.strftime('%a, %d %b %Y %H:%M:%S 000')
return timestamp
df['datetime_str'] = (df['date'] + ' ' + df['time']).apply(lambda x: date_parser(x))
>>> df
date time datetime_str
0 2017-04-12 17:30:18.733 Wed, 12 Apr 2017 17:30:18 733
1 2017-04-13 07:30:18 Thu, 13 Apr 2017 07:30:18 000
use something like this:
df.apply(lambda x: x.datetime.strftime('%D %d ...the format you want...'))
There is a string and a date format. I want to get the date based on format.
If date format is YYYY.MM.dd and string is 2017.01.01. It should transform to a valid date object.
How can I find the date.
You can use datetime module something like this :
from datetime import datetime
date_object = datetime.strptime('2017.01.01', '%Y.%m.%d') # Converting the given date string into a datetime object.
formatted_date = date_object.strftime('%c') #User Defined Output Format
print(formatted_date)
This will result in :
Sun Jan 1 00:00:00 2017
You can refer to the documentation here.