I have a table with a Date column and several other country-specific columns (see the picture below). I want to create a heatmap in Seaborn but for that I need the Date column to be a datetime object. How can I change the dates from the current format -i.e. 2021Q3 - to 2021-09-01 (YYYY-MM-DD)?
I have tried the solution below (which works for monthly data - to_date = lambda x: pd.to_datetime(x['Date'], format='%YM%m')), but it does not work for the quarterly data. I get a ValueError: 'q' is a bad directive in format '%YQ%q'... I could not find any solution to the error online...
# loop to transform the Date column's format
to_date = lambda x: pd.to_datetime(x['Date'], format='%YQ%q')
df_eurostat_reg_bank_x = df_eurostat_reg_bank.assign(Date=to_date)
I have also tried this solution, but I get the first month of the quarter in return, whereas I want the last month of the quarter:
df_eurostat_reg_bank['Date'] = df_eurostat_reg_bank['Date'].str.replace(r'(\d+)(Q\d)', r'\1-\2')
df_eurostat_reg_bank['Date'] = pd.PeriodIndex(df_eurostat_reg_bank.Date, freq='Q').to_timestamp()
df_eurostat_reg_bank.Date = df_eurostat_reg_bank.Date.dt.strftime('%m/%d/%Y')
df_eurostat_reg_bank = df_eurostat_reg_bank.set_index('Date')
Thank you in advance!
I assume that your example of 2022Q3 is a string on the grounds that it's not a date format that I recognise.
Thus, simple arithmetic and f-strings will avoid the use of any external modules:
def convdate(d):
return f'{d[:4]}-{int(d[5]) * 3 - 2:02d}-01'
for d in ['2022Q1','2022Q2','2022Q3','2022Q4']:
print(convdate(d))
Output:
2022-01-01
2022-04-01
2022-07-01
2022-10-01
Note:
There is no attempt to ensure that the input string to convdate() is valid
My data 'df' shows data 'Date' as 1970-01-01 00:00:00.019990103 when this is formatted to date_to using pandas. How do I show the date as 01/03/1999?
consider LoneWanderer's comment for next time and show some of the code that you have tried.
I would try this:
from datetime import datetime
now = datetime.now()
print(now.strftime('%d/%m/%Y'))
You can print now to see that is in the same format that you have and after that is formatted to the format required.
I see that the actual date is in last 10 chars of your source string.
To convert such strings to a Timestamp (ignoring the starting part), run:
df.Date = df.Date.apply(lambda src: pd.to_datetime(src[-8:]))
It is worth to consider to keep this date just as Timestamp, as it
simplifies operations on date / time and apply your formatting only in printouts.
But if you want to have this date as a string in "your" format, in the
"original" column, perform the second conversion (Timestamp to string):
df.Date = df.Date.dt.strftime('%m/%d/%Y')
I work with data from Datalogger and the timestap is not supported by datetime in the Pandas Dataframe.
I would like to convert this timestamp into a format pandas knows and the then convert the datetime into seconds, starting with 0.
>>>df.time
0 05/20/2019 19:20:27:374
1 05/20/2019 19:20:28:674
2 05/20/2019 19:20:29:874
3 05/20/2019 19:20:30:274
Name: time, dtype: object
I tried to convert it from the object into datetime64[ns]. with %m or %b for month.
df_time = pd.to_datetime(df["time"], format = '%m/%d/%y %H:%M:%S:%MS')
df_time = pd.to_datetime(df["time"], format = '%b/%d/%y %H:%M:%S:%MS')
with error: redefinition of group name 'M' as group 7; was group 5 at position 155
I tried to reduce the data set and remove the milliseconds without success.
df['time'] = pd.to_datetime(df['time'],).str[:-3]
ValueError: ('Unknown string format:', '05/20/2019 19:20:26:383')
or is it possible to just subtract the first time line from all the other values in the column time?
Use '%m/%d/%Y %H:%M:%S:%f' as format instead of '%m/%d/%y %H:%M:%S:%MS'
Here is the format documentation for future reference
I am not exactly sure what you are looking for but you can use the above example to format your output and then you can remove items from your results like the microseconds this way:
date = str(datetime.now())
print(date)
2019-07-28 14:04:28.986601
print(date[11:-7])
14:04:28
time = date[11:-7]
print(time)
14:04:28
I am trying to do a simple test on pandas capabilities to handle dates & format.
For that i have created a dataframe with values like below. :
df = pd.DataFrame({'date1' : ['10-11-11','12-11-12','10-10-10','12-11-11',
'12-12-12','11-12-11','11-11-11']})
Here I am assuming that the values are dates. And I am converting it into proper format using pandas' to_datetime function.
df['format_date1'] = pd.to_datetime(df['date1'])
print(df)
Out[3]:
date1 format_date1
0 10-11-11 2011-10-11
1 12-11-12 2012-12-11
2 10-10-10 2010-10-10
3 12-11-11 2011-12-11
4 12-12-12 2012-12-12
5 11-12-11 2011-11-12
6 11-11-11 2011-11-11
Here, Pandas is reading the date of the dataframe as "MM/DD/YY" and converting it in native format (i.e. YYYY/MM/DD). I want to check if Pandas can take my input indicating that the date format is actually "YY/MM/DD" and then let it convert into its native format. This will change the value of row no.: 5. To do this, I have run following code. But it is giving me an error.
df3['format_date2'] = pd.to_datetime(df3['date1'], format='%Y/%m/%d')
ValueError: time data '10-10-10' does not match format '%Y/%m/%d' (match)
I have seen the sort of solution here. But I was hoping to get a little easy and crisp answer.
%Y in the format specifier takes the 4-digit year (i.e. 2016). %y takes the 2-digit year (i.e. 16, meaning 2016). Change the %Y to %y and it should work.
Also the dashes in your format specifier are not present. You need to change your format to %y-%m-%d
I have a .txt file data-set like this with the date column of interest:
1181206,3560076,2,01/03/2010,46,45,M,F
2754630,2831844,1,03/03/2010,56,50,M,F
3701022,3536017,1,04/03/2010,40,38,M,F
3786132,3776706,2,22/03/2010,54,48,M,F
1430789,3723506,1,04/05/2010,55,43,F,M
2824581,3091019,2,23/06/2010,59,58,M,F
4797641,4766769,1,04/08/2010,53,49,M,F
I want to work out the number of days between each date and 01/03/2010 and replace the date with the days offset {0, 2, 3, 21...} yielding an output like this:
1181206,3560076,2,0,46,45,M,F
2754630,2831844,1,2,56,50,M,F
3701022,3536017,1,3,40,38,M,F
3786132,3776706,2,21,54,48,M,F
1430789,3723506,1,64,55,43,F,M
2824581,3091019,2,114,59,58,M,F
4797641,4766769,1,156,53,49,M,F
I've been trying for ages and its getting really frustrating. I've tried converting to datetime using the datetime.datetime.strptime( '01/03/2010', "%d/%m/%Y").date() method and then subtracting the two dates but it gives me an output of e.g. '3 days, 0:00:00' but I can't seem to get an output of only the number!
The difference between two dates is a timedelta. Any timedelta instance has days attribute that is an integer value you want.
This is fairly simple. Using the code you gave:
date1 = datetime.datetime.strptime('01/03/2010', '%d/%m/%Y').date()
date2 = datetime.datetime.strptime('04/03/2010', '%d/%m/%Y').date()
You get two datetime objects.
(date2-date1)
will give you the time delta. The mistake you're making is to convert that timedelta to a string. timedelta objects have a days attribute. Therefore, you can get the number of days using it:
(date2-date1).days
This generates the desired output.
Using your input (a bit verbose...)
#!/usr/bin/env python
import datetime
with open('input') as fd:
d_first = datetime.date(2010, 03, 01)
for line in fd:
date=line.split(',')[3]
day, month, year= date.split(r'/')
d = datetime.date(int(year), int(month), int(day))
diff=d - d_first
print diff.days
Gives
0
2
3
21
64
114
156
Have a look at pleac, a lot of date-example there using python.