In my import file one of the column has a date, if I view the same column in the dataframe, its converted into integer. How do I convert back to the date format.
In the data file, the column looks like 'Oct-17' but when I view in the dataframe it looks like '43009'. How do I change in Python from integer to Date so my data looks like 'Oct-17'
Appreciate for your help
Use xlrd, once you read in pandas:
df = pd.DataFrame({'Date_String':[43009,43000,42345,43134,43917]})
import xlrd
df['Date'] = df['Date_String'].apply(lambda x: xlrd.xldate.xldate_as_datetime(x, 0))
df['MMMYY'] =df['Date'].apply(lambda x: x.strftime('%b-%y'))
print(df)
Date_String Date MMMYY
0 43009 2017-10-01 Oct-17
1 43000 2017-09-22 Sep-17
2 42345 2015-12-07 Dec-15
3 43134 2018-02-03 Feb-18
4 43917 2020-03-27 Mar-20
Related
I have the following DataFrame with a Date column,
0 2021-12-13
1 2021-12-10
2 2021-12-09
3 2021-12-08
4 2021-12-07
...
7990 1990-01-08
7991 1990-01-05
7992 1990-01-04
7993 1990-01-03
7994 1990-01-02
I am trying to find the index for a specific date in this DataFrame using the following code,
# import raw data into DataFrame
df = pd.DataFrame.from_records(data['dataset']['data'])
df.columns = data['dataset']['column_names']
df['Date'] = pd.to_datetime(df['Date'])
# sample date to search for
sample_date = dt.date(2021,12,13)
print(sample_date)
# return index of sample date
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
The output of the program is,
2021-12-13
[]
I can't understand why. I have cast the Date column in the DataFrame to a DateTime and I'm doing a like-for-like comparison.
I have reproduced your Dataframe with minimal samples. By changing the way that you can compare the date will work like this below.
import pandas as pd
import datetime as dt
df = pd.DataFrame({'Date':['2021-12-13','2021-12-10','2021-12-09','2021-12-08']})
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y-%m-%d')
sample_date = dt.datetime.strptime('2021-12-13', '%Y-%m-%d')
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
output:
[0]
The search data was in the index number 0 of the DataFrame
Please let me know if this one has any issues
I am working with time series data with pandas and my data frame looks a little bit like this
Date Layer
0 2000-01-01 0.408640
1 2000-01-02 0.842065
2 2000-01-03 1.271810
3 2000-01-04 1.699399
4 2000-01-05 2.128098
... ...
7300 2019-12-27 149.323520
7301 2019-12-28 149.744012
7302 2019-12-29 150.155702
7303 2019-12-30 150.562771
7304 2019-12-31 151.003031
I need to make a column for each year, like this:
2000 2001 2002
0 0.408640 0.415863 0.425689
1 0.852653 0.826542 0.863524
... ... ...
364 156.235978 158.564578 152.135689
365 156.685421 158.924556 152.528978
Is there a way I can manage to do that? The resulting data can be in a new data frame
The approach for this will be to create separate year and day of year columns, and then create a pivot table:
#Convert Date column to pandas datetime if you haven't already:
df['Date'] = pd.to_datetime(df['Date'])
#Create year column
df['Year'] = df['Date'].dt.year
#Create day of year column
df['DayOfYear'] = df['Date'].dt.dayofyear
#create pivot table in new dataframe
df2 = pd.pivot_table(df, index = 'DayOfYear', columns = 'Year', values = 'Layer')
This won't look exactly like your desired output because the index will be numbered 1-365 (and have a name) rather than 0-364. If you want it to match exactly, you can add:
df2 = df2.reset_index()
I need to convert a df with a data column of integers and convert this to the following format in the current year: YYYY-MM-DD HH:MM:SS. I have a DF that looks like this:
Date LT Mean
0 7 5.491916
1 8 5.596823
2 9 5.793934
3 10 7.501096
4 11 8.152358
5 12 8.426322
And, I need it to look like this using the current year 2020:
Date LT Mean
0 2020-07-01 5.491916
1 2020-08-01 5.596823
2 2020-09-01 5.793934
3 2020-10-01 7.501096
4 2020-11-01 8.152358
5 2020-12-01 8.426322
I have not been able to find a reference for converting a single integer used for the date and converting it into the yyyy-mm-dd hh:mm:ss format i need. Thank you,
You can use pandas to_datetime function. Assuming your Date column represents the month, you can use like this:
df['Date'] = pandas.to_datetime(df["Date"], format='%m').apply(lambda dt: dt.replace(year=2020))
Then if you need transform the column to string in the specified format:
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d %H:%m:%s')
I have a dataframe that contains a column with dates e.g. 24/07/15 etc
Is there a way to create a new column into the dataframe that displays all the days of the week corresponding to the already existing 'Date' column?
I want the output to appear as:
[Date][DayOfTheWeek]
This might work:
If you want day name:
In [1405]: df
Out[1405]:
dates
0 24/07/15
1 25/07/15
2 26/07/15
In [1406]: df['dates'] = pd.to_datetime(df['dates']) # You don't need to specify the format also.
In [1408]: df['dow'] = df['dates'].dt.day_name()
In [1409]: df
Out[1409]:
dates dow
0 2015-07-24 Friday
1 2015-07-25 Saturday
2 2015-07-26 Sunday
If you want day number:
In [1410]: df['dow'] = df['dates'].dt.day
In [1411]: df
Out[1411]:
dates dow
0 2015-07-24 24
1 2015-07-25 25
2 2015-07-26 26
I would try the apply function, so something like this:
def extractDayOfWeek(dateString):
...
df['DayOfWeek'] = df.apply(lambda x: extractDayOfWeek(x['Date'], axis=1)
The idea is that, you map over every row, extract the 'date' column, and then apply your own function to create a new row entry named 'Day'
Depending of the type of you column Date.
df['Date']=pd.to_datetime(df['Date'], format="d/%m/%y")
df['weekday'] = df['Date'].dt.dayofweek
so let's say this is my code:
df = pd.read_table('file_name', sep=';')
pd.Timestamp("today").strftime(%d.%m.%y)
df = df[(df['column1'] < today)]
df
Here's the table from the csv file:
Column 1
27.02.2018
05.11.2018
22.05.2018
01.11.2018
01.08.2018
01.08.2018
16.10.2018
22.08.2018
21.11.2018
so as you can see, I imported a table from a csv file. I only need to see dates before today (16.10.2018), but when I run the code this is what I get
Column 1
05.11.2018
01.11.2018
01.08.2018
01.08.2018
Which means Python is only looking at the days and ignoring the months, and this is wrong. I need it to understand this is a date not just numbers. What do I do to achieve that?
PS I'm new to Python
You should convert your column to the date type, not strings, since strings are compared lexicographically.
You can thus convert it with:
# convert the strings to date(time) objects
df['column1'] = pd.to_datetime(df['column1'], format='%d.%m.%Y')
Then you can compare it with a date object, like:
>>> from datetime import date
>>> df[df['column1'] < date.today()]
column1
0 2018-02-27
1 2018-05-11
2 2018-05-22
3 2018-01-11
4 2018-01-08
5 2018-01-08
7 2018-08-22