Is there a way of parsing the column names themselves as datetime.? My column names look like this:
Name SizeRank 1996-06 1996-07 1996-08 ...
I know that I can convert values for a column to datetime values, e.g for a column named datetime, I can do something like this:
temp = pd.read_csv('data.csv', parse_dates=['datetime'])
Is there a way of converting the column names themselves? I have 285 columns i.e my data is from 1996-2019.
There's no way of doing that immediately while reading the data from a file afaik, but you can fairly simply convert the columns to datetime after you've read them in. You just need to watch out that you don't pass columns that don't actually contain a date to the function.
Could look something like this, assuming all columns after the first two are dates (as in your example):
dates = pd.to_datetime(df.columns[2:])
You can then do whatever you need to do with those datetimes.
You could do something like this.
df.columns = df.columns[:2] + pd.to_datetime (df.columns[2:])
It seems pandas will accept a datetime object as a column name...
import pandas as pd
from datetime import datetime
import re
columns = ["Name", "2019-01-01","2019-01-02"]
data = [["Tom", 1,0], ["Dick",1,1], ["Harry",0,0]]
df = pd.DataFrame(data, columns = columns)
print(df)
newcolumns = {}
for col in df.columns:
if re.search("\d+-\d+-\d+", col):
newcolumns[col] = pd.to_datetime(col)
else:
newcolumns[col] = col
print(newcolumns)
df.rename(columns = newcolumns, inplace = True)
print("--------------------")
print(df)
print("--------------------")
for col in df.columns:
print(type(col), col)
OUTPUT:
Name 2019-01-01 2019-01-02
0 Tom 1 0
1 Dick 1 1
2 Harry 0 0
{'Name': 'Name', '2019-01-01': Timestamp('2019-01-01 00:00:00'), '2019-01-02': Timestamp('2019-01-02 00:00:00')}
--------------------
Name 2019-01-01 00:00:00 2019-01-02 00:00:00
0 Tom 1 0
1 Dick 1 1
2 Harry 0
--------------------
<class 'str'> Name
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2019-01-01 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'> 2019-01-02 00:00:00
For brevity you can use...
newcolumns = {col:(pd.to_datetime(col) if re.search("\d+-\d+-\d+", col) else col) for col in df.columns}
df.rename(columns = newcolumns, inplace = True)
Related
Got the following issue.
I have an column in my pandas with some dates and some empty values.
Example:
1 - 3-20-2019
2 -
3 - 2-25-2019
etc
I want to convert the format from month-day-year to day-month-year, and when its empty, i just want to keep it empty.
What is the fastest approach?
Thanks!
One can initialize the data for the days using strings, then convert the strings to datetimes. A print can then deliver the objects in the needed format.
I will use an other format (with dots as separators), so that the conversion is clear between the steps.
Sample code first:
import pandas as pd
data = {'day': ['3-20-2019', None, '2-25-2019'] }
df = pd.DataFrame( data )
df['day'] = pd.to_datetime(df['day'])
df['day'] = df['day'].dt.strftime('%d.%m.%Y')
df[ df == 'NaT' ] = ''
Comments on the above.
The first instance of df is in the ipython interpreter:
In [56]: df['day']
Out[56]:
0 3-20-2019
1 None
2 2-25-2019
Name: day, dtype: object
After the conversion to datetime:
In [58]: df['day']
Out[58]:
0 2019-03-20
1 NaT
2 2019-02-25
Name: day, dtype: datetime64[ns]
so that we have
In [59]: df['day'].dt.strftime('%d.%m.%Y')
Out[59]:
0 20.03.2019
1 NaT
2 25.02.2019
Name: day, dtype: object
That NaT makes problems. So we replace all its occurrences with the empty string.
In [73]: df[ df=='NaT' ] = ''
In [74]: df
Out[74]:
day
0 20.03.2019
1
2 25.02.2019
Not sure if this is the fastest way to get it done. Anyway,
df = pd.DataFrame({'Date': {0: '3-20-2019', 1:"", 2:"2-25-2019"}}) #your dataframe
df['Date'] = pd.to_datetime(df.Date) #convert to datetime format
df['Date'] = [d.strftime('%d-%m-%Y') if not pd.isnull(d) else '' for d in df['Date']]
Output:
Date
0 20-03-2019
1
2 25-02-2019
I have a column in a dataframe that contains time in the below format.
Dataframe: df
column: time
value: 07:00:00, 13:00:00 or 14:00:00
The column will have only one of these three values in each row. I want to convert these to 0, 1 and 2. Can you help replace the times with these numeric values?
Current:
df['time'] = [07:00:00, 13:00:00, 14:00:00]
Expected:
df['time'] = [0, 1, 2]
Thanks in advance.
You can use map to do this:
import datetime
mapping = {datetime.time(07,00,00):0, datetime.time(13,00,00):1, datetime.time(14,00,00):2}
df['time']=df['time'].map(mapping)
One approach is to use map
Ex:
val = {"07:00:00":0, "13:00:00":1, "14:00:00":2}
df = pd.DataFrame({'time':["07:00:00", "13:00:00", "14:00:00"] })
df["time"] = df["time"].map(val)
print(df)
Output:
time
0 0
1 1
2 2
I want to add the name of index of multi-index dataframe.
I want to set the name of red box in image as 'Ticker'
How can I do that?
Set index.names (plural because MultiIndex) or use rename_axis:
df.index.names = ['Ticker','date']
#if want extract second name
df.index.names = ['Ticker',df.index.names[1]]
Or:
df = df.rename_axis(['Ticker','date'])
#if want extract second name
df = df.rename_axis(['Ticker',df.index.names[1]])
Sample:
mux = pd.MultiIndex.from_product([['NAVER'], ['2018-11-28','2018-12-01','2018-12-02']],
names=[None, 'date'])
df = pd.DataFrame({'open':[1,2,3]},
index=mux)
print(df)
open
date
NAVER 2018-11-28 1
2018-12-01 2
2018-12-02 3
df = df.rename_axis(['Ticker','date'])
print (df)
open
Ticker date
NAVER 2018-11-28 1
2018-12-01 2
2018-12-02 3
Every time I append a dataframe to a text file, I want it to contain a column with the same timestamp for each row. The timestamp could be any arbitrary time as long as its different from next time when I append a new dataframe to the existing text file. Below code inserts a column named TimeStamp, but doesn't actually insert datetime values. The column is simply empty. I must be overlooking something simple. What am I doing wrong?
t = [datetime.datetime.now().replace(microsecond=0) for i in range(df.shape[0])]
s = pd.Series(t, name = 'TimeStamp')
df.insert(0, 'TimeStamp', s)
I think simpliest is use insert only:
df = pd.DataFrame({'A': list('AAA'), 'B': range(3)}, index=list('xyz'))
print (df)
A B
x A 0
y A 1
z A 2
df.insert(0, 'TimeStamp', pd.to_datetime('now').replace(microsecond=0))
print (df)
TimeStamp A B
x 2018-02-15 07:35:35 A 0
y 2018-02-15 07:35:35 A 1
z 2018-02-15 07:35:35 A 2
Your working version - change range(df.shape[0]) to df.index for same indices in Series and in DataFrame:
t = [datetime.datetime.utcnow().replace(microsecond=0) for i in df.index]
s = pd.Series(t, name = 'TimeStamp')
df.insert(0, 'TimeStamp', s)
I've traditionally used Stata for data analysis, but I've been exploring pandas today. I successfully replicated some analysis I did in Stata, but I'm having a hard time exporting it to excel.
Example of what I'm getting with write_excel()
Column1 Column2
Date
2014-01-01 00:00:00 x a
2014-01-02 00:00:00 y b
2014-01-03 00:00:00 z c
I'd like to align the index so that it's in line with the column headers. Essentially, I'd like to keep the column headers where they are, but shift everything up by one cell.
I want my index to only have the date (YYYY-MM-DD) without the hours, minutes, and seconds (it's always 00:00:00). How do I change my index to only have the date?
Much thanks.
What worked for me was to reset the index so that 'Date' becomes an ordinary column, then call the dt property date to assign back just the date portion and when writing to excel pass param index=False:
In [34]:
df = df.reset_index()
df['Date'] = df.Date.dt.date
df
Out[34]:
Date Column1 Column2
0 2014-01-01 x a
1 2014-01-02 y b
2 2014-01-03 z c
and then
df.to_excel(r'c:\data\t.xlsx',index=False)
This results in the following in excel:
I think the most simple and easy way of handling pandas DatetimeIndex format is kwarg of pandas ExcelWriter itself.
datetime_format='yyyy-mm-dd'
for example,
import pandas as pd
import xlsxwriter
'''
Suppose that there is 'df' the pandas dataframe which contains DatetimeIndex(ex. 2015-04-15 10:15:30) as index.
'''
writer = pd.ExcelWriter('result.xlsx', engine='xlsxwriter', datetime_format='yyyy-mm-dd')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
For any other cell(batch/individual) formatting, use xlsxwriter add_format(), set_column(), etc.
(1) Mimicking your format:
import pandas as pd
from pandas import *
df = pd.read_csv('input.txt',sep=',',header=None,names=['Date','Column A','Column B'])
df = df.set_index(['Date'])
(2) Doing a reindexing:
df = df.reset_index()
(3) To excel
writer = ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1',index=False)
writer.save()
Note:
For the excel writer you will need to have openpyxl. A breeze to install with pip install openxl. Info on this here: https://openpyxl.readthedocs.org/en/latest/.
Alternatively, a write to csv would be more trivial.
Example of implementation in context of steps above in ipython:
In [1]: import pandas as pd
In [2]: from pandas import *
In [3]: # 1. Mimicking your format:
In [4]: df = pd.read_csv('input.txt',sep=',',header=None,names=['Date','Column A','Column B'])
In [5]: print ( df )
Date Column A Column B
0 2014-01-01 00:00:00 x a
1 2014-01-02 00:00:00 y b
2 2014-01-03 00:00:00 z c
In [6]: df = df.set_index(['Date'])
In [7]: print ( df )
Column A Column B
Date
2014-01-01 00:00:00 x a
2014-01-02 00:00:00 y b
2014-01-03 00:00:00 z c
In [8]: ## 2. Doing a reindexing:
In [9]: df = df.reset_index()
In [10]: print ( df )
Date Column A Column B
0 2014-01-01 00:00:00 x a
1 2014-01-02 00:00:00 y b
2 2014-01-03 00:00:00 z c
In [11]: ## 3. To excel
In [12]: writer = ExcelWriter('output.xlsx')
In [13]: df.to_excel(writer,'Sheet1',index=False)
In [14]: writer.save()