Convert column to date format

Convert column to date format - python

I am trying to convert the date to a correct date format. I have tested some of the possibilities that I have read in the forum but, I still don't know how to tackle this issue:
After importing:
df = pd.read_excel(r'/path/df_datetime.xlsb', sheet_name="12FEB22", engine='pyxlsb')
I get the following df:
{'Unnamed: 0': {0: 'Administrative ID',
1: '000002191',
2: '000002382',
3: '000002434',
4: '000002728',
5: '000002826',
6: '000003265',
7: '000004106',
8: '000004333'},
'Unnamed: 1': {0: 'Service',
1: 'generic',
2: 'generic',
3: 'generic',
4: 'generic',
5: 'generic',
6: 'generic',
7: 'generic',
8: 'generic'},
'Unnamed: 2': {0: 'Movement type',
1: 'New',
2: 'New',
3: 'New',
4: 'Modify',
5: 'New',
6: 'New',
7: 'New',
8: 'New'},
'Unnamed: 3': {0: 'Date',
1: 37503,
2: 37475,
3: 37453,
4: 44186,
5: 37711,
6: 37658,
7: 37770,
8: 37820},
'Unnamed: 4': {0: 'Contract Term',
1: '12',
2: '12',
3: '12',
4: '12',
5: '12',
6: '12',
7: '12',
8: '12'}}
However, even although I have tried to convert the 'Date' Column (or 'Unnamed 3', because the original dataset hasn't first row so I have to change the header after that) during the importation, it has been unsuccessful.
Is there any option that I can do?
Thanks!

try this:
from xlrd import xldate_as_datetime
def trans_date(x):
if isinstance(x, int):
return xldate_as_datetime(x, 0).date()
else:
return x
print(df['Unnamed: 3'].apply(trans_date))
>>>
0 Date
1 2002-09-04
2 2002-08-07
3 2002-07-16
4 2020-12-21
5 2003-03-31
6 2003-02-06
7 2003-05-29
8 2003-07-18
Name: Unnamed: 3, dtype: object

Related

Pandas - Group by multiple columns and datetime

I have a df of tennis results and I would like to be able to see how many days its been since each player last won a game.
This is what my df looks like
Player 1
Player 2
Date
p1_win
p2_win
Murray
Nadal
2022-05-16
1
0
Nadal
Murray
2022-05-25
1
0
and this is what I want it to look like
Player 1
Player 2
Date
p1_win
p2_win
p1_lastwin
p2_lastwin
Murray
Nadal
2022-05-16
1
0
na
na
Nadal
Murray
2022-05-25
1
0
na
9
the results will have to be able to include the days since the last win whether the player was player 1 or 2 using group by I think. Also maybe if possible it would be good to have a win percentage for the year if possible.
Any help is much appreciated.
edit - here is the dict
{'Player 1': {0: 'Murray',
1: 'Nadal',
2: 'Murray',
3: 'Nadal',
4: 'Murray',
5: 'Nadal',
6: 'Murray',
7: 'Nadal',
8: 'Murray',
9: 'Nadal',
10: 'Murray'},
'Player 2': {0: 'Nadal',
1: 'Murray',
2: 'Nadal',
3: 'Murray',
4: 'Nadal',
5: 'Murray',
6: 'Nadal',
7: 'Murray',
8: 'Nadal',
9: 'Murray',
10: 'Nadal'},
'Date': {0: '2022-05-16',
1: '2022-05-26',
2: '2022-05-27',
3: '2022-05-28',
4: '2022-05-29',
5: '2022-06-01',
6: '2022-06-02',
7: '2022-06-05',
8: '2022-06-09',
9: '2022-06-13',
10: '2022-06-17'},
'p1_win': {0: '1',
1: '1',
2: '0',
3: '1',
4: '0',
5: '0',
6: '1',
7: '0',
8: '1',
9: '0',
10: '1'},
'p2_win': {0: '0',
1: '0',
2: '1',
3: '0',
4: '1',
5: '1',
6: '0',
7: '1',
8: '0',
9: '1',
10: '0'}}
Thanks :)

I leveraged pd.merge_asof to find the latest win, and then did a merge to the relevant index.
df = pd.DataFrame({'Player 1': {0: 'Murray', 1: 'Nadal', 2: 'Murray', 3: 'Nadal', 4: 'Murray', 5: 'Nadal', 6: 'Murray'}, 'Player 2': {0: 'Nadal', 1: 'Murray', 2: 'Nadal', 3: 'Murray', 4: 'Nadal', 5: 'Murray', 6: 'Nadal'}, 'Date': {0: '2022-05-16', 1: '2022-05-26', 2: '2022-05-27', 3: '2022-05-28', 4: '2022-05-29', 5: '2022-06-01', 6: '2022-06-02'}, 'p1_win': {0: '1', 1: '1', 2: '0', 3: '1', 4: '0', 5: '0', 6: '1'}, 'p2_win': {0: '0', 1: '0', 2: '1', 3: '0', 4: '1', 5: '1', 6: '0'}})
df['p1_win']=df.p1_win.astype(int)
df['p2_win']=df.p2_win.astype(int)
df['Date'] = pd.to_datetime(df['Date'])
df['match'] = [x+'_'+y if x>y else y+'_'+x for x,y in zip(df['Player 1'],df['Player 2'])]
# df['winner'] = np.where(df.p1_win==1,df['Player 1'],df['Player 2'])
# df['looser'] = np.where(df.p1_win==0,df['Player 1'],df['Player 2'])
df = df.reset_index()
df = df.sort_values(by='Date')
df = pd.merge_asof(df,df[df.p1_win==1][['match','Date','index']],by=['match'],on='Date',suffixes=('','_latest_win_p1'),allow_exact_matches=False,direction='backward')
df = pd.merge_asof(df,df[df.p2_win==1][['match','Date','index']],by=['match'],on='Date',suffixes=('','_latest_win_p2'),allow_exact_matches=False,direction='backward')
# df = df[['index','Date','Player 1','Player 2','p1_win','p2_win','match','winner','looser','index_latest_win_p2','index_latest_win_p1']]
df = df.merge(df[['Date','index','match']],how='left',left_on=['index_latest_win_p1','match'],right_on=['index','match'],suffixes=('','_latest_win_winner'))
df = df.merge(df[['Date','index','match']],how='left',left_on=['index_latest_win_p2','match'],right_on=['index','match'],suffixes=('','_latest_win_looser'))
df['days_since_last_win_winner'] = (df['Date']-df.Date_latest_win_winner).dt.days
df['days_since_last_win_looser'] = (df['Date']-df.Date_latest_win_looser).dt.days
not sure that this is exactly what you meant but let me know if you need anything else:

Start looking at a column position based on a column name and return the next value

Do you know how can I start looking for an specific text starting starting in a column name that has been given by the values found in another column?
Let me explain better with an example. The column 8 contains the column names where I have to start looking for the text 'office'. So, for the first row, the Col 8 indicates that I have to start at Col 2. Then I have to find the NEXT 'Office' text and return the value of the next column (always same row). Once I get it, I will create a DataFrame containing the next value, Col 4 in this example.
{'Col 1': {0: 3.4, 1: 4.6, 2: 7.6, 3: 3.7, 4: 5.9, 5: 2.5, 6: 2.6},
'Col 2': {0: 'LTE', 1: 'LTE', 2: 'LTE', 3: 'LTE', 4: 'LTE', 5: 'LTE', 6: 'LTE'},
'Col 3': {0: 'Office', 1: 'Office', 2: nan, 3: 'Office', 4: nan, 5: nan, 6: nan},
'Col 4': {0: 1.2, 1: 3.1, 2: 23.0, 3: 11.0, 4: 34.0, 5: 12.0, 6: 123.0},
'Col 5': {0: 'LTE', 1: 'LTE', 2: 'LTE', 3: 'LTE', 4: 'LTE', 5: 'LTE', 6: 'LTE'},
'Col 6': {0: 'Office', 1: nan, 2: 'Office', 3: 'Office', 4: 'Office', 5: 'Office', 6: 'Office'},
'Col 7': {0: 1.2, 1: 6.7, 2: 12.0, 3: 143.0, 4: 674.0, 5: 354.0, 6: 134.0},
'Col 8': {0: 'Col 2', 1: 'Col 2', 2: 'Col 6', 3: 'Col 2', 4: 'Col 6', 5: 'Col 6', 6: 'Col 6'}}
any ideas on how to deal with this problem?
Output expected:
{'Col1': {0: '3.4', 1: '4.6', 2: '7.6', 3: '3.7', 4: '5.9', 5: '2.5', 6: '2.6'},
'Col 4': {0: 1.2, 1: 3.1, 2: 12.0, 3: 11.0, 4: 674.0, 5: 354.0, 6: 134.0}}
which looks like:
Col1 Col 4
0 3.4 1.2
1 4.6 3.1
2 7.6 12.0
3 3.7 11.0
4 5.9 674.0
5 2.5 354.0
6 2.6 134.0

Python conditional lookup

I have transactional table and a lookup table as below. I need add val field from df_lkp to df_txn by lookup.
For each record of df_txn, I need to loop thru df_lkp. If the grp field value is a then compare only field a in both tables and get match. If the grp value is ab then compare fields a and b in both tables. If it is abc then a, b, c fields should be compared to fetch val field, and so on. Is there a way this could done in pandas without a for-loop?
df_txn = pd.DataFrame({'id': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},
'amt': {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600, 6: 700},
'a': {0: '226', 1: '227', 2: '248', 3: '236', 4: '248', 5: '236', 6: '236'},
'b': {0: '0E31', 1: '0E32', 2: '0E40', 3: '0E35', 4: '0E40', 5: '0E40', 6: '0E33'},
'c': {0: '3014', 1: '3015', 2: '3016', 3: '3016', 4: '3016', 5: '3016', 6: '3016'}})
df_lkp = pd.DataFrame({'a': {0: '226', 1: '227', 2: '236', 3: '237', 4: '248'},
'b': {0: '0E31', 1: '0E32', 2: '0E33', 3: '0E35', 4: '0E40'},
'c': {0: '3014', 1: '3015', 2: '3016', 3: '3018', 4: '3019'},
'grp': {0: 'a', 1: 'ab', 2: 'abc', 3: 'b', 4: 'bc'},
'val': {0: 'KE00CH0004', 1: 'KE00CH0003', 2: 'KE67593065', 3: 'KE67593262', 4: 'KE00CH0003'}})
the output
df_tx2 = pd.DataFrame({'id': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'},
'amt': {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600, 6: 700},
'a': {0: '226', 1: '227', 2: '248', 3: '236', 4: '248', 5: '236', 6: '236'},
'b': {0: '0E31', 1: '0E32', 2: '0E40', 3: '0E35', 4: '0E40', 5: '0E40', 6: '0E33'},
'c': {0: '3014', 1: '3015', 2: '3016', 3: '3016', 4: '3016', 5: '3016', 6: '3016'},
'val': {0: 'KE00CH0004', 1: 'KE00CH0003', 2: '', 3: '', 4: '', 5: '', 6: 'KE67593065'}
})

How to export to excel with pandas dataframe with multi column

I'm stuck at exporting a multi index dataframe to excel, in the matter what I'm looking for.
This is what I'm looking for in excel.
I know I have to add an extra Index Parameter on the left for the row of SRR (%) and Traction (-), but how?
My code so far.
import pandas as pd
import matplotlib.pyplot as plt
data = {'Step 1': {'Step Typ': 'Traction', 'SRR (%)': {1: 8.384, 2: 9.815, 3: 7.531, 4: 10.209, 5: 7.989, 6: 7.331, 7: 5.008, 8: 2.716, 9: 9.6, 10: 7.911}, 'Traction (-)': {1: 5.602, 2: 6.04, 3: 2.631, 4: 2.952, 5: 8.162, 6: 9.312, 7: 4.994, 8: 2.959, 9: 10.075, 10: 5.498}, 'Temperature': 30, 'Load': 40}, 'Step 3': {'Step Typ': 'Traction', 'SRR (%)': {1: 2.909, 2: 5.552, 3: 5.656, 4: 9.043, 5: 3.424, 6: 7.382, 7: 3.916, 8: 2.665, 9: 4.832, 10: 3.993}, 'Traction (-)': {1: 9.158, 2: 6.721, 3: 7.787, 4: 7.491, 5: 8.267, 6: 2.985, 7: 5.882, 8: 3.591, 9: 6.334, 10: 10.43}, 'Temperature': 80, 'Load': 40}, 'Step 5': {'Step Typ': 'Traction', 'SRR (%)': {1: 4.765, 2: 9.293, 3: 7.608, 4: 7.371, 5: 4.87, 6: 4.832, 7: 6.244, 8: 6.488, 9: 5.04, 10: 2.962}, 'Traction (-)': {1: 6.656, 2: 7.872, 3: 8.799, 4: 7.9, 5: 4.22, 6: 6.288, 7: 7.439, 8: 7.77, 9: 5.977, 10: 9.395}, 'Temperature': 30, 'Load': 70}, 'Step 7': {'Step Typ': 'Traction', 'SRR (%)': {1: 9.46, 2: 2.83, 3: 3.249, 4: 9.273, 5: 8.792, 6: 9.673, 7: 6.784, 8: 3.838, 9: 8.779, 10: 4.82}, 'Traction (-)': {1: 5.245, 2: 8.491, 3: 10.088, 4: 9.988, 5: 4.886, 6: 4.168, 7: 8.628, 8: 5.038, 9: 7.712, 10: 3.961}, 'Temperature': 80, 'Load': 70} }
df = pd.DataFrame(data)
items = list()
series = list()
for item, d in data.items():
items.append(item)
series.append(pd.DataFrame.from_dict(d))
df = pd.concat(series, keys=items)
df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True).T.to_excel('testfile.xlsx')
The picture below, shows df.set_index(['Step Typ', 'Load', 'Temperature'], inplace=True).T as a dataframe: (somehow close, but not exactly what I'm looking for):
Edit 1:
Found a good solution, not the exact one I was looking for, but it's still worth using it.
df.reset_index().drop(["level_0","level_1"], axis=1).pivot(columns=["Step Typ", "Load", "Temperature"], values=["SRR (%)", "Traction (-)"]).apply(lambda x: pd.Series(x.dropna().values)).to_excel("solution.xlsx")

Can you explain clearely and show the output you are looking for?
To export a table to excel use df.to_excel('path', index=True/False)
where:
index=True or False - to insert or not the index column into the file

Found a good solution, not the exact one I was looking for, but it's still worth using it.
df.reset_index().drop(["level_0","level_1"], axis=1).pivot(columns=["Step Typ", "Load", "Temperature"], values=["SRR (%)", "Traction (-)"]).apply(lambda x: pd.Series(x.dropna().values)).to_excel("solution.xlsx")

In Python, pandas, how to ignore invalid values in python when i convert the columns from hexa to decimal?

when I use:
df[["Type 2", "Type 4"]].applymap(lambda n: int(n, 16))
It stops in the error because of invalid value in Type 2 column because of invalid values (negative values, NaN, string...) for hexa conversion. how to ignore this error or mark the invalid value as zero
{'Type 1': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15, 8: 17},
'Type 2': {0: 'AA',
1: 'BB',
2: 'NaN',
3: '55',
4: '3.14',
5: '-96',
6: 'String',
7: 'FFFFFF',
8: 'FEEE'},
'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0},
'Type 4': {0: '23',
1: 'fefe',
2: 'abcd',
3: 'dddd',
4: 'dad',
5: 'cfe',
6: 'cf42',
7: '321',
8: '0'},
'Type 5': {0: -120,
1: -120,
2: -120,
3: -120,
4: -120,
5: -120,
6: -120,
7: -120,
8: -120}}

You can create a personalized function that handles this exception to use in your lambda. For example:
def lambda_int(n):
try:
return int(n, 16)
except ValueError:
return 0
df[["Type 2", "Type 4"]] = df[["Type 2", "Type 4"]].applymap(lambda n: lambda_int(n))

Please go through this, i reconstructed your question and gave steps to follow
1. You first dictionary you provided does not have a value, it has a string "NaN"
data = {'Type 1': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15, 8: 17},
'Type 2': {0: 'AA',
1: 'BB',
2: 'NaN',
3: '55',
4: '3.14',
5: '-96',
6: 'String',
7: 'FFFFFF',
8: 'FEEE'},
'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0},
'Type 4': {0: '23',
1: 'fefe',
2: 'abcd',
3: 'dddd',
4: 'dad',
5: 'cfe',
6: 'cf42',
7: '321',
8: '0'},
'Type 5': {0: -120,
1: -120,
2: -120,
3: -120,
4: -120,
5: -120,
6: -120,
7: -120,
8: -120}}
import pandas as pd
df = pd.DataFrame(data)
df.head()
To check nan in your df and remove them
columns_with_na = df.isna().sum()
#filter starting from 1 missing value
columns_with_na = columns_with_na[columns_with_na != 0]
print(len(columns_with_na))
print(len(columns_with_na.sort_values(ascending = False))) #print them in descendng order
Prints 0 and 0 because there is no nan
Reconstructed your data to include a nan by using numpy.nan
import numpy as np
#recreated a dataset and included a nan value : np.nan at Type 2
data = {'Type 1': {0: 1, 1: 3, 2: 5, 3: 7, 4: 9, 5: 11, 6: 13, 7: 15, 8: 17},
'Type 2': {0: 'AA',
1: 'BB',
2: np.nan,
3: '55',
4: '3.14',
5: '-96',
6: 'String',
7: 'FFFFFF',
8: 'FEEE'},
'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0},
'Type 4': {0: '23',
1: 'fefe',
2: 'abcd',
3: 'dddd',
4: 'dad',
5: 'cfe',
6: 'cf42',
7: '321',
8: '0'},
'Type 5': {0: -120,
1: -120,
2: -120,
3: -120,
4: -120,
5: -120,
6: -120,
7: -120,
8: -120}}
df2 = pd.DataFrame(data)
df2.head()
#sum up number of columns with nan
columns_with_na = df2.isna().sum()
#filter starting from 1 missing value
columns_with_na = columns_with_na[columns_with_na != 0]
print(len(columns_with_na))
print(len(columns_with_na.sort_values(ascending = False)))
prints 1 and 1 because there is a nan at Type 2 column
#drop nan values
df2 = df2.dropna(how = 'any')
#sum up number of columns with nan
columns_with_na = df2.isna().sum()
#filter starting from 1 missing value
columns_with_na = columns_with_na[columns_with_na != 0]
print(len(columns_with_na))
#prints 0 because I dropped all the nan values
df2.head()
To fill nan in df with 0 use:
df2.fillna(0, inplace = True)
Fill in nan with 0 in df2['Type 2'] only:
#if you dont want to change the origina dataframe set inplace to false
df2['Type 2'].fillna(0, inplace = True) #inplace is set to True to change the original df

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert column to date format - python

Related

Pandas - Group by multiple columns and datetime

Start looking at a column position based on a column name and return the next value

Python conditional lookup

How to export to excel with pandas dataframe with multi column

In Python, pandas, how to ignore invalid values in python when i convert the columns from hexa to decimal?

Categories

Resources