Merge two dataframes with python - python

I have two dataframes :dfDepas and df7 ;
dfDepas.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 4 columns):
day_of_week 7 non-null object
P_ACT_KW 7 non-null float64
P_SOUSCR 7 non-null float64
depassement 7 non-null float64
dtypes: float64(3), object(1)
memory usage: 280.0+ bytes
df7.info()
<class 'pandas.core.frame.DataFrame'>
Index: 7 entries, Fri to Thurs
Data columns (total 6 columns):
ACT_TIME_AERATEUR_1_F1 7 non-null float64
ACT_TIME_AERATEUR_1_F3 7 non-null float64
ACT_TIME_AERATEUR_1_F5 7 non-null float64
ACT_TIME_AERATEUR_1_F6 7 non-null float64
ACT_TIME_AERATEUR_1_F7 7 non-null float64
ACT_TIME_AERATEUR_1_F8 7 non-null float64
dtypes: float64(6)
memory usage: 392.0+ bytes
I try to merge these two dataframes according ['day_of_week'] which is the index in dfDepas dataframe.
I don't know how can I use this : merged_df = pd.merge(dfDepas, df7, how='inner',on=['day_of_week'])
Any idea to help me please?
Thank you
Kind regards
EDIT
dfDepas
day_of_week P_ACT_KW P_SOUSCR depassement
Fri 157.258929 427.142857 0.0
Mon 157.788110 426.875000 0.0
Sat 166.989236 426.875000 0.0
Sun 149.676215 426.875000 0.0
Thurs 157.339286 427.142857 0.0
Tues 151.122913 427.016021 0.0
Weds 159.569444 427.142857 0.0
df7
ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 ACT_TIME_AERATEUR_1_F5 ACT_TIME_AERATEUR_1_F6 ACT_TIME_AERATEUR_1_F7 ACT_TIME_AERATEUR_1_F8
Fri 0.326258 0.330253 0.791144 0.654682 3.204544 1.008550
Sat -0.201327 -0.228196 0.044616 0.184003 -0.579214 0.292886
Sun 5.068735 5.250199 5.407271 5.546657 7.823564 5.786713
Mon -0.587129 -0.559986 -0.294890 -0.155503 2.013379 -0.131496
Tues-1.244922 -1.510025 -0.788717 -1.098790 -0.996845 -0.718881
Weds-3.264598 -3.391776 -3.188409 -3.041306 -4.846189 -4.668533
Thurs -0.178179 0.011002 -1.907544 -2.084516 -6.119337

You can use reset_index and rename column 0 to day_of_week for matching:
merged_df = pd.merge(dfDepas,
df7.reset_index().rename(columns={0:'day_of_week'}),
on=['day_of_week'])
Thank you Quickbeam2k1 for another solution:
merged_df = pd.merge(dfDepas.set_index('day_of_week'),
df7,
right_index=True,
left_index =True)

Related

read_json not showing column as datetime64[ns]

so I'm having a lot of trouble getting a column of a pandas dataframe to read as datetime64[ns] dtype after having been saved in json format. I've tried pretty much everything I've seen online, pd.datetime(coerce, format), astype(datetime64[ns]), dateformat = 'iso', etc.
This is strange and very frustrating as all my other dataframes with date columns and saved as json files are being read correctly with the dtype as datetime64[ns].
I would really appreciate some help
Here are the last few lines of my code where I create the data frame and what it returns:
player = pd.DataFrame(full, index = list(range(len(full))), columns = ['Name', 'Handedness', 'Height', 'Bday'])
player.Height = player.Height.str[:-2]
player.Height = pd.to_numeric(player.Height)
player.Bday = pd.to_datetime(player.Bday, format = '%d/%m/%Y')
player = player.reset_index(drop = True)
player.to_json(f'../../Datasets/Singles_players/Player_Traits/{Event}_players.json', date_format = 'iso')
print(player.info())
print(player.head())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 25 non-null object
1 Handedness 25 non-null object
2 Height 25 non-null float64
3 Bday 25 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 928.0+ bytes
None
Name Handedness Height Bday
0 KENTO MOMOTA Left 175.0 1994-09-01
1 VIKTOR AXELSEN Right 194.0 1994-01-04
2 ANDERS ANTONSEN Right 183.0 1997-04-27
3 CHOU TIEN CHEN Right 180.0 1990-01-08
4 ANTHONY SINISUKA GINTING Right 171.0 1996-05-11
All good BUT here is what happens when I read the file:
player = pd.read_json('../Datasets/Singles_Players/Player_Traits/MS_players.json')
print(player.info())
print(player.head())
<class 'pandas.core.frame.DataFrame'>
Int64Index: 25 entries, 0 to 24
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 25 non-null object
1 Handedness 25 non-null object
2 Height 25 non-null int64
3 Bday 25 non-null object
dtypes: int64(1), object(3)
memory usage: 1000.0+ bytes
None
Name Handedness Height Bday
0 KENTO MOMOTA Left 175 1994-09-01T00:00:00.000Z
1 VIKTOR AXELSEN Right 194 1994-01-04T00:00:00.000Z
2 ANDERS ANTONSEN Right 183 1997-04-27T00:00:00.000Z
3 CHOU TIEN CHEN Right 180 1990-01-08T00:00:00.000Z
4 ANTHONY SINISUKA GINTING Right 171 1996-05-11T00:00:00.000Z

Pandas dataframe adding zero-padding before the datetime

I'm using Pandas dataframe. And I have a dataFrame df as the following:
time id
-------------
5:13:40 1
16:20:59 2
...
For the first row, the time 5:13:40 has no zero padding before, and I want to convert it to 05:13:40. So my expected df would be like:
time id
-------------
05:13:40 1
16:20:59 2
...
The type of time is <class 'datetime.timedelta'>.Could anyone give me some hints to handle this problem? Thanks so much!
Use pd.to_timedelta:
df['time'] = pd.to_timedelta(df['time'])
Before:
print(df)
time id
1 5:13:40 1.0
2 16:20:59 2.0
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 1 to 2
Data columns (total 2 columns):
time 2 non-null object
id 2 non-null float64
dtypes: float64(1), object(1)
memory usage: 48.0+ bytes
After:
print(df)
time id
1 05:13:40 1.0
2 16:20:59 2.0
df.info()
d<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 1 to 2
Data columns (total 2 columns):
time 2 non-null timedelta64[ns]
id 2 non-null float64
dtypes: float64(1), timedelta64[ns](1)
memory usage: 48.0 bytes

Using set_index within a custom function

I would like to convert the date observations from a column into the index for my dataframe. I am able to do this with the code below:
Sample data:
test = pd.DataFrame({'Values':[1,2,3], 'Date':["1/1/2016 17:49","1/2/2016 7:10","1/3/2016 15:19"]})
Indexing code:
test['Date Index'] = pd.to_datetime(test['Date'])
test = test.set_index('Date Index')
test['Index'] = test.index.date
However when I try to include this code in a function, I am able to create the 'Date Index' column but set_index does not seem to work as expected.
def date_index(df):
df['Date Index'] = pd.to_datetime(df['Date'])
df = df.set_index('Date Index')
df['Index'] = df.index.date
If I inspect the output of not using a function info() returns:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3 entries, 2016-01-01 17:49:00 to 2016-01-03 15:19:00
Data columns (total 3 columns):
Date 3 non-null object
Values 3 non-null int64
Index 3 non-null object
dtypes: int64(1), object(2)
memory usage: 96.0+ bytes
If I inspect the output of the function info() returns:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
Date 3 non-null object
Values 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 120.0+ bytes
I would like the DatetimeIndex.
How can set_index be used within a function? Am I using it incorrectly?
IIUC return df is missing:
df1 = pd.DataFrame({'Values':[1,2,3], 'Exam Completed Date':["1/1/2016 17:49","1/2/2016 7:10","1/3/2016 15:19"]})
def date_index(df):
df['Exam Completed Date Index'] = pd.to_datetime(df['Exam Completed Date'])
df = df.set_index('Exam Completed Date Index')
df['Index'] = df.index.date
return df
print (date_index(df1))
Exam Completed Date Values Index
Exam Completed Date Index
2016-01-01 17:49:00 1/1/2016 17:49 1 2016-01-01
2016-01-02 07:10:00 1/2/2016 7:10 2 2016-01-02
2016-01-03 15:19:00 1/3/2016 15:19 3 2016-01-03
print (date_index(df1).info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3 entries, 2016-01-01 17:49:00 to 2016-01-03 15:19:00
Data columns (total 3 columns):
Exam Completed Date 3 non-null object
Values 3 non-null int64
Index 3 non-null object
dtypes: int64(1), object(2)
memory usage: 96.0+ bytes
None

sum columns in dataframe with pandas

I have a dataframe df_F1
df_F1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 7 columns):
class_energy 2 non-null object
ACT_TIME_AERATEUR_1_F1 2 non-null float64
ACT_TIME_AERATEUR_1_F3 2 non-null float64
dtypes: float64(6), object(1)
memory usage: 128.0+ bytes
df_F1.head()
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3
low 5.875550 431
medium 856.666667 856
I try to create a dataframe Ratio wich contain for each class_energy the value of energy of each ACT_TIME_AERATEUR_1_Fx devided by the sum of energy of all class_energy for each ACT_TIME_AERATEUR_1_Fx. For example :
ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3
low 5.875550/(5.875550 + 856.666667) 431/(431+856)
medium 856.666667/(5.875550+856.666667) 856/(431+856)
Can you help me please to resolve it?
Thank you in advancce
Best regards
you can do this:
In [20]: df.set_index('class_energy').apply(lambda x: x/x.sum()).reset_index()
Out[20]:
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3
0 low 0.006812 0.334887
1 medium 0.993188 0.665113

Python Pandas - Create DataFrame based on a value from a file

I have a DataFrame (df1). something like this:
CUST_KEY SDATE QTI
0 1997041501 2016-06-21 2.000000
1 1975122001 2016-07-08 1.000000
2 1978091401 2016-07-01 31.000000
3 1950090501 2016-06-01 2.000000
I also have a dataframe I made from an excel file:
metadf = pd.read_excel('C:\TEMP\METADATA.xlsx')
metadf1 = metadf[0:1]
eff_from = pd.to_datetime(metadf1['EFF_FROM'], format="%d/%m/%Y")
metadf1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 4 columns):
EFF_FROM 1 non-null datetime64[ns]
SDATE 1 non-null datetime64[ns]
EDATE 1 non-null datetime64[ns]
NOTES 1 non-null object
dtypes: datetime64[ns](3), object(1)
memory usage: 112.0+ bytes
0 2016-07-01
What I'm trying to do is create a new DataFrame from df1, where the SDATE >= EFF_FROM from metadf1.
I don't think a merge is going to work. Can I use eff_from as a variable? It looks like I've created a series in my eff_from=
line there (very new to python, bit confused about the myriad types of data there are!)
Many thanks for your help

Categories