I've a multiindex dataframe in pandas that looks this (created using pivot_table):
I need help to add a level above (or below) the Date level showing the day of the date like this:
I know I can get the day of a date like this:
lt.DATE.dt.strftime('%a')
#lt is a dataframe and DATE is a column it.
Here is the code reporduce a similar pivot_table:
import pandas as pd
import numpy as np
dlist = pd.date_range('2015-01-01',periods=5)
df = pd.DataFrame(dlist, columns=['DATE'])
df['EC'] = range(7033,7033+len(df))
df['HS'] = np.random.randint(0,9,5)
df['AH'] = np.random.randint(0,9,5)
pv = pd.pivot_table(df, columns=[df.DATE, 'EC'], values=['HS','AH'])
pv = pv.unstack(level=1).unstack(level=0)
I got the solution! Here it goes:
import pandas as pd
import numpy as np
dlist = pd.date_range('2015-01-01',periods=5)
df = pd.DataFrame(dlist, columns=['DATE'])
df['EC'] = range(7033,7033+len(df))
df['HS'] = np.random.randint(0,9,5)
df['AH'] = np.random.randint(0,9,5)
df['DAY'] = df.DATE.dt.strftime('%a')
pv = pd.pivot_table(df, columns=[df.DATE.dt.date, df.DAY, 'EC'], values=['HS','AH'])
pv = pv.unstack(level=[1,2]).unstack(level=0)
pv.to_excel('solution.xlsx')
And it produces an output like this:
Pay attention to the function unstack and set the list of levels that are required to be unstacked at a time.
Related
import numpy as np
import pandas as pd
The file is stored at the following path:
'https://media-doselect.s3.amazonaws.com/generic/NMgEjwkAEGGQZBoNYGr9Ld7w0/rating.csv'
df = pd.read_csv('https://media-doselect.s3.amazonaws.com/generic/NMgEjwkAEGGQZBoNYGr9Ld7w0/rating.csv')
df['Training'] = df.Rating.apply(lambda x : 'No' if x <= 3.5 else 'Yes')
df.head()
If I'm understanding correctly?
import pandas as pd
df = pd.read_csv('https://media-doselect.s3.amazonaws.com/generic/NMgEjwkAEGGQZBoNYGr9Ld7w0/rating.csv')
df = (
df
.groupby("Department")["Rating"]
.mean()
.reset_index()
.min()["Department"]
)
print(df)
Finance
I'm trying to achieve this kind of transformation with Pandas.
I made this code but unfortunately it doesn't give the result I'm searching for.
CODE :
import pandas as pd
df = pd.read_csv('file.csv', delimiter=';')
df = df.count().reset_index().T.reset_index()
df.columns = df.iloc[0]
df = df[1:]
df
RESULT :
Do you have any proposition ? Any help will be appreciated.
First create columns for test nonOK and then use named aggregatoin for count, sum column Values and for count Trues values use sum again, last sum both columns:
df = (df.assign(NumberOfTest1 = df['Test one'].eq('nonOK'),
NumberOfTest2 = df['Test two'].eq('nonOK'))
.groupby('Category', as_index=False)
.agg(NumberOfID = ('ID','size'),
Values = ('Values','sum'),
NumberOfTest1 = ('NumberOfTest1','sum'),
NumberOfTest2 = ('NumberOfTest2','sum'))
.assign(TotalTest = lambda x: x['NumberOfTest1'] + x['NumberOfTest2']))
import pandas as pd
import yfinance as yf
import numpy as np
tickers = ['BRIGADE.NS', 'DLF.NS', 'GODREJPROP.NS', 'OBEROIRLTY.NS', 'PRESTIGE.NS']
Tickers_Data = yf.download(tickers, period='5y')
display(Tickers_Data)
Ret_Log = np.log(Tickers_Data['Adj Close'] / Tickers_Data['Adj Close'].shift(1))
Ret_Cumulative = Ret_Log.cumsum().apply(np.exp)
Ret_Absolute = Tickers_Data['Adj Close'].pct_change()
MA50 = Tickers_Data['Open'].rolling(50).mean()
MA200 = Tickers_Data['Open'].rolling(200).mean()
display(Tickers_Data.columns)
I want to add Ret_Log, Ret_Cumulative, Ret_Absolute, MA50, MA200 at the end of my MultiIndex DataFrame Tickers_Data. If I run for loop (added at the end of code) I get the expected output but I need to achieve this without a for loop.
Any help is highly appreciated.
Thanks in advance.
If I use the code below, I get the expected output but I need to achieve this without a for loop
for i in range(len(tickers)):
tickers_data['MA50',tickers[i]] = tickers_data['Open'][tickers.index[i]].rolling(50).mean()
for i in range(len(tickers)):
tickers_data['MA200',tickers[i]] = tickers_data['Open'][tickers.index[i]].rolling(200).mean()
I have the following dataframe as below:
df = pd.DataFrame({'Field':'FAPERF',
'Form':'LIVERID',
'Folder':'ALL',
'Logline':'9',
'Data':'Yes',
'Data':'Blank',
'Data':'No',
'Logline':'10'}) '''
I need dataframe:
df = pd.DataFrame({'Field':['FAPERF','FAPERF'],
'Form':['LIVERID','LIVERID'],
'Folder':['ALL','ALL'],
'Logline':['9','10'],
'Data':['Yes','Blank','No']}) '''
I had tried using the below code but not able to achieve desired output.
res3.set_index(res3.groupby(level=0).cumcount(), append=True['Data'].unstack(0)
Can anyone please help me.
I believe your best option is to create multiple data frames with the same column name ( example 3 df with column name : "Data" ) then simply perform a concat function over Data frames :
df1 = pd.DataFrame({'Field':'FAPERF',
'Form':'LIVERID',
'Folder':'ALL',
'Logline':'9',
'Data':'Yes'}
df2 = pd.DataFrame({
'Data':'No',
'Logline':'10'})
df3 = pd.DataFrame({'Data':'Blank'})
frames = [df1, df2, df3]
result = pd.concat(frames)
You just need to add to list in which you specify the logline and data_type for each row.
import pandas as pd
import numpy as np
list_df = []
data_type_list = ["yes","no","Blank"]
logline_type = ["9","10",'10']
for x in range (len(data_type_list)):
new_dict = { 'Field':['FAPERF'], 'Form':['LIVERID'],'Folder':['ALL'],"Data" : [data_type_list[x]], "Logline" : [logline_type[x]]}
df = pd.DataFrame(new_dict)
list_df.append(df)
new_df = pd.concat(list_df)
print(new_df)
I have a large temperature time series that I'm performing some functions on. I'm taking hourly observations and creating daily statistics. After I'm done with my calculations, I want to use the grouped year and Julian days that are objects in the Groupby ('aa' below) and the drangeT and drangeHI arrays that come out and make an entirely new DataFrame with those variables. Code is below:
import numpy as np
import scipy.stats as st
import pandas as pd
city = ['BUF']#,'PIT','CIN','CHI','STL','MSP','DET']
mons = np.arange(5,11,1)
for a in city:
data = 'H:/Classwork/GEOG612/Project/'+a+'Data_cut.txt'
df = pd.read_table(data,sep='\t')
df['TempF'] = ((9./5.)*df['TempC'])+32.
df1 = df.loc[df['Month'].isin(mons)]
aa = df1.groupby(['Year','Julian'],as_index=False)
maxT = aa.aggregate({'TempF':np.max})
minT = aa.aggregate({'TempF':np.min})
maxHI = aa.aggregate({'HeatIndex':np.max})
minHI = aa.aggregate({'HeatIndex':np.min})
drangeT = maxT - minT
drangeHI = maxHI - minHI
df2 = pd.DataFrame(data = {'Year':aa.Year,'Day':aa.Julian,'TRange':drangeT,'HIRange':drangeHI})
All variables in the df2 command are of length 8250, but I get this error message when I run the it:
ValueError: cannot copy sequence with size 3 to array axis with dimension 8250
Any suggestions are welcomed and appreciated. Thanks!