Provided I have a multiindex data Frame as follows:
import pandas as pd
import pandas as pd
import numpy as np
input_id = np.array(['input_id'])
docType = np.array(['pre','pub','app','dw'])
docId = np.array(['34455667'])
sec_type = np.array(['bib','abs','cl','de'])
sec_ids = np.array(['x-y','z-k'])
index = pd.MultiIndex.from_product([input_id,docType,docId,sec_type,sec_ids])
content= [str(np.random.randint(1,10))+ '##' + str(np.random.randint(1,10)) for i in range(len(index))]
df = pd.DataFrame(content, index=index, columns=['content'])
df.rename_axis(index=['input_id','docType','docId','secType','sec_ids'], inplace=True)
I would like to query the multiindex DF
# query a multiindex DF
idx = pd.IndexSlice
df.loc[idx[:,'pub',:,'de',:]]
Resulting in:
I would like to get directly the values of the multiindex column sec_ids as a list. How do I have to modify to get the follwoing result:
['x-y','z-k']
Thanks
You can use the MultiIndex.get_level_values() method to get the values of a specific level of a MultiIndex. So in this case call it after your slice.
df.loc[idx[:,'pub',:,'de',:]].index.get_level_values('sec_ids').tolist()
#['x-y', 'z-k']
Related
I have a DataFrame and I create a Series with its first line:
import pandas as pd
df = pd.DataFrame({
'market_name':['z','a'],
'values':[2,1]
})
markets_name = ['a','z']
df["market_name"] = pd.Categorical(df['market_name'], markets_name)
final_row = df.sort_values("market_name").reset_index(drop=True).iloc[0]
print(final_row)
But the dtype of the Series (in my original code) is not the same as the DataFrame, how to proceed to keep exactly the same as the original DataFrame?
My fails attempts:
final_row = final_row.astype(dict(df.dtypes))
final_row = pd.Series(final_row, dtype=df.dtypes)
I need to be able sort the result of Pandas' 2nd groupby by Category.
The first groupby creates a list from another column, and second one is the groupby result I need. The problem is that the 2nd groupby does not honour the original sorted categorical index of the Dataframe
import pandas as pd
import numpy as np
import numpy.ma as ma
from pathlib import Path
fr = Path('../data/rules-1.xlsx')
df = pd.read_excel(fr, sheet_name='MS')
from pandas.api.types import CategoricalDtype
print('Before:')
display(df)
ms_cat = ['Parent-C', 'Parent-A', 'Parent-B']
df['ParentMS'] = df['ParentMS'].astype(CategoricalDtype(list(ms_cat)),order=True)
df = df.reset_index()
df = df.set_index('ParentMS')
df = df.sort_index()
print('After:')
display(df)
df_g = df. groupby(['ParentMS', 'Milestone'])['Tasks'].apply(list)
df_g = df_g.groupby('ParentMS')
# Category sort is not honored after the second groupby()
for name, group in df_g:
print(name, group)
This the input file:
[enter image description here][1]
[1]: https://i.stack.imgur.com/KZnZD.png
Combining the two "df_g" lines did the trick for me. I cannot explain it but it worked
df_g = df.groupby(['ParentMS', 'Milestone'])['RN'].apply(list).groupby('ParentMS')
I have a dataframe that currently looks like this:
import numpy as np
raw_data = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,-3.4],'3M':[24,-31,53,5]}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','SP','1M','3M'])
print df
I would like to transponse in a way such that all the value fields get transposed to the Value Column and the date is appended as a row item. The column name of the value field becomes a row for the Description column. That is the resulting Dataframe should look like this:
import numpy as np
raw_data = {'Series_Date':['2017-03-10','2017-03-10','2017-03-10','2017-03-13','2017-03-13','2017-03-13','2017-03-14','2017-03-14','2017-03-14','2017-03-15','2017-03-15','2017-03-15'],'Value':[35.6,-7.8,24,56.7,56,-31,41,56,53,41,-3.4,5],'Desc':['SP','1M','3M','SP','1M','3M','SP','1M','3M','SP','1M','3M']}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','Value','Desc'])
print df
Could someone please help how I can flip and transpose my DataFrame this way?
Use pd.melt to transform DF from a wide format to a long one:
idx = "Series_Date" # identifier variable
pd.melt(df, id_vars=idx, var_name="Desc").sort_values(idx).reset_index(drop=True)
I have a dataframe defined as follows:
import datetime
import pandas as pd
import random
import numpy as np
todays_date = datetime.datetime.today().date()
index = pd.date_range(todays_date - datetime.timedelta(10), periods=10, freq='D')
index = index.append(index)
idname = ['A']*10 + ['B']*10
values = random.sample(xrange(100), 20)
data = np.vstack((idname, values)).T
tmp_df = pd.DataFrame(data, columns=['id', 'value'])
tmp_index = pd.DataFrame(index, columns=['date'])
tmp_df = pd.concat([tmp_index, tmp_df], axis=1)
tmp_df = tmp_df.set_index('date')
Note that there are 2 values for each date. I would like to resample the dataframe tmp_df on a weekly basis but keep the two separate values. I tried tmp_df.resample('W-FRI') but it doesn't seem to work.
The solution you're looking for is groupby, which lets you perform operations on dataframe slices (here 'A' and 'B') independently:
df.groupby('id').resample('W-FRI')
Note: your code produces an error (No numeric types to aggregate) because the 'value' column is not converted to int. You need to convert it first:
df['value'] = pd.to_numeric(df['value'])
Consider the following code:
import datetime
import pandas as pd
import numpy as np
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['A','B', 'C']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, index=index, columns=columns)
df
Here we create an empty DataFrame in Python using Pandas and then fill it to any extent. However, is it possible to add columns dynamically in a similar manner, i.e., for columns = ['A','B', 'C'], it must be possible to add columns D,E,F etc till a specified number.
I think the
pandas.DataFrame.append
method is what you are after.
e.g.
output_frame=input_frame.append(appended_frame)
There are additional examples in the documentation Pandas merge join and concatenate documentation