I have the following pandas dataframe
I would like it to be converted to a pandas dataframe with one row. Is there a simple way to do it. I tried pivot but was getting weird results.
You can pivot, swap the level of columns names, shift values up to fill NaN values and flatten column names:
out = df.pivot(columns='Study Identification').swaplevel(0,1,axis=1).apply(lambda x: pd.Series(x.dropna().values)).fillna('')
out.columns = s.columns.map(''.join)
So in your case reshape the df with unstack
s = df.set_index('A',append=True).unstack(level=1).swaplevel(0,1,axis=1)
s.columns = s.columns.map(''.join)
Related
I have a pandas dataframe- got it from API so don't have much control over the structure of it- similar like this:
I want to have datetime a column and value as another column. Any hints?
you can use T to transform the dataframe and then reseindex to create a new index column and keep the current column you may need to change its name form index
df = df.T.reset_index()
df.columns = df.iloc[0]
df = df[1:]
What i have is a list of Dataframes.
What is important to note is that the shape of the dataframes differ between 2-7 columns, also the columns are named between 0 & len of the column (e.g. df1 has 5 columns named 0,1,2,3,4 etc. df2 has 4 columns named 0,1,2,3)
I would like is to check if a row in a column contains a certain string, then delete that column.
list_dfs1=[df1,df2,df3...df100]
What i have done so far is the below & i get an error that column 5 is not in axis (it is there for some DF)
for i, df in enumerate(list_dfs1):
for index,row in df.iterrows():
if np.where(row.str.contains("DEC")):
df.drop(index, axis=1)
Any suggestions.
You could try:
for df in list_dfs:
for col in df.columns:
# If you are unsure about column types, cast column as string:
df[col] = df[col].astype(str)
# Check if the column contains the string of interest
if df[col].str.contains("DEC").any():
df.drop(columns=[col], inplace=True)
If you know that all columns are of type string, you don't have to actually do df[col] = df[col].astype(str).
You can write a custom function that checks whether the dataframe has the pattern or not. You can use pd.Series.str.contains with pd.Series.any
def func(s):
return s.str.contains('DEC').any()
list_df = [df.loc[:, ~df.apply(func)] for df in list_dfs1]
I would take another approach. I would concatenate the list into a data frame and then eliminate the column where finding the string
import pandas as pd
df = pd.concat(list_dfs1)
Let us say your condition was to eliminate any column with "DEC"
df.mask(df == "DEC").dropna(axis=1, how="any")
My DF has the following columns:
df.columns = ['not-changing1', 'not-changing2', 'changing1', 'changing2', 'changing3', 'changing4']
I want to swap the last 4 columns WITHOUT USING COLUMNS NAMES, but using their index instead.
So, the final column order would be:
result.columns = ['not-changing1', 'not-changing2', 'changing1', 'changing3', 'changing2', 'changing4']
How do I do that?
i have following data sample i am trying to flatten it out using pandas, i wanna flatten this data over Candidate_Name.
This is my implementation,
df= df.merge(df,on=('Candidate_Name'))
but i am not getting desired result. My desired output is as follows. So basically have all the rows that match Candidate_Name in a single row, where duplicate column names may suffix with _x
I think you need GroupBy.cumcount with DataFrame.unstack and then flatten MultiIndex with same values for first groups and added numbers for another levels for avoid duplicated columns names:
df = df.set_index(['Candidate_Name', df.groupby('Candidate_Name').cumcount()]).unstack()
df.columns = [a if b == 0 else f'{a}_{b}' for a, b in df.columns]
I have a dataframe contains 4 columns, the first 3 columns are numerical variables which indicate the feature of the variable at the last column, and the last column are strings.
I want to merge the last string column by the previous 3 columns through the groupby function. Then it works(I mean the string which shares the same feature logged by the first three columns had been merged successfully)
Previously the length of the dataframe was 1200, and the length of the merged dataframe is 1100. I found the later df is multindexed. Which only contain 2 columns.(hierarchical index ) Thus I tried the reindex method by a generated ascending numerical list. Sadly I failed.
df1.columns
*[Out]Index(['time', 'column','author', 'text'], dtype='object')
series = df1.groupby(['time', 'column','author'])
['body_text'].sum()#merge the last column by the first 3 columns
dfx = series.to_frame()# get the new df
dfx.columns
*[Out]Index(['author', 'text'], dtype='object')
len(dfx)
*[Out]1100
indexs = list(range(1100))
dfx.reindex(index = indexs)
*[Out]Exception: cannot handle a non-unique multi-index!
Reindex here is not necessary, better is use DataFrame.reset_index or add parameter as_index=False to DataFrame.groupby
dfx = df1.groupby(['time', 'column','author'])['body_text'].sum().reset_index()
Or:
dfx = df1.groupby(['time', 'column','author'], as_index=False)['body_text'].sum()