I want to shift some columns in the middle of the dataframe to the rightmost.
I could do this with individual column using code:
cols=list(df.columns.values)
cols.pop(cols.index('one_column'))
df=df[cols +['one_column']]
df
But it's inefficient to do it individually when there are 100 columns of 2 series, ie. series1_1... series1_50 and series2_1... series2_50 in the middle of the dataframe.
How can I do it by assigning the 2 series as lists, popping them and putting them back? Maybe something like
cols=list(df.columns.values)
series1 = list(df.loc['series1_1':'series1_50'])
series2 = list(df.loc['series2_1':'series2_50'])
cols.pop('series1', 'series2')
df=df[cols +['series1', 'series2']]
but this didn't work. Thanks
If you just want to shift the columns, you could call concat like this:
cols_to_shift = ['colA', 'colB']
pd.concat([
df[df.columns.difference(cols_to_shift)],
df[cols_to_shift]
], axis=1
)
Or, you could do a little list manipulation on the columns.
cols_to_keep = [c for c in df.columns if c not in cols_to_shift]
df[cols_to_keep + cols_to_shift]
Minimal Example
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1, 10, (3, 5)), columns=list('ABCDE'))
df
A B C D E
0 6 1 4 4 8
1 4 6 3 5 8
2 7 9 9 2 7
cols_to_shift = ['B', 'C']
pd.concat([
df[df.columns.difference(cols_to_shift)],
df[cols_to_shift]
], axis=1
)
A D E B C
0 6 4 8 1 4
1 4 5 8 6 3
2 7 2 7 9 9
[c for c in df.columns if c not in cols_to_shift]
df[cols_to_keep + cols_to_shift]
A D E B C
0 6 4 8 1 4
1 4 5 8 6 3
2 7 2 7 9 9
I think list.pop only takes indices of the elements in the list.
You should list.remove instead.
cols = df.columns.tolist()
for s in (‘series1’, ‘series2’):
cols.remove(s)
df = df[cols + [‘series1’, ‘series2’]]
Related
I have a dataframe like this
df = pd.DataFrame(
np.arange(2, 11).reshape(-1, 3),
index=list('ABC'),
columns=pd.MultiIndex.from_arrays([
['data1', 'data2','data3'],
['F', 'K',''],
['', '','']
], names=['meter', 'Sleeper',''])
).rename_axis('Index')
df
meter data1 data2 data3
Sleeper F K
Index
A 2 3 4
B 5 6 7
C 8 9 10
So I want to join level names and flatted the data
following this solution Pandas dataframe with multiindex column - merge levels
df.columns = df.columns.map('_'.join).str.strip('|')
df.reset_index(inplace=True)
Getting this
Index data1_F_ data2_K_ data3__
0 A 2 3 4
1 B 5 6 7
2 C 8 9 10
but I dont want those _ end of the column names so I added
df.columns = df.columns.apply(lambda x: x[:-1] if x.endswith('_') else x)
df
But got
AttributeError: 'Index' object has no attribute 'apply'
How can I combine map and apply (flatten the column names and remove _ at the end of the column names in one run ?
expected output
Index data1_F data2_K data3
0 A 2 3 4
1 B 5 6 7
2 C 8 9 10
Thanks
You can try this:
df.columns = df.columns.map('_'.join).str.strip('_')
df
Out[132]:
data1_F data2_K data3
Index
A 2 3 4
B 5 6 7
C 8 9 10
Having two dataframes df1 and df2 (same number of rows) how can we, very simply, take all the columns from df2 and add them to df1? Using join, we are joining them on the index or a given column, but assuming their index's are completely different and they have no columns in common. Is that doable (without the obvious way of looping over each column in df2and add them as new to df1)?
EDIT: added an example.
Note; no index, column names are mentioned since it should not matter (thats is the "problem").
df1= [[1,3,2,
[11,20,33]]
df2 = [["bird",np.nan,37,np.sqrt(2)]
["dog",0.123,3.14,0]]
pd.some_operation(df1,df2)
#[[1,3,2,"bird",np.nan,37,np.sqrt(2)]
#[11,20,33,"dog",0.123,3.14,0]]
Samples:
df1 = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
}, index = list('QRSTUW'))
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
}, index = list('KLMNOP'))
Pandas always use index values if use join or concat by axis=1, so for correct alignement is necessary create same index values:
df = df1.join(df2.set_index(df1.index))
df = pd.concat([df1, df2.set_index(df1.index)], axis=1)
print (df)
A B C D E F
Q a 4 7 1 5 a
R b 5 8 3 3 a
S c 4 9 5 6 a
T d 5 4 7 9 b
U e 5 2 1 2 b
W f 4 3 0 4 b
Or create default index in both DataFrames:
df = df1.reset_index(drop=True).join(df2.reset_index(drop=True))
df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
I have to copy columns from one DataFrame A to another DataFrame B. The column names in A and B do not match.
What is the best way to do it? There are several columns like this. Do I need to write for each column like B["SO"] = A["Sales Order"] etc.
i would use pd.concat
combined_df = pd.concat([df1, df2[['column_a', 'column_b']]], axis=1)
also gives you the power to concat different size dateframes , outer join etc.
Use:
df1 = pd.DataFrame({
'SO':list('abcdef'),
'RI':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
print (df1)
SO RI C
0 a 4 7
1 b 5 8
2 c 4 9
3 d 5 4
4 e 5 2
5 f 4 3
df2 = pd.DataFrame({
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
print (df2)
D E F
0 1 5 a
1 3 3 a
2 5 6 a
3 7 9 b
4 1 2 b
5 0 4 b
Create dictionary for rename, select columns matched, rename by dict and DataFrame.join to original - DataFrames matched by index values:
d = {'SO':'Sales Order',
'RI':'Retail Invoices'}
df11 = df1[d.keys()].rename(columns=d)
print (df11)
Sales Order Retail Invoices
0 a 4
1 b 5
2 c 4
3 d 5
4 e 5
5 f 4
df = df2.join(df11)
print (df)
D E F Sales Order Retail Invoices
0 1 5 a a 4
1 3 3 a b 5
2 5 6 a c 4
3 7 9 b d 5
4 1 2 b e 5
5 0 4 b f 4
Make a dictionary of abbreviations. And try this code.
Ex:
full_form_dict = {'SO':'Sales Order',
'RI':'Retail Invoices',}
A_col = list(A.columns)
B_col = [v for k,v in full_form_dict.items() if k in A_col]
# to loop over A_col
# B_col = [v for col in A_col for k,v in full_form_dict.items() if k == col]
I have a "sample.txt" like this.
idx A B C D cat
J 1 2 3 1 x
K 4 5 6 2 x
L 7 8 9 3 y
M 1 2 3 4 y
N 4 5 6 5 z
O 7 8 9 6 z
With this dataset, I want to get sum in row and column.
In row, it is not a big deal.
I made result like this.
### MY CODE ###
import pandas as pd
df = pd.read_csv('sample.txt',sep="\t",index_col='idx')
df.info()
df2 = df.groupby('cat').sum()
print( df2 )
The result is like this.
A B C D
cat
x 5 7 9 3
y 8 10 12 7
z 11 13 15 11
But I don't know how to write a code to get result like this.
(simply add values in column A and B as well as column C and D)
AB CD
J 3 4
K 9 8
L 15 12
M 3 7
N 9 11
O 15 15
Could anybody help how to write a code?
By the way, I don't want to do like this.
(it looks too dull, but if it is the only way, I'll deem it)
df2 = df['A'] + df['B']
df3 = df['C'] + df['D']
df = pd.DataFrame([df2,df3],index=['AB','CD']).transpose()
print( df )
When you pass a dictionary or callable to groupby it gets applied to an axis. I specified axis one which is columns.
d = dict(A='AB', B='AB', C='CD', D='CD')
df.groupby(d, axis=1).sum()
Use concat with sum:
df = df.set_index('idx')
df = pd.concat([df[['A', 'B']].sum(1), df[['C', 'D']].sum(1)], axis=1, keys=['AB','CD'])
print( df)
AB CD
idx
J 3 4
K 9 8
L 15 12
M 3 7
N 9 11
O 15 15
Does this do what you need? By using axis=1 with DataFrame.apply, you can use the data that you want in a row to construct a new column. Then you can drop the columns that you don't want anymore.
In [1]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A', 'B', 'C', 'D'], data=[[1, 2, 3, 4], [1, 2, 3, 4]])
In [6]: df
Out[6]:
A B C D
0 1 2 3 4
1 1 2 3 4
In [7]: df['CD'] = df.apply(lambda x: x['C'] + x['D'], axis=1)
In [8]: df
Out[8]:
A B C D CD
0 1 2 3 4 7
1 1 2 3 4 7
In [13]: df.drop(['C', 'D'], axis=1)
Out[13]:
A B CD
0 1 2 7
1 1 2 7
I know this is probably a basic question, but somehow I can't find the answer. I was wondering how it's possible to return a value from a dataframe if I know the row and column to look for? E.g. If I have a dataframe with columns 1-4 and rows A-D, how would I return the value for B4?
You can use ix for this:
In [236]:
df = pd.DataFrame(np.random.randn(4,4), index=list('ABCD'), columns=[1,2,3,4])
df
Out[236]:
1 2 3 4
A 1.682851 0.889752 -0.406603 -0.627984
B 0.948240 -1.959154 -0.866491 -1.212045
C -0.970505 0.510938 -0.261347 -1.575971
D -0.847320 -0.050969 -0.388632 -1.033542
In [237]:
df.ix['B',4]
Out[237]:
-1.2120448782618383
Use at, if rows are A-D and columns 1-4:
print (df.at['B', 4])
If rows are 1-4 and columns A-D:
print (df.at[4, 'B'])
Fast scalar value getting and setting.
Sample:
df = pd.DataFrame(np.arange(16).reshape(4,4),index=list('ABCD'), columns=[1,2,3,4])
print (df)
1 2 3 4
A 0 1 2 3
B 4 5 6 7
C 8 9 10 11
D 12 13 14 15
print (df.at['B', 4])
7
df = pd.DataFrame(np.arange(16).reshape(4,4),index=[1,2,3,4], columns=list('ABCD'))
print (df)
A B C D
1 0 1 2 3
2 4 5 6 7
3 8 9 10 11
4 12 13 14 15
print (df.at[4, 'B'])
13