I'm hoping to replace values in all columns within a df using integers from a specified column. Using the df below I want to use the values in Code and replace them in all other columns.
df = pd.DataFrame({
'Place' : ['X','Y','X','Y','X','Y','X','Y'],
'Number' : ['A','B','C','D','F','G','H','I'],
'Code' : [1,2,3,0,1,2,5,4],
'Value' : ['','','','','','','','']
})
df[:] = df['Code'].apply(lambda x: x if np.isreal(x) else 0).astype(int)
print(df)
Intended Output:
Place Number Code Value
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 0 0 0 0
4 1 1 1 1
5 2 2 2 2
6 5 5 5 5
7 4 4 4 4
Use reindex, ffill, bfill
df[['Code']].reindex(columns=df.columns).ffill(1).bfill(1).astype(int)
Out[256]:
Place Number Code Value
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 0 0 0 0
4 1 1 1 1
5 2 2 2 2
6 5 5 5 5
7 4 4 4 4
Numpy solution
df[:] = np.transpose([df.Code] * df.shape[1])
Out[314]:
Place Number Code Value
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 0 0 0 0
4 1 1 1 1
5 2 2 2 2
6 5 5 5 5
7 4 4 4 4
Try this:
df[df.columns] = df[['Code', 'Code', 'Code', 'Code']]
or:
df[df.columns] = df[['Code']*len(df.columns)]
Hope it helps you.
Related
I have an excel dataset which contains 100 rows and 100 clolumns with order frequencies in locations described by x and y.(gird like structure)
I'd like to convert it to the following structure with 3 columns:
x-Coördinaten | y-Coördinaten | value
The "value" column only contains positive integers. The x and y column contain float type data (geograohical coordinates.
The order does not matter, as it can easily be sorted afterwards.
So,basicly a merge of lists could work, e.g.:
[[1,5,3,5], [4,2,5,6], [2,3,1,5]] ==> [1,5,3,5,4,2,5,6,2,3,1,5]
But then i would lose the location...which is key for my project.
What is the best way to accomplish this?
Assuming this input:
l = [[1,5,3,5],[4,2,5,6],[2,3,1,5]]
df = pd.DataFrame(l)
you can use stack:
df2 = df.rename_axis(index='x', columns='y').stack().reset_index(name='value')
output:
x y value
0 0 0 1
1 0 1 5
2 0 2 3
3 0 3 5
4 1 0 4
5 1 1 2
6 1 2 5
7 1 3 6
8 2 0 2
9 2 1 3
10 2 2 1
11 2 3 5
or melt for a different order:
df2 = df.rename_axis('x').reset_index().melt('x', var_name='y', value_name='value')
output:
x y value
0 0 0 1
1 1 0 4
2 2 0 2
3 0 1 5
4 1 1 2
5 2 1 3
6 0 2 3
7 1 2 5
8 2 2 1
9 0 3 5
10 1 3 6
11 2 3 5
You should be able to get the results with a melt operation -
df = pd.DataFrame(np.arange(9).reshape(3, 3))
df.columns = [2, 3, 4]
df.loc[:, 'x'] = [3, 4, 5]
This is what df looks like
2 3 4 x
0 0 1 2 3
1 3 4 5 4
2 6 7 8 5
The melt operation -
df.melt(id_vars='x', var_name='y')
output -
x y value
0 3 2 0
1 4 2 3
2 5 2 6
3 3 3 1
4 4 3 4
5 5 3 7
6 3 4 2
7 4 4 5
8 5 4 8
I have a dataframe. I assigned a uniuqe value to each group. But also want to assign a unique value to each element or subgroup of each group.
df = pd.DataFrame({'A':[1,2,3,4,6,3,7,3,2],'B':[4,3,8,2,6,3,9,1,0], 'C':['a','a','c','b','b','b','b','c','c']})
I assigned a unique value to each group as follow
df.groupby('C').ngroup()
But i want output as
index grp subgrp
0 0 0
1 0 1
2 2 0
3 1 0
4 1 1
5 1 2
6 1 3
7 2 1
8 2 2
Adding cumcount after get the grp column
df['grp'] = df.groupby('C').ngroup()
df['subgrp'] = df.groupby('grp').cumcount()
df
Out[356]:
A B C grp subgrp
0 1 4 a 0 0
1 2 3 a 0 1
2 3 8 c 2 0
3 4 2 b 1 0
4 6 6 b 1 1
5 3 3 b 1 2
6 7 9 b 1 3
7 3 1 c 2 1
8 2 0 c 2 2
I have a Pandas dataframe like this :
id A B
0 2 2
1 1 1
2 3 3
3 7 7
And I want to duplicate the first row 3 times just below the selected row :
id A B
0 2 2
1 2 2
2 2 2
3 2 2
4 1 1
5 3 3
6 7 7
Is there a method that already exist in Pandas library ?
There is no built-in method for doing just this. However, you can create a list of indexes, and use df.loc + df.index.repeat:
new_df = df.loc[df.index.repeat([4] + [1] * (len(df) - 1))].reset_index(drop=True)
Output:
>>> new_df
id A B
0 0 2 2
1 0 2 2
2 0 2 2
3 0 2 2
4 1 1 1
5 2 3 3
6 3 7 7
Use reindex and Index.repeat to create your dataframe:
>>> df.reindex(df.index.repeat([3] + [1] * (len(df) - 1)))
id A B
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
Another way:
>>> df.loc[[df.index[0]]*3 + df.index[1:].tolist()]
id A B
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
A more generalized way proposed by #MuhammadHassan:
row_index = 0
repeat_time = 3
>>> df.reindex(df.index.tolist() + [row_index]*repeat_time).sort_index()
id A B
0 0 2 2
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
Let us try
n=3
row = 0
df = df.append(df.loc[[row]*(n-1)]).sort_index()
df
id A B
0 0 2 2
0 0 2 2
0 0 2 2
1 1 1 1
2 2 3 3
3 3 7 7
I would like to combine two columns in a new column.
Lets suppose I have:
Index A B
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
5 1 2
6 1 2
7 1 2
8 1 2
9 1 2
10 1 2
Now I would like to create a column C with the entries from A from Index 0 to 4 and from column B from Index 5 to 10. It should look like this:
Index A B C
0 1 0 1
1 1 0 1
2 1 0 1
3 1 0 1
4 1 0 1
5 1 2 2
6 1 2 2
7 1 2 2
8 1 2 2
9 1 2 2
10 1 2 2
Is there a python code how I can get this? Thanks in advance!
If Index is an actual column you can use numpy.where and specify your condition
import numpy as np
df['C'] = np.where(df['Index'] <= 4, df['A'], df['B'])
Index A B C
0 0 1 0 1
1 1 1 0 1
2 2 1 0 1
3 3 1 0 1
4 4 1 0 1
5 5 1 2 2
6 6 1 2 2
7 7 1 2 2
8 8 1 2 2
9 9 1 2 2
10 10 1 2 2
if your index is your actual index
you can slice your indices with iloc and create your column with concat.
df['C'] = pd.concat([df['A'].iloc[:5], df['B'].iloc[5:]])
print(df)
A B C
0 1 0 1
1 1 0 1
2 1 0 1
3 1 0 1
4 1 0 1
5 1 2 2
6 1 2 2
7 1 2 2
8 1 2 2
9 1 2 2
10 1 2 2
I have original dataframe:
ID T value
1 0 1
1 4 3
2 0 0
2 4 1
2 7 3
The value is same previous row.
The output should be like:
ID T value
1 0 1
1 1 1
1 2 1
1 3 1
1 4 3
2 0 0
2 1 0
2 2 0
2 3 0
2 4 1
2 5 1
2 6 1
2 7 3
... ... ...
I tried loop it take long time process.
Any idea how to solve this for large dataframe?
Thanks!
For solution is necessary unique integer values in T for each group.
Use groupby with custom function - for each group use reindex and then replace NaNs in value column by forward filling ffill:
df1 = (df.groupby('ID')['T', 'value']
.apply(lambda x: x.set_index('T').reindex(np.arange(x['T'].min(), x['T'].max() + 1)))
.ffill()
.astype(int)
.reset_index())
print (df1)
ID T value
0 1 0 1
1 1 1 1
2 1 2 1
3 1 3 1
4 1 4 3
5 2 0 0
6 2 1 0
7 2 2 0
8 2 3 0
9 2 4 1
10 2 5 1
11 2 6 1
12 2 7 3
If get error:
ValueError: cannot reindex from a duplicate axis
it means some duplicated values per group like:
print (df)
ID T value
0 1 0 1
1 1 4 3
2 2 0 0
3 2 4 1 <-4 is duplicates per group 2
4 2 4 3 <-4 is duplicates per group 2
5 2 7 3
Solution is aggregate values first for unique T - e.g.by sum:
df = df.groupby(['ID', 'T'], as_index=False)['value'].sum()
print (df)
ID T value
0 1 0 1
1 1 4 3
2 2 0 0
3 2 4 4
4 2 7 3