Python: Replace a cell value in Dataframe with if statement - python

I have a matrix with that looks like this:
com 0 1 2 3 4 5
AAA 0 5 0 4 2 1 4
ABC 0 9 8 9 1 0 3
ADE 1 4 3 5 1 0 1
BCD 1 6 7 8 3 4 1
BCF 2 3 4 2 1 3 0 ...
Where AAA, ABC ... is the dataframe index. The dataframe columns are com 0 1 3 4 5 6
I want to set the cell values in my dataframe equal to 0 when the row values of com is equal the column "number". So for instance, the above matrix will look like:
com 0 1 2 3 4 5
AAA 0 0 0 4 2 1 4
ABC 0 0 8 9 1 0 3
ADE 1 4 0 5 1 0 1
BCD 1 6 0 8 3 4 1
BCF 2 3 4 0 1 3 0 ...
I tried to iterate over rows and use both .loc and .ix but no success.

Just require some numpy trick
In [22]:
print df
0 1 2 3 4 5
0 5 0 4 2 1 4
0 9 8 9 1 0 3
1 4 3 5 1 0 1
1 6 7 8 3 4 1
2 3 4 2 1 3 0
[5 rows x 6 columns]
In [23]:
#making a masking matrix, 0 where column and index values equal, 1 elsewhere, kind of the vectorized way of doing if TURE 0, else 1
print df*np.where(df.columns.values==df.index.values[..., np.newaxis], 0,1)
0 1 2 3 4 5
0 0 0 4 2 1 4
0 0 8 9 1 0 3
1 4 0 5 1 0 1
1 6 0 8 3 4 1
2 3 4 0 1 3 0
[5 rows x 6 columns]

I think this should work.
for line in range(len(matrix)):
matrix[matrix[line][0]+1]=0
NOTE
Depending on your matrix setup you may not need the +1
Basically it takes the first digit of each line in the matrix and uses that as the index of the value to change to 0
i.e. if the row was
c 0 1 2 3 4 5
AAA 4 3 2 3 9 5 9,
it would change the 5 below the number 4 to 0
c 0 1 2 3 4 5
AAA 4 3 2 3 9 0 9

Related

Grid-like dataframe to list

I have an excel dataset which contains 100 rows and 100 clolumns with order frequencies in locations described by x and y.(gird like structure)
I'd like to convert it to the following structure with 3 columns:
x-Coördinaten | y-Coördinaten | value
The "value" column only contains positive integers. The x and y column contain float type data (geograohical coordinates.
The order does not matter, as it can easily be sorted afterwards.
So,basicly a merge of lists could work, e.g.:
[[1,5,3,5], [4,2,5,6], [2,3,1,5]] ==> [1,5,3,5,4,2,5,6,2,3,1,5]
But then i would lose the location...which is key for my project.
What is the best way to accomplish this?
Assuming this input:
l = [[1,5,3,5],[4,2,5,6],[2,3,1,5]]
df = pd.DataFrame(l)
you can use stack:
df2 = df.rename_axis(index='x', columns='y').stack().reset_index(name='value')
output:
x y value
0 0 0 1
1 0 1 5
2 0 2 3
3 0 3 5
4 1 0 4
5 1 1 2
6 1 2 5
7 1 3 6
8 2 0 2
9 2 1 3
10 2 2 1
11 2 3 5
or melt for a different order:
df2 = df.rename_axis('x').reset_index().melt('x', var_name='y', value_name='value')
output:
x y value
0 0 0 1
1 1 0 4
2 2 0 2
3 0 1 5
4 1 1 2
5 2 1 3
6 0 2 3
7 1 2 5
8 2 2 1
9 0 3 5
10 1 3 6
11 2 3 5
You should be able to get the results with a melt operation -
df = pd.DataFrame(np.arange(9).reshape(3, 3))
df.columns = [2, 3, 4]
df.loc[:, 'x'] = [3, 4, 5]
This is what df looks like
2 3 4 x
0 0 1 2 3
1 3 4 5 4
2 6 7 8 5
The melt operation -
df.melt(id_vars='x', var_name='y')
output -
x y value
0 3 2 0
1 4 2 3
2 5 2 6
3 3 3 1
4 4 3 4
5 5 3 7
6 3 4 2
7 4 4 5
8 5 4 8

Python: create new column conditionally on values from two other columns

I would like to combine two columns in a new column.
Lets suppose I have:
Index A B
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
5 1 2
6 1 2
7 1 2
8 1 2
9 1 2
10 1 2
Now I would like to create a column C with the entries from A from Index 0 to 4 and from column B from Index 5 to 10. It should look like this:
Index A B C
0 1 0 1
1 1 0 1
2 1 0 1
3 1 0 1
4 1 0 1
5 1 2 2
6 1 2 2
7 1 2 2
8 1 2 2
9 1 2 2
10 1 2 2
Is there a python code how I can get this? Thanks in advance!
If Index is an actual column you can use numpy.where and specify your condition
import numpy as np
df['C'] = np.where(df['Index'] <= 4, df['A'], df['B'])
Index A B C
0 0 1 0 1
1 1 1 0 1
2 2 1 0 1
3 3 1 0 1
4 4 1 0 1
5 5 1 2 2
6 6 1 2 2
7 7 1 2 2
8 8 1 2 2
9 9 1 2 2
10 10 1 2 2
if your index is your actual index
you can slice your indices with iloc and create your column with concat.
df['C'] = pd.concat([df['A'].iloc[:5], df['B'].iloc[5:]])
print(df)
A B C
0 1 0 1
1 1 0 1
2 1 0 1
3 1 0 1
4 1 0 1
5 1 2 2
6 1 2 2
7 1 2 2
8 1 2 2
9 1 2 2
10 1 2 2

pandas: replace values by row based on condition

I have a pandas dataframe as follows:
df2
amount 1 2 3 4
0 5 1 1 1 1
1 7 0 1 1 1
2 9 0 0 0 1
3 8 0 0 1 0
4 2 0 0 0 1
What I want to do is replace the 1s on every row with the value of the amount field in that row and leave the zeros as is. The output should look like this
amount 1 2 3 4
0 5 5 5 5 5
1 7 0 7 7 7
2 9 0 0 0 9
3 8 0 0 8 0
4 2 0 0 0 2
I've tried applying a lambda function row-wise like this, but I'm running into errors
df2.apply(lambda x: x.loc[i].replace(0, x['amount']) for i in len(x), axis=1)
Any help would be much appreciated. Thanks
Let's use mask:
df2.mask(df2 == 1, df2['amount'], axis=0)
Output:
amount 1 2 3 4
0 5 5 5 5 5
1 7 0 7 7 7
2 9 0 0 0 9
3 8 0 0 8 0
4 2 0 0 0 2
You can also do it wit pandas.DataFrame.mul() method, like this:
>>> df2.iloc[:, 1:] = df2.iloc[:, 1:].mul(df2['amount'], axis=0)
>>> print(df2)
amount 1 2 3 4
0 5 5 5 5 5
1 7 0 7 7 7
2 9 0 0 0 9
3 8 0 0 8 0
4 2 0 0 0 2

Stacking Pandas Dataframe without dropping row

Currently, I have a dataframe like this:
0 0 0 3 0 0
0 7 8 9 1 0
0 4 5 2 4 0
My code to stack it:
dt = dataset.iloc[:,0:7].stack().sort_index(level=1).reset_index(level=0, drop=True).to_frame()
dt['variable'] = pandas.Categorical(dt.index).codes+1
dt.rename(columns={0:index_column_name}, inplace=True)
dt.set_index(index_column_name, inplace=True)
dt['variable'] = numpy.sort(dt['variable'])
However, it drops the first row when I'm stacking it, and I want to keep the headers / first row, how would I achieve this?
In essence, I'm losing the data from the first row (a.k.a column headers) and I want to keep it.
Desired Output:
value,variable
0 1
0 1
0 1
0 2
7 2
4 2
0 3
8 3
5 3
3 4
9 4
2 4
0 5
1 5
4 5
0 6
0 6
0 6
Current output:
value,variable
0 1
0 1
7 2
4 2
8 3
5 3
9 4
2 4
1 5
4 5
0 6
0 6
Why not use df.melt as #WeNYoBen mentioned?
print(df)
1 2 3 4 5 6
0 0 0 0 3 0 0
1 0 7 8 9 1 0
2 0 4 5 2 4 0
print(df.melt())
variable value
0 1 0
1 1 0
2 1 0
3 2 0
4 2 7
5 2 4
6 3 0
7 3 8
8 3 5
9 4 3
10 4 9
11 4 2
12 5 0
13 5 1
14 5 4
15 6 0
16 6 0
17 6 0

Repeating rows in a DataFrame based on a column

I have a dataframe now:
class1 class2 value value2
0 1 0 1 4
1 2 1 2 3
2 2 0 3 5
3 3 1 4 6
I want to repeat rows and insert an increment column in the same amount according to the difference between value and value2. I want to get the dataframe should like this:
class1 class2 value value2 value3
0 1 0 1 4 1
1 1 0 1 4 2
2 1 0 1 4 3
3 1 0 1 4 4
4 2 1 2 3 2
5 2 1 2 3 3
6 2 0 3 5 3
7 2 0 3 5 4
8 2 0 3 5 5
9 3 1 4 6 4
10 3 1 4 6 5
11 3 1 4 6 6
I tried it like:
def func(x):
copy = x.copy()
num = x.value2+1-x.value
return pd.concat([copy]*num.values[0])
df= df.groupby(['class1','class2']).apply(lambda x:func(x))
But there will be a oredr problem that leads me to not know how to add column value3. And I'd like to have an elegant way of doing it.
Can anyone help me? Thanks in advance
Compute the difference and call Index.repeat:
idx = df.index.repeat(df.value2 - df.value + 1)
Now, either use reindex:
df = df.reindex(idx).reset_index(drop=True)
Or loc:
df = df.loc[idx].reset_index(drop=True)
And you get
df
class1 class2 value value2
0 1 0 1 4
1 1 0 1 4
2 1 0 1 4
3 1 0 1 4
4 2 1 2 3
5 2 1 2 3
6 2 0 3 5
7 2 0 3 5
8 2 0 3 5
9 3 1 4 6
10 3 1 4 6
11 3 1 4 6
For the second part of your question, you'll need groupby.cumcount:
s = idx.to_series()
df['value3'] = df['value'] + s.groupby(idx).cumcount().values
df
class1 class2 value value2 value3
0 1 0 1 4 1
1 1 0 1 4 2
2 1 0 1 4 3
3 1 0 1 4 4
4 2 1 2 3 2
5 2 1 2 3 3
6 2 0 3 5 3
7 2 0 3 5 4
8 2 0 3 5 5
9 3 1 4 6 4
10 3 1 4 6 5
11 3 1 4 6 6
Here's a sequence of things that would get you the desired output:
df.join(df
.apply(lambda x: pd.Series(range(x.value, x.value2+1)), axis=1)
.stack().astype(int)
.reset_index(level=1, drop=1)
.to_frame('value3')).reset_index(drop=1)
Out[]:
class1 class2 value value2 value3
0 1 0 1 4 1
1 1 0 1 4 2
2 1 0 1 4 3
3 1 0 1 4 4
4 2 1 2 3 2
5 2 1 2 3 3
6 2 0 3 5 3
7 2 0 3 5 4
8 2 0 3 5 5
9 3 1 4 6 4
10 3 1 4 6 5
11 3 1 4 6 6

Categories