Say I have a dataframe with negative values on specific columns:
df = pd.DataFrame([[1, 1, -1],[-1, 1, 1],[-1, -1, 1]])
Now, I want to inplace clip the negative values to 0 on only specific lines and columns:
df.loc[[1, 2], [0, 1]].clip(lower=0, inplace=True)
But this doesn't work:
df
Out:
0 1 2
0 1 1 -1
1 -1 1 1
2 -1 -1 1
This is because slicing dataframe with a list of integers returns a copy:
df.loc[[1, 2], [0, 1]] is df.loc[[1, 2], [0, 1]]
Out: False
How do I make inplace changes to specific rows and columns then?
How about using df.lt instead:
df[df.loc[[1, 2], [0, 1]].lt(0)] = 0
print(df)
0 1 2
0 1 1 -1
1 0 1 1
2 0 0 1
You can do this:
df.loc[[1, 2], [0, 1]] = df.loc[[1, 2], [0, 1]].clip(lower=0)
Output:
0 1 2
0 1 1 -1
1 0 1 1
2 0 0 1
Related
Here's an example of DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame([
[0, "file_0", 5],
[0, "file_1", 0],
[1, "file_2", 0],
[1, "file_3", 8],
[2, "file_4", 0],
[2, "file_5", 5],
[2, "file_6", 100],
[2, "file_7", 0],
[2, "file_8", 50]
], columns=["case", "filename", "num"])
I wanna select num==0 rows and their previous rows with the same case value, no matter the num value of the previous row.
Finally, we should get
case filename num
0 file_0 5
0 file_1 0
1 file_2 0
2 file_4 0
2 file_6 100
2 file_7 0
I have got that I can select the previous row by
df[(df['num']==0).shift(-1).fillna(False)]
However, this doesn't consider the case value. One solution that came to my mind is group by case first and then filter data. I have no idea how to code it ...
I figure out the answer by myself:
# create boolean masks which are true when `num` is 0 and previous `case` is the same
mask = (df.case.eq(df.case.shift())) & (df['num']==0)
# concat previous rows and num==0 rows
df_res = pd.concat([df[mask.shift(-1).fillna(False)], df[df['num']==0]]).sort_values(['case', 'filename'])
How about merging df ?
df = pd.DataFrame([
[0, "file_0", 0],
[0, "file_1", 0],
[1, "file_2", 0],
[2, "file_3", 0],
[2, "file_4", 100],
[2, "file_5", 0],
[2, "file_6", 50],
[2, "file_7", 0]
], columns=["case", "filename", "num"])
df = df.merge(df, left_on='filename', right_on='filename', how='inner')
df[(df['case_x'] == df['case_y']) & df['num_x'] == 0]
Out[219]:
case_x filename num_x case_y num_y
0 0 file_0 0 0 0
1 0 file_1 0 0 0
2 1 file_2 0 1 0
3 2 file_3 0 2 0
4 2 file_4 100 2 100
5 2 file_5 0 2 0
6 2 file_6 50 2 50
7 2 file_7 0 2 0
then you can rename columns back
df[['case_x', 'filename', 'num_x']].rename({'case_x':'case','num_x':'num'},axis=1)
Out[223]:
case filename num
0 0 file_0 0
1 0 file_1 0
2 1 file_2 0
3 2 file_3 0
4 2 file_4 100
5 2 file_5 0
6 2 file_6 50
7 2 file_7 0
Do you mean:
df.join(df.groupby('case').shift(-1)
.loc[df['num']==0]
.dropna(how='all').add_suffix('_next'),
how='inner')
Output:
case filename num filename_next num_next
0 0 file_0 0 file_1 0.0
3 2 file_3 0 file_4 100.0
5 2 file_5 0 file_6 50.0
I need to find where the rows in ABC all have the value 1 and then create a new column that has the result.
my idea is to use np.where() with some condition, but I don't know the correct way of dealing with this problem, from what I have read I'm not supposed to iterate through a dataframe, but use some of the pandas creative methods?
df1 = pd.DataFrame({'A': [0, 1, 1, 0],
'B': [1, 1, 0, 1],
'C': [0, 1, 1, 1],},
index=[0, 1, 2, 4])
print(df1)
what I am after is this:
A B C TRUE
0 0 1 0 0
1 1 1 1 1 <----
2 1 0 1 0
4 0 1 1 0
If the data is always 0/1, you can simply take the product per row:
df1['TRUE'] = df1.prod(1)
output:
A B C TRUE
0 0 1 0 0
1 1 1 1 1
2 1 0 1 0
4 0 1 1 0
This is what you are looking for:
df1["TRUE"] = (df1==1).all(axis=1).astype(int)
For example, if I have a data frame
x f
0 0 [0, 1]
1 1 [3]
2 2 [2, 3, 4]
3 3 [3, 6]
4 4 [4, 5]
If I want to remove the rows which columns x doesn't in f columns, I tried with where and apply but I can't get the expected results. I got the below table and I want to know why row 0,2,3 are 0 instead of 1?
x f mask
0 0 [0, 1] 0
1 1 [3] 0
2 2 [2, 3, 4] 0
3 3 [3, 6] 0
4 4 [4, 5] 0
Anyone knows why? And should I do to handle this number vs list case?
df1 = pd.DataFrame({'x': [0,1,2,3,4],'f' :[[0,1],[3],[2,3,4],[3,6],[3,5]]}, index = [0,1,2,3,4])
df1['mask'] = np.where(df1.x.values in df1.f.values ,1,0)
Here is necessary test values by pairs - solution with in in list comprehension:
df1['mask'] = np.where([a in b for a, b in df1[['x', 'f']].values],1,0)
Or with DataFrame.apply and axis=1:
df1['mask'] = np.where(df1.apply(lambda x: x.x in x.f, axis=1),1,0)
print (df1)
x f mask
0 0 [0, 1] 1
1 1 [3] 0
2 2 [2, 3, 4] 1
3 3 [3, 6] 1
4 4 [3, 5] 0
IIUC row explode then use isin
pd.DataFrame(df1.f.tolist()).isin(df1.x).any(1).astype(int)
Out[10]:
0 1
1 0
2 1
3 1
4 0
dtype: int32
df1['mask'] = pd.DataFrame(df1.f.tolist()).isin(df1.x).any(1).astype(int)
I want to merge two datasets by indexes and columns.
I want to merge entire dataset
df1 = pd.DataFrame([[1, 0, 0], [0, 2, 0], [0, 0, 3]],columns=[1, 2, 3])
df1
1 2 3
0 1 0 0
1 0 2 0
2 0 0 3
df2 = pd.DataFrame([[0, 0, 1], [0, 2, 0], [3, 0, 0]],columns=[1, 2, 3])
df2
1 2 3
0 0 0 1
1 0 2 0
2 3 0 0
I have tried this code but I got this error. I can't get why it shows the size of axis as an error.
df_sum = pd.concat([df1, df2])\
.groupby(df2.index)[df2.columns]\
.sum().reset_index()
ValueError: Grouper and axis must be same length
This was what I expected the output of df_sum
df_sum
1 2 3
0 1 0 1
1 0 4 0
2 3 0 3
You can use :df1.add(df2, fill_value=0). It will add df2 into df1 also it will replace NAN value with 0.
>>> import numpy as np
>>> import pandas as pd
>>> df2 = pd.DataFrame([(10,9),(8,4),(7,np.nan)], columns=['a','b'])
>>> df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
>>> df1.add(df2, fill_value=0)
a b
0 11 11.0
1 11 8.0
2 12 6.0
I need to find all-pair column-wise operation on a dataframe. I came up with a naive solution but wondering if any elegant way is available.
The following script counts the number rows having one in both columns.
input:
a b c d
0 0 0 1 0
1 1 1 0 1
2 1 1 1 0
Output:
2 2 1 1
2 2 1 1
1 1 2 0
1 1 0 1
Code:
df = DataFrame(random.randint(0,high=2, size=(3,4)), columns=['a','b', 'c', 'd'])
mycolumns = df.columns
for i in range(0, shape(df)[1]):
for j in range(0, shape(df)[1]):
print(sum(df[mycolumns[i]] & df[mycolumns[j]]))
That is basically matrix multiplication of X' and X where X' is transpose of X:
>>> xs = df.values
>>> xs.T.dot(xs)
array([[2, 2, 1, 1],
[2, 2, 1, 1],
[1, 1, 2, 0],
[1, 1, 0, 1]])