i would like to set a df value between two values x_lim(0,2) to True.
I would like to get a df that looks like this:
x | y | z
0 | 7 | True
1 | 3 | True
2 | 4 | True
3 | 8 | False
i tried :
def set_label(df, x_lim, y_lim, variable):
for index, row in df.iterrows():
for i in range(x_lim[0],x_lim[1]):
df['Label'] = variable.get()
print(df)
could anyone help me to solve this problem ?
Related
python newbie here. I have written the code that solves the issue. However, there should be a much better way of doing it.
I have two Series that come from the same table but due to some earlier process I get as separate sets. (They could be joined into a single dataframe again since the entries belong to the same record)
Ser1 Ser2
| id | | section |
| ---| |-------- |
| 1 | | A |
| 2 | | B |
| 2 | | C |
| 3 | | D |
df2
| id | section |
| ---|---------|
| 1 | A |
| 2 | B |
| 2 | Z |
| 2 | Y |
| 4 | X |
First, I would like to find those entries in Ser1, which match the same id in df2. Then, check if the values in the ser2 can NOT be found in the section column of df2
My expected results:
| id | section | result |
| ---|-------- |---------|
| 1 | A | False | # Both id(1) and section(A) are also in df2
| 2 | B | False | # Both id(2) and section(B) are also in df2
| 2 | C | True | # id(2) is in df2 but section(C) is not
| 3 | D | False | # id(3) is not in df2, in that case the result should also be False
My code:
for k, v in Ser2.items():
rslt_df = df2[df2['id'] == Ser[k]]
if rslt_df.empty:
print(False)
if(v not in rslt_df['section'].tolist()):
print(True)
else:
print(False)
I know the code is not very good. But after reading about merging and comprehension lists I am getting confused what the best way would be to improve it.
You can concat the series and compute the "result" with boolean arithmetic (XOR):
out = (
pd.concat([ser1, ser2], axis=1)
.assign(result=ser1.isin(df2['id'])!=ser2.isin(df2['section']))
)
Output:
id section result
0 1 A False
1 2 B False
2 2 C True
3 3 D False
Intermediates:
m1 = ser1.isin(df2['id'])
m2 = ser2.isin(df2['section'])
m1 m2 m1!=m2
0 True True False
1 True True False
2 True False True
3 False False False
I have the below data in a Dataframe.
+----+------+----+------+
| Id | Name | Id | Name |
+----+------+----+------+
| 1 | A | 1 | C |
| 2 | B | 2 | B |
+----+------+----+------+
Though the column names are repeating, ideally, its a comparison of 1st 2 columns (old data) with the last 2 columns (new data).
I was trying to rename the 2nd last column by appending _New to it with Index using the below code. Unfortunately, the 1st column is also getting appended with _New.
df.rename(columns={df.columns[2]: df.columns[2] + '_New'}, inplace=True)
Here's the result I am getting using the above code.
+--------+------+--------+------+
| Id_New | Name | Id_New | Name |
+--------+------+--------+------+
| 1 | A | 1 | C |
| 2 | B | 2 | B |
+--------+------+--------+------+
My understanding is that it should add _New to only the 2nd last column. Below is the expected result.
+----+------+--------+------+
| Id | Name | Id_New | Name |
+----+------+--------+------+
| 1 | A | 1 | C |
| 2 | B | 2 | B |
+----+------+--------+------+
Is there any way to accomplish this?
You can use a simple loop with a dictionary to keep track of the increments. I generalized the logic here to handle an arbitrary number of duplicates:
cols = {}
new_cols = []
for c in df.columns:
if c in cols:
new_cols.append(f'{c}_New{cols[c]}')
cols[c] += 1
else:
new_cols.append(c)
cols[c] = 1
df.columns = new_cols
output:
Id Name Id_New1 Name_New1
0 1 A 1 C
1 2 B 2 B
If you really want Id_New then Id_New2 etc. change:
new_cols.append(f'{c}_New{cols[c]}')
to
i = cols[c] if cols[c] != 1 else ''
new_cols.append(f'{c}_New{i}')
I need help with creating a conditional column using values from multiple other columns with pandas.
Column1|Column2|Column4|Column4
1 | 2 | 5 | A
2 | 3 | 4 | B
3 | 4 | 3 | C
4 | 5 | 2 | B
5 | 1 | 1 | C
And what I want is to create a new column such that if column4 is equal to A then the new column will be equal to the value in column1 so the final dataframe would look like this
Column1|Column2|Column4|Column4|column5
1 | 2 | 5 | A | 1
2 | 3 | 4 | B | 3
3 | 4 | 3 | C | 3
4 | 5 | 2 | B | 5
5 | 1 | 1 | C | 1
Here is what I have tried so far but keep getting the response data.column1 (x) object is not callable
def column5(x):
if x['column4'] == 'A'
return data.column1(x)
elif x['column4'] == 'B'
return data.column2(x)
elif x['column4'] == 'C'
return data.column3(x)
You got error because data.column1 is a pandas.Series, you cannot call it like a function with data.column1(x).
Also your desired value are different for each row based on value of col4, so you will need to use either a loop, or better: using pandas's apply() function.
Try this:
# map value to column
val_to_col = {
'A': 'Column1',
'B': 'Column2',
'C': 'Column3'
}
# get data from col, based on row[col4]
df['column5'] = df.apply(lambda row: row[val_to_col.get(row['Column4'])], axis=1)
Here is a pandas.DataFrame df.
| Foo | Bar |
|-----|-----|
| 0 | A |
| 1 | B |
| 2 | C |
| 3 | D |
| 4 | E |
I selected some rows and defined a new dataframe, by df1 = df.iloc[[1,3],:].
| Foo | Bar |
|-----|-----|
| 1 | B |
| 3 | D |
What is the best way to get the rest of df, like the following.
| Foo | Bar |
|-----|-----|
| 0 | A |
| 2 | C |
| 4 | E |
Fast set-based diffing.
df2 = df.loc[df.index.difference(df1.index)]
df2
Foo Bar
0 0 A
2 2 C
4 4 E
Works as long as your index values are unique.
If I'm understanding correctly, you want to take a dataframe, select some rows from it and store those in a variable df2, and then select rows in df that are not in df2.
If that's the case, you can do df[~df.isin(df2)].dropna().
df[ x ] subsets the dataframe df based on the condition x
~df.isin(df2) is the negation of df.isin(df2), which evaluates to True for rows of df belonging to df2.
.dropna() drops rows with a NaN value. In this case the rows we don't want were coerced to NaN in the filtering expression above, so we get rid of those.
I assume that Foo can be treated as a unique index.
First select Foo values from df1:
idx = df1['Foo'].values
Then filter your original dataframe:
df2 = df[~df['Foo'].isin(idx)]
So I have a dataframe with some values. This is my dataframe:
|in|x|y|z|
+--+-+-+-+
| 1|a|a|b|
| 2|a|b|b|
| 3|a|b|c|
| 4|b|b|c|
I would like to get number of unique values of each row, and number of values that are not equal to value in column x. The result should look like this:
|in | x | y | z | count of not x |unique|
+---+---+---+---+---+---+
| 1 | a | a | b | 1 | 2 |
| 2 | a | b | b | 2 | 2 |
| 3 | a | b | c | 2 | 3 |
| 4 | b | b |nan| 0 | 1 |
I could come up with some dirty decisions here. But there must be some elegant way of doing that. My mind is turning around dropduplicates(that does not work on series); turning into array and .unique(); df.iterrows() that I want to evade; and .apply on each row.
Here are solutions using apply.
df['count of not x'] = df.apply(lambda x: (x[['y','z']] != x['x']).sum(), axis=1)
df['unique'] = df.apply(lambda x: x[['x','y','z']].nunique(), axis=1)
A non-apply solution for getting count of not x:
df['count of not x'] = (~df[['y','z']].isin(df['x'])).sum(1)
Can't think of anything great for unique. This uses apply, but may be faster, depending on the shape of the data.
df['unique'] = df[['x','y','z']].T.apply(lambda x: x.nunique())