how to add a value to the next cell with python pandas - python

I have a value 'x' from a table that i'm interested in. I want to first find where the value 'x' is in the table, and add a string 's' to the cell to the right of 'x'(the next column but the same row).
df[df.ix('x')] = s #would replace 'x' with 's'
df[df.ix('x')+1] = s #so i tried it with '+1' to indicate the same row but next the column but the syntax is wrong.
UPDATE:
example raw table data -
columnA columnB
A
B
X
C
X
D
desired outcome -
columnA columnB
A
X S
X S
B
X S
C
my code in a simplified version:
data = pd.read_excel('C:/Users/....table.xlsx', sep='\t')
for vh in data["columnA"]:
data[df.ix('X')+1] = s
#obviously the '+1' syntax is wrong, how should i change it?
#i want S in columnB where there is X in column A
thanks in advance!
UPDATE NEW CODE:
for line in f:
for vh in data["columnA"]:
vh = vh.rstrip()
tmp = data[line in vh]
tmp = tmp[list(tmp.columns[-1]) + tmp.columns.tolist()[:-1]]
tmp.columns = data.columns
data[tmp] = string
i think the syntax is wrong, anyone has any idea?
thanks

Assuming you have no 'x' values in the last column of your DataFrame:
tmp = df == 'X' # boolean mask
tmp = tmp[list(tmp.columns[-1]) + tmp.columns.tolist()[:-1]] # shift the order of columns to 1 ahead
tmp.columns = df.columns # restore names order in the mask
df[tmp] = 'S' # setting the s value to the cell right after the 'X'
For your two-columns DataFrame it would be as simple as that:
df["columnB"] = df["columnA"].apply(lambda x: 'S' if x == 'X' else '')

Related

Custom function creation: How to skip first line of 'if' statement based on input parameters?

I have created a custom function designed to append columns onto an existing data frame based on user inputs. However, if the last variable is set to the None argument, I want the if-else statement to end. However, if it is set to something else (a string), I want the statement to continue. How can I properly format my if-else statement for this?
Here is the data:
import pandas as pd
data = {'col1':['a','b','c','d','e','f','g'],
'foo':[1,2,3,4,5,6,7],
'bar':[8,9,10,11,12,13,14],
'foobar':[15,16,17,18,19,20,21]}
df = pd.DataFrame(data)
df
Here is my function:
def func(a, b, c, d):
"""
Parameters
---------------
a = String
name of col 2 in data frame
b = String
name of col 3 in data frame
c = String
name of col 4 in data frame
d = String
Name of last desired output column. Can assign None if nothing is desired.
Returns:
---------------
Input data frame with appended columns
Example:
---------------
func('foo', 'bar', 'foobar', 'lastcol')
"""
df['new_col1'] = df[a] + 44
df['new_col2'] = df[b] + 88
df['new_col3'] = df[c] + 133
if d == None:
continue
else:
df[d] = df[a] + df[b]
return df.head(5)
The following error is produced:
Input In [20]
continue
^
SyntaxError: 'continue' not properly in loop
continue is used inside a for-loop or a while-loop.
to termiante the function you could do:
if d is not None:
df[d] = df[a] + df[b]
return df.head(5)
Why not instead of
if d == None:
continue
else:
df[d] = df[a] + df[b]
...just do this:
if d is not None:
df[d] = df[a] + df[b]

Set a new column using Pandas

I have a dataframe like this:
A Status_A Invalid_A
0 Null OR Blank True
1 NaN Null OR Blank True
2 Xv Valid False
I want a dataframe like this:
A Status_A Invalid_A
0 Null OR Blank A True
1 NaN Null OR Blank A True
2 Xv Valid False
I want to append column name to the Status_A column when I create df using
def checkNull(ele):
if pd.isna(ele) or (ele == ''):
return ("Null OR Blank", True)
else:
return ("Valid", False)
df[['Status_A', 'Invalid_A']] = df['A'].apply(checkNull).tolist()
I want to pass column name in this function.
You have a couple of options here.
One option is that when you create the dataframe, you can pass additional arguments to pd.Series.apply:
def checkNull(ele, suffix):
if pd.isna(ele) or (ele ==''):
return (f"Null OR Blank {suffix}", True)
else :
return ("Valid", False)
df[['Status_A', 'Invalid_A']] = df['A'].apply(checkNull, args=('A',)).tolist()
Another option is to post-process the dataframe to add the suffix
df.loc[df['Invalid_A'], 'Status_A'] += '_A'
That being said, both columns are redundant, which is usually code smell. Consider just using the boolean series pd.isna(df['A']) | (df['A'] == '') as an index instead.
The more efficient way is to use np.where
df[('Status%s') % '_A'] = np.where((df['A'].isnull()) | (df['A']==''), 'Null or Blank', 'Valid')
df[('Invalid%s') % '_A'] = np.where((df['A'].isnull()) | (df['A']==''), 'True', 'False')
Maybe something like this
def append_col_name(df, col_name):
col = f"Status_{col_name}"
df[col] = df[col].apply(lambda x : x + " " + col_name if x != "Valid" else x)
return df
Then with your df
append_col_name(df, "A")
if you're checking each element, you can use a vectorised operation and return an entire dataframe, as opposed to operating on a column.
def str_col_check(colname : str,
dataframe : pd.DataFrame) -> pd.DataFrame:
suffix = colname.split('_')[-1]
dataframe.loc[df['Status_A'].isin(['Null OR Blank', '']),'Status_A'] = dataframe['Status_A'] + '_' + suffix
return dataframe

Pandas CSV : Check for each row if a column is empty

I want to test for each row of a CSV file if some column are empty or not and change value of another column depending on that.
Here is what I have :
df = df.replace(r'^\s*$', np.NaN, regex=True)
df['Multi-line'] = pd.Series(dtype=object)
for i, row in df.iterrows():
if (row['Directory Number 1'] != np.NaN and row['Directory Number 2'] != np.NaN and row['Directory Number 3'] != np.NaN and row['Directory Number 4'] != np.NaN):
df.at[i,'Multi-line'] = 'Yes'
If 2 "Directory Number X" or more are not empty, I want the "Multi-line" column to be "Yes" and if 1 or 0 "Directory Number X" are not empty then "Multi-line" should be "No".
Here is only one if just to show you how it looks but in my test sample, all Multi-line are set to "Yes", it seems like the problem is inside the If condition with the row value and the np.nan but I don't know how to check if a row value is empty or not..
Thanks for you help !
I assume that you executed df = df.replace(r'^\s*$', np.NaN, regex=True)
before.
Then, to generate the new column, run:
df['Multi-line'] = df.apply(lambda row: 'Yes' if row.notna().sum() >= 2 else 'No', axis=1)
No need for explicit call to iterrows, as apply arranges just such
a loop, invoking the passed function for each row.
If your DataFrame has also other columns, especially when they can
have NaN values, then application of this lambda function should be
limited to just these 4 columns of interest.
In this case run:
cols = [ f'Directory Number {i}' for i in range(1, 5) ]
df['Multi-line'] = df[cols].apply(lambda row:
'Yes' if row.notna().sum() >= 2 else 'No', axis=1)
Note also that a check like if (row[s] != np.NaN): as proposed
in the other solution is a bad approach, since NaN by definition
is not equal to another NaN, so you can't just compare two NaNs.
To check it try:
s = np.nan
s2 = np.nan
s != s2 # True
s == s2 # False
Then save any "true" string in s, running s = 'xx' and repeat:
s != s2 # True
s == s2 # False
with just the same result.
You can use a counter instead
df = df.replace(r'^\s*$', np.NaN, regex=True)
df['Multi-line'] = pd.Series(dtype=object)
cnt=0;
str = ['Directory Number 1','Directory Number 2','Directory Number 3','Directory Number 4'];
for i, row in df.iterrows():
for s in str:
if (row[s] != np.NaN):
cnt+=1;
if (cnt>2):
df.at[i,'Multi-line'] = 'Yes'
else:
df.at[i,'Multi-line'] = 'No'
cnt=0;

Search for a value anywhere in a pandas DataFrame

This seems like a simple question, but I couldn't find it asked before (this and this are close but the answers aren't great).
The question is: if I want to search for a value somewhere in my df (I don't know which column it's in) and return all rows with a match.
What's the most Pandaic way to do it? Is there anything better than:
for col in list(df):
try:
df[col] == var
return df[df[col] == var]
except TypeError:
continue
?
You can perform equality comparison on the entire DataFrame:
df[df.eq(var1).any(1)]
You should using isin , this is return the column , is want row check cold' answer :-)
df.isin(['bal1']).any()
A False
B True
C False
CLASS False
dtype: bool
Or
df[df.isin(['bal1'])].stack() # level 0 index is row index , level 1 index is columns which contain that value
0 B bal1
1 B bal1
dtype: object
You can try the code below:
import pandas as pd
x = pd.read_csv(r"filePath")
x.columns = x.columns.str.lower().str.replace(' ', '_')
y = x.columns.values
z = y.tolist()
print("Note: It take Case Sensitive Values.")
keyWord = input("Type a Keyword to Search: ")
try:
for k in range(len(z)-1):
l = x[x[z[k]].str.match(keyWord)]
print(l.head(10))
k = k+1
except:
print("")
This is a solution which will return the actual column you need.
df.columns[df.isin(['Yes']).any()]
Minimal solution:
import pandas as pd
import numpy as np
def locate_in_df(df, value):
a = df.to_numpy()
row = np.where(a == value)[0][0]
col = np.where(a == value)[1][0]
return row, col

Issues calling values above each other in a matrix, python

This is my first post so let me know if I need to change anything!
I've created a grid based on the following input:
1;2;12;12;12
11;12;2;12;12
1;2;12;2;12
11;12;2;1;2
To create this grid I used this piece of code:
node = hou.pwd()
geo = node.geometry()
text = node.evalParm('text')
lines = text.splitlines()
numbers = [map(int, line.split(';') ) for line in lines]
geo.addAttrib(hou.attribType.Point, 'instance', -1)
for row, n in enumerate(numbers):
for col, value in enumerate(n):
pos = (col, 0.0, row)
pt_add = geo.createPoint()
pt_add.setAttribValue('instance', value)
pt_add.setPosition(pos)
This works great and creates the grid with points spaced 1 apart and with the correct value.
Now I want to do the following:
if value != 0:
a = #Value of current index
b = #Value of index above
if len(a) < len(b):
if a[-len(a)] == b[-len(a)]:
a = '0'
b = b
else:
pass
else:
if len(a)== len(b):
if len(a) < 2:
pass
else:
a = '0'
b = b + 'x'
else:
if a[-len(a)] == b[-len(a)]:
b = a + 'y'
a = '0'
else:
pass
Now I'm assuming I need to go over the rows and columns again but if I do that with a for loop then it won't allow me to call the value of the index above in that column. Could someone help me figure this out? And I'll need to change the value of "instance" to the new value.
As a brief explanation of what I'm trying to achieve:
Example image
Edit: Adding something other than x or y to the int to differentiate between a "11" with 1 changed to "0" under it and an "11" with 2 or 3 changed to "0" under them.

Categories