Apply function to specific row in dataframe

Apply function to specific row in dataframe - python

I have a dataframe of which I want to clean up a specific row, in this case the first row. I have written a function in which I believe will return the string if the regex is matched.
def clean_cells(string):
if '201' in string:
return re.findall('201[0-9]', string)[0]
else:
return string
I want to apply this function to the first row in the dataframe then replace the first row with the cleaned up version then contatenate the rest of the original dataframe.
I have tried:
df = df.iloc[[0]].apply(clean_cells)

Select first row by position and all columns with : and assign back:
df.iloc[0, :] = df.iloc[0, :].apply(clean_cells)
Another solution:
df.iloc[0] = df.iloc[0].apply(clean_cells)

Related

Extract strings values from DataFrame column

I have the following DataFrame:
Student
food
1
R0100000
2
R0200000
3
R0300000
4
R0400000
I need to extract as a string the values of the "food" column of the df DataFrame when I filter the data.
For example, when I filter by the Student=1, I need the return value of "R0100000" as a string value, without any other characters or spaces.
This is the code to create the same DataFrame as mine:
data={'Student':[1,2,3,4],'food':['R0100000', 'R0200000', 'R0300000', 'R0400000']}
df=pd.DataFrame(data)
I tried to select the Dataframe Column and apply str(), but it does not return me the desired results:
df_new=df.loc[df['Student'] == 1]
df_new=df_new.food
df_str=str(df_new)
del df_new

This works for me:
s = df[df.Student==1]['food'][0]
s.strip()

It's pretty simple, first get the column.
like, col =data["food"] and then use col[index] to get respective value
So, you answer would be data["food"][0]
Also, you can use iloc and loc search for these.
(df.iloc[rows,columns], so we can use this property to get answer as, df.iloc[0,1])
df.loc[rows, column_names] example: df.loc[0,"food"]

loop through pandas columns inside function

I have following function:
def match_function(column):
df_1 = df[column].str.split(',', expand=True)
df_11=df_1.apply(lambda s: s.value_counts(), axis=1).fillna(0)
match = df_11.iloc[:, 0][0]/df_11.sum(axis=1)*100
df[column] = match
return match
this functuion only works if I enter specific column name
how to change this function in the way, if I pass it a certain dataframe, it will loop through all of its columns automatically. so I won't have to enter each column separately?
ps. I know the function it self written very poorly, but im kinda new to coding, sorry

You need to wrap the function so that it does this iteratively over all columns.
If you add this to your code then it'll iterate over the columns while returning the match results in a list (as you will have multiple results as you're running over multiple columns).
def match_over_dataframe_columns(dataframe):
return [match_function(column) for column in dataframe.columns]
results = match_over_dataframe_columns(df)

Instead of inputting column to your function, input the entire dataframe. Then, cast the columns of the df to a list and loop over the columns, performing your analysis on each column. For example:
def match_function(df):
columns = df.columns.tolist()
matches = {}
for column in columns:
#do your analysis
#instead of returning match,
matches[column] = match
return matches
This will return a dictionary with keys of your columns and values of the corresponding match value.

just loop through the columns
def match_function(df):
l_match = []
for column in df.columns:
df_1 = df[column].str.split(',', expand=True)
df_11=df_1.apply(lambda s: s.value_counts(), axis=1).fillna(0)
match = df_11.iloc[:, 0][0]/df_11.sum(axis=1)*100
df[column] = match
l_match.append(match)
return l_match

Find pandas column name using a list

I want to find pandas columns using a list of strings, but I want to find columns even if it contains part of the string. Now if the column name is 'TVD' and I have 'tv' in my list, I want it to be found. The reason is I want to drop these columns and bring them back to the first column. This is my current code but I'm only able to find the exact column name. Let's say the column name is 'TVD (feet)', then I'll be having a problem.
df = sts.read_df(dataset)
depth_names_lower = ['tvd', 'tvdss', 'md']
depth_names_upper = [depth.upper() for depth in depth_names_lower]
depth_names = depth_names_lower + depth_names_upper
tvd_cols = [col for col in df.columns if depth_names in col]
cols = list(df.columns)
for depth in tvd_cols:
cols.pop(cols.index(depth))
df = df[tvd_cols+cols]

you can use regexp to find the target columns, as flags=re.IGNORECASE to ignore case.
pattern = '|'.join(depth_names_lower)
cond = df.columns.str.contains(pattern, regex=True, flags=re.IGNORECASE)
cols = df.columns[cond]
df[cols]

You are attempting to check if depth_names which is a list in contained in col which is a string. This will always return False. You want to do an individual check of each string in depth_names to see if that is a substring of col. One way to do that is to use another list contraction:
tvd_cols = [col for col in df.columns if any([d in col for d in depth_names])]
The inner-contraction will return a list of booleans. "any" evaluates to a single boolean, True if and only if at least one of the list is True.

List of Dataframes, drop Dataframe column (columns have different names) if row contains a special string

What i have is a list of Dataframes.
What is important to note is that the shape of the dataframes differ between 2-7 columns, also the columns are named between 0 & len of the column (e.g. df1 has 5 columns named 0,1,2,3,4 etc. df2 has 4 columns named 0,1,2,3)
I would like is to check if a row in a column contains a certain string, then delete that column.
list_dfs1=[df1,df2,df3...df100]
What i have done so far is the below & i get an error that column 5 is not in axis (it is there for some DF)
for i, df in enumerate(list_dfs1):
for index,row in df.iterrows():
if np.where(row.str.contains("DEC")):
df.drop(index, axis=1)
Any suggestions.

You could try:
for df in list_dfs:
for col in df.columns:
# If you are unsure about column types, cast column as string:
df[col] = df[col].astype(str)
# Check if the column contains the string of interest
if df[col].str.contains("DEC").any():
df.drop(columns=[col], inplace=True)
If you know that all columns are of type string, you don't have to actually do df[col] = df[col].astype(str).

You can write a custom function that checks whether the dataframe has the pattern or not. You can use pd.Series.str.contains with pd.Series.any
def func(s):
return s.str.contains('DEC').any()
list_df = [df.loc[:, ~df.apply(func)] for df in list_dfs1]

I would take another approach. I would concatenate the list into a data frame and then eliminate the column where finding the string
import pandas as pd
df = pd.concat(list_dfs1)
Let us say your condition was to eliminate any column with "DEC"
df.mask(df == "DEC").dropna(axis=1, how="any")

How to use Apply() and self defined function to change data in DataFrame?

What is the easiest way to make some changes in the index column of different rows in a DataFrame ?
def fn(country):
if any(char.isdigit() for char in country):
return country[:-2]
else:
return country
df.loc["Country"].apply(fn,axis=1)

I cant test now. Can you try: df['Country'] = df.apply(lambda row: fn(row),axis = 1) and change your function argument to take the row into account (like row['Country']). This way you can manipulate anything you want row by row using other column values.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Apply function to specific row in dataframe - python

Select first row by position and all columns with : and assign back: df.iloc[0, :] = df.iloc[0, :].apply(clean_cells) Another solution: df.iloc[0] = df.iloc[0].apply(clean_cells)

Related

Extract strings values from DataFrame column

loop through pandas columns inside function

Find pandas column name using a list

List of Dataframes, drop Dataframe column (columns have different names) if row contains a special string

How to use Apply() and self defined function to change data in DataFrame?

Categories

Resources