I have a dataframe of which I want to clean up a specific row, in this case the first row. I have written a function in which I believe will return the string if the regex is matched.
def clean_cells(string):
if '201' in string:
return re.findall('201[0-9]', string)[0]
else:
return string
I want to apply this function to the first row in the dataframe then replace the first row with the cleaned up version then contatenate the rest of the original dataframe.
I have tried:
df = df.iloc[[0]].apply(clean_cells)
Select first row by position and all columns with : and assign back:
df.iloc[0, :] = df.iloc[0, :].apply(clean_cells)
Another solution:
df.iloc[0] = df.iloc[0].apply(clean_cells)
Related
I have the following DataFrame:
Student
food
1
R0100000
2
R0200000
3
R0300000
4
R0400000
I need to extract as a string the values of the "food" column of the df DataFrame when I filter the data.
For example, when I filter by the Student=1, I need the return value of "R0100000" as a string value, without any other characters or spaces.
This is the code to create the same DataFrame as mine:
data={'Student':[1,2,3,4],'food':['R0100000', 'R0200000', 'R0300000', 'R0400000']}
df=pd.DataFrame(data)
I tried to select the Dataframe Column and apply str(), but it does not return me the desired results:
df_new=df.loc[df['Student'] == 1]
df_new=df_new.food
df_str=str(df_new)
del df_new
This works for me:
s = df[df.Student==1]['food'][0]
s.strip()
It's pretty simple, first get the column.
like, col =data["food"] and then use col[index] to get respective value
So, you answer would be data["food"][0]
Also, you can use iloc and loc search for these.
(df.iloc[rows,columns], so we can use this property to get answer as, df.iloc[0,1])
df.loc[rows, column_names] example: df.loc[0,"food"]
I have following function:
def match_function(column):
df_1 = df[column].str.split(',', expand=True)
df_11=df_1.apply(lambda s: s.value_counts(), axis=1).fillna(0)
match = df_11.iloc[:, 0][0]/df_11.sum(axis=1)*100
df[column] = match
return match
this functuion only works if I enter specific column name
how to change this function in the way, if I pass it a certain dataframe, it will loop through all of its columns automatically. so I won't have to enter each column separately?
ps. I know the function it self written very poorly, but im kinda new to coding, sorry
You need to wrap the function so that it does this iteratively over all columns.
If you add this to your code then it'll iterate over the columns while returning the match results in a list (as you will have multiple results as you're running over multiple columns).
def match_over_dataframe_columns(dataframe):
return [match_function(column) for column in dataframe.columns]
results = match_over_dataframe_columns(df)
Instead of inputting column to your function, input the entire dataframe. Then, cast the columns of the df to a list and loop over the columns, performing your analysis on each column. For example:
def match_function(df):
columns = df.columns.tolist()
matches = {}
for column in columns:
#do your analysis
#instead of returning match,
matches[column] = match
return matches
This will return a dictionary with keys of your columns and values of the corresponding match value.
just loop through the columns
def match_function(df):
l_match = []
for column in df.columns:
df_1 = df[column].str.split(',', expand=True)
df_11=df_1.apply(lambda s: s.value_counts(), axis=1).fillna(0)
match = df_11.iloc[:, 0][0]/df_11.sum(axis=1)*100
df[column] = match
l_match.append(match)
return l_match
I want to find pandas columns using a list of strings, but I want to find columns even if it contains part of the string. Now if the column name is 'TVD' and I have 'tv' in my list, I want it to be found. The reason is I want to drop these columns and bring them back to the first column. This is my current code but I'm only able to find the exact column name. Let's say the column name is 'TVD (feet)', then I'll be having a problem.
df = sts.read_df(dataset)
depth_names_lower = ['tvd', 'tvdss', 'md']
depth_names_upper = [depth.upper() for depth in depth_names_lower]
depth_names = depth_names_lower + depth_names_upper
tvd_cols = [col for col in df.columns if depth_names in col]
cols = list(df.columns)
for depth in tvd_cols:
cols.pop(cols.index(depth))
df = df[tvd_cols+cols]
you can use regexp to find the target columns, as flags=re.IGNORECASE to ignore case.
pattern = '|'.join(depth_names_lower)
cond = df.columns.str.contains(pattern, regex=True, flags=re.IGNORECASE)
cols = df.columns[cond]
df[cols]
You are attempting to check if depth_names which is a list in contained in col which is a string. This will always return False. You want to do an individual check of each string in depth_names to see if that is a substring of col. One way to do that is to use another list contraction:
tvd_cols = [col for col in df.columns if any([d in col for d in depth_names])]
The inner-contraction will return a list of booleans. "any" evaluates to a single boolean, True if and only if at least one of the list is True.
What i have is a list of Dataframes.
What is important to note is that the shape of the dataframes differ between 2-7 columns, also the columns are named between 0 & len of the column (e.g. df1 has 5 columns named 0,1,2,3,4 etc. df2 has 4 columns named 0,1,2,3)
I would like is to check if a row in a column contains a certain string, then delete that column.
list_dfs1=[df1,df2,df3...df100]
What i have done so far is the below & i get an error that column 5 is not in axis (it is there for some DF)
for i, df in enumerate(list_dfs1):
for index,row in df.iterrows():
if np.where(row.str.contains("DEC")):
df.drop(index, axis=1)
Any suggestions.
You could try:
for df in list_dfs:
for col in df.columns:
# If you are unsure about column types, cast column as string:
df[col] = df[col].astype(str)
# Check if the column contains the string of interest
if df[col].str.contains("DEC").any():
df.drop(columns=[col], inplace=True)
If you know that all columns are of type string, you don't have to actually do df[col] = df[col].astype(str).
You can write a custom function that checks whether the dataframe has the pattern or not. You can use pd.Series.str.contains with pd.Series.any
def func(s):
return s.str.contains('DEC').any()
list_df = [df.loc[:, ~df.apply(func)] for df in list_dfs1]
I would take another approach. I would concatenate the list into a data frame and then eliminate the column where finding the string
import pandas as pd
df = pd.concat(list_dfs1)
Let us say your condition was to eliminate any column with "DEC"
df.mask(df == "DEC").dropna(axis=1, how="any")
What is the easiest way to make some changes in the index column of different rows in a DataFrame ?
def fn(country):
if any(char.isdigit() for char in country):
return country[:-2]
else:
return country
df.loc["Country"].apply(fn,axis=1)
I cant test now. Can you try: df['Country'] = df.apply(lambda row: fn(row),axis = 1) and change your function argument to take the row into account (like row['Country']). This way you can manipulate anything you want row by row using other column values.