Functionto create a DF - python

I want to create a DF from another DF using a function like this:
def create_df_region(df,region):
df = pd.DataFrame(index=df_reduced.index)
df['Cons'] = df_reduced['ind_{region}'.format()].value
Problem is: ind_{} can assume values like ind_s, ind_n, ind_no and I want to pass these values when creating the DF because n means norh, s means south and so on.
then, to create the df:
df_south = create_df_region(df_reduced, s)
when s mean the south beacuse in the df_reduced i have columns ind_s, ind_s...
How can I do it as the way i am trying abive is not working.

You need to return the newly created dataframe at the end of the function,
use .values instead of .value and use f-string for retrieving the source column name, as follows:
def create_df_region(df, region):
df = pd.DataFrame(index=df_reduced.index)
df['Cons'] = df_reduced[f'ind_{region}'].values # use .values instead of .value
return df
Also, when you call the function, you need to pass a string 's' instead of the variable name s as follows:
df_south = create_df_region(df_reduced, 's')

Use f'ind_{region}' instead .format():
def create_df_region(df_reduced,region):
df = pd.DataFrame(index=df_reduced.index)
df['Cons'] = df_reduced[f'ind_{region}'].value
*I've also changed the first parameter of the function from df to df_reduced to make sense.

Related

How to use multiple pandas functions in a single variable python

I want to drop a column level and columns to the right, from data downloaded from yahoo finance.
FAANG = yf.download(['AAPL','GOOGL','NFLX','META','AMZN','SPY'],
start = '2008-01-01',end = '2022-12-31')
FAANG_AC = FAANG.drop(FAANG.columns[6:36],axis=1)
FAC = FAANG_AC.droplevel(0,axis=1)
How do I combine .drop and .droplevel into a single variable, so that I do not have to use multiple variables in this situation?
You don't need to use intermediate variables. You can chain everything:
FAANG = (yf.download(['AAPL','GOOGL','NFLX','META','AMZN','SPY'],
start='2008-01-01', end = '2022-12-31')
.drop(FAANG.columns[6:36], axis=1)
.droplevel(0, axis=1)
)
You can add inplace=True as a parameter for when calling those methods. Like:
FAANG.drop(FAANG.columns[6:36],axis=1, inplace=True)
Careful: it will modify the FAANG variable.
Reference: https://www.askpython.com/python-modules/pandas/inplace-true-parameter

How do I apply a function over a column?

I have created a function I would like to apply over a given dataframe column. Is there an apply function so that I can create a new column and apply my created function?
Example code:
dat = pd.DataFrame({'title': ['cat', 'dog', 'lion','turtle']})
Manual method that works:
print(calc_similarity(chosen_article,str(df['title'][1]),model_word2vec))
print(calc_similarity(chosen_article,str(df['title'][2]),model_word2vec))
Attempt to apply over dataframe column:
dat['similarity']= calc_similarity(chosen_article, str(df['title']), model_word2vec)
The issue I have been running into is that the function outputs the same result over the entirety of the newly created column.
I have tried apply() as follows:
dat['similarity'] = dat['title'].apply(lambda x: calc_similarity(chosen_article, str(x), model_word2vec))
and
dat['similarity'] = dat['title'].astype(str).apply(lambda x: calc_similarity(chosen_article, x, model_word2vec))
Which result in a ZeroDivisionError which i am not understanding since I am not passing empty strings
Function being used:
def calc_similarity(input1, input2, vectors):
s1words = set(vocab_check(vectors, input1.split()))
s2words = set(vocab_check(vectors, input2.split()))
output = vectors.n_similarity(s1words, s2words)
return output
It sounds like you are having difficulty applying a function while passing additional keyword arguments. Here's how you can execute that:
# By default, function will use values for first arg.
# You can specify kwargs in the apply method though
df['similarity'] = df['title'].apply(
calc_similarity,
input2=chosen_article,
vectors=model_word2vec
)

How to create conditionnal columns in Pandas with any?

I'm working with Pandas. I need to create a new column in a dataframe according to conditions in other columns. I try to look for each value in a series if it contains a value (a condition to return text).This works when the values are exactly the same but not when the value is only a part of the value of the series.
Sample data :
df = pd.DataFrame([["ores"], ["ores + more texts"], ["anything else"]], columns=['Symptom'])
def conditions(df5):
if ("ores") in df5["Symptom"]: return "Things"
df["new_column"] = df.swifter.apply(conditions, axis=1)
It's doesn't work because any("something") is always True
So i tried :
df['new_column'] = np.where(df2["Symptom"].str.contains('ores'), 'yes', 'no') : return "Things"
It doesn't work because it's inside a loop.
I can't use np.select because it needed two separate lists and my code has to be easily editable (and it can't come from a dict).
It also doesn't work with find_all. And also not with :
df["new_column"] == "ores" is True: return "things"
I don't really understand why nothing work and what i have to do ?
Edit :
df5 = pd.DataFrame([["ores"], ["ores + more texts"], ["anything else"]], columns=['Symptom'])
def conditions(df5):
(df5["Symptom"].str.contains('ores'), 'Things')
df5["Deversement Service"] = np.where(conditions)
df5
For the moment i have a lenght of values problem
To add a new column with condition, use np.where:
df = pd.DataFrame([["ores"], ["ores + more texts"], ["anything else"]], columns=['Symptom'])
df['new'] = np.where(df["Symptom"].str.contains('ores'), 'Things', "")
print (df)
Symptom new
0 ores Things
1 ores + more texts Things
2 anything else
If you need a single boolean value, use pd.Series.any:
if df["Symptom"].str.contains('ores').any():
print ("Things")
# Things

How to lowercase an entire Data Frame?

I'm' trying to build a function to the job because my data frames are in a list. This is the function that I am working on:
def lower(x):
'''
This function lowercase the entire Data Frame.
'''
for x in clean_lst:
for x.columns in x:
x.columns['i'].map(lambda i: i.lower())
It's not working like that!
This is the list of data frames:
clean_lst = [pop_movies, trash_movies]
I am planing to access the list like this:
lower = [pd.DataFrame(lower(x)) for x in clean_list]
pop_movies = lower[0]
trash_movies = lower[1]
HELP!!!
You can use apply functions from pandas package which works on df / series.
clean_lst = [i.apply(lambda x: x.str.lower()) for i in clean_lst]
You should use a vectorized method for every column in the dataframe
x["column_i"].str.lower()

Why are my variable not accessible after a function?

I can't figure out why my function isn't providing the changes to the variables after I execute the function. Or why the variables are accessible after the function. I'm provided a dataframe and telling the fucntion the column to compare. I want the function to include the matching values are the original dataframe and create a separate dataframe that I can see just the matches. When I run the code I can see the dataframe and matching dataframe after running the function, but when I tried to call the matching dataframe after python doesn't recognize the variable as define and the original dataframe isn't modified when I look at it again. I've tried to call them both as global variables at the beginning of the function, but that didn't work either.
def scorer_tester_function(dataframe, score_type, source, compare, limit_num):
match = []
match_index = []
similarity = []
org_index = []
match_df = pd.DataFrame()
for i in zip(source.index, source):
position = list(source.index)
print(str(position.index(i[0])) + " of " + str(len(position)))
if pd.isnull(i[1]):
org_index.append(i[0])
match.append(np.nan)
similarity.append(np.nan)
match_index.append(np.nan)
else:
ratio = process.extract( i[1], compare, limit=limit_num,
scorer=scorer_dict[score_type])
org_index.append(i[0])
match.append(ratio[0][0])
similarity.append(ratio[0][1])
match_index.append(ratio[0][2])
match_df['org_index'] = pd.Series(org_index)
match_df['match'] = pd.Series(match)
match_df['match_index'] = pd.Series(match_index)
match_df['match_score'] = pd.Series(similarity)
match_df.set_index('org_index', inplace=True)
dataframe = pd.concat([dataframe, match_df], axis=1)
return match_df, dataframe
I'm calling the function list this:
scorer_tester_function(df_ven, 'WR', df_ven['Name 1'].sample(2), df_emp['Name 2'], 1)
My expectation is that I can access match_df and def_ven and I would be able to see and further manipulate these variables, but when called the original dataframe df_ven is unchanged and match_df returns a variable not defined error.
return doesn't inject local variables into the caller's scope; it makes the function call evaluate to their values.
If you write
a, b = scorer_tester_function(df_ven, 'WR', df_ven['Name 1'].sample(2), df_emp['Name 2'], 1)
then a will have the value of match_df from inside the function and b will have the value of dataframe, but the names match_df and dataframe go out of scope after the function returns; they do not exist outside of it.

Categories