I'm' trying to build a function to the job because my data frames are in a list. This is the function that I am working on:
def lower(x):
'''
This function lowercase the entire Data Frame.
'''
for x in clean_lst:
for x.columns in x:
x.columns['i'].map(lambda i: i.lower())
It's not working like that!
This is the list of data frames:
clean_lst = [pop_movies, trash_movies]
I am planing to access the list like this:
lower = [pd.DataFrame(lower(x)) for x in clean_list]
pop_movies = lower[0]
trash_movies = lower[1]
HELP!!!
You can use apply functions from pandas package which works on df / series.
clean_lst = [i.apply(lambda x: x.str.lower()) for i in clean_lst]
You should use a vectorized method for every column in the dataframe
x["column_i"].str.lower()
Related
I have a data frame that looks like the following:
Category Class
==========================
['org1', 'org2'] A
['org2', 'org3'] B
org1 C
['org3', 'org4'] A
org2 A
['org2', 'org4'] B
...
When I read in this data using Pandas, the lists are read in as strings (e.g., dat['Category][0][0] returns [ rather than returning org1). I have several columns like this. I want every categorical column that already contains at least one list to have all records be a list. For example, the above data frame should look like the following:
Category Class
==========================
['org1', 'org2'] A
['org2', 'org3'] B
['org1'] C
['org3', 'org4'] A
['org2'] A
['org2', 'org4'] B
...
Notice how the singular values in the Category column are now contained in lists. When I reference dat['Category][0][0], I'd like org1 to be returned.
What is the best way to accomplish this? I was thinking of using ast.literal_eval with an apply and lambda function, but I'd like to try and use best-practices if possible. Thanks in advance!
You could create a boolean mask of the values that need to changed. If there are no lists, no change is needed. If there are lists, you can apply literal_eval or a list creation lambda to subsets of the data.
import ast
import pandas as pd
def normalize_category(df):
is_list = df['Category'].str.startswith('[')
if is_list.any():
df.loc[is_list,'Category'] = df.loc[is_list, 'Category'].apply(ast.literal_eval)
df.loc[~is_list,'Category'] = df.loc[~is_list]['Category'].apply(lambda val: [val])
df = pd.DataFrame({"Category":["['org1', 'org2']", "org1"], "Class":["A", "B"]})
normalize_category(df)
print(df)
df = pd.DataFrame({"Category":["org2", "org1"], "Class":["A", "B"]})
normalize_category(df)
print(df)
You can do it like this:
df['Category'] = df['Category'].apply(lambda x: literal_eval(x) if x.startswith('[') else [x])
I am a new coder using jupyter notebook. I have a dataframe that contains 23 columns with different amounts of values( at most 23 and at least 2) I have created a function that normalizes the contents of one column below.
def normalize(column):
y = DFref[column].values[()]
y = x.astype(int)
KGF= list()
for element in y:
element_norm = element / x.sum()
KGF.append(element_norm)
return KGF
I am now trying to create a function that loops through all columns in the Data frame. Right now if I plug in the name of one column, it works as intended. What would I need to do in order to create a function that loops through each column and normalizes the values of each column, and then adds it to a new dataframe?
It's not clear if all 23 columns are numeric, but I will assume they are. Then there are a number of ways to solve this. The method below probably isn't the best, but it might be a quick fix for you...
colnames = DFref.columns.tolist()
normalised_data = {}
for colname in colnames:
normalised_data[colname] = normalize(colname)
df2 = pd.DataFrame(normalised_data)
I want to read three columns from my pandas data frame and then combine with some character to form a new data frame column, the below iteration code works fine.
def date_creation(a,b,c):
date=str(a) +'/'+str(b)+'/'+str(c)
return date
df.loc["Test_FL_DATE"]=df[:,["DAY_OF_MONTH","MONTH","AYEAR"]].apply(date_creation)
Sample Input
Sample Output
However, if I want to do the same job by using apply or lambda. In fact, I am trying but it is not working. the code is as below which I believe is not correct. Thanks in advance for helping me out.
def date_creation(a,b,c):
date=str(a) +'/'+str(b)+'/'+str(c)
return date
df.loc["Test_FL_DATE"]=df[:,["DAY_OF_MONTH","MONTH","AYEAR"]].apply(date_creation)
Here is possible use if need lambda function:
cols = ["DAY_OF_MONTH","MONTH","AYEAR"]
df["Test_FL_DATE"] = df[cols].astype(str).apply(lambda x: '/'.join(x))
Or:
df["Test_FL_DATE"] = df[cols].apply(lambda x: '/'.join(x.astype(str)))
But nicer is:
df["Test_FL_DATE"] = df[["DAY_OF_MONTH","MONTH","AYEAR"]].astype(str).apply('/'.join)
And faster solution is simply join by +:
df["Test_FL_DATE"] = (df["DAY_OF_MONTH"].astype(str) + '/' +
df["MONTH"].astype(str) + '/' +
df["AYEAR"].astype(str))
Probably easiest to use pd.Series.str.cat, which concatenates one string Series with other Series.
df['Test_FL_Date'] = (df['DAY_OF_MONTH']
.astype(str)
.str
.cat([df['MONTH'], df['AYEAR'], sep='/'))
I am trying to add columns to a python pandas df using the apply function.
However the number of columns to be added depend on the output of the function
used in the apply function.
example code:
number_of_columns_to_be_added = 2
def add_columns(number_of_columns_to_be_added):
df['n1'],df['n2'] = zip(*df['input'].apply(lambda x : do_something(x, number_of_columns_to_be_added)))
Any idea on how to define the ugly column part (df['n1'], ..., df['n696969']) before the = zip( ... part programatically?
I'm guessing that the output of zip is a tuple, therefore you could try this:
temp = zip(*df['input'].apply(lambda x : do_something(x, number_of_columns_to_be_added)))
for i, value in enumerate(temp, 1):
key = 'n'+str(i)
df[key] = value
temp will hold the all the entries and then you iterate over tempto assign the values to your dict with your specific keys. Hope this matches your original idea.
I am relatively new to Python programming. I have a pandas DataFrame object say obj1. I need to apply a series of transformation to records stored in obj1['key']. Suppose obj1['key'] has 300 entries and I need to apply func1 then func2 then func3 on each of the 300 entries and store the final result in obj1['key'].
One way would be to do as below. Is there a better way to do the same?
obj1['key']=[func3(func2(func1(item))) for item in obj1['key']]
Python generators can't be used for this purpose right.
Yes, you could use the default method DataFrame.apply()
df = df.apply(f1).apply(f2).apply(fn)
define a function
def recursivator(x, fs):
return recursivator(fs[-1](x), fs[:-1]) if len(fs) > 0 else x
x is the thing being operated on, fs is a list of functions.
df = df.applymap(lambda x: recursivator(x, fs))