How to lowercase an entire Data Frame? - python

I'm' trying to build a function to the job because my data frames are in a list. This is the function that I am working on:
def lower(x):
'''
This function lowercase the entire Data Frame.
'''
for x in clean_lst:
for x.columns in x:
x.columns['i'].map(lambda i: i.lower())
It's not working like that!
This is the list of data frames:
clean_lst = [pop_movies, trash_movies]
I am planing to access the list like this:
lower = [pd.DataFrame(lower(x)) for x in clean_list]
pop_movies = lower[0]
trash_movies = lower[1]
HELP!!!

You can use apply functions from pandas package which works on df / series.
clean_lst = [i.apply(lambda x: x.str.lower()) for i in clean_lst]

You should use a vectorized method for every column in the dataframe
x["column_i"].str.lower()

Related

Use ast.literal_eval on all columns of a Pandas Dataframe

I have a data frame that looks like the following:
Category Class
==========================
['org1', 'org2'] A
['org2', 'org3'] B
org1 C
['org3', 'org4'] A
org2 A
['org2', 'org4'] B
...
When I read in this data using Pandas, the lists are read in as strings (e.g., dat['Category][0][0] returns [ rather than returning org1). I have several columns like this. I want every categorical column that already contains at least one list to have all records be a list. For example, the above data frame should look like the following:
Category Class
==========================
['org1', 'org2'] A
['org2', 'org3'] B
['org1'] C
['org3', 'org4'] A
['org2'] A
['org2', 'org4'] B
...
Notice how the singular values in the Category column are now contained in lists. When I reference dat['Category][0][0], I'd like org1 to be returned.
What is the best way to accomplish this? I was thinking of using ast.literal_eval with an apply and lambda function, but I'd like to try and use best-practices if possible. Thanks in advance!
You could create a boolean mask of the values that need to changed. If there are no lists, no change is needed. If there are lists, you can apply literal_eval or a list creation lambda to subsets of the data.
import ast
import pandas as pd
def normalize_category(df):
is_list = df['Category'].str.startswith('[')
if is_list.any():
df.loc[is_list,'Category'] = df.loc[is_list, 'Category'].apply(ast.literal_eval)
df.loc[~is_list,'Category'] = df.loc[~is_list]['Category'].apply(lambda val: [val])
df = pd.DataFrame({"Category":["['org1', 'org2']", "org1"], "Class":["A", "B"]})
normalize_category(df)
print(df)
df = pd.DataFrame({"Category":["org2", "org1"], "Class":["A", "B"]})
normalize_category(df)
print(df)
You can do it like this:
df['Category'] = df['Category'].apply(lambda x: literal_eval(x) if x.startswith('[') else [x])

Create function that loops through columns in a Data frame

I am a new coder using jupyter notebook. I have a dataframe that contains 23 columns with different amounts of values( at most 23 and at least 2) I have created a function that normalizes the contents of one column below.
def normalize(column):
y = DFref[column].values[()]
y = x.astype(int)
KGF= list()
for element in y:
element_norm = element / x.sum()
KGF.append(element_norm)
return KGF
I am now trying to create a function that loops through all columns in the Data frame. Right now if I plug in the name of one column, it works as intended. What would I need to do in order to create a function that loops through each column and normalizes the values of each column, and then adds it to a new dataframe?
It's not clear if all 23 columns are numeric, but I will assume they are. Then there are a number of ways to solve this. The method below probably isn't the best, but it might be a quick fix for you...
colnames = DFref.columns.tolist()
normalised_data = {}
for colname in colnames:
normalised_data[colname] = normalize(colname)
df2 = pd.DataFrame(normalised_data)

How to combine multiple columns in a pandas Dataframe by using apply?

I want to read three columns from my pandas data frame and then combine with some character to form a new data frame column, the below iteration code works fine.
def date_creation(a,b,c):
date=str(a) +'/'+str(b)+'/'+str(c)
return date
df.loc["Test_FL_DATE"]=df[:,["DAY_OF_MONTH","MONTH","AYEAR"]].apply(date_creation)
Sample Input
Sample Output
However, if I want to do the same job by using apply or lambda. In fact, I am trying but it is not working. the code is as below which I believe is not correct. Thanks in advance for helping me out.
def date_creation(a,b,c):
date=str(a) +'/'+str(b)+'/'+str(c)
return date
df.loc["Test_FL_DATE"]=df[:,["DAY_OF_MONTH","MONTH","AYEAR"]].apply(date_creation)
Here is possible use if need lambda function:
cols = ["DAY_OF_MONTH","MONTH","AYEAR"]
df["Test_FL_DATE"] = df[cols].astype(str).apply(lambda x: '/'.join(x))
Or:
df["Test_FL_DATE"] = df[cols].apply(lambda x: '/'.join(x.astype(str)))
But nicer is:
df["Test_FL_DATE"] = df[["DAY_OF_MONTH","MONTH","AYEAR"]].astype(str).apply('/'.join)
And faster solution is simply join by +:
df["Test_FL_DATE"] = (df["DAY_OF_MONTH"].astype(str) + '/' +
df["MONTH"].astype(str) + '/' +
df["AYEAR"].astype(str))
Probably easiest to use pd.Series.str.cat, which concatenates one string Series with other Series.
df['Test_FL_Date'] = (df['DAY_OF_MONTH']
.astype(str)
.str
.cat([df['MONTH'], df['AYEAR'], sep='/'))

How to define a variable amount of columns in python pandas apply

I am trying to add columns to a python pandas df using the apply function.
However the number of columns to be added depend on the output of the function
used in the apply function.
example code:
number_of_columns_to_be_added = 2
def add_columns(number_of_columns_to_be_added):
df['n1'],df['n2'] = zip(*df['input'].apply(lambda x : do_something(x, number_of_columns_to_be_added)))
Any idea on how to define the ugly column part (df['n1'], ..., df['n696969']) before the = zip( ... part programatically?
I'm guessing that the output of zip is a tuple, therefore you could try this:
temp = zip(*df['input'].apply(lambda x : do_something(x, number_of_columns_to_be_added)))
for i, value in enumerate(temp, 1):
key = 'n'+str(i)
df[key] = value
temp will hold the all the entries and then you iterate over tempto assign the values to your dict with your specific keys. Hope this matches your original idea.

Apply series of transformations to pandas DataFrame object

I am relatively new to Python programming. I have a pandas DataFrame object say obj1. I need to apply a series of transformation to records stored in obj1['key']. Suppose obj1['key'] has 300 entries and I need to apply func1 then func2 then func3 on each of the 300 entries and store the final result in obj1['key'].
One way would be to do as below. Is there a better way to do the same?
obj1['key']=[func3(func2(func1(item))) for item in obj1['key']]
Python generators can't be used for this purpose right.
Yes, you could use the default method DataFrame.apply()
df = df.apply(f1).apply(f2).apply(fn)
define a function
def recursivator(x, fs):
return recursivator(fs[-1](x), fs[:-1]) if len(fs) > 0 else x
x is the thing being operated on, fs is a list of functions.
df = df.applymap(lambda x: recursivator(x, fs))

Categories