Apply functions to nested list inside pandas rows - python

I have the following df and I'm trying to figure out how to extract the unique values from each list in each row in order to simplify my df.
As if you were to apply unique() to the first row and then you get 'NEUTRALREGION' only once. Please note that I have another 4 columns with the same requirements.

I solved this using df.applymap(lambda x: set(x)).
That allowed me to check the unique values in each cell.

Related

Group By and ILOC Errors

I'm getting the following error when trying to groupby and sum by dataframe by specific columns.
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
I've checked other solutions and it's not a double column name header issue.
See df3 below which I want to group by on all columns except last two, I want to sum()
dfs head shows that if I just group by the columns names it works fine but not with iloc which I know to be the correct formula to pull back column I want to group by.
I need to use ILOC as final dataframe will have many more columns.
df.iloc[:,0:3] returns a dataframe. So you are trying to group dataframe with another dataframe.
But you just need a column list.
can you try this:
dfs = df3.groupby(list(df3.iloc[:,0:3].columns))['Churn_Alive_1','Churn_Alive_0'].sum()

I have a dataframe containing arrays, is there a way collect all of the elements and store it in a seperate dataframe?

I cant seem to find a way to split all of the array values from the column of a dataframe.
I have managed to get all the array values using this code:
The dataframe is as follows:
I want to use value.counts() on the dataframe and I get this
I want the array values that are clubbed together to be split so that I can get the accurate count of every value.
Thanks in advance!
You could try .explode(), which would create a new row for every value in each list.
df_mentioned_id_exploded = pd.DataFrame(df_mentioned_id.explode('entities.user_mentions'))
With the above code you would create a new dataframe df_mentioned_id_exploded with a single column entities.user_mentions, which you could then use .value_counts() on.

Pandas DataFrame: info() function for one column only

I have a dataframe named df_train with 20 columns. Is there a pythonic way to just view info on only one column by selecting its name.
Basically I am trying to loop through the df and extract number of unique values and add missing values
print("\nUnique Values:")
for col in df_train.columns:
print(f'{col:<25}: {df_train[col].nunique()} unique values. \tMissing values: {} ')
If you want the total number of null values, this is the pythonic way to achieve it:
df_train[col].isnull().sum()
Yes there is a way to select individual columns from a dataframe.
df_train['your_column_name']
This will extract only the column with <your_column_name>.
PS: This is my first StackOverflow answer. Please be nice.

Python: Create New Column based on values of other column using len()

My dataframe is a pandas dataframe df with many rows & columns.
Now i want to create a new column (series) based on the values of an object column. e.g.:
df.iloc[0, 'oldcolumn'] Output is 0 should give me 0 in a new column and
df.iloc[1, 'oldcolumn'] Output is 'ab%$.' should give me 5 in the same new column (number of literals incl. space).
in addition, is there a way to avoid loops or own functions?
Thank U
To create a new column based on the length of the value in another column, you should do
df['newcol'] = df['oldcol'].apply(lambda x: len(str(x)))
Although this is a generic way of creating a new column based on data from existing columns, Henry's approach is also a good one.
In addition, is there a way to avoid loops or own functions?
I recommend you take a look at How To Make Your Pandas Loop 71803 Times Faster.
You can try this:
df['strlen'] = df['oldcolumn'].apply(len)
print(df)

How can I create a pandas dataframe column for each part-of-speech tag?

I have a dataset that consists of tokenized, POS-tagged phrases as one column of a dataframe:
Current Dataframe
I want to create a new column in the dataframe, consisting only of the proper nouns in the previous column:
Desired Solution
Right now, I'm trying something like this for a single row:
if 'NNP' in df['Description_POS'][96][0:-1]:
df['Proper Noun'] = df['Description_POS'][96]
But then I don't know how to loop this for each row, and how to obtain the tuple which contains the proper noun.
I'm very new right now and at a loss for what to use, so any help would be really appreciated!
Edit: I tried the solution recommended, and it seems to work, but there is an issue.
this was my dataframe:
Original dataframe
After implementing the code recommended
df['Proper Nouns'] = df['POS_Description'].apply(
lambda row: [i[0] for i in row if i[1] == 'NNP'])
it looks like this:
Dataframe after creating a proper nouns column
You can use the apply method, which as the name suggests will apply the given function to every row of the dataframe or series. This will return a series, which you can add as a new column to your dataframe
df['Proper Nouns'] = df['POS_Description'].apply(
lambda row: [i[0] for i in row if i[1] == 'NNP'])
I am assuming the POS_Description dtype to be a list of tuples.

Categories