How to convert List from column to list of DataFrame? - python

I have a DataFrame which contains categories, and I want to split the DataFrame using categories as df_name:
df_name = df['category'].unique()
print(sites)
result:
['df1' 'df2']
after splitting the DataFrame using a loop, I get 2 smaller DataFrames, df1 and df2.
Next, I want to alter the DataFrame. I want to remove column category from df1 and df2, using df_name, but got an error. After trying for awhile I think the problem is because df_name is a list.
How do I convert df_name from
['df1' 'df2']
to
[df1 df2]
?

why using loop to filter? you can just use df[df['column'] == 'df1'] to filter 'df1' value from a column
then if you want to remove column, you can use del df['category']

Related

How can I take values from a selection of columns in my Pandas dataframe to create a new column that contains a list of those values?

I have a dataframe in Pandas where I would like to turn the values of a set of columns (more specifically, from column index 3 to the end) into a new single column that contains a list of those values in each row.
Right now, I have code that can print out a list of the values in the columns, but only for single row. How can I do this for the whole dataframe?
import pandas as pd
orig_df = pd.read_csv('zipcode_price_dataset.csv')
df = orig_df.loc[(orig_df['State'] == "CA")]
row = df.head(1)
print(row[df.columns[3:].values].values[0])
I could iterate through the rows using a for loop, but is there a more concise way to do this?
Something like the following:
df['new'] = df[df.columns[3:]].values.tolist()
Use .iloc:
df.iloc[: , 3:].agg(list, axis=1)

Merge Dataframes using List of Columns (Pandas Vlookup)

I'd like to lookup several columns from another dataframe that I have in a list to bring them over to my main dataframe, essentially doing a "v-lookup" of ~30 columns using ID as the key or lookup value for all columns.
However, for the columns that are the same between the two dataframes, I don't want to bring over the duplicate columns but have those values be filled in df1 from df2.
I've tried below:
df = pd.merge(df,df2[['ID', [look_up_cols]]] ,
on ='ID',
how ='left',
#suffixes=(False,False)
)
but it brings in the shared columns from df2 when I want df2's values filled into the same columns in df1.
I've tried also created a dictionary with the column pairs from each df and doing this for loop to lookup each item in the dictionary (lookup_map) in the other df using ID as the key:
for col in look_up_cols:
df1[col] = df2['ID'].map(lookup_map)
but this just returns NaNs.
You should be able to do something like the following:
df = pd.merge(df,df2[look_up_cols + ['ID']] ,
on ='ID',
how ='left')
This just adds the ID column to the look_up_cols list and thereby allows it to be used in the merge function

how to get column names in pandas of getdummies

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

Adding a new column to a pandas dataframe

I have a dataframe df with one column and 500k rows (df with first 5 elements is given below). I want to add new data in the existing column. The new data is a matrix of 200k rows and 1 column. How can I do it? Also I want add a new column named op.
X098_DE_time
0.046104
-0.037134
-0.089496
-0.084906
-0.038594
We can use concat function after rename the column from second dataframe.
df2.rename(columns={'op':' X098_DE_time'}, inplace=True)
new_df = pd.concat([df, new_df], axis=0)
Note: If we don't rename df2 column, the resultant new_df will have 2 different columns.
To add new column you can use
df["new column"] = [list of values];

Get unique values of multiple columns as a new dataframe in pandas

Having pandas data frame df with at least columns C1,C2,C3 how would you get all the unique C1,C2,C3 values as a new DataFrame?
in other words, similiar to :
SELECT C1,C2,C3
FROM T
GROUP BY C1,C2,C3
Tried that
print df.groupby(by=['C1','C2','C3'])
but im getting
<pandas.core.groupby.DataFrameGroupBy object at 0x000000000769A9E8>
I believe you need drop_duplicates if want all unique triples:
df = df.drop_duplicates(subset=['C1','C2','C3'])
If want use groupby add first:
df = df.groupby(by=['C1','C2','C3'], as_index=False).first()

Categories