Update dataframe 1 using two columns in dataframe 2 in python - python

I want to update Freq column in df1 using Freq column in data frame 2 as shown below,
data = {'Cell':[1,2,3,4,'10-05','10-09'], 'Freq':[True, True,True,True,True,True]}
df1 = pd.DataFrame(data)
Dataframe 1
Dataframe 1
Dataframe 2
data2 = {'Cell-1':[1,1,1,1,1,1,2,2,2,2,2,2],'Cell-2':[1,2,3,4,'10-05','10-09',1,2,3,4,'10-05','10-09'] ,'Freq':[True, False,True,False,True,True,True, False,True,False,True,False]}
df2 = pd.DataFrame(data2)
Dataframe 2
df1 column 1 has keys while column 2 is corresponding value which in this case is either True or False.
Lets take for example key = 1 in Dataframe 1. This key = 1 has multiple values in Dataframe 2 as shown in the figure. The multiple values for this key = 1 in dataframe 2 is due to values in Column 2, Dataframe 2 which in turn are keys to Dataframe 1 which I want to update in column 2 of df1.
Algorithm in action figure
Alogrithm in action

Related

How to add new column to existing dataframe (no headers) using iloc?

I want to create third column to existing dataframe, having the same values of 2nd column using iloc pandas method.
What are the options do I have ?
df = pd.DataFrame([*zip([1,2,3],[4,5,6])])
here is one way to do it
df.insert(len(df.columns), # position of new column, it points to column after last column
len(df.columns), # naming could be the location of column
value=df.iloc[:,1] # grab the values from last column
)
df
0 1 2
0 1 4 4
1 2 5 5
2 3 6 6

Stick the columns based on the one columns keeping ids

I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df
You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5

Create Pivot table for each column in Pandas df

I have a DataFrame where I have many columns (there is one dependent variable and many independent variables)
variable_id
dep_var
variable_1
variable_2
new
1
6
3
new
0
3
6
new
0
8
7
new
1
11
1
new
0
17
9
new
1
1
2
I want to create a Pivot table such as this:
pd.pivot_table(df,index=['variable_1'], columns=['dep_var'], values=['variable_id'],aggfunc='count')
I want to create it for each column separatly (so I need to change index in pd.pivot_table)
I have written a sample code:
def pivot_table(df):
df_columns = list(df)
for column in df_columns:
print("indexing by: ", column)
print(pd.pivot_table(df,index=[column], columns=['dep_var'], values=['variable_id'],aggfunc='count'))
but I want my result to be saved as pandas DataFrame
desired output:
how I want my output for each variable separately
Use:
def pivot_table(df):
dfs = []
for column in df:
print("indexing by: ", column)
df = pd.pivot_table(df,index=[column], values=['dep_var'])
dfs.append(df)
return dfs

How to get the value of the next column using python?

I have a dataset where I need to match the column A and fetch its corresponding next value in next column B.
For example , I have to check if 1 is matched in column A, If true then print "First Page"
Similarly for all the values in column A has to be matched with say X , if true, then print its next value in column B.
Example:
By using df.iloc you can can get the row or column you want by index.
By using mask you can filter the data frame to get the row you want (where column a == some value) and take the value in the second column by df.iloc[0,1].
import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4,3,2,1]}
df = pd.DataFrame(data=d)
df
col1 col2
0 1 4
1 2 3
2 3 2
3 4 1
# a is the value in the first column and df is the data frame
def a2b(a,df):
return df[df.iloc[:,0]==a].iloc[0,1]
a2b(2,df)
returns 3

Panda drop values in columns but keep columns

Has the title say, I would like to find a way to drop the row (erase it) in a data frame from a column to the end of the data frame but I don't find any way to do so.
I would like to start with
A B C
-----------
1 1 1
1 1 1
1 1 1
and get
A B C
-----------
1
1
1
I was trying with
df.drop(df.loc[:, 'B':].columns, axis = 1, inplace = True)
But this delete the column itself too
A
-
1
1
1
am I missing something?
If you only know the column name that you want to keep:
import pandas as pd
new_df = pd.DataFrame(df["A"])
If you only know the column names that you want to drop:
new_df = df.drop(["B", "C"], axis=1)
For your case, to keep the columns, but remove the content, one possible way is:
new_df = pd.DataFrame(df["A"], columns=df.columns)
Resulting df contains columns "A" and "B" but without values (NaN instead)

Categories