create a new column with a value from rows - python

I have a data table as below
I want to put "a" to a column like below:

Delete row:
df = df.drop(0, axis=0)
Add column :
df.insert(0,"",["a","","",""])
First, you need to learn how to use pandas.

df1 = pd.DataFrame({'a':[1,2,3,4],'b':[1,2,3,4],'c':[1,2,3,4]})
print(df1)
df2 = df1.drop([0])
df2 = df2.reset_index(drop=True)
print(df2)
df3 = pd.DataFrame({'': [1,' ', ' ']})
df4 = pd.concat([df3,df2],axis = 1)
print(df4)
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
a b c
0 2 2 2
1 3 3 3
2 4 4 4
a b c
0 1 2 2 2
1 3 3 3
2 4 4 4

Related

Concatenate dataframe to a master dataframe only if the the values do not exist in the master dataframe

Let's say I have 2 dataframes:
Df1 =
Batch Result
0 A 3
1 A 5
2 B 5
3 B 6
4 C 8
5 C 3
Df2=
Batch Result
0 C 8
1 C 3
2 D 6
3 D 1
I want to concat Df2 to Df1, but I only want to have batch D from Df2 in Df1. The output should be look like this:
Df1 =
Batch Result
0 A 3
1 A 5
2 B 5
3 B 6
4 C 8
5 C 3
2 D 6
3 D 1
How can I do this with Pandas?
You can remove the batch that are in Df1 from Df2 before concat:
pd.concat([Df1, Df2[ ~Df2['Batch'].isin(Df1['Batch'])] ])

Python: how to merge two dataframe based only on different columns?

I have two dataframes df1 and df2
df1
A B
0 4 2
1 3 3
2 1 2
df2
B AB C
0 4 8 3
1 3 9 2
2 1 2 4
I would like to make a join only on different columns
df3
A B AB C
0 4 2 8 3
1 3 3 9 2
2 1 2 2 4
Use Index.isin with inverse mask or Index.difference:
df22 = df2.loc[:, ~df2.columns.isin(df1.columns)]
df = df1.join(df22)
Or:
df22 = df2[df2.columns.difference(df1.columns)]
df = df1.join(df22)
print (df)
A B AB C
0 4 2 8 3
1 3 3 9 2
2 1 2 2 4
You can also use the merge functions as an alternate solution:
df3=pd.merge(df1,df2, left_on='A', right_on='B', how ='left', suffixes=('','_')).drop('B_',axis=1)

return rows with unique pairs across columns

I'm trying to find rows that have unique pairs of values across 2 columns, so this dataframe:
A B
1 0
2 0
3 0
0 1
2 1
3 1
0 2
1 2
3 2
0 3
1 3
2 3
will be reduced to only the rows that don't match up if flipped, for instance 1 and 3 is a combination I only want returned once. So a check to see if the same pair exists if the columns are flipped (3 and 1) it can be removed. The table I'm looking to get is:
A B
0 2
0 3
1 0
1 2
1 3
2 3
Where there is only one occurrence of each pair of values that are mirrored if the columns are flipped.
I think you can use apply sorted + drop_duplicates:
df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Faster solution with numpy.sort:
df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
.drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Solution without sorting with DataFrame.min and DataFrame.max:
a = df.min(axis=1)
b = df.max(axis=1)
df['A'] = a
df['B'] = b
df = df.drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Loading the data:
import numpy as np
import pandas as pd
a = np.array("1 2 3 0 2 3 0 1 3 0 1 2".split("\t"),dtype=np.double)
b = np.array("0 0 0 1 1 1 2 2 2 3 3 3".split("\t"),dtype=np.double)
df = pd.DataFrame(dict(A=a,B=b))
In case you don't need to sort the entire DF:
df["trans"] = df.apply(
lambda row: (min(row['A'], row['B']), max(row['A'], row['B'])), axis=1
)
df.drop_duplicates("trans")

Combine two distinct dataframes to show all possible iterations

I'm looking to combine dataframes df1 and df2 to get df3 in Python, most preferably in a one-liner (that is, no "for all x in df1.LETS...").
I'm at a current loss for words to use with my Google-fu, so here I am at StackExchange, hoping another programmer can help fill in my mental blank with this predicament.
Thank you!
df1 df2 df3
LETS NUMS LETS NUMS
A 1 A 1
B 2 A 2
3 A 3
4 A 4
B 1
B 2
B 3
B 4
You can use:
df1 = pd.DataFrame({'LETS':list('AB')})
df2 = pd.DataFrame({'NUMS':range(1,5)})
cross join solution with merge + assign column with constant and drop helper column A:
df = pd.merge(df1.assign(A=1), df2.assign(A=1), on='A').drop('A', axis=1)
print (df)
LETS NUMS
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4
Another solution with MultiIndex.from_product and new function in pandas 0.20.1 - MultiIndex.to_frame
df = pd.MultiIndex.from_product([df1['LETS'], df2['NUMS']]).to_frame()
df.columns = ['LETS','NUMS']
print (df)
LETS NUMS
A 1 A 1
2 A 2
3 A 3
4 A 4
B 1 B 1
2 B 2
3 B 3
4 B 4
print (df.reset_index(drop=True))
LETS NUMS
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4
pd.DataFrame(index=pd.MultiIndex.from_product([df1.LETS, df2.NUMS],
names=("LETS", "NUMS"))).reset_index()
# LETS NUMS
#0 A 1
#1 A 2
#2 A 3
#3 A 4
#4 B 1
#5 B 2
#6 B 3
#7 B 4

Rearrange dataframe structure

I get a dataframe
df
A B
0 1 4
1 2 5
2 3 6
For further processing, it would be more convenient to have the df restructered
as follows:
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
How can I achieve that?
Use unstack with reset_index :
df = df.unstack().reset_index(level=1, drop=True).reset_index()
df.columns = ['letters','numbers']
print (df)
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
Or numpy.concatenate + numpy.repeat + DataFrame:
a = np.concatenate(df.values)
b = np.repeat(df.columns,len(df.index))
df = pd.DataFrame({'letters':b, 'numbers':a})
print (df)
letters numbers
0 A 1
1 A 4
2 A 2
3 B 5
4 B 3
5 B 6
Probably simplest to melt:
In [36]: pd.melt(df, var_name="letters", value_name="numbers")
Out[36]:
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6

Categories