create a new column with a value from rows

create a new column with a value from rows - python

I have a data table as below
I want to put "a" to a column like below:

Delete row:
df = df.drop(0, axis=0)
Add column :
df.insert(0,"",["a","","",""])
First, you need to learn how to use pandas.

df1 = pd.DataFrame({'a':[1,2,3,4],'b':[1,2,3,4],'c':[1,2,3,4]})
print(df1)
df2 = df1.drop([0])
df2 = df2.reset_index(drop=True)
print(df2)
df3 = pd.DataFrame({'': [1,' ', ' ']})
df4 = pd.concat([df3,df2],axis = 1)
print(df4)
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
a b c
0 2 2 2
1 3 3 3
2 4 4 4
a b c
0 1 2 2 2
1 3 3 3
2 4 4 4

Related

Concatenate dataframe to a master dataframe only if the the values do not exist in the master dataframe

Let's say I have 2 dataframes:
Df1 =
Batch Result
0 A 3
1 A 5
2 B 5
3 B 6
4 C 8
5 C 3
Df2=
Batch Result
0 C 8
1 C 3
2 D 6
3 D 1
I want to concat Df2 to Df1, but I only want to have batch D from Df2 in Df1. The output should be look like this:
Df1 =
Batch Result
0 A 3
1 A 5
2 B 5
3 B 6
4 C 8
5 C 3
2 D 6
3 D 1
How can I do this with Pandas?

You can remove the batch that are in Df1 from Df2 before concat:
pd.concat([Df1, Df2[ ~Df2['Batch'].isin(Df1['Batch'])] ])

Python: how to merge two dataframe based only on different columns?

I have two dataframes df1 and df2
df1
A B
0 4 2
1 3 3
2 1 2
df2
B AB C
0 4 8 3
1 3 9 2
2 1 2 4
I would like to make a join only on different columns
df3
A B AB C
0 4 2 8 3
1 3 3 9 2
2 1 2 2 4

Use Index.isin with inverse mask or Index.difference:
df22 = df2.loc[:, ~df2.columns.isin(df1.columns)]
df = df1.join(df22)
Or:
df22 = df2[df2.columns.difference(df1.columns)]
df = df1.join(df22)
print (df)
A B AB C
0 4 2 8 3
1 3 3 9 2
2 1 2 2 4

You can also use the merge functions as an alternate solution:
df3=pd.merge(df1,df2, left_on='A', right_on='B', how ='left', suffixes=('','_')).drop('B_',axis=1)

return rows with unique pairs across columns

I'm trying to find rows that have unique pairs of values across 2 columns, so this dataframe:
A B
1 0
2 0
3 0
0 1
2 1
3 1
0 2
1 2
3 2
0 3
1 3
2 3
will be reduced to only the rows that don't match up if flipped, for instance 1 and 3 is a combination I only want returned once. So a check to see if the same pair exists if the columns are flipped (3 and 1) it can be removed. The table I'm looking to get is:
A B
0 2
0 3
1 0
1 2
1 3
2 3
Where there is only one occurrence of each pair of values that are mirrored if the columns are flipped.

I think you can use apply sorted + drop_duplicates:
df = df.apply(sorted, axis=1).drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Faster solution with numpy.sort:
df = pd.DataFrame(np.sort(df.values, axis=1), index=df.index, columns=df.columns)
.drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3
Solution without sorting with DataFrame.min and DataFrame.max:
a = df.min(axis=1)
b = df.max(axis=1)
df['A'] = a
df['B'] = b
df = df.drop_duplicates()
print (df)
A B
0 0 1
1 0 2
2 0 3
4 1 2
5 1 3
8 2 3

Loading the data:
import numpy as np
import pandas as pd
a = np.array("1 2 3 0 2 3 0 1 3 0 1 2".split("\t"),dtype=np.double)
b = np.array("0 0 0 1 1 1 2 2 2 3 3 3".split("\t"),dtype=np.double)
df = pd.DataFrame(dict(A=a,B=b))
In case you don't need to sort the entire DF:
df["trans"] = df.apply(
lambda row: (min(row['A'], row['B']), max(row['A'], row['B'])), axis=1
)
df.drop_duplicates("trans")

Combine two distinct dataframes to show all possible iterations

I'm looking to combine dataframes df1 and df2 to get df3 in Python, most preferably in a one-liner (that is, no "for all x in df1.LETS...").
I'm at a current loss for words to use with my Google-fu, so here I am at StackExchange, hoping another programmer can help fill in my mental blank with this predicament.
Thank you!
df1 df2 df3
LETS NUMS LETS NUMS
A 1 A 1
B 2 A 2
3 A 3
4 A 4
B 1
B 2
B 3
B 4

You can use:
df1 = pd.DataFrame({'LETS':list('AB')})
df2 = pd.DataFrame({'NUMS':range(1,5)})
cross join solution with merge + assign column with constant and drop helper column A:
df = pd.merge(df1.assign(A=1), df2.assign(A=1), on='A').drop('A', axis=1)
print (df)
LETS NUMS
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4
Another solution with MultiIndex.from_product and new function in pandas 0.20.1 - MultiIndex.to_frame
df = pd.MultiIndex.from_product([df1['LETS'], df2['NUMS']]).to_frame()
df.columns = ['LETS','NUMS']
print (df)
LETS NUMS
A 1 A 1
2 A 2
3 A 3
4 A 4
B 1 B 1
2 B 2
3 B 3
4 B 4
print (df.reset_index(drop=True))
LETS NUMS
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4

pd.DataFrame(index=pd.MultiIndex.from_product([df1.LETS, df2.NUMS],
names=("LETS", "NUMS"))).reset_index()
# LETS NUMS
#0 A 1
#1 A 2
#2 A 3
#3 A 4
#4 B 1
#5 B 2
#6 B 3
#7 B 4

Rearrange dataframe structure

I get a dataframe
df
A B
0 1 4
1 2 5
2 3 6
For further processing, it would be more convenient to have the df restructered
as follows:
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
How can I achieve that?

Use unstack with reset_index :
df = df.unstack().reset_index(level=1, drop=True).reset_index()
df.columns = ['letters','numbers']
print (df)
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
Or numpy.concatenate + numpy.repeat + DataFrame:
a = np.concatenate(df.values)
b = np.repeat(df.columns,len(df.index))
df = pd.DataFrame({'letters':b, 'numbers':a})
print (df)
letters numbers
0 A 1
1 A 4
2 A 2
3 B 5
4 B 3
5 B 6

Probably simplest to melt:
In [36]: pd.melt(df, var_name="letters", value_name="numbers")
Out[36]:
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

create a new column with a value from rows - python

I have a data table as below I want to put "a" to a column like below:

Delete row: df = df.drop(0, axis=0) Add column : df.insert(0,"",["a","","",""]) First, you need to learn how to use pandas.

Related

Concatenate dataframe to a master dataframe only if the the values do not exist in the master dataframe

Python: how to merge two dataframe based only on different columns?

return rows with unique pairs across columns

Combine two distinct dataframes to show all possible iterations

Rearrange dataframe structure

Categories

Resources