How to take column of dataframe in pandas [duplicate] - python

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 5 years ago.
now I have this dataframe:
A B C
0 m 1 b
1 n 4 a
2 p 3 c
3 o 4 d
4 k 6 e
so,How I can get n,p,k in column。as follow:
A B C
0 n 4 a
1 p 3 c
2 k 6 e
thanks

Use .loc
df = df.loc[df.A.isin(['n','p','k']),:]

Related

Pandas: How to create a new column by random seletct other columns? [duplicate]

This question already has answers here:
Create new column by random sampling of other columns data
(4 answers)
Closed 5 months ago.
Example df:
A B C
0 X 9 0
1 5 7 5
2 5 6 Y
Expect output as below. The column D's value is random selected from column A/B/C
A B C D
0 X 9 0 X(random from column A)
1 5 7 5 7(random from column B)
2 5 6 Y Y(random from column C)
You can use numpy advanced indexing with numpy.random.randint:
df['D'] = df.to_numpy()[np.arange(df.shape[0]),
np.random.randint(0, df.shape[1], df.shape[0])]
Example:
A B C D
0 X 9 0 X # A
1 5 7 5 7 # B
2 5 6 Y 5 # A
If you wan exactly one value from each column, use a random permutation with numpy.random.choice:
df['D'] = df.to_numpy()[np.arange(df.shape[0]),
np.random.choice(np.arange(df.shape[1]),
df.shape[0], replace=False)]
Example:
A B C D
0 X 9 0 9 # B
1 5 7 5 5 # A
2 5 6 Y Y # C

How to concatenate two dataframes when there are different number of columns? [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Pandas: append dataframe to another df
(2 answers)
Closed 1 year ago.
I have two dataframes:
df1 looks like below:
A B C D E F G H
2 6 8 1 3 9 2 3
df2 looks like below:
A C D G
3 2 4 9
Now, I want to add the data of df2 to df1. But, I want to insert "N/A" for df2 when any column which exists in df1 but not present in df2. You can think, my I want something like below:
A B C D E F G H
2 6 8 1 3 9 2 3
3 N/A 2 4 N/A N/A 9 N/A
Can anyone please help, how can I do this?

What is the most efficient way to populate one pandas dataframe using another dataframe? [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am wondering how I can most efficiently do the following operation so that I can also upscale it to dataframes with million rows+.
I have 2 panda dataframes:
Data1:
Position Letter
1 a
2 b
3 c
4 b
5 a
Data2:
Weight Letter
1 a
2 b
3 c
Now I want to create an extra column(weight) in data 1 resulting in the following:
Position Letter Weight
1 a 1
2 b 2
3 c 3
4 b 2
5 a 1
Best way is to use merge:
df = df1.merge(df2, on=['Letter'])
print(df)
Position Letter Weight
0 1 a 1
1 5 a 1
2 2 b 2
3 4 b 2
4 3 c 3

How can I duplicate a value or string in pandas? [duplicate]

This question already has answers here:
How to repeat a Pandas DataFrame?
(7 answers)
Closed 2 years ago.
I have the a pandas columns containing multiple strings. I want all these strings to be duplicated 3 times.
df = pd.DataFrame(data = ['a','b','c']),
Desired output:
0
0 a
1 b
2 c
I want to transform this table so it looks like this:
0
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
I can't seem to find an easy way to do this.
Anything will help.
Try:
df[0].repeat(3).reset_index(drop=True)
Out:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
Name: 0, dtype: object
You can use repeat + reindex:
df = df.reindex(df.index.repeat(3))
Out[105]:
0
0 a
0 a
0 a
1 b
1 b
1 b
2 c
2 c
2 c
Or concat:
df = pd.concat([df]*3)

Separate String from and create a dataframe column [duplicate]

This question already has answers here:
Split cell into multiple rows in pandas dataframe
(5 answers)
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 3 years ago.
I am working on a below problem :
df_temp = pd.DataFrame()
df_temp.insert(0, 'Label', ["A|B|C","A|C","C|B","A","B"])
df_temp.insert(1, 'ID', [1,2,3,4,5])
df_temp
Label ID
0 A|B|C 1
1 A|C 2
2 C|B 3
3 A 4
4 B 5
I want to convert this dataframe into something like below dataframe, where I can separate Labels for ID column.
Expected Output :
ID Label
1 A
1 B
1 C
2 A
2 C
3 C
3 B
4 A
5 B
Try this:
(df_temp.set_index('ID')['Label']
.str.split('|', expand=True)
.reset_index()
.melt('ID')
.drop('variable', axis=1)
.dropna()
.sort_values('ID'))
Output:
ID value
0 1 A
5 1 B
10 1 C
1 2 A
6 2 C
2 3 C
7 3 B
3 4 A
4 5 B

Categories