How can I duplicate a value or string in pandas? [duplicate] - python

This question already has answers here:
How to repeat a Pandas DataFrame?
(7 answers)
Closed 2 years ago.
I have the a pandas columns containing multiple strings. I want all these strings to be duplicated 3 times.
df = pd.DataFrame(data = ['a','b','c']),
Desired output:
0
0 a
1 b
2 c
I want to transform this table so it looks like this:
0
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
I can't seem to find an easy way to do this.
Anything will help.

Try:
df[0].repeat(3).reset_index(drop=True)
Out:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
Name: 0, dtype: object

You can use repeat + reindex:
df = df.reindex(df.index.repeat(3))
Out[105]:
0
0 a
0 a
0 a
1 b
1 b
1 b
2 c
2 c
2 c
Or concat:
df = pd.concat([df]*3)

Related

What is the most efficient way to populate one pandas dataframe using another dataframe? [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am wondering how I can most efficiently do the following operation so that I can also upscale it to dataframes with million rows+.
I have 2 panda dataframes:
Data1:
Position Letter
1 a
2 b
3 c
4 b
5 a
Data2:
Weight Letter
1 a
2 b
3 c
Now I want to create an extra column(weight) in data 1 resulting in the following:
Position Letter Weight
1 a 1
2 b 2
3 c 3
4 b 2
5 a 1
Best way is to use merge:
df = df1.merge(df2, on=['Letter'])
print(df)
Position Letter Weight
0 1 a 1
1 5 a 1
2 2 b 2
3 4 b 2
4 3 c 3

Separate String from and create a dataframe column [duplicate]

This question already has answers here:
Split cell into multiple rows in pandas dataframe
(5 answers)
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 3 years ago.
I am working on a below problem :
df_temp = pd.DataFrame()
df_temp.insert(0, 'Label', ["A|B|C","A|C","C|B","A","B"])
df_temp.insert(1, 'ID', [1,2,3,4,5])
df_temp
Label ID
0 A|B|C 1
1 A|C 2
2 C|B 3
3 A 4
4 B 5
I want to convert this dataframe into something like below dataframe, where I can separate Labels for ID column.
Expected Output :
ID Label
1 A
1 B
1 C
2 A
2 C
3 C
3 B
4 A
5 B
Try this:
(df_temp.set_index('ID')['Label']
.str.split('|', expand=True)
.reset_index()
.melt('ID')
.drop('variable', axis=1)
.dropna()
.sort_values('ID'))
Output:
ID value
0 1 A
5 1 B
10 1 C
1 2 A
6 2 C
2 3 C
7 3 B
3 4 A
4 5 B

Is there a way to iterate over a column in Pandas to find matching index values from another dataframe?

I have two dataframes, df_diff and df_three. For each column of df_three, it contains the index values of three largest values from each column of df_diff. For example, let's say df_diff looks like this:
A B C
0 4 7 8
1 5 5 7
2 8 2 1
3 10 3 4
4 1 12 3
Using
df_three = df_diff.apply(lambda s: pd.Series(s.nlargest(3).index))
df_three would look like this:
A B C
0 3 4 0
1 2 0 1
2 1 1 3
How could I match the index values in df_three to the column values of df_diff? In other words, how could I get df_three to look like this:
A B C
0 10 12 8
1 8 7 7
2 5 5 4
Am I making this problem too complicated? Would there be an easier way?
Any help is appreciated!
def top_3(s, top_values):
res = s.sort_values(ascending=False)[:top_values]
res.index = range(top_values)
return res
res = df.apply(lambda x: top_3(x, 3))
print(res)
Use numpy.sort with dataframe values:
n=3
arr = df.copy().to_numpy()
df_three = pd.DataFrame(np.sort(arr, 0)[::-1][:n], columns=df.columns)
print(df_three)
A B C
0 10 12 8
1 8 7 7
2 5 5 4

why does groupby function returns duplicated data [duplicate]

This question already has answers here:
Pandas GroupBy.apply method duplicates first group
(3 answers)
Closed 2 years ago.
I am testing pandas.groupby function and have generated a random dataframe
df = pd.DataFrame(np.random.randint(5,size=(6,3)), columns=list('abc'))
in a random case df is:
a b c
0 2 2 2
1 1 4 2
2 3 0 1
3 2 1 3
4 0 2 2
5 2 1 4
when I use the following code to print out the groupby object, I get some interesting results.
def func(x):
print(x)
df.groupby("a").apply(lambda x: func(x))
a b c
0 0 1 4
a b c
0 0 1 4
a b c
2 2 4 1
3 2 2 1
a b c
1 4 0 0
4 4 4 3
Could anybody let me know why index 0 appear twice in this case?
DataFrame.groupby.apply evaluates the first group twice to determine whether a fast path for calculation can be followed for the remaining groups. This behavior has changed in recent versions of pandas as discussed here

How to take column of dataframe in pandas [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 5 years ago.
now I have this dataframe:
A B C
0 m 1 b
1 n 4 a
2 p 3 c
3 o 4 d
4 k 6 e
so,How I can get n,p,k in column。as follow:
A B C
0 n 4 a
1 p 3 c
2 k 6 e
thanks
Use .loc
df = df.loc[df.A.isin(['n','p','k']),:]

Categories