This question already has answers here:
How to repeat a Pandas DataFrame?
(7 answers)
Closed 2 years ago.
I have the a pandas columns containing multiple strings. I want all these strings to be duplicated 3 times.
df = pd.DataFrame(data = ['a','b','c']),
Desired output:
0
0 a
1 b
2 c
I want to transform this table so it looks like this:
0
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
I can't seem to find an easy way to do this.
Anything will help.
Try:
df[0].repeat(3).reset_index(drop=True)
Out:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 c
Name: 0, dtype: object
You can use repeat + reindex:
df = df.reindex(df.index.repeat(3))
Out[105]:
0
0 a
0 a
0 a
1 b
1 b
1 b
2 c
2 c
2 c
Or concat:
df = pd.concat([df]*3)
Related
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am wondering how I can most efficiently do the following operation so that I can also upscale it to dataframes with million rows+.
I have 2 panda dataframes:
Data1:
Position Letter
1 a
2 b
3 c
4 b
5 a
Data2:
Weight Letter
1 a
2 b
3 c
Now I want to create an extra column(weight) in data 1 resulting in the following:
Position Letter Weight
1 a 1
2 b 2
3 c 3
4 b 2
5 a 1
Best way is to use merge:
df = df1.merge(df2, on=['Letter'])
print(df)
Position Letter Weight
0 1 a 1
1 5 a 1
2 2 b 2
3 4 b 2
4 3 c 3
This question already has answers here:
Split cell into multiple rows in pandas dataframe
(5 answers)
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 3 years ago.
I am working on a below problem :
df_temp = pd.DataFrame()
df_temp.insert(0, 'Label', ["A|B|C","A|C","C|B","A","B"])
df_temp.insert(1, 'ID', [1,2,3,4,5])
df_temp
Label ID
0 A|B|C 1
1 A|C 2
2 C|B 3
3 A 4
4 B 5
I want to convert this dataframe into something like below dataframe, where I can separate Labels for ID column.
Expected Output :
ID Label
1 A
1 B
1 C
2 A
2 C
3 C
3 B
4 A
5 B
Try this:
(df_temp.set_index('ID')['Label']
.str.split('|', expand=True)
.reset_index()
.melt('ID')
.drop('variable', axis=1)
.dropna()
.sort_values('ID'))
Output:
ID value
0 1 A
5 1 B
10 1 C
1 2 A
6 2 C
2 3 C
7 3 B
3 4 A
4 5 B
I have two dataframes, df_diff and df_three. For each column of df_three, it contains the index values of three largest values from each column of df_diff. For example, let's say df_diff looks like this:
A B C
0 4 7 8
1 5 5 7
2 8 2 1
3 10 3 4
4 1 12 3
Using
df_three = df_diff.apply(lambda s: pd.Series(s.nlargest(3).index))
df_three would look like this:
A B C
0 3 4 0
1 2 0 1
2 1 1 3
How could I match the index values in df_three to the column values of df_diff? In other words, how could I get df_three to look like this:
A B C
0 10 12 8
1 8 7 7
2 5 5 4
Am I making this problem too complicated? Would there be an easier way?
Any help is appreciated!
def top_3(s, top_values):
res = s.sort_values(ascending=False)[:top_values]
res.index = range(top_values)
return res
res = df.apply(lambda x: top_3(x, 3))
print(res)
Use numpy.sort with dataframe values:
n=3
arr = df.copy().to_numpy()
df_three = pd.DataFrame(np.sort(arr, 0)[::-1][:n], columns=df.columns)
print(df_three)
A B C
0 10 12 8
1 8 7 7
2 5 5 4
This question already has answers here:
Pandas GroupBy.apply method duplicates first group
(3 answers)
Closed 2 years ago.
I am testing pandas.groupby function and have generated a random dataframe
df = pd.DataFrame(np.random.randint(5,size=(6,3)), columns=list('abc'))
in a random case df is:
a b c
0 2 2 2
1 1 4 2
2 3 0 1
3 2 1 3
4 0 2 2
5 2 1 4
when I use the following code to print out the groupby object, I get some interesting results.
def func(x):
print(x)
df.groupby("a").apply(lambda x: func(x))
a b c
0 0 1 4
a b c
0 0 1 4
a b c
2 2 4 1
3 2 2 1
a b c
1 4 0 0
4 4 4 3
Could anybody let me know why index 0 appear twice in this case?
DataFrame.groupby.apply evaluates the first group twice to determine whether a fast path for calculation can be followed for the remaining groups. This behavior has changed in recent versions of pandas as discussed here
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 5 years ago.
now I have this dataframe:
A B C
0 m 1 b
1 n 4 a
2 p 3 c
3 o 4 d
4 k 6 e
so,How I can get n,p,k in column。as follow:
A B C
0 n 4 a
1 p 3 c
2 k 6 e
thanks
Use .loc
df = df.loc[df.A.isin(['n','p','k']),:]