I am new to python and developing a code
I want to search for a word in a column and if a match is found, i want to insert an empty row below that.
My code is below
If df.columnname=='total':
Df.insert
Could someone pls help me.
Do give the following a try:
>>>df
id Label
0 1 A
1 2 B
2 3 B
3 4 B
4 5 A
5 6 B
6 7 A
7 8 A
8 9 C
9 10 C
10 11 C
# Create a separate dataframe with the id of the rows to be duplicated
df1 = df.loc[df['Label']=='B', 'id']
# Join it back and reset the index
df = pd.concat(df,df1).sort_index()
>>>df
id Label
0 1 A
1 2 B
2 2 NaN
3 3 B
4 3 NaN
5 4 B
6 4 NaN
7 5 A
8 6 B
9 6 NaN
10 7 A
11 8 A
12 9 C
13 10 C
14 11 C
Use below code:
from numpy import nan as Nan
import pandas as pd
df1 = pd.DataFrame({'Column1': ['A0', 'total', 'total', 'A3'],'Column2': ['B0', 'B1',
'B2', 'B3'],'Column3': ['C0', 'C1', 'C2', 'C3'],'Column4': ['D0', 'D1', 'D2',
'D3']},index=[0, 1, 2, 3])
count = 0
for index, row in df1.iterrows():
if row["Column1"] == 'total':
df1 = pd.DataFrame(np.insert(df1.values, index+1+count, values=[" "]
* len(df1.columns), axis=0),columns = df1.columns)
count += 1
print (df1)
Input:
Column1 Column2 Column3 Column4
0 A0 B0 C0 D0
1 total B1 C1 D1
2 total B2 C2 D2
3 A3 B3 C3 D3
Output:
Column1 Column2 Column3 Column4
0 A0 B0 C0 D0
1 total B1 C1 D1
2
3 total B2 C2 D2
4
5 A3 B3 C3 D3
Related
I have a data frame where some rows have one ID and one related ID. In the example below, a1 and a2 are related (say to the same person) while b and c don't have any related rows.
import pandas as pd
test = pd.DataFrame(
[['a1', 1, 'a2'],
['a1', 2, 'a2'],
['a1', 3, 'a2'],
['a2', 4, 'a1'],
['a2', 5, 'a1'],
['b', 6, ],
['c', 7, ]],
columns=['ID1', 'Value', 'ID2']
)
test
ID1 Value ID2
0 a1 1 a2
1 a1 2 a2
2 a1 3 a2
3 a2 4 a1
4 a2 5 a1
5 b 6 None
6 c 7 None
What I need to achieve is to add a column containing the sum of all values for related rows. In this case, the desired output should be like below. Is there a way to get this, please?
ID1
Value
ID2
Group by ID1 and ID2
a1
1
a2
15
a1
2
a2
15
a1
3
a2
15
a2
4
a1
15
a2
5
a1
15
b
6
6
c
7
7
Note that I learnt to use group by to get sum for ID1 (from this question); but not for 'ID1' and 'ID2' together.
test['Group by ID1'] = test.groupby("ID1")["Value"].transform("sum")
test
ID1 Value ID2 Group by ID1
0 a1 1 a2 6
1 a1 2 a2 6
2 a1 3 a2 6
3 a2 4 a1 9
4 a2 5 a1 9
5 b 6 None 6
6 c 7 None 7
Update
Think I can still use for loop to get this done like below. But wondering if there is another non-loop way. Thanks.
bottle = pd.DataFrame().reindex_like(test)
bottle['ID1'] = test['ID1']
bottle['ID2'] = test['ID2']
for index, row in bottle.iterrows():
bottle.loc[index, "Value"] = test[test['ID1'] == row['ID1']]['Value'].sum() + \
test[test['ID1'] == row['ID2']]['Value'].sum()
print(bottle)
ID1 Value ID2
0 a1 15.0 a2
1 a1 15.0 a2
2 a1 15.0 a2
3 a2 15.0 a1
4 a2 15.0 a1
5 b 6.0 None
6 c 7.0 None
A possible solution would be to sort the pairs in ID1 and ID2, such that they always appear in the same order.
Swapping the IDs:
s = df['ID1'] > df['ID2']
df.loc[s, ['ID1', 'ID2']] = df.loc[s, ['ID2', 'ID1']].values
print(df)
>>> ID1 Value ID2
0 a1 1 a2
1 a1 2 a2
2 a1 3 a2
3 a1 4 a2
4 a1 5 a2
5 b 6 None
6 c 7 None
Then we can do a simple groupby:
df['RSUM'] = df.groupby(['ID1', 'ID2'], dropna=False)['Value'].transform("sum")
print(df)
>>> ID1 Value ID2 RSUM
0 a1 1 a2 15
1 a1 2 a2 15
2 a1 3 a2 15
3 a1 4 a2 15
4 a1 5 a2 15
5 b 6 None 6
6 c 7 None 7
Note the dropna=False to not discard IDs that have no pairing.
If you do not want to permanently swap the IDs, you can just create a temporary dataframe.
Basically i want to just flatten ( maybe not good term )
for example having dataframe:
A B C
0 1 [1,2] [1, 10]
1 2 [2, 14] [2, 18]
I want to get the output of:
A B1 B2 B3 B4
0 1 1 2 1 10
1 2 2 14 2 18
I've tried:
print(pd.DataFrame(df.values.flatten().tolist(), columns=['%sG'%i for i in range(6)], index=df.index))
But nothing good.
Hope you get what i mean :)
General solution working also if lists have differents lengths:
df1 = pd.DataFrame(df['B'].values.tolist())
df2 = pd.DataFrame(df['C'].values.tolist())
df = pd.concat([df[['A']], df1, df2], axis=1)
df.columns = [df.columns[0]] + [f'B{i+1}' for i in range(len(df.columns)-1)]
print (df)
A B1 B2 B3 B4
0 1 1 2 1 10
1 2 2 14 2 18
If same size:
df1 = pd.DataFrame(np.array(df[['B','C']].values.tolist()).reshape(len(df),-1))
df1.columns = [f'B{i+1}' for i in range(len(df1.columns))]
df1.insert(0, 'A', df['A'])
print (df1)
A B1 B2 B3 B4
0 1 1 2 1 10
1 2 2 14 2 18
In more recent versions you can use explode:
>>> x = df.select_dtypes(exclude=list).join(df.select_dtypes(list).apply(pd.Series.explode, axis=1))
>>> x.columns = x.columns + x.columns.to_series().groupby(level=0).cumcount().add(1).astype(str)
>>> x
A1 B1 B2 C1 C2
0 1 1 2 1 10
1 2 2 14 2 18
>>>
I'm looking to iterate through a list of orders and assign an owner id to each order. The id is in a separate pandas dataframe (I've also tried changing this into a Series and OrderedDict. I would like to locate the min value from the df and use that for the first order in orders, then add 1 to the count of that ids count, and repeat until all the orders are filled.
Reproducible Example:
df = pd.DataFrame({'Id':['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], 'count':[2, 3, 5, 6, 8, 9, 12, 13, 15, 55]})
orders = pd.DataFrame({'order_id':['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10', 'a11', 'a12', 'a13']})
orders['newowner'] = ""
Owners:
df
Id count
0 a 2
1 b 3
2 c 5
3 d 6
4 e 8
5 f 9
6 g 12
7 h 13
8 i 15
9 j 55
Orders:
order_id newowner
0 a1
1 a2
2 a3
3 a4
4 a5
5 a6
6 a7
7 a8
8 a9
9 a10
10 a11
11 a12
12 a13
expected result:
order_id newowner
0 a1 a # brings a up to 3 records
1 a2 a # a and b are tied with 3, so it goes to a again (doesn't matter which gets it first)
2 a3 b # now b has 3, and a has 4, so it goes to b
3 a4 a # both have 4 so a
4 a5 b # etc.
5 a6 a
6 a7 b
7 a8 c
8 a9 a
9 a10 b
10 a11 c
11 a12 a
12 a13 b
I've tried finding the min of the df.count, as well as tried to loop through each, but am having a hard time isolating each order.
for order in orders.iteritems():
order['newowner'] = df.count.min()
for order in orders.iteritems():
for name in df.iteritems:
idx = df[df.count == df.count.min()]['Id']
order['newonwer'] = idx
Here is one way via df.apply:
def set_owner(order_id):
min_idx = df['count'].idxmin()
df.loc[min_idx, 'count'] += 1
return df.loc[min_idx, 'Id']
orders['newowner'] = orders['order_id'].apply(set_owner)
orders
# order_id newowner
# 0 a1 a
# 1 a2 a
# 2 a3 b
# 3 a4 a
# 4 a5 b
# 5 a6 a
# 6 a7 b
# 7 a8 c
# 8 a9 a
# 9 a10 b
# 10 a11 c
# 11 a12 d
# 12 a13 a
df
# Id count
# 0 a 8
# 1 b 7
# 2 c 7
# 3 d 7
# 4 e 8
# 5 f 9
# 6 g 12
# 7 h 13
# 8 i 15
# 9 j 55
I'm not sure this is the way I'd do it. I'd probably look for a way to use df.apply if possible. But I think this code will give you the expected results.
for idx, order in orders.iterrows():
idxmin = df['count'].idxmin()
df.loc[idxmin, 'count'] += 1
order['newowner'] = df.loc[idxmin,'Id']
Relate to the question below,I would like to count the number of following rows.
Thanks to the answer,I could handle data.
But I met some trouble and exception.
How to count the number of following rows in pandas
A B
1 a0
2 a1
3 b1
4 a0
5 b2
6 a2
7 a2
First,I would like to cut df.with startswith("a")
df1
A B
1 a0
df2
A B
2 a1
3 b1
df3
A B
4 a0
5 b2
df4
A B
6 a2
df5
A B
7 a2
I would like to count each df's rows
"a" number
a0 1
a1 2
a0 2
a2 1
a2 1
How could be this done?
I am happy someone tell me how to handle this kind of problem.
You can use aggregate by custom Series created with cumsum:
print (df.B.str.startswith("a").cumsum())
0 1
1 2
2 2
3 3
4 3
5 4
6 5
Name: B, dtype: int32
df1 = df.B.groupby(df.B.str.startswith("a").cumsum()).agg(['first', 'size'])
df1.columns =['"A"','number']
df1.index.name = None
print (df1)
"A" number
1 a0 1
2 a1 2
3 a0 2
4 a2 1
5 a2 1
Say I have two dataframes, is it possible to concatenate them by columns, but with the second one appearing as a single column in the concatenated dataframe?
Pictorially, I'm looking for:
df_A:
C1 C2 C3
1 2 3
11 22 33
df_B:
D1 D2 D3
3 4 5
33 44 55
Concatenated:
C1 C2 C3 df_B
D1 D2 D3
1 2 3 3 4 5
11 22 33 33 44 55
You can contruct a MultiIndex to created a DataFrame with the desired appearance:
import pandas as pd
df_A = pd.DataFrame([(1,2,3), (11,22,33)], columns=['C1', 'C2', 'C3'])
df_B = pd.DataFrame([(3,4,5), (33,44,55)], columns=['D1', 'D2', 'D3'])
result = pd.concat([df_A, df_B], axis=1)
result.columns = pd.MultiIndex.from_tuples([(col,'') for col in df_A]
+ [('df_B', col) for col in df_B])
print(result)
yields
C1 C2 C3 df_B
D1 D2 D3
0 1 2 3 3 4 5
1 11 22 33 33 44 55