Search for a word and insert an empty row - python

I am new to python and developing a code
I want to search for a word in a column and if a match is found, i want to insert an empty row below that.
My code is below
If df.columnname=='total':
Df.insert
Could someone pls help me.

Do give the following a try:
>>>df
id Label
0 1 A
1 2 B
2 3 B
3 4 B
4 5 A
5 6 B
6 7 A
7 8 A
8 9 C
9 10 C
10 11 C
# Create a separate dataframe with the id of the rows to be duplicated
df1 = df.loc[df['Label']=='B', 'id']
# Join it back and reset the index
df = pd.concat(df,df1).sort_index()
>>>df
id Label
0 1 A
1 2 B
2 2 NaN
3 3 B
4 3 NaN
5 4 B
6 4 NaN
7 5 A
8 6 B
9 6 NaN
10 7 A
11 8 A
12 9 C
13 10 C
14 11 C

Use below code:
from numpy import nan as Nan
import pandas as pd
df1 = pd.DataFrame({'Column1': ['A0', 'total', 'total', 'A3'],'Column2': ['B0', 'B1',
'B2', 'B3'],'Column3': ['C0', 'C1', 'C2', 'C3'],'Column4': ['D0', 'D1', 'D2',
'D3']},index=[0, 1, 2, 3])
count = 0
for index, row in df1.iterrows():
if row["Column1"] == 'total':
df1 = pd.DataFrame(np.insert(df1.values, index+1+count, values=[" "]
* len(df1.columns), axis=0),columns = df1.columns)
count += 1
print (df1)
Input:
Column1 Column2 Column3 Column4
0 A0 B0 C0 D0
1 total B1 C1 D1
2 total B2 C2 D2
3 A3 B3 C3 D3
Output:
Column1 Column2 Column3 Column4
0 A0 B0 C0 D0
1 total B1 C1 D1
2
3 total B2 C2 D2
4
5 A3 B3 C3 D3

Related

Sum of values from related rows in Python dataframe

I have a data frame where some rows have one ID and one related ID. In the example below, a1 and a2 are related (say to the same person) while b and c don't have any related rows.
import pandas as pd
test = pd.DataFrame(
[['a1', 1, 'a2'],
['a1', 2, 'a2'],
['a1', 3, 'a2'],
['a2', 4, 'a1'],
['a2', 5, 'a1'],
['b', 6, ],
['c', 7, ]],
columns=['ID1', 'Value', 'ID2']
)
test
ID1 Value ID2
0 a1 1 a2
1 a1 2 a2
2 a1 3 a2
3 a2 4 a1
4 a2 5 a1
5 b 6 None
6 c 7 None
What I need to achieve is to add a column containing the sum of all values for related rows. In this case, the desired output should be like below. Is there a way to get this, please?
ID1
Value
ID2
Group by ID1 and ID2
a1
1
a2
15
a1
2
a2
15
a1
3
a2
15
a2
4
a1
15
a2
5
a1
15
b
6
6
c
7
7
Note that I learnt to use group by to get sum for ID1 (from this question); but not for 'ID1' and 'ID2' together.
test['Group by ID1'] = test.groupby("ID1")["Value"].transform("sum")
test
ID1 Value ID2 Group by ID1
0 a1 1 a2 6
1 a1 2 a2 6
2 a1 3 a2 6
3 a2 4 a1 9
4 a2 5 a1 9
5 b 6 None 6
6 c 7 None 7
Update
Think I can still use for loop to get this done like below. But wondering if there is another non-loop way. Thanks.
bottle = pd.DataFrame().reindex_like(test)
bottle['ID1'] = test['ID1']
bottle['ID2'] = test['ID2']
for index, row in bottle.iterrows():
bottle.loc[index, "Value"] = test[test['ID1'] == row['ID1']]['Value'].sum() + \
test[test['ID1'] == row['ID2']]['Value'].sum()
print(bottle)
ID1 Value ID2
0 a1 15.0 a2
1 a1 15.0 a2
2 a1 15.0 a2
3 a2 15.0 a1
4 a2 15.0 a1
5 b 6.0 None
6 c 7.0 None
A possible solution would be to sort the pairs in ID1 and ID2, such that they always appear in the same order.
Swapping the IDs:
s = df['ID1'] > df['ID2']
df.loc[s, ['ID1', 'ID2']] = df.loc[s, ['ID2', 'ID1']].values
print(df)
>>> ID1 Value ID2
0 a1 1 a2
1 a1 2 a2
2 a1 3 a2
3 a1 4 a2
4 a1 5 a2
5 b 6 None
6 c 7 None
Then we can do a simple groupby:
df['RSUM'] = df.groupby(['ID1', 'ID2'], dropna=False)['Value'].transform("sum")
print(df)
>>> ID1 Value ID2 RSUM
0 a1 1 a2 15
1 a1 2 a2 15
2 a1 3 a2 15
3 a1 4 a2 15
4 a1 5 a2 15
5 b 6 None 6
6 c 7 None 7
Note the dropna=False to not discard IDs that have no pairing.
If you do not want to permanently swap the IDs, you can just create a temporary dataframe.

How to widen a dataframe - pandas

Basically i want to just flatten ( maybe not good term )
for example having dataframe:
A B C
0 1 [1,2] [1, 10]
1 2 [2, 14] [2, 18]
I want to get the output of:
A B1 B2 B3 B4
0 1 1 2 1 10
1 2 2 14 2 18
I've tried:
print(pd.DataFrame(df.values.flatten().tolist(), columns=['%sG'%i for i in range(6)], index=df.index))
But nothing good.
Hope you get what i mean :)
General solution working also if lists have differents lengths:
df1 = pd.DataFrame(df['B'].values.tolist())
df2 = pd.DataFrame(df['C'].values.tolist())
df = pd.concat([df[['A']], df1, df2], axis=1)
df.columns = [df.columns[0]] + [f'B{i+1}' for i in range(len(df.columns)-1)]
print (df)
A B1 B2 B3 B4
0 1 1 2 1 10
1 2 2 14 2 18
If same size:
df1 = pd.DataFrame(np.array(df[['B','C']].values.tolist()).reshape(len(df),-1))
df1.columns = [f'B{i+1}' for i in range(len(df1.columns))]
df1.insert(0, 'A', df['A'])
print (df1)
A B1 B2 B3 B4
0 1 1 2 1 10
1 2 2 14 2 18
In more recent versions you can use explode:
>>> x = df.select_dtypes(exclude=list).join(df.select_dtypes(list).apply(pd.Series.explode, axis=1))
>>> x.columns = x.columns + x.columns.to_series().groupby(level=0).cumcount().add(1).astype(str)
>>> x
A1 B1 B2 C1 C2
0 1 1 2 1 10
1 2 2 14 2 18
>>>

populate a pandas column with the id from the min value of another pandas DF

I'm looking to iterate through a list of orders and assign an owner id to each order. The id is in a separate pandas dataframe (I've also tried changing this into a Series and OrderedDict. I would like to locate the min value from the df and use that for the first order in orders, then add 1 to the count of that ids count, and repeat until all the orders are filled.
Reproducible Example:
df = pd.DataFrame({'Id':['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'], 'count':[2, 3, 5, 6, 8, 9, 12, 13, 15, 55]})
orders = pd.DataFrame({'order_id':['a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'a10', 'a11', 'a12', 'a13']})
orders['newowner'] = ""
Owners:
df
Id count
0 a 2
1 b 3
2 c 5
3 d 6
4 e 8
5 f 9
6 g 12
7 h 13
8 i 15
9 j 55
Orders:
order_id newowner
0 a1
1 a2
2 a3
3 a4
4 a5
5 a6
6 a7
7 a8
8 a9
9 a10
10 a11
11 a12
12 a13
expected result:
order_id newowner
0 a1 a # brings a up to 3 records
1 a2 a # a and b are tied with 3, so it goes to a again (doesn't matter which gets it first)
2 a3 b # now b has 3, and a has 4, so it goes to b
3 a4 a # both have 4 so a
4 a5 b # etc.
5 a6 a
6 a7 b
7 a8 c
8 a9 a
9 a10 b
10 a11 c
11 a12 a
12 a13 b
I've tried finding the min of the df.count, as well as tried to loop through each, but am having a hard time isolating each order.
for order in orders.iteritems():
order['newowner'] = df.count.min()
for order in orders.iteritems():
for name in df.iteritems:
idx = df[df.count == df.count.min()]['Id']
order['newonwer'] = idx
Here is one way via df.apply:
def set_owner(order_id):
min_idx = df['count'].idxmin()
df.loc[min_idx, 'count'] += 1
return df.loc[min_idx, 'Id']
orders['newowner'] = orders['order_id'].apply(set_owner)
orders
# order_id newowner
# 0 a1 a
# 1 a2 a
# 2 a3 b
# 3 a4 a
# 4 a5 b
# 5 a6 a
# 6 a7 b
# 7 a8 c
# 8 a9 a
# 9 a10 b
# 10 a11 c
# 11 a12 d
# 12 a13 a
df
# Id count
# 0 a 8
# 1 b 7
# 2 c 7
# 3 d 7
# 4 e 8
# 5 f 9
# 6 g 12
# 7 h 13
# 8 i 15
# 9 j 55
I'm not sure this is the way I'd do it. I'd probably look for a way to use df.apply if possible. But I think this code will give you the expected results.
for idx, order in orders.iterrows():
idxmin = df['count'].idxmin()
df.loc[idxmin, 'count'] += 1
order['newowner'] = df.loc[idxmin,'Id']

How to count the following number of rows in pandas (new)

Relate to the question below,I would like to count the number of following rows.
Thanks to the answer,I could handle data.
But I met some trouble and exception.
How to count the number of following rows in pandas
A B
1 a0
2 a1
3 b1
4 a0
5 b2
6 a2
7 a2
First,I would like to cut df.with startswith("a")
df1
A B
1 a0
df2
A B
2 a1
3 b1
df3
A B
4 a0
5 b2
df4
A B
6 a2
df5
A B
7 a2
I would like to count each df's rows
"a" number
a0 1
a1 2
a0 2
a2 1
a2 1
How could be this done?
I am happy someone tell me how to handle this kind of problem.
You can use aggregate by custom Series created with cumsum:
print (df.B.str.startswith("a").cumsum())
0 1
1 2
2 2
3 3
4 3
5 4
6 5
Name: B, dtype: int32
df1 = df.B.groupby(df.B.str.startswith("a").cumsum()).agg(['first', 'size'])
df1.columns =['"A"','number']
df1.index.name = None
print (df1)
"A" number
1 a0 1
2 a1 2
3 a0 2
4 a2 1
5 a2 1

Pandas hierarchical columns

Say I have two dataframes, is it possible to concatenate them by columns, but with the second one appearing as a single column in the concatenated dataframe?
Pictorially, I'm looking for:
df_A:
C1 C2 C3
1 2 3
11 22 33
df_B:
D1 D2 D3
3 4 5
33 44 55
Concatenated:
C1 C2 C3 df_B
D1 D2 D3
1 2 3 3 4 5
11 22 33 33 44 55
You can contruct a MultiIndex to created a DataFrame with the desired appearance:
import pandas as pd
df_A = pd.DataFrame([(1,2,3), (11,22,33)], columns=['C1', 'C2', 'C3'])
df_B = pd.DataFrame([(3,4,5), (33,44,55)], columns=['D1', 'D2', 'D3'])
result = pd.concat([df_A, df_B], axis=1)
result.columns = pd.MultiIndex.from_tuples([(col,'') for col in df_A]
+ [('df_B', col) for col in df_B])
print(result)
yields
C1 C2 C3 df_B
D1 D2 D3
0 1 2 3 3 4 5
1 11 22 33 33 44 55

Categories