Adding particular rows and changing their positions

Adding particular rows and changing their positions - python

I have an example(left one in image description).
There are several indexes in the first column. However, the third (not just for the third one because I have data from more than a thousand repeating intervals) of the repeating characters is missing-data which is 'GG'.
Question: I want to add particular rows (like 'GG') with the value 'NaN'
I want to display its values in different columns based on the characters of the repeated section(from 'II' to '//\n').
Is there any way I can do in this situation?.

Assuming your data is a dataframe with all data as columns (if not, you first need to reset_index):
(df
#.reset_index() # uncomment if first column is index
.assign(cols=df.groupby('col1').cumcount())
.pivot(index='col1', columns='cols', values='col2')
)
output:
cols 0 1 2
col1
A 0.0 4.0 7.0
B 1.0 5.0 8.0
C 2.0 9.0 NaN
D 3.0 6.0 10.0
input:
col1 col2
0 A 0
1 B 1
2 C 2
3 D 3
4 A 4
5 B 5
6 D 6
7 A 7
8 B 8
9 C 9
10 D 10

Related

Duplicate positions from group

I have the following Dataset:
col value
0 A 1
1 A NaN
2 B NaN
3 B NaN
4 B NaN
5 B 1
6 C 3
7 C NaN
8 C NaN
9 D 5
10 E 6
There is only one value set per group, the rest in Nan. What I want to do know, is fill the NaN with he value of the group. If a group has no NaNs, I just want to ignore it.
Outcome should look like this:
col value
0 A 1
1 A 1
2 B 1
3 B 1
4 B 1
5 B 1
6 C 3
7 C 3
8 C 3
9 D 5
10 E 6
What I've tried so far is the following:
df["value"] = df.groupby(col).transform(lambda x: x.fillna(x.mean()))
However, this method is not only super slow, but doesn't give me the wished result.
Anybody an idea?

It depends of data - if there is always one non missing value you can sorting and then replace by GroupBy.ffill, it working well if some groups has NANs only:
df = df.sort_values(['col','value'])
df["value"] = df.groupby('col')["value"].ffill()
#if always only one non missing value per group, fail if all NaNs of some group
#df["value"] = df["value"].ffill()
print (df)
col value
0 A 1.0
1 A 1.0
5 B 1.0
2 B 1.0
3 B 1.0
4 B 1.0
6 C 3.0
7 C 3.0
8 C 3.0
9 D 5.0
10 E 6.0
Or if there is multiple values and need replace by mean, for improve performace change your solution with GroupBy.transform only mean passed to Series.fillna:
df["value"] = df["value"].fillna(df.groupby('col')["value"].transform('mean'))
print (df)
col value
0 A 1.0
1 A 1.0
5 B 1.0
2 B 1.0
3 B 1.0
4 B 1.0
6 C 3.0
7 C 3.0
8 C 3.0
9 D 5.0
10 E 6.0

You can use ffill which is the same as fillna() with method=ffill (see docs)
df["value"] = df["value"].ffill()

Pandas Dataframe Question: Subtract next row and add specific value if NaN

Trying to groupby in pandas, then sort values and have a result column show what you need to add to get to the next row in the group, and if your are the end of the group. To replace the value with the number 3. Anyone have an idea how to do it?
import pandas as pd
df = pd.DataFrame({'label': 'a a b c b c'.split(), 'Val': [2,6,6, 4,16, 8]})
df
label Val
0 a 2
1 a 6
2 b 6
3 c 4
4 b 16
5 c 8
Id like the results as shown below, that you have to add 4 to 2 to get 6. So the groups are sorted. But if there is no next value in the group and NaN is added. To replace it with the value 3. I have shown below what the results should look like:
label Val Results
0 a 2 4.0
1 a 6 3.0
2 b 6 10.0
3 c 4 4.0
4 b 16 3.0
5 c 8 3.0
I tried this, and was thinking of shifting values up but the problem is that the labels aren't sorted.
df['Results'] = df.groupby('label').apply(lambda x: x - x.shift())`
df
label Val Results
0 a 2 NaN
1 a 6 4.0
2 b 6 NaN
3 c 4 NaN
4 b 16 10.0
5 c 8 4.0
Hope someone can help:D!

Use groupby, diff and abs:
df['Results'] = abs(df.groupby('label')['Val'].diff(-1)).fillna(3)
label Val Results
0 a 2 4.0
1 a 6 3.0
2 b 6 10.0
3 c 4 4.0
4 b 16 3.0
5 c 8 3.0

pandas using .where() to replace in .groupby() object

Consider a dataframe which contains several groups of integers:
d = pd.DataFrame({'label': ['a','a','a','a','b','b','b','b'], 'value': [1,2,3,2,7,1,8,9]})
d
label value
0 a 1
1 a 2
2 a 3
3 a 2
4 b 7
5 b 1
6 b 8
7 b 9
For each of these groups of integers, each integer has to be bigger or equal to the previous one. If not the case, it takes on the value of the previous integer. I replace using
s.where(~(s < s.shift()), s.shift())
which works fine for a single series. I can even group the dataframe, and loop through each extracted series:
grouped = s.groupby('label')['value']
for _, s in grouped:
print(s.where(~(s < s.shift()), s.shift()))
0 1.0
1 2.0
2 3.0
3 3.0
Name: value, dtype: float64
4 7.0
5 7.0
6 8.0
7 9.0
Name: value, dtype: float64
However, how do I now get these values back into my original dataframe?
Or, is there a better way to do this? I don't care for using .groupby and don't consider the for loop a pretty solution either...

IIUC, you can use cummax in the groupby like:
d['val_max'] = d.groupby('label')['value'].cummax()
print (d)
label value val_max
0 a 1 1
1 a 2 2
2 a 3 3
3 a 2 3
4 b 7 7
5 b 1 7
6 b 8 8
7 b 9 9

How to pivot and get the mean values of each column to rows

I'm new in python and I need your assistance with getting the result when you add your columns as value and your values in rows.
Here's an example:
columns
A B C
1 2 3
4 5 6
7 8 9
Expected result:
avg
A 4
B 5
C 6
I can do it easily in excel by placing the columns in "Values" the move the values in rows to get the average but I can't seem to do it in python.

df=pd.DataFrame({'A':[1,4,7],'B':[2,5,8],'C':[3,6,9]})
df
A B C
0 1 2 3
1 4 5 6
2 7 8 9
ser=df.mean() #Result is a Series
df=pd.DataFrame({'avg':ser}) #Convert this Series into DataFrame
df
avg
A 4.0
B 5.0
C 6.0

Using to_frame
df.mean().to_frame('ave')
Out[186]:
ave
A 4.0
B 5.0
C 6.0

Pandas combining dataframes based on column value

I am trying to turn multiple dataframes into a single one based on the values in the first column, but not every dataframe has the same values in the first column. Take this example:
df1:
A 4
B 6
C 8
df2:
A 7
B 4
F 3
full_df:
A 4 7
B 6 4
C 8
F 3
How do I do this using python and pandas?

You can use pandas merge with outer join
df1.merge(df2,on =['first_column'],how='outer')

You can use pd.concat, remembering to align indices:
res = pd.concat([df1.set_index(0), df2.set_index(0)], axis=1)
print(res)
1 1
A 4.0 7.0
B 6.0 4.0
C 8.0 NaN
F NaN 3.0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding particular rows and changing their positions - python

Related

Duplicate positions from group

Pandas Dataframe Question: Subtract next row and add specific value if NaN

pandas using .where() to replace in .groupby() object

How to pivot and get the mean values of each column to rows

Pandas combining dataframes based on column value

Categories

Resources