add a new counter based on an existing counter in python pandas - python

I have a Series that look like this
col
0 1
1 2
2 3
3 4
4 1
5 2
6 3
7 1
8 2
9 3
10 1
11 2
and I would like to generate a second counter that looks like this
col col2
0 1 1
1 2 1
2 3 1
3 4 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
10 1 4
11 2 4
How can I do that in python?

If 1 is always start of groups then create mask by compare by Series.eq and then add Series.cumsum for cumulative sum:
df['col2'] = df['col'].eq(1).cumsum()
print (df)
col col2
0 1 1
1 2 1
2 3 1
3 4 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
10 1 4
11 2 4

Related

Modify equivalent values in a column

I'm working with Pandas, but I have a question about how to change equivalent values. I want to work with binary values in the "class" column so I have 1 and I want 2 and 3 to be changed to 0. Ah! And I don't just have these lines, I have 70 in total! How do I do it?
id class col2 col3
0 1 2 0
1 1 1 1
2 1 9 9
3 2 8 4
4 2 7 2
5 2 4 3
6 2 7 2
7 3 1 4
8 3 2 8
9 3 3 9
I want this result:
id class col2 col3
0 1 2 0
1 1 1 1
2 1 9 9
3 0 8 4
4 0 7 2
5 0 4 3
6 0 7 2
7 0 1 4
8 0 2 8
9 0 3 9
Try this:
df.loc[df['class'] > 1, 'class'] = 0

Pandas Add an incremental number based on another column

Consider a dataframe with a column like this:
sequence
1
2
3
4
5
1
2
3
1
2
3
4
5
6
7
I wish to create a column when the sequence resets. The sequence is of variable length.
Such that I'd get something like:
sequence run
1 1
2 1
3 1
4 1
5 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
5 3
6 3
7 3
Try with diff then cumsum
df['run'] = df['sequence'].diff().ne(1).cumsum()
Out[349]:
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
10 3
11 3
12 3
13 3
14 3
Name: sequence, dtype: int32
Use:
dataset['run'] = dataset.groupby('sequence ').cumcount().add(1)
output example:
sequence run
y 1
a 1
g 1
a 2
b 1
a 3
b 2

Better way other than for loops,

I want to create a DataFrame that has the columns feature1, month and feature_segment. I have over 3,000 unique values in feature1 and 3 feature_segments, I now have to map each feature to each month and feature_segment,
for example:
feature1 = 1 so the mapping should create a data frame as such:
feature1 month feature_Segment
1 1 1
1 1 2
1 1 3
1 2 1
1 2 2
1 2 3
1 3 1
1 3 2
1 3 3
1 4 1
1 4 2
1 4 3
1 5 1
1 5 2
1 5 3
1 6 1
1 6 2
1 6 3
1 7 1
1 7 2
1 7 3
1 8 1
1 8 2
1 8 3
1 9 1
1 9 2
1 9 3
1 10 1
1 10 2
1 10 3
1 11 1
1 11 2
1 11 3
1 12 1
1 12 2
1 12 3
Now is there any way to create this data frame without using a for loop?
All the df columns are in lists.
Use itertools.product:
from itertools import product
feature = [1]
feature_Segment = [1,2,3]
month = range(1, 13)
df = pd.DataFrame(product(feature, month, feature_Segment),
columns=['feature1','month','feature_Segment'])
print (df.head(10))
feature1 month feature_Segment
0 1 1 1
1 1 1 2
2 1 1 3
3 1 2 1
4 1 2 2
5 1 2 3
6 1 3 1
7 1 3 2
8 1 3 3
9 1 4 1

Repeating rows in a DataFrame based on a column

I have a dataframe now:
class1 class2 value value2
0 1 0 1 4
1 2 1 2 3
2 2 0 3 5
3 3 1 4 6
I want to repeat rows and insert an increment column in the same amount according to the difference between value and value2. I want to get the dataframe should like this:
class1 class2 value value2 value3
0 1 0 1 4 1
1 1 0 1 4 2
2 1 0 1 4 3
3 1 0 1 4 4
4 2 1 2 3 2
5 2 1 2 3 3
6 2 0 3 5 3
7 2 0 3 5 4
8 2 0 3 5 5
9 3 1 4 6 4
10 3 1 4 6 5
11 3 1 4 6 6
I tried it like:
def func(x):
copy = x.copy()
num = x.value2+1-x.value
return pd.concat([copy]*num.values[0])
df= df.groupby(['class1','class2']).apply(lambda x:func(x))
But there will be a oredr problem that leads me to not know how to add column value3. And I'd like to have an elegant way of doing it.
Can anyone help me? Thanks in advance
Compute the difference and call Index.repeat:
idx = df.index.repeat(df.value2 - df.value + 1)
Now, either use reindex:
df = df.reindex(idx).reset_index(drop=True)
Or loc:
df = df.loc[idx].reset_index(drop=True)
And you get
df
class1 class2 value value2
0 1 0 1 4
1 1 0 1 4
2 1 0 1 4
3 1 0 1 4
4 2 1 2 3
5 2 1 2 3
6 2 0 3 5
7 2 0 3 5
8 2 0 3 5
9 3 1 4 6
10 3 1 4 6
11 3 1 4 6
For the second part of your question, you'll need groupby.cumcount:
s = idx.to_series()
df['value3'] = df['value'] + s.groupby(idx).cumcount().values
df
class1 class2 value value2 value3
0 1 0 1 4 1
1 1 0 1 4 2
2 1 0 1 4 3
3 1 0 1 4 4
4 2 1 2 3 2
5 2 1 2 3 3
6 2 0 3 5 3
7 2 0 3 5 4
8 2 0 3 5 5
9 3 1 4 6 4
10 3 1 4 6 5
11 3 1 4 6 6
Here's a sequence of things that would get you the desired output:
df.join(df
.apply(lambda x: pd.Series(range(x.value, x.value2+1)), axis=1)
.stack().astype(int)
.reset_index(level=1, drop=1)
.to_frame('value3')).reset_index(drop=1)
Out[]:
class1 class2 value value2 value3
0 1 0 1 4 1
1 1 0 1 4 2
2 1 0 1 4 3
3 1 0 1 4 4
4 2 1 2 3 2
5 2 1 2 3 3
6 2 0 3 5 3
7 2 0 3 5 4
8 2 0 3 5 5
9 3 1 4 6 4
10 3 1 4 6 5
11 3 1 4 6 6

With python dataframes add column of counts of rows that meet condition to each row that meets it

Say I have a python DataFrame with the following structure:
pd.DataFrame([[1,2,3,4],[1,2,3,4],[1,3,5,6],[1,4,6,7],[1,4,6,7],[1,4,6,7]])
Out[262]:
0 1 2 3
0 1 2 3 4
1 1 2 3 4
2 1 3 5 6
3 1 4 6 7
4 1 4 6 7
5 1 4 6 7
How can I add a column called 'ct' that counts the instances of the DataFrame where column 1-3 match to each row that matches... so the DataFrame would look like this when all is completed.
0 1 2 3 ct
0 1 2 3 4 2
1 1 2 3 4 2
2 1 3 5 6 1
3 1 4 6 7 3
4 1 4 6 7 3
5 1 4 6 7 3
You can use groupby + transform + size:
df['ct'] = df.groupby([1,2,3])[1].transform('size')
#alternatively
#df['ct'] = df.groupby([1,2,3])[1].transform(len)
print (df)
0 1 2 3 ct
0 1 2 3 4 2
1 1 2 3 4 2
2 1 3 5 6 1
3 1 4 6 7 3
4 1 4 6 7 3
5 1 4 6 7 3

Categories