Replace column with the count of specific character

Replace column with the count of specific character - python

I have a data frame that contains a column as the following:
1 string;string
2 string;string;string
I would like to iterate through the hole column and replace the values with the count of ";" +1 (number of strings) to get:
1 2
2 3
Thank you for any help.

You can use str.count function:
print (df)
col
1 string;string
2 string;string;string
df['col'] = df['col'].str.count(';') + 1
print (df)
col
1 2
2 3
df['col'] = df['col'].str.count(';').add(1)
print (df)
col
1 2
2 3

Related

Sort Pandas DataFrame based on previous row in another column

I have the following DataFrame in my Python porject:
df1 = pd.DataFrame({"Col A":[1,2,3],"Col B":[3,2,2]})
I wish to order it in this kind of way:
df2 = pd.DataFrame({"Col A":[1,3,2],"Col B":[3,2,2]})
My goal is that each value in Col A matches the previous' value in Col B.
Do you have any idea of how to make this work properly and as little effort as possible?
I tried to work with .sort_values(by=) but that's also where my current knowledge stops.

If need roll one value per Col B use lambda function:
df1 = pd.DataFrame({"Col A":[1,2,3,7,4,8],"Col B":[3,2,2,1,1,1]})
print (df1)
Col A Col B
0 1 3
1 2 2
2 3 2
3 7 1
4 4 1
5 8 1
df1['Col A'] = df1.groupby('Col B')['Col A'].transform(lambda x: np.roll(x, -1))
print (df1)
Col A Col B
0 1 3
1 3 2
2 2 2
3 4 1
4 8 1
5 7 1

Yes, you can achieve the desired output by using sort_values() and by creating a mapping dictionary so:
import pandas as pd
df1 = pd.DataFrame({"Col A":[1,2,3],"Col B":[3,2,2]})
# mapping_dict for ordering
mapping_dict = {1:3, 3:2, 2:2}
df1["sort_order"] = df1["Col A"].map(mapping_dict)
df2 = df1.sort_values(by="sort_order").drop(columns=["sort_order"])
print(df2)
Output:
Col A Col B
0 1 3
2 3 2
1 2 2

Convert a pandas dataframe column with character to the alphabetic order

I'd like to know how to convert a column containing a single character on each row to the integer corresponding to its alphabetic order. From this:
to this:
Any hints on how to solve it?
Thanks! :)

Here is an alternative solution using pandas.Series.map and enumerate.
import pandas as pd
import string
df = pd.DataFrame({"col": ["A", "B", "C"]})
# col
# 0 A
# 1 B
# 2 C
df.col.map(
{letter: index for index, letter in enumerate(string.ascii_uppercase, start=1)}
)
# 0 1
# 1 2
# 2 3
# Name: col, dtype: int64

Use Series.rank:
df['col'] = df['col'].rank(method='dense').astype(int)
print (df)
col
0 3
1 1
2 2
3 1
4 1
Or if all values are uppercase letter use:
df['col'] = df['col'].apply(ord) - 64
print (df)
col
0 3
1 1
2 2
3 1
4 1

Create readable string in pandas dataframe

I have a single column dataframe:
col1
1
2
3
4
I need to create another column where it will be a string like:
Result:
col1 col2
1 Value is 1
2 Value is 2
3 Value is 3
4 Value is 4
I know about formatted strings but not sure how to implement it in dataframe

Convert column to string and prepend values:
df['col2'] = 'Value is ' + df['col1'].astype(str)
Or use f-strings with Series.map:
df['col2'] = df['col1'].map(lambda x: f'Value is {x}')
print (df)
col1 col2
0 1 Value is 1
1 2 Value is 2
2 3 Value is 3
3 4 Value is 4

Find the frequency distribution of the first character of the name in the table in python 3

I have a table like
key Name
1 snake
2 panda
3 parrot
4 catipie
5 cattie
Now I want to find the count of occurrence of first character of each row and sort in descending order and if there is a tie , it should sort in lexical order , so my output looks like :
c 2
p 2
s 1

Select first value by indexing str[0] and count by value_counts:
s = df['Name'].str[0].value_counts()
print (s)
p 2
c 2
s 1
Name: Name, dtype: int64
And for DataFrame add rename_axis with reset_index:
df = df['Name'].str[0].value_counts().rename_axis('first').reset_index(name='count')
print (df)
first count
0 p 2
1 c 2
2 s 1
If necessary sorting same count by letters add sort_values:
df = df.sort_values(['first','count'], ascending=[True, False])
print (df)
first count
1 c 2
0 p 2
2 s 1
And for Series:
s = df.set_index('first')['count']
print (s)
first
c 2
p 2
s 1
Name: count, dtype: int64
Last use to_string:
print (s.to_string(header=None))
c 2
p 2
s 1

Pandas: conditional rolling count

I have a Series that looks the following:
col
0 B
1 B
2 A
3 A
4 A
5 B
It's a time series, therefore the index is ordered by time.
For each row, I'd like to count how many times the value has appeared consecutively, i.e.:
Output:
col count
0 B 1
1 B 2
2 A 1 # Value does not match previous row => reset counter to 1
3 A 2
4 A 3
5 B 1 # Value does not match previous row => reset counter to 1
I found 2 related questions, but I can't figure out how to "write" that information as a new column in the DataFrame, for each row (as above). Using rolling_apply does not work well.
Related:
Counting consecutive events on pandas dataframe by their index
Finding consecutive segments in a pandas data frame

I think there is a nice way to combine the solution of #chrisb and #CodeShaman (As it was pointed out CodeShamans solution counts total and not consecutive values).
df['count'] = df.groupby((df['col'] != df['col'].shift(1)).cumsum()).cumcount()+1
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1

One-liner:
df['count'] = df.groupby('col').cumcount()
or
df['count'] = df.groupby('col').cumcount() + 1
if you want the counts to begin at 1.

Based on the second answer you linked, assuming s is your series.
df = pd.DataFrame(s)
df['block'] = (df['col'] != df['col'].shift(1)).astype(int).cumsum()
df['count'] = df.groupby('block').transform(lambda x: range(1, len(x) + 1))
In [88]: df
Out[88]:
col block count
0 B 1 1
1 B 1 2
2 A 2 1
3 A 2 2
4 A 2 3
5 B 3 1

I like the answer by #chrisb but wanted to share my own solution, since some people might find it more readable and easier to use with similar problems....
1) Create a function that uses static variables
def rolling_count(val):
if val == rolling_count.previous:
rolling_count.count +=1
else:
rolling_count.previous = val
rolling_count.count = 1
return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable
2) apply it to your Series after converting to dataframe
df = pd.DataFrame(s)
df['count'] = df['col'].apply(rolling_count) #new column in dataframe
output of df
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1

If you wish to do the same thing but filter on two columns, you can use this.
def count_consecutive_items_n_cols(df, col_name_list, output_col):
cum_sum_list = [
(df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list
]
df[output_col] = df.groupby(
["_".join(map(str, x)) for x in zip(*cum_sum_list)]
).cumcount() + 1
return df
col_a col_b count
0 1 B 1
1 1 B 2
2 1 A 1
3 2 A 1
4 2 A 2
5 2 B 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace column with the count of specific character - python

I have a data frame that contains a column as the following: 1 string;string 2 string;string;string I would like to iterate through the hole column and replace the values with the count of ";" +1 (number of strings) to get: 1 2 2 3 Thank you for any help.

You can use str.count function: print (df) col 1 string;string 2 string;string;string df['col'] = df['col'].str.count(';') + 1 print (df) col 1 2 2 3 df['col'] = df['col'].str.count(';').add(1) print (df) col 1 2 2 3

Related

Sort Pandas DataFrame based on previous row in another column

Convert a pandas dataframe column with character to the alphabetic order

Create readable string in pandas dataframe

Find the frequency distribution of the first character of the name in the table in python 3

Pandas: conditional rolling count

Categories

Resources