Convert a pandas dataframe column with character to the alphabetic order

Convert a pandas dataframe column with character to the alphabetic order - python

I'd like to know how to convert a column containing a single character on each row to the integer corresponding to its alphabetic order. From this:
to this:
Any hints on how to solve it?
Thanks! :)

Here is an alternative solution using pandas.Series.map and enumerate.
import pandas as pd
import string
df = pd.DataFrame({"col": ["A", "B", "C"]})
# col
# 0 A
# 1 B
# 2 C
df.col.map(
{letter: index for index, letter in enumerate(string.ascii_uppercase, start=1)}
)
# 0 1
# 1 2
# 2 3
# Name: col, dtype: int64

Use Series.rank:
df['col'] = df['col'].rank(method='dense').astype(int)
print (df)
col
0 3
1 1
2 2
3 1
4 1
Or if all values are uppercase letter use:
df['col'] = df['col'].apply(ord) - 64
print (df)
col
0 3
1 1
2 2
3 1
4 1

Related

Sort Pandas DataFrame based on previous row in another column

I have the following DataFrame in my Python porject:
df1 = pd.DataFrame({"Col A":[1,2,3],"Col B":[3,2,2]})
I wish to order it in this kind of way:
df2 = pd.DataFrame({"Col A":[1,3,2],"Col B":[3,2,2]})
My goal is that each value in Col A matches the previous' value in Col B.
Do you have any idea of how to make this work properly and as little effort as possible?
I tried to work with .sort_values(by=) but that's also where my current knowledge stops.

If need roll one value per Col B use lambda function:
df1 = pd.DataFrame({"Col A":[1,2,3,7,4,8],"Col B":[3,2,2,1,1,1]})
print (df1)
Col A Col B
0 1 3
1 2 2
2 3 2
3 7 1
4 4 1
5 8 1
df1['Col A'] = df1.groupby('Col B')['Col A'].transform(lambda x: np.roll(x, -1))
print (df1)
Col A Col B
0 1 3
1 3 2
2 2 2
3 4 1
4 8 1
5 7 1

Yes, you can achieve the desired output by using sort_values() and by creating a mapping dictionary so:
import pandas as pd
df1 = pd.DataFrame({"Col A":[1,2,3],"Col B":[3,2,2]})
# mapping_dict for ordering
mapping_dict = {1:3, 3:2, 2:2}
df1["sort_order"] = df1["Col A"].map(mapping_dict)
df2 = df1.sort_values(by="sort_order").drop(columns=["sort_order"])
print(df2)
Output:
Col A Col B
0 1 3
2 3 2
1 2 2

How to get the integer portion of a float column in pandas

Suppose I have a dataframe df as shown below
qty
0 1.300
1 1.909
Now I want to extract only the integer portion of the qty column and the df should look like
qty
0 1
1 1
Tried using df['qty'].round(0) but didn't get the desired result as it rounds of the number to the nearest integer.
Java has a function intValue() which does the desired operation. Is there a similar function in pandas ?

Convert values to integers by Series.astype:
df['qty'] = df['qty'].astype(int)
print (df)
qty
0 1
1 1
If not working above is possible use numpy.modf for extract values before .:
a, b = np.modf(df['qty'])
df['qty'] = b.astype(int)
print (df)
qty
0 1
1 1
Or by split before ., but it should be slow if large DataFrame:
df['qty'] = b.astype(str).str.strip('.').str[0].astype(int)
Or use numpy.floor:
df['qty'] = np.floor(df['qty']).astype(int)

You can use the method floordiv:
df['col'].floordiv(1).astype(int)
For example:
col
0 9.748333
1 6.612708
2 2.888753
3 8.913470
4 2.354213
Output:
0 9
1 6
2 2
3 8
4 2
Name: col, dtype: int64

Reformatting a dataframe without using for loops

I want to convert a dataframe like:
id event_type count
1 "a" 3
1 "b" 5
2 "a" 1
3 "b" 2
into a dataframe like:
id a b a > b
1 3 5 0
2 1 0 1
3 0 2 0
Without using for-loops. What's a proper pythonic (Pandas-tonic?) way of doing this?

Well, not sure if this is exactly what you need or if it has to be more flexible than this. However, this would be one way to do it - assuming missing values can be replaced by 0.
import pandas as pd
from io import StringIO
# Creating and reading the data
data = """
id event_type count
1 "a" 3
1 "b" 5
2 "a" 1
3 "b" 2
"""
df = pd.read_csv(StringIO(data), sep='\s+')
# Transforming
df_ = pd.pivot_table(df, index='id', values='count', columns='event_type') \
.fillna(0).astype(int)
df_['a > b'] = (df_['a'] > df_['b']).astype(int)
Where df_ will take the form:
event_type a b a > b
id
1 3 5 0
2 1 0 1
3 0 2 0

This can be split up into two parts.
pivot see post
assign new column
Solution
df.set_index(
[‘id’, ‘event_type’]
)[‘count’].unstack(
fill_value=0
).assign(**{
‘a < b’: lambda d: d.eval(‘a < b’)
})

Replace column with the count of specific character

I have a data frame that contains a column as the following:
1 string;string
2 string;string;string
I would like to iterate through the hole column and replace the values with the count of ";" +1 (number of strings) to get:
1 2
2 3
Thank you for any help.

You can use str.count function:
print (df)
col
1 string;string
2 string;string;string
df['col'] = df['col'].str.count(';') + 1
print (df)
col
1 2
2 3
df['col'] = df['col'].str.count(';').add(1)
print (df)
col
1 2
2 3

Pandas: conditional rolling count

I have a Series that looks the following:
col
0 B
1 B
2 A
3 A
4 A
5 B
It's a time series, therefore the index is ordered by time.
For each row, I'd like to count how many times the value has appeared consecutively, i.e.:
Output:
col count
0 B 1
1 B 2
2 A 1 # Value does not match previous row => reset counter to 1
3 A 2
4 A 3
5 B 1 # Value does not match previous row => reset counter to 1
I found 2 related questions, but I can't figure out how to "write" that information as a new column in the DataFrame, for each row (as above). Using rolling_apply does not work well.
Related:
Counting consecutive events on pandas dataframe by their index
Finding consecutive segments in a pandas data frame

I think there is a nice way to combine the solution of #chrisb and #CodeShaman (As it was pointed out CodeShamans solution counts total and not consecutive values).
df['count'] = df.groupby((df['col'] != df['col'].shift(1)).cumsum()).cumcount()+1
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1

One-liner:
df['count'] = df.groupby('col').cumcount()
or
df['count'] = df.groupby('col').cumcount() + 1
if you want the counts to begin at 1.

Based on the second answer you linked, assuming s is your series.
df = pd.DataFrame(s)
df['block'] = (df['col'] != df['col'].shift(1)).astype(int).cumsum()
df['count'] = df.groupby('block').transform(lambda x: range(1, len(x) + 1))
In [88]: df
Out[88]:
col block count
0 B 1 1
1 B 1 2
2 A 2 1
3 A 2 2
4 A 2 3
5 B 3 1

I like the answer by #chrisb but wanted to share my own solution, since some people might find it more readable and easier to use with similar problems....
1) Create a function that uses static variables
def rolling_count(val):
if val == rolling_count.previous:
rolling_count.count +=1
else:
rolling_count.previous = val
rolling_count.count = 1
return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable
2) apply it to your Series after converting to dataframe
df = pd.DataFrame(s)
df['count'] = df['col'].apply(rolling_count) #new column in dataframe
output of df
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1

If you wish to do the same thing but filter on two columns, you can use this.
def count_consecutive_items_n_cols(df, col_name_list, output_col):
cum_sum_list = [
(df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list
]
df[output_col] = df.groupby(
["_".join(map(str, x)) for x in zip(*cum_sum_list)]
).cumcount() + 1
return df
col_a col_b count
0 1 B 1
1 1 B 2
2 1 A 1
3 2 A 1
4 2 A 2
5 2 B 1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert a pandas dataframe column with character to the alphabetic order - python

I'd like to know how to convert a column containing a single character on each row to the integer corresponding to its alphabetic order. From this: to this: Any hints on how to solve it? Thanks! :)

Use Series.rank: df['col'] = df['col'].rank(method='dense').astype(int) print (df) col 0 3 1 1 2 2 3 1 4 1 Or if all values are uppercase letter use: df['col'] = df['col'].apply(ord) - 64 print (df) col 0 3 1 1 2 2 3 1 4 1

Related

Sort Pandas DataFrame based on previous row in another column

How to get the integer portion of a float column in pandas

Reformatting a dataframe without using for loops

Replace column with the count of specific character

Pandas: conditional rolling count

Categories

Resources