I have a data frame that contains a column as the following:
1 string;string
2 string;string;string
I would like to iterate through the hole column and replace the values with the count of ";" +1 (number of strings) to get:
1 2
2 3
Thank you for any help.
You can use str.count function:
print (df)
col
1 string;string
2 string;string;string
df['col'] = df['col'].str.count(';') + 1
print (df)
col
1 2
2 3
df['col'] = df['col'].str.count(';').add(1)
print (df)
col
1 2
2 3
Related
I have the following DataFrame in my Python porject:
df1 = pd.DataFrame({"Col A":[1,2,3],"Col B":[3,2,2]})
I wish to order it in this kind of way:
df2 = pd.DataFrame({"Col A":[1,3,2],"Col B":[3,2,2]})
My goal is that each value in Col A matches the previous' value in Col B.
Do you have any idea of how to make this work properly and as little effort as possible?
I tried to work with .sort_values(by=) but that's also where my current knowledge stops.
If need roll one value per Col B use lambda function:
df1 = pd.DataFrame({"Col A":[1,2,3,7,4,8],"Col B":[3,2,2,1,1,1]})
print (df1)
Col A Col B
0 1 3
1 2 2
2 3 2
3 7 1
4 4 1
5 8 1
df1['Col A'] = df1.groupby('Col B')['Col A'].transform(lambda x: np.roll(x, -1))
print (df1)
Col A Col B
0 1 3
1 3 2
2 2 2
3 4 1
4 8 1
5 7 1
Yes, you can achieve the desired output by using sort_values() and by creating a mapping dictionary so:
import pandas as pd
df1 = pd.DataFrame({"Col A":[1,2,3],"Col B":[3,2,2]})
# mapping_dict for ordering
mapping_dict = {1:3, 3:2, 2:2}
df1["sort_order"] = df1["Col A"].map(mapping_dict)
df2 = df1.sort_values(by="sort_order").drop(columns=["sort_order"])
print(df2)
Output:
Col A Col B
0 1 3
2 3 2
1 2 2
I'd like to know how to convert a column containing a single character on each row to the integer corresponding to its alphabetic order. From this:
to this:
Any hints on how to solve it?
Thanks! :)
Here is an alternative solution using pandas.Series.map and enumerate.
import pandas as pd
import string
df = pd.DataFrame({"col": ["A", "B", "C"]})
# col
# 0 A
# 1 B
# 2 C
df.col.map(
{letter: index for index, letter in enumerate(string.ascii_uppercase, start=1)}
)
# 0 1
# 1 2
# 2 3
# Name: col, dtype: int64
Use Series.rank:
df['col'] = df['col'].rank(method='dense').astype(int)
print (df)
col
0 3
1 1
2 2
3 1
4 1
Or if all values are uppercase letter use:
df['col'] = df['col'].apply(ord) - 64
print (df)
col
0 3
1 1
2 2
3 1
4 1
I have a single column dataframe:
col1
1
2
3
4
I need to create another column where it will be a string like:
Result:
col1 col2
1 Value is 1
2 Value is 2
3 Value is 3
4 Value is 4
I know about formatted strings but not sure how to implement it in dataframe
Convert column to string and prepend values:
df['col2'] = 'Value is ' + df['col1'].astype(str)
Or use f-strings with Series.map:
df['col2'] = df['col1'].map(lambda x: f'Value is {x}')
print (df)
col1 col2
0 1 Value is 1
1 2 Value is 2
2 3 Value is 3
3 4 Value is 4
I have a table like
key Name
1 snake
2 panda
3 parrot
4 catipie
5 cattie
Now I want to find the count of occurrence of first character of each row and sort in descending order and if there is a tie , it should sort in lexical order , so my output looks like :
c 2
p 2
s 1
Select first value by indexing str[0] and count by value_counts:
s = df['Name'].str[0].value_counts()
print (s)
p 2
c 2
s 1
Name: Name, dtype: int64
And for DataFrame add rename_axis with reset_index:
df = df['Name'].str[0].value_counts().rename_axis('first').reset_index(name='count')
print (df)
first count
0 p 2
1 c 2
2 s 1
If necessary sorting same count by letters add sort_values:
df = df.sort_values(['first','count'], ascending=[True, False])
print (df)
first count
1 c 2
0 p 2
2 s 1
And for Series:
s = df.set_index('first')['count']
print (s)
first
c 2
p 2
s 1
Name: count, dtype: int64
Last use to_string:
print (s.to_string(header=None))
c 2
p 2
s 1
I have a Series that looks the following:
col
0 B
1 B
2 A
3 A
4 A
5 B
It's a time series, therefore the index is ordered by time.
For each row, I'd like to count how many times the value has appeared consecutively, i.e.:
Output:
col count
0 B 1
1 B 2
2 A 1 # Value does not match previous row => reset counter to 1
3 A 2
4 A 3
5 B 1 # Value does not match previous row => reset counter to 1
I found 2 related questions, but I can't figure out how to "write" that information as a new column in the DataFrame, for each row (as above). Using rolling_apply does not work well.
Related:
Counting consecutive events on pandas dataframe by their index
Finding consecutive segments in a pandas data frame
I think there is a nice way to combine the solution of #chrisb and #CodeShaman (As it was pointed out CodeShamans solution counts total and not consecutive values).
df['count'] = df.groupby((df['col'] != df['col'].shift(1)).cumsum()).cumcount()+1
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1
One-liner:
df['count'] = df.groupby('col').cumcount()
or
df['count'] = df.groupby('col').cumcount() + 1
if you want the counts to begin at 1.
Based on the second answer you linked, assuming s is your series.
df = pd.DataFrame(s)
df['block'] = (df['col'] != df['col'].shift(1)).astype(int).cumsum()
df['count'] = df.groupby('block').transform(lambda x: range(1, len(x) + 1))
In [88]: df
Out[88]:
col block count
0 B 1 1
1 B 1 2
2 A 2 1
3 A 2 2
4 A 2 3
5 B 3 1
I like the answer by #chrisb but wanted to share my own solution, since some people might find it more readable and easier to use with similar problems....
1) Create a function that uses static variables
def rolling_count(val):
if val == rolling_count.previous:
rolling_count.count +=1
else:
rolling_count.previous = val
rolling_count.count = 1
return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable
2) apply it to your Series after converting to dataframe
df = pd.DataFrame(s)
df['count'] = df['col'].apply(rolling_count) #new column in dataframe
output of df
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 B 1
If you wish to do the same thing but filter on two columns, you can use this.
def count_consecutive_items_n_cols(df, col_name_list, output_col):
cum_sum_list = [
(df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list
]
df[output_col] = df.groupby(
["_".join(map(str, x)) for x in zip(*cum_sum_list)]
).cumcount() + 1
return df
col_a col_b count
0 1 B 1
1 1 B 2
2 1 A 1
3 2 A 1
4 2 A 2
5 2 B 1