How to count all values of a one column? [duplicate] - python

This question already has answers here:
Count the frequency that a value occurs in a dataframe column
(15 answers)
Closed 3 years ago.
I am trying to count all the instances of all values of col_a
for ex.
col_a
A
B
C
A
D
B
A
Is there one line of code I can use that would tell me how many times each value (A,B,C,D) exist in that column?

So the solution will be value_counts
df.col_a.value_counts()

Or use groupby with size:
>>> df.groupby('col_a').size()
col_a
A 3
B 2
C 1
D 1
dtype: int64
>>>

Related

repeats the value in one column to fill the empty cells in that column [duplicate]

This question already has answers here:
How to replace NaNs by preceding or next values in pandas DataFrame?
(10 answers)
Closed 1 year ago.
I have a dataframe like this:
A B
a
a
a b
a
a
a B
I want to fill the empty cells in the column "B" with the existing values in "B". so that the end result will be:
A B
a b
a b
a b
a B
a B
a B
I have tried the idea to get the column "B" in a pandas series and remove the empty cells.
tmp=df['B']
tmp.dropna(axis=0, inplace=True, how=None)
Then I want to repeat each item in the tmp series three times and put it back to the origianl dataframe. But failed.
My solution may not be a good one. Any suggestion could help!
Thanks in advance.
I cannot find duplicate, so use bfill only if empty values are missing values in some column:
df["B"] = df["B"].replace('', np.nan).bfill()
You need to replace the empty strings with replace, then use bfill, backward fill:
>>> df.replace('', np.nan).bfill()
A B
0 a b
1 a b
2 a b
3 a B
4 a B
5 a B
>>>

Replace Value in column based on value in given list [duplicate]

This question already has answers here:
Pandas dataframe column value case insensitive replace where <condition>
(2 answers)
Closed 1 year ago.
I have a column in the data frame which allowed only values present in a defined list.
E.g.: Given a list l1 = [1,2,5,6], I need to replace every value with "0" if value in column is not present in the list
column
Expected column
1
1
5
5
2
2
3
0
4
0
3
0
6
6
I have tried using loc
df.loc[~l1, 0, df.column]
But this says TypeError. What is the efficient way in python to replace the value ?
df.loc[~df['column'].isin(l1), 'Expected column'] = 0

How to aggregate duplicate rows in python? [duplicate]

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
How to count duplicate rows in pandas dataframe?
(10 answers)
Closed 2 years ago.
I have a dataframe that looks like this:
Cell1 Cell2 Cell3
A B B
A B B
B B A
C B A
I am trying to get the following output:
Cell1 Cell2 Cell3 sum
A B B 2
B B A 1
C B A 1
I tried the aggregate function, but can't find the solution for this.

Pandas merge two dataframes summing values [duplicate]

This question already has answers here:
how to merge two dataframes and sum the values of columns
(2 answers)
Closed 4 years ago.
Suppose I have two dataframes with partly repeated entries:
source1=pandas.DataFrame({'key':['a','b'],'value':[1,2]})
# key value
#0 a 1
#1 b 2
source2=pandas.DataFrame({'key':['b','c'],'value':[3,0]})
# key value
#0 b 3
#1 c 0
What do I need to do with source1 and source2 in order to get resulting frame with following entries:
# key value
#0 a 1
#1 b 5
#2 c 0
Just add
source1.set_index('key').add(source2.set_index('key'), fill_value=0)
If key is already the index, just use
source1.add(source2, fill_value=0)
You man want to .reset_index() at the end if you don't want key as index
With grouping:
>>> pd.concat([source1, source2]).groupby('key', as_index=False).sum()
key value
0 a 1
1 b 5
2 c 0

translate dataframe with dictionary in python [duplicate]

This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 5 years ago.
Having the following pandas Dataframe sample:
df = pd.DataFrame([[1,2],[1,2],[3,5]])
df
0 1
0 1 2
1 1 2
2 3 5
And the following dictionary:
d = {1:'foo',2:'bar',3:'tar',4:'tartar',5:'foofoo'}
I would like to "translate" the dataframe by using the dictionary d. The output looks like:
result = pd.DataFrame([['foo','bar'],['foo','bar'],['tar','fofo']])
result
0 1
0 foo bar
1 foo bar
2 tar fofo
I would like to avoid using for loops. The solution I'm trying to find is something with map or similars...
Solution
Replacing whole dataframe:
result_1 = df.replace(d)
Replacing a specific column of a dataframe:
result_2 = df.replace({"COLUMN":d})

Categories