This question already has answers here:
Finding count of distinct elements in DataFrame in each column
(8 answers)
Closed 5 years ago.
How to count a number of unique values in multiple columns in Python, pandas etc. I can do for one column using "nunique" function. I need something like:
print("Number of unique values in Var1", DF.var1.nunique(),sep="= ").
For all the variables in the dataset. Something like a loop or apply function maybe. I tried a lot of things failed to get what I desired.
Thanks for the help!
You want to print number of unique values per column, so use:
for k, v in df.nunique().to_dict().items():
print('{}={}'.format(k,v))
Related
This question already has answers here:
Python: get a frequency count based on two columns (variables) in pandas dataframe some row appears
(3 answers)
Closed last year.
I'm working on the following dataset:
and I want to count each value in the LearnCode column for each Age category, I've tried doing it using Groupby method but didn't manage to get it correctly, can anyone help on how to do it?
You can do this using a groupby on two columns
results = df.groupby(by=['Age', 'LearnCode']).count()
This outputs a count for each ['Age', 'LearnCode'] pair
This question already has answers here:
Pandas Groupby and Sum Only One Column
(3 answers)
Closed 2 years ago.
I have a dataframe in Panda with the number of cows stolen per Year :
stolen_each_year = data[['stolen cows','year_of_date' ]]
I would like to remove all Duplicate years and keep just one with the sum of all.
I have an idea with a python function but I am trying to use the panda at the maximum
Thank You for your time
EDIT : I tried with the .groupby method but it does not seem to work fine
You can groupby and then sum the stolen cows.
stolen_each_year.groupby("year_of_date")['stolen cows'].sum()
also... interesting dataset 🐮...
This question already has answers here:
Sorting columns and selecting top n rows in each group pandas dataframe
(3 answers)
Closed 3 months ago.
I have a pandas dataframe with following shape
open_year, open_month, type, col1, col2, ....
I'd like to find the top type in each (year,month) so I first find the count of each type in each (year,month)
freq_df = df.groupby(['open_year','open_month','type']).size().reset_index()
freq_df.columns = ['open_year','open_month','type','count']
Then I want to find the top n type based on their freq (e.g. count) for each (year_month). How can I do that?
I can use nlargest but I'm missing the type
freq_df.groupby(['open_year','open_month'])['count'].nlargest(5)
but I'm missing the column type
I'd recommend sorting your counts in descending order first, and you can call GroupBy.head after—
(freq_df.sort_values('count', ascending=False)
.groupby(['open_year','open_month'], sort=False).head(5)
)
This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 4 years ago.
I have a Pandas Dataframe and there are some columns that I want to keep in terms of location and others I do not.
I know that when selecting a specific value in a column to get a row I can use:
x = df[df['location']=='Aberdeen']
However, how can I do it for many locations without having to do them individually and then concatenate?
This is what Ive tried:
x = df[[df['location']=='Aberdeen' & 'Glasgow' & 'London']]
But I am receiving a:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
I know there must be a super simple solution to this but I haven't figured it out, please help.
This question already has answers here:
How to determine the length of lists in a pandas dataframe column
(2 answers)
Closed 5 years ago.
I currently have a dataframe that contains a list of floats within a column, and I want to add a second column to the df that counts the length of the list within the first column (the number of items within that list). What would be the easiest way to go about doing this and would I have to write a function that iterates over each item in the column?
This should work:
df['list_len'] = df['list_column'].str.len()