This question already has answers here:
How to get value counts for multiple columns at once in Pandas DataFrame?
(14 answers)
Closed 2 years ago.
hello I have the following dataset:
I want to count the frequency of each value occuring across the entire dataset. I am aware of the value_count() which is works only on columns but not for the entire dataset.
I used the following code:
df.value_counts()
But it results in an error:
AttributeError: 'DataFrame' object has no attribute 'value_counts'
Please could you help me count the frequency of values across the whole dataset?
You can use the stack function to stack all values in one column, and then use value_counts:
df.stack().value_counts()
Related
This question already has answers here:
GroupBy pandas DataFrame and select most common value
(13 answers)
Closed 4 months ago.
i can use mean and median with groupby with this line:
newdf.groupby('dropoff_site')['load_weight'].mean()
newdf.groupby('dropoff_site')['load_weight'].median()
But when i use it for mode like this:
newdf.groupby('dropoff_site')['load_weight'].mode()
An error popped up, saying:
'SeriesGroupBy' object has no attribute 'mode'
What should i do?
update:
from GroupBy pandas DataFrame and select most common value i used
source2.groupby(['Country','City'])['Short name'].agg(pd.Series.mode)
as
newdf.groupby(['dropoff_site'])['load_weight'].agg(pd.Series.mode)
because this has multimodal, but now the error goes:
Must produce aggregated value
Try this...
newdf.groupby('dropoff_site')['load_weight'].agg(pd.Series.mode)
This question already has answers here:
How can repetitive rows of data be collected in a single row in pandas?
(3 answers)
pandas group by and find first non null value for all columns
(3 answers)
Closed 7 months ago.
While using iterrows to implement the logic takes lot of time.Can some suggest a way on how I could optimize the code with vectorized/apply()
Below is the input table..From a partition of (ITEMSALE,ITEMID),I need to populate rows with rank=1 .If any column value is null in rank=1,I need to populate the next available value in that column.This has to be done for all columns in dataset.
Below is the output format expected
I have tried below logic using iterrows where am accessing values rowise.Performance is too low using this method.
This should get you what you need
df.loc[df.loc[df['Item_ID'].isna()].groupby('Item_Sale')['Date'].idxmin()]
This question already has answers here:
How do I expand the output display to see more columns of a Pandas DataFrame?
(22 answers)
Closed 1 year ago.
I am a complete novice when it comes to Python so this might be badly explained.
I have a pandas dataframe with 2485 entries for years from 1960-2020. I want to know how many entries there are for each year, which I can easily get with the .value_counts() method. My issue is that when I print this, the output only shows me the top 5 and bottom 5 entries, rather than the number for every year. Is there a way to display all the value counts for all the years in the DataFrame?
Use pd.set_options and set display.max_rows to None:
>>> pd.set_option("display.max_rows", None)
Now you can display all rows of your dataframe.
Options and settings
pandas.set_option
If suppose the name of dataframe is 'df' then use
counts = df.year.value_counts()
counts.to_csv('name.csv',index=false)
As our terminal can't display entire columns they just display the top and bottom by collapsing the remaining values so try saving in a csv and see the records
This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
I am fairly new to using Python and having come from using SQL I have been using PANDAS to build reports from CSV files with reasonable success. I have been able to answer most of questions thanks mainly to this site, but I dont seem to be able to find an answer to my question:
I have a dataframe which has 2 columns I want to be able to group on the first column and display the lowest and highest alphabetical values from the second column concatenated into a third column. I could do this fairly easy in SQL but as I say I am struggling getting my head around it in Python/Pandas
example:
source data:
LINK_NAME, CITY_NAME
Linka, Citya
Linka, Cityz
Linkb,Cityx
Linkb,Cityc
Desired output:
LINK_NAME,LINKID
Linka, CityaCityz
Linkb,CitycCityx
Edit:
Sorry for missing part of your question.
To sort the strings within each group alphabetically, you could define a function to apply to the grouped items:
def first_and_last_alpha(series):
sorted_series = series.sort_values()
return "".join([sorted_series.iloc[0], sorted_series.iloc[-1]])
df.groupby("LINK_NAME")["CITY_NAME"].apply(first_and_last_alpha)
Original:
Your question seems to be a duplicate of this one.
The same effect, with your data, is achieved by:
df.groupby("LINK_NAME")["CITY_NAME"].apply(lambda x: "".join(x))
where df is your pandas.Dataframe object
In future, it's good to provide a reproducible example, including anything you've attempted before posting. For example, the output from df.to_dict() would allow me to recreate your example data instantly.
This question already has answers here:
Pandas column creation
(3 answers)
Accessing Pandas column using squared brackets vs using a dot (like an attribute)
(5 answers)
pandas dataframe where clause with dot versus brackets column selection
(1 answer)
Closed 5 years ago.
I just thought I added a column to a pandas dataframe with
df.NewColumn = np.log1p(df.ExistingColumn)
but when I looked it wasn't there! No error was raised either.. I executed it many times not believing what I was seeing but yeah, it wasn't there. So then did this:
df['NewColumn'] = np.log1p(df.ExistingColumn)
and now the new column was there.
Does anyone know the reason for this confusing behaviour? I thought those two ways of operating on a column were equivalent..