Selecting the row with the maximum value in a column in geopandas - python

I am using geopanda with python. I would like to select the row with the maximum value of column "pop".
I am trying the solution given here Find maximum value of a column and return the corresponding row values using Pandas :
city_join.loc[city_join['pop'].idxmax()]
However, it does not work. It says "TypeError: reduction operation 'argmax' not allowed for this dtype" . I think the reason is because that solution works for panda and not for geopandas. Am I right? How can I select the row with the maximun value of column "pop" in a geopanda dataframe?

check your type with print(df['columnName'].dtype) and make sure it is numeric (i.e. integer, float ...). if it returns just object then use df['columnName'].astype(float) instead
Try with - city_join.loc[city_join['pop'].astype(float).idxmax()] if pop column is object type
Or
You can convert the column to numeric first
city_join['pop'] = pd.to_numeric(city_join['pop'])
and run your code city_join.loc[city_join['pop'].idxmax()]

Related

single positional indexer is out-of-bounds: an error

why dataframe.iloc[1,5] return an error but dataframe.iloc[[1,5]] not?
dataframe.iloc[1,5] returns an error because it is trying to access a single scalar value at the intersection of row 1 and column 5 in the DataFrame.
On the other hand, dataframe.iloc[[1,5]] does not return an error because it is trying to access multiple rows (rows 1 and 5) in the DataFrame. In this case, the output is a new DataFrame containing those specific rows.
The .iloc attribute is used to access a specific location in the DataFrame by index position. When passing a single integer value to .iloc, it will treat it as the index of the row, and when passing a list of integers, it will treat it as the indices of the rows.
So in short, when you use dataframe.iloc[1,5] it's trying to access a single value at a specific location, which doesn't exist in the dataframe. But when using dataframe.iloc[[1,5]] you're accessing the rows with those indexes, which exist in the dataframe.

Do not convert numerical column names to float in pandas read_excel

I have an Excel file where column name might be a number, i.e. 2839238. I am reading it using pd.read_excel(bytes(filedata), engine='openpyxl') and, for some reason, this column name gets converted to a float 2839238.0. How to disable this conversion?
This is an issue for me because I then operate on column names using string-only methods like df = df.loc[:, ~df.columns.str.contains('^Unnamed')], and it gives me the following error:
TypeError: bad operand type for unary ~: 'float'
Column names are arbitrary.
try to change the type of the columns.
df['col'] = df['col'].astype(int)
the number you gave in the example shows that maybe you have a big numbers so python can't handle the big numbers as int but it can handle it like a float or double, check the ranges of the data types in python and compare it to your data and see which one you can use
Verify that you don't have any duplicate column names. Pandas will add .0 or .1 if there is another instance of 2839238 as a header name.
See description of mangle_dupe_colsbool
which says:
Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.

create an int column after dividing a column by a number in pandas

Assume that I have a panda data frame with a column that holds seconds.
I want to create a new column that holds minutes. so I divide the sec column by 60. The problem that I have is that min column is not an integer anymore. How can I make it an integer?
I have this code:
alldata['min']=alldata['sec']/60
I tried this, but it did not work:
alldata['min']=int(alldata['sec']/60)
I am getting this error:
TypeError: cannot convert the series to <class 'int'>
I tried this:
alldata["min"] = pd.to_numeric(alldata["min"], downcast='integer')
but I still have float values in min column.
How can have integer values for min? (Just drop the value to its floor value)
You should just use whole integer division:
alldata['min']=alldata['sec']//60
Try
alldata['min'] = alldata['min'].astype('int64')
or .astype('int32') Based on your operating system
You can use .astype() to convert dataframe columns to different types. In your case use the following line:
alldata['min']= (alldata['sec']/60).astype("int")
DocPage

Pandas: Find string in a column and replace them with numbers with incrementing values

I am working on a dataframe with where I have multiple columns and in one of the columns where there are many rows approx more than 1000 rows which contains the string values. Kindly check the below table for more details:
In the above image I want to change the string values in the column Group_Number to number by picking the values from the first column (MasterGroup) and increment by one (01) and want values to be like below:
Also need to verify that if the String is duplicating then instead of giving a new number it replaces with already changed number. For example in the above image ANAYSIM is duplicating and instead of giving a new sequence number I want already given number to repeating string.
Have checked different links but they are focusing on giving values from user:
Pandas DataFrame: replace all values in a column, based on condition
Change one value based on another value in pandas
Conditional Replace Pandas
Any help with achieving the desired outcome is highly appreciated.
We could do cumcount with groupby
s=(df.groupby('MasterGroup').cumcount()+1).mul(10).astype(str)
t=pd.to_datetime(df.Group_number, errors='coerce')
Then we assign
df.loc[t.isnull(), 'Group_number']=df.MasterGroup.astype(str)+s

Pandas Groupby First Value in Column

Is there a way to get the first or last value in a particular of a group in a pandas dataframe after performing a particular groupby ?
For example, I want to get the first value in column_z but this does not work :
df.groupby(by=['A', 'B']).agg({'x':np.sum, 'y':np.max, 'datetime':'count', 'column_z':first()})
The point of getting the first and last value in the group is I would like to eventually get the difference between the two.
I know there is this function: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group
But i don't know how to use it with my use case, getting the first value in a particular column after grouping.

Categories