limit pandas .loc method output within a iloc range - python

I am looking for a maximum value within my pandas dataframe but only within certain index range:
df.loc[df['Score'] == df['Score'].iloc[430:440].max()]
This gives me a pandas.core.frame.DataFrame type output with multiple rows.
I specifically need the the index integer of the maximum value within iloc[430:440] and only the first index the maximum value occurs.
Is there anyway to limit the range of the .loc method?
Thank you

If you just want the index:
i = df['Score'].iloc[430:440].idxmax()
If you want to get the row as well:
df.loc[i]
If you want to get the first row in the entire dataframe with that value (rather than just within the range you specified originally):
df[df['Score'] == df['Score'].iloc[430:440].max()].iloc[0]

Related

How do i check for null values in dataframes that are beyond that which can be shown in the console? [duplicate]

I have been worried about how to find indices of all rows with null values in a particular column of a pandas dataframe in python. If A is one of the entries in df.columns then I need to find indices of each row with null values in A
Supposing you need the indices as a list, one option would be:
df[df['A'].isnull()].index.tolist()
np.where(df['column_name'].isnull())[0]
np.where(Series_object) returns the indices of True occurrences in the column. So, you will be getting the indices where isnull() returned True.
The [0] is needed because np.where returns a tuple and you need to access the first element of the tuple to get the array of indices.
Similarly, if you want to get the indices of all non-null values in the column, you can run
np.where(df['column_name'].isnull() == False)[0]

How can I assign a lists elements to corresponding rows of a dataframe in pandas?

I have numbers in a List that should get assigned to certain rows of a dataframe consecutively.
List=[2,5,7,12….]
In my dataframe that looks similar to the below table, I need to do the following:
A frame_index that starts with 1 gets the next element of List as “sequence_number”
Frame_Index==1 then assign first element of List as Sequence_number.
Frame_index == 1 again, so assign second element of List as Sequence_number.
So my goal is to achieve a new dataframe like this:
I don't know which functions to use. If this weren't python language, I would use a for loop and check where frame_index==1, but my dataset is large and I need a pythonic way to achieve the described solution. I appreciate any help.
EDIT: I tried the following to fill with my List values to use fillna with ffill afterwards:
concatenated_df['Sequence_number']=[List[i] for i in
concatenated_df.index if (concatenated_df['Frame_Index'] == 1).any()]
But of course I'm getting "list index out of range" error.
I think you could do that in two steps.
Add column and fill with your list where frame_index == 1.
Use df.fillna() with method="ffill" kwarg.
import pandas as pd
df = pd.DataFrame({"frame_index": [1,2,3,4,1,2]})
sequence = [2,5]
df.loc[df["frame_index"] == 1, "sequence_number"] = sequence
df.ffill(inplace=True) # alias for df.fillna(method="ffill")
This puts the sequence_number as float64, which might be acceptable in your use case, if you want it to be int64, then you can just force it when creating the column (line 4) or cast it later.

Extract a column value based on conditions applied to other columns in pandas

This is the dataframe:
Here I am trying to fetch the 'start_planting_date' based on unique 'crop' value having the maximum 'count'.
pandas loc[] query:
Example: I want to know the start_planting_date for the crop == Maize having the maximum count value, i.e., 1087 in this case.
Can there be a better/more optimized way of writing this query?
In case you want only the max value for each group, try using:
df.groupby('crop').max()
You can use .groupby() on crop and get the index of maximum count for each crop with GroupBy.idxmax() and use .loc to locate the entries of those max indexes, as follows:
df.loc[df.groupby('crop')['count'].idxmax()]

Select single minimum value of Pandas dataframe column instead multiple

I want get minimum value a year from a dataframe(df_greater_TDS) column('DTS38').
So I grouped by year-column and applied transform(min). However, as there are multiple minimum values, min function is returning multiple rows.
how to get only one value or here a single row?
idx = df_greater_TDS.groupby('year')['DTS38'].transform(min)==df_greater_TDS['DTS38']
df_TDS=df_greater_TDS[idx]

Comparing dataframe cell value to previous cell value

I am trying to iterate through each row of a pandas dataframe and compare a certain cell's value to the same cell in the previous row. I imagined this could be done using .shift(), but it is returning a series instead of the cell value.
I have seen some usage of groupby and iloc for accessing the value of a cell but not for iterative comparisons, and using some sort of incrementing counter method or manually storing the value of each cell and then comparing doesn't seem very efficient.
Here is what I imagined would work, but no joy.
for index, row in df.iterrows():
if row['apm'] > df['apm'].shift(1):
# do something
You can just create a new column (e.g. flag) to indicate whether or not the boolean check is true.
df = df.assign(flag=df['apm'].gt(df['apm'].shift()))
Then you could perform your action based on the value of this column.
df['apm'].shift(1)
returns a series with previous values for each row, except for None in the first row.
Thus, the code
df['apm'] > df['apm'].shift(1)
will return the series with boolen values, also True or False for each row.
Would that be enough for your task?

Categories