How is the grouping done in groupby [duplicate]

How is the grouping done in groupby [duplicate] - python

This question already has an answer here:
How is pandas groupby method actually working?
(1 answer)
Closed 1 year ago.
df=pd.DataFrame({'key':['A','B','C','A','B','C'],
'data1':range(6), 'data2': rng.randint(0,10,6)}, columns=['key','data1','data2'])
l=[0,1,0,1,2,0]
df.groupby(l).sum()

So the output of your code is
https://ibb.co/kgXSwvL
We have three distinct values 0,1,2 in the list l.
If you see carefully it's the sum of values in data1 column on the same index where 0 is encountered.
For Example, in the given list l, 0 is at 0th, 2nd and 5th position where data1 has values 0, 2, 5 which sums up to 7 and data2 has values 3, 5, 2 which sums up to 10.
and similarly for value 1, in the list l, 1 is at index 1 and 3, hence the value corresponding to data1 in the dataframe df for the label 1 and 3 is 1 and 3 respectively which sums up to 4.
Similarly for value 3 in the list the answer can be verified.

Related

How to find last occurrence of value meeting condition in column in python

I have the following dataframe:
df = pd.DataFrame({"A":['a','b','c','d','e','f','g','h','i','j','k'],
"B":[1,3,4,5,6,7,6,5,8,5,5]})
df
displayed as:
A B
0 a 1
1 b 3
2 c 4
3 d 5
4 e 6
5 f 7
6 g 6
7 h 5
8 i 8
9 j 5
10 k 5
I first want to find the letter in column "A" that corresponds to the first occurrence of a value in column "B" that is >= 6. Looking at this, we see that this would be row index 4, corresponding to a value of 6 and "e" in column "A".
I can identify the column "A" value we just got with this code:
#Find first occurrence >= threshold
threshold = 6
array = df.values
array[np.where(array[:,1] >= threshold)][0,0]
This code returns 'e', which is what I want.
This code is referenced from this Stack Overflow source: Python find first occurrence in Pandas dataframe column 2 below threshold and return column 1 value same row using NumPy
What I am having trouble figuring out is how to modify this code to find the last occurrence meeting my criteria of being >= the threshold of 6. And so looking at my code above, I want to produce 'i', because looking at the above data frame, the row containing "i" in column "A" correspond to a value of 8 in column "B", which is the last occurrence of a value >= the threshold of 6. I want to preserve the order of the rows as alphabetical referencing column "A". I am guessing this might have to do with somehow modifying the indexing in my code, specifically the array[:,1] component or the [0,0] component, but I am not sure how to specifically call for the last occurrence meeting my criteria. How can I modify my code to find the value in column "A" corresponding to the last occurrence of a value >= the threshold of 6 in column "B"?

To get the first occurrence, You can use idxmax:
df.loc[df['B'].ge(6).idxmax()]
output:
A e
B 6
Name: 4, dtype: object
For just the value in 'A':
df.loc[df['B'].ge(6).idxmax(), 'A']
output: 'e'
For the last, do the same on the reversed Series:
df.loc[df.loc[::-1,'B'].ge(6).idxmax()]
output:
A k
B 8
Name: 10, dtype: object
df.loc[df.loc[::-1, 'B'].ge(6).idxmax(), 'A']
output: 'k'

here is one way to do it
search for the rows meeting your criteria and then get the values from the bottom of the resultset
df.loc[df['B'] >=6][-1:]
in text dataframe
A B
8 i 8
in dataframe code
A B
10 k 8

How to find elements that are in first pandas Data frame and not in second, and viceversa. python [duplicate]

This question already has answers here:
set difference for pandas
(12 answers)
Closed 11 months ago.
I have two data frames.
first_dataframe
id
9
8
6
5
7
4
second_dataframe
id
6
4
1
5
2
3
Note: My dataframe has many columns, but I need to compare only based on ID |
I need to find:
ids that are in first dataframe and not in second [1,2,3]
ids that are in second dataframe and not in first [7,8,9]
I have searched for an answer, but all solutions that I've found doesn't seem to work for me, because they look for changes based on index.

Use set subtraction:
inDF1_notinDF2 = set(df1['id']) - set(df2['id']) # Removes all items that are in df2 from df1
inDF2_notinDF1 = set(df2['id']) - set(df1['id']) # Removes all items that are in df1 from df2
Output:
>>> inDF1_notinDF2
{7, 8, 9}
>>> inDF2_notinDF1
{1, 2, 3}

Getting highest value out of a dataframe with value_counts()

I want to print out the highest, not unique value out of my dataframe.
With df['Value'].value_counts() i can count them, but how do i selected them by how often the numbers appear.
Value
1
2
1
2
3
2

As I understand you want the first highest value that has a frequency greater than 1. In this case you can write,
for val, cnt in df['Value'].value_counts().sort_index(ascending=False).iteritems():
if cnt > 1:
print(val)
break
The sort_index sorts the items by the 'Value' rather than the frequencies. For example if your 'Value' columns has values [1, 2, 3, 3, 2, 2,2, 1, 3, 2] then the result of df['Value'].value_counts().sort_index(ascending=False).iteritems() will be as follows,
3 3
2 5
1 2
Name: Value, dtype: int64
The answer in this example would then be 3 since it is the first highest value with frequency greater than 1.

Add new column based on odd/even condition [duplicate]

This question already has answers here:
How to conditionally update DataFrame column in Pandas based on list
(2 answers)
Check if a number is odd or even in Python [duplicate]
(6 answers)
Closed 3 years ago.
Let's say I have the dataframe below:
my_df = pd.DataFrame({'A': [1, 2, 3]})
my_df
A
0 1
1 2
2 3
I want to add a column B with values X if the corresponding number in A is odd, otherwise Y. I would like to do it in this way if possible:
my_df['B'] = np.where(my_df['A'] IS ODD, 'X', 'Y')
I don't know how to check if the value is odd.

You were so close!
my_df['b'] = np.where(my_df['A'] % 2 != 0, 'X', 'Y')
value % 2 != 0 will check if a number is odd. Where are value % 2 == 0 will check for evens.
Output:
A b
0 1 X
1 2 Y
2 3 X

how to subset by fixed column and row by boolean in pandas? [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 3 years ago.
I am coming from R background. I need elementary with pandas.
if I have a dataframe like this
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
I want to subset dataframe to select a fixed column and select a row by a boolean.
For example
df.iloc[df.2 > 4][2]
then I want to set the value for the subset cell to equal a value.
something like
df.iloc[df.2 > 4][2] = 7
It seems valid for me however it seem pandas work with booleans in more strict way than R

In here it is .loc
df.loc[df[2] > 4,2]
1 6
Name: 2, dtype: int64
df.loc[df[2] > 4,2]=7
df
0 1 2
0 1 2 3
1 4 5 7

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How is the grouping done in groupby [duplicate] - python

This question already has an answer here: How is pandas groupby method actually working? (1 answer) Closed 1 year ago. df=pd.DataFrame({'key':['A','B','C','A','B','C'], 'data1':range(6), 'data2': rng.randint(0,10,6)}, columns=['key','data1','data2']) l=[0,1,0,1,2,0] df.groupby(l).sum()

Related

How to find last occurrence of value meeting condition in column in python

How to find elements that are in first pandas Data frame and not in second, and viceversa. python [duplicate]

Getting highest value out of a dataframe with value_counts()

Add new column based on odd/even condition [duplicate]

how to subset by fixed column and row by boolean in pandas? [duplicate]

Categories

Resources