fastest way to change multiple loc in a dataframe

fastest way to change multiple loc in a dataframe - python

I have a pandas dataframe with 1 million rows. I want to replace values in 900,000 rows in a column by another set of values. Is there fast way to do this without a for loop (which takes me two days to complete)?
For example, look at this sample dataframe where I have condensed 1 million rows to 8 rows
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['a'] = [-1,-3,-4,-4,-3, 4,5,6]
df['b'] = [23,45,67,89,0,-1, 2, 3]
L2 = [-1,-3,-4]
L5 = [9,10,11]
I want to replace values where a is -1, -3, -4 in a single shot if possible or as fast as possible without a for loop.
The crucial part is that values in L5 have to be repeated as needed.
I have tried
df.loc[df.a < 0, 'a'] = L5
but this works only when len(df.a.values) == len(L5)

Use map by dictionary created from both lists by zip, last replace to original non matched values by fillna:
d = dict(zip(L2, L5))
print (d)
{-1: 9, -3: 10, -4: 11}
df['a'] = df['a'].map(d).fillna(df['a'])
print (df)
a b
0 9.0 23
1 10.0 45
2 11.0 67
3 11.0 89
4 10.0 0
5 4.0 -1
6 5.0 2
7 6.0 3
Performance:
It depends of number of values for replace anf of lenght of lists:
Length of lists is 100:
np.random.seed(123)
N = 1000000
df = pd.DataFrame({'a':np.random.randint(1000, size=N)})
L2 = np.arange(100)
L5 = np.arange(100) + 10
In [336]: %timeit df['d'] = np.select([df['a'] == i for i in L2], L5, df['a'])
180 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [337]: %timeit df['a'].map(dict(zip(L2, L5))).fillna(df['a'])
56.9 ms ± 2.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
If length of lists is small (e.g. 3):
np.random.seed(123)
N = 1000000
df = pd.DataFrame({'a':np.random.randint(100, size=N)})
L2 = np.arange(3)
L5 = np.arange(3) + 10
In [339]: %timeit df['d'] = np.select([df['a'] == i for i in L2], L5, df['a'])
11.9 ms ± 40.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [340]: %timeit df['a'].map(dict(zip(L2, L5))).fillna(df['a'])
54 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

you can use np.select such as:
import numpy as np
condition = [df['a'] == i for i in L2]
df['a'] = np.select(condition, L5, df['a'])
and you get:
a b
0 9 23
1 10 45
2 11 67
3 11 89
4 10 0
5 4 -1
6 5 2
7 6 3
Timing: let's create a bigger dataframe such as with your df:
df_l = pd.concat([df]*10000)
print (df_l.shape)
(80000, 2)
Now some timeit:
# with map, #jezrael
d = dict(zip(L2, L5))
%timeit df_l['a'].map(d).fillna(df_l['a'])
100 loops, best of 3: 7.71 ms per loop
# with np.select
condition = [df_l['a'] == i for i in L2]
%timeit np.select(condition, L5, df_l['a'])
1000 loops, best of 3: 350 µs per loop

Related

Modify single value in pandas DataFrame with integer row and label column

I want to modify a single value in a DataFrame. The typical suggestion for doing this is to use df.at[] and reference the position as the index label and the column label, or to use df.iat[] and reference the position as the integer row and the integer column. But I want to reference the position as the integer row and the column label.
Assume this DataFrame:
dateindex
apples
oranges
bananas
2021-01-01 14:00:01.384624
1
X
3
2021-01-05 13:43:26.203773
4
5
6
2021-01-31 08:23:29.837238
7
8
9
2021-02-08 10:23:09.095632
0
1
2
data = [{'apples':1, 'oranges':'X', 'bananas':3},
{'apples':4, 'oranges':5, 'bananas':6},
{'apples':7, 'oranges':8, 'bananas':9},
{'apples':0, 'oranges':1, 'bananas':2}]
indexes = [pd.to_datetime('2021-01-01 14:00:01.384624'),
pd.to_datetime('2021-01-05 13:43:26.203773'),
pd.to_datetime('2021-01-31 08:23:29.837238'),
pd.to_datetime('2021-02-08 10:23:09.095632')]
idx = pd.Index(indexes, name='dateindex')
df = pd.DataFrame(data, index=idx)
I want to change the value "X" to "2". I don't know the exact time; I just know that it's the first row. But I do know that I want to change the "oranges" column.
I want to do something like df.at[0,'oranges'], but I can't do that; I get a KeyError.
The best thing that I can figure out is to do df.at[df.index[0],'oranges'], but that seems so awkward when they've gone out of their way to provide both by-label and by-integer-offset interfaces. Is that the best thing?

Wrt
The best thing that I can figure out is to do df.at[df.index[0],'oranges'], but that seems so awkward when they've gone out of their way to provide both by-label and by-integer-offset interfaces. Is that the best thing?
Yes, it is. And I agree, it is awkward. The old .ix used to support these mixed indexing cases better but its behaviour depended on the dtype of the axis, making it inconsistent. In the meanwhile...
The other options, which have been used in the other answers, can all issue the SettingWithCopy warning. It's not guaranteed to raise the issue but it might, based on what the indexing criteria are and how values are assigned.
Referencing Combining positional and label-based indexing and starting with this df, which has dateindex as the index:
apples oranges bananas
dateindex
2021-01-01 14:00:01.384624 1 X 3
2021-01-05 13:43:26.203773 4 5 6
2021-01-31 08:23:29.837238 7 8 9
2021-02-08 10:23:09.095632 0 1 2
Using both options:
with .loc or .at:
df.at[df.index[0], 'oranges'] = -50
apples oranges bananas
dateindex
2021-01-01 14:00:01.384624 1 -50 3
2021-01-05 13:43:26.203773 4 5 6
2021-01-31 08:23:29.837238 7 8 9
2021-02-08 10:23:09.095632 0 1 2
with .iloc or .iat:
df.iat[0, df.columns.get_loc('oranges')] = -20
apples oranges bananas
dateindex
2021-01-01 14:00:01.384624 1 -20 3
2021-01-05 13:43:26.203773 4 5 6
2021-01-31 08:23:29.837238 7 8 9
2021-02-08 10:23:09.095632 0 1 2
FWIW, I find approach #1 more consistent since it can handle multiple row indexes without changing the functions/methods used: df.loc[df.index[[0, 2]], 'oranges'] but approach #2 needs a different column indexer when there are multiple columns: df.iloc[[0, 2], df.columns.get_indexer(['oranges', 'bananas'])].

Solution with Series.iat
If it doesn't seem more awkward to you, you can use the iat method of pandas Series:
df["oranges"].iat[0] = 2
Time performance comparison with other methods
As this method doesn't raise any warning, it can be interesting to compare its time performance with other proposed solutions.
%%timeit
df.at[df.index[0], 'oranges'] = 2
# > 9.91 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
df.iat[0, df.columns.get_loc('oranges')] = 2
# > 13.5 µs ± 74.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
df["oranges"].iat[0] = 2
# > 3.49 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The pandas.Series.iat method seems to be the most performant one (I took the median of three runs).
Let's try again with huge DataFrames
With a DatetimeIndex
# Generating random data
df_large = pd.DataFrame(np.random.randint(0, 50, (100000, 100000)))
df_large.columns = ["col_{}".format(i) for i in range(100000)]
df_large.index = pd.date_range(start=0, periods=100000)
# 2070-01-01 to 2243-10-16, a bit unrealistic
%%timeit
df_large.at[df_large.index[55555], 'col_55555'] = -2
# > 10.1 µs ± 85.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
df_large.iat[55555, df_large.columns.get_loc('col_55555')] = -2
# > 13.2 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
df_large["col_55555"].iat[55555] = -2
# > 3.31 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
With a RangeIndex
# Generating random data
df_large = pd.DataFrame(np.random.randint(0, 50, (100000, 100000)))
df_large.columns = ["col_{}".format(i) for i in range(100000)]
%%timeit
df_large.at[df_large.index[55555], 'col_55555'] = 2
# > 4.5 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
df_large.iat[55555, df_large.columns.get_loc('col_55555')] = 2
# > 13.5 µs ± 50.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
df_large["col_55555"].iat[55555] = 2
# > 3.49 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Since it is a simple indexing with O(n) complexity, the size of the array doesn't change much the results, except when it comes to the "at + index" ; strangely enough, it shows worst performance with small dataframes. Thanks to the author wfaulk for spotting that using a RangeIndex decreases the access time of the "at + index" method. The time performance remains higher and constant when dealing with DatetimeIndex with pd.Series.iat.

You were actually quite close with your initial guess.
You would do it like this:
import pandas as pd
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
{'a': 100, 'b': 200, 'c': 300, 'd': 400},
{'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
df = pd.DataFrame(mydict)
print(df)
# change th value of column a, row 2
df['a'][2] = 100
# print column a, row 2
print(df['a'][2])
There are lots of different variants such as loc and iloc, but this is one good method.
In the example we discovered that loc was optimal as df[][] throws an error:
import pandas as pd
data = [{'apples':1, 'oranges':'X', 'bananas':3},
{'apples':4, 'oranges':5, 'bananas':6},
{'apples':7, 'oranges':8, 'bananas':9},
{'apples':0, 'oranges':1, 'bananas':2}]
indexes = [pd.to_datetime('2021-01-01 14:00:01.384624'),
pd.to_datetime('2021-01-05 13:43:26.203773'),
pd.to_datetime('2021-01-31 08:23:29.837238'),
pd.to_datetime('2021-02-08 10:23:09.095632')]
idx = pd.Index(indexes, name='dateindex')
df = pd.DataFrame(data, index=idx)
print(df)
df.loc['2021-01-01 14:00:01.384624','oranges'] = 10
# df['oranges'][0] = 10
print(df)
This works.

You can use the loc method. It receives the row and column you want to change.
Changing X to 2: df.loc[0, 'oranges'] = 2
See: pandas.DataFrame.loc

Fastest ways to filter for values in pandas dataframe

A similar dataframe can be created:
import pandas as pd
df = pd.DataFrame()
df["nodes"] = list(range(1, 11))
df["x"] = [1,4,9,12,27,87,99,121,156,234]
df["y"] = [3,5,6,1,8,9,2,1,0,-1]
df["z"] = [2,3,4,2,1,5,9,99,78,1]
df.set_index("nodes", inplace=True)
So the dataframe looks like this:
x y z
nodes
1 1 3 2
2 4 5 3
3 9 6 4
4 12 1 2
5 27 8 1
6 87 9 5
7 99 2 9
8 121 1 99
9 156 0 78
10 234 -1 1
My first try for searching e.g. all nodes containing number 1 is:
>>> df[(df == 1).any(axis=1)].index.values
[1 4 5 8 10]
As i have to do this for many numbers and my real dataframe is much bigger than this one, i'm searching for a very fast way to do this.

Just tried something that may be enlightening
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10000, 4)), columns=list('ABCD'))
df.set_index("A", inplace=True)
df_no_index = df.reset_index()
So set up a dataframe with ints right the way through. This is not the same as yours but it will suffice.
Then I ran four tests
%timeit df[(df == 1).any(axis=1)].index.values
%timeit df[(df['B'] == 1) | (df['C']==1)| (df['D']==1)].index.values
%timeit df_no_index[(df_no_index == 1).any(axis=1)].A.values
%timeit df_no_index[(df_no_index['B'] == 1) | (df_no_index['C']==1)| (df_no_index['D']==1)].A.values
The results I got were,
940 µs ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.47 ms ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.08 ms ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.55 ms ± 51.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Which showed that the initial method that you took, with index seems to be the fastest of these approaches. Removing the index does not improve the speed with a moderately sized dataframe

Pandas DataFrame: copy the contents of a column if it is empty

I have the following DataFrame with named columns and index:
'a' 'a*' 'b' 'b*'
1 5 NaN 9 NaN
2 NaN 3 3 NaN
3 4 NaN 1 NaN
4 NaN 9 NaN 7
The data source has caused some column headings to be copied slightly differently. For example, as above, some column headings are a string and some are the same string with an additional '*' character.
I want to copy any values (which are not null) from a* and b* columns to a and b, respectively.
Is there an efficient way to do such an operation?

Use np.where
df['a']= np.where(df['a'].isnull(), df['a*'], df['a'])
df['b']= np.where(df['b'].isnull(), df['b*'], df['b'])
Output:
a a* b b*
0 5.0 NaN 9.0 NaN
1 3.0 3.0 3.0 NaN
2 4.0 NaN 1.0 NaN
3 9.0 9.0 7.0 7.0

Using fillna() is a lot slower than np.where but has the advantage of being pandas only. If you want a faster method and keep it pandas pure, you can use combine_first() which according to the documentation is used to:
Combine Series values, choosing the calling Series’s values first. Result index will be the union of the two indexes
Translation: this is a method designed to do exactly what is asked in the question.
How do I use it?
df['a'].combine_first(df['a*'])
Performance:
df = pd.DataFrame({'A': [0, None, 1, 2, 3, None] * 10000, 'A*': [4, 4, 5, 6, 7, 8] * 10000})
def using_fillna(df):
return df['A'].fillna(df['A*'])
def using_combine_first(df):
return df['A'].combine_first(df['A*'])
def using_np_where(df):
return np.where(df['A'].isnull(), df['A*'], df['A'])
def using_np_where_numpy(df):
return np.where(np.isnan(df['A'].values), df['A*'].values, df['A'].values)
%timeit -n 100 using_fillna(df)
%timeit -n 100 using_combine_first(df)
%timeit -n 100 using_np_where(df)
%timeit -n 100 using_np_where_numpy(df)
1.34 ms ± 71.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
281 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
257 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
166 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For better performance is possible use numpy.isnan and convert Series to numpy arrays by values:
df['a'] = np.where(np.isnan(df['a'].values), df['a*'].values, df['a'].values)
df['b'] = np.where(np.isnan(df['b'].values), df['b*'].values, df['a'].values)
Another general solution if exist only pairs with/without * in columns of DataFrame and is necessary remove * columns:
First create MultiIndex by split with append *val:
df.columns = (df.columns + '*val').str.split('*', expand=True, n=1)
And then select by DataFrame.xs for DataFrames, so DataFrame.fillna working very nice:
df = df.xs('*val', axis=1, level=1).fillna(df.xs('val', axis=1, level=1))
print (df)
a b
1 5.0 9.0
2 3.0 3.0
3 4.0 1.0
4 9.0 7.0
Performance: (depends of number of missing values and length of DataFrame)
df = pd.DataFrame({'A': [0, np.nan, 1, 2, 3, np.nan] * 10000,
'A*': [4, 4, 5, 6, 7, 8] * 10000})
def using_fillna(df):
df['A'] = df['A'].fillna(df['A*'])
return df
def using_np_where(df):
df['B'] = np.where(df['A'].isnull(), df['A*'], df['A'])
return df
def using_np_where_numpy(df):
df['C'] = np.where(np.isnan(df['A'].values), df['A*'].values, df['A'].values)
return df
def using_combine_first(df):
df['D'] = df['A'].combine_first(df['A*'])
return df
%timeit -n 100 using_fillna(df)
%timeit -n 100 using_np_where(df)
%timeit -n 100 using_combine_first(df)
%timeit -n 100 using_np_where_numpy(df)
1.15 ms ± 89.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
533 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
591 µs ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
423 µs ± 21.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How to map new variable in pandas in effective way

Here's my data
Id Amount
1 6
2 2
3 0
4 6
What I need, is to map : if Amount is more than 3 , Map is 1. But,if Amount is less than 3, Map is 0
Id Amount Map
1 6 1
2 2 0
3 0 0
4 5 1
What I did
a = df[['Id','Amount']]
a = a[a['Amount'] >= 3]
a['Map'] = 1
a = a[['Id', 'Map']]
df= df.merge(a, on='Id', how='left')
df['Amount'].fillna(0)
It works, but not highly configurable and not effective.

Convert boolean mask to integer:
#for better performance convert to numpy array
df['Map'] = (df['Amount'].values >= 3).astype(int)
#pure pandas solution
df['Map'] = (df['Amount'] >= 3).astype(int)
print (df)
Id Amount Map
0 1 6 1
1 2 2 0
2 3 0 0
3 4 6 1
Performance:
#[400000 rows x 3 columns]
df = pd.concat([df] * 100000, ignore_index=True)
In [133]: %timeit df['Map'] = (df['Amount'].values >= 3).astype(int)
2.44 ms ± 97.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [134]: %timeit df['Map'] = (df['Amount'] >= 3).astype(int)
2.6 ms ± 66.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Get Column and Row Index for Highest Value in Dataframe Pandas

I'd like to know if there's a way to find the location (column and row index) of the highest value in a dataframe. So if for example my dataframe looks like this:
A B C D E
0 100 9 1 12 6
1 80 10 67 15 91
2 20 67 1 56 23
3 12 51 5 10 58
4 73 28 72 25 1
How do I get a result that looks like this: [0, 'A'] using Pandas?

Use np.argmax
NumPy's argmaxcan be helpful:
>>> df.stack().index[np.argmax(df.values)]
(0, 'A')
In steps
df.values is a two-dimensional NumPy array:
>>> df.values
array([[100, 9, 1, 12, 6],
[ 80, 10, 67, 15, 91],
[ 20, 67, 1, 56, 23],
[ 12, 51, 5, 10, 58],
[ 73, 28, 72, 25, 1]])
argmax gives you the index for the maximum value for the "flattened" array:
>>> np.argmax(df.values)
0
Now, you can use this index to find the row-column location on the stacked dataframe:
>>> df.stack().index[0]
(0, 'A')
Fast Alternative
If you need it fast, do as few steps as possible.
Working only on the NumPy array to find the indices np.argmax seems best:
v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]
Result:
[0, 'A']
Timings
Timing works best for lareg data frames:
df = pd.DataFrame(data=np.arange(int(1e6)).reshape(-1,5), columns=list('ABCDE'))
Sorted slowest to fastest:
Mask:
%timeit df.mask(~(df==df.max().max())).stack().index.tolist()
33.4 ms ± 982 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Stack-idmax
%timeit list(df.stack().idxmax())
17.1 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Stack-argmax
%timeit df.stack().index[np.argmax(df.values)]
14.8 ms ± 392 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Where
%%timeit
i,j = np.where(df.values == df.values.max())
list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))
4.45 ms ± 84.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Argmax-unravel_index
%%timeit
v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]
499 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Compare
d = {'name': ['Mask', 'Stack-idmax', 'Stack-argmax', 'Where', 'Argmax-unravel_index'],
'time': [33.4, 17.1, 14.8, 4.45, 499],
'unit': ['ms', 'ms', 'ms', 'ms', 'µs']}
timings = pd.DataFrame(d)
timings['seconds'] = timings.time * timings.unit.map({'ms': 1e-3, 'µs': 1e-6})
timings['factor slower'] = timings.seconds / timings.seconds.min()
timings.sort_values('factor slower')
Output:
name time unit seconds factor slower
4 Argmax-unravel_index 499.00 µs 0.000499 1.000000
3 Where 4.45 ms 0.004450 8.917836
2 Stack-argmax 14.80 ms 0.014800 29.659319
1 Stack-idmax 17.10 ms 0.017100 34.268537
0 Mask 33.40 ms 0.033400 66.933868
So the "Argmax-unravel_index" version seems to be one to nearly two orders of magnitude faster for large data frames, i.e. where often speeds matters most.

Use stack for Series with MultiIndex and idxmax for index of max value:
print (df.stack().idxmax())
(0, 'A')
print (list(df.stack().idxmax()))
[0, 'A']
Detail:
print (df.stack())
0 A 100
B 9
C 1
D 12
E 6
1 A 80
B 10
C 67
D 15
E 91
2 A 20
B 67
C 1
D 56
E 23
3 A 12
B 51
C 5
D 10
E 58
4 A 73
B 28
C 72
D 25
E 1
dtype: int64

mask + max
df.mask(~(df==df.max().max())).stack().index.tolist()
Out[17]: [(0, 'A')]

This should work:
def max_df(df):
m = None
p = None
for idx, item in enumerate(df.idxmax()):
c = df.columns[item]
val = df[c][idx]
if m is None or val > m:
m = val
p = idx, c
return p
This uses the idxmax function, then compares all of the values returned by it.
Example usage:
>>> df
A B
0 100 9
1 90 8
>>> max_df(df)
(0, 'A')
Here's a one-liner (for fun):
def max_df2(df):
return max((df[df.columns[item]][idx], idx, df.columns[item]) for idx, item in enumerate(df.idxmax()))[1:]

In my opinion for larger datasets, stack() becomes inefficient, let's use np.where to return index positions:
i,j = np.where(df.values == df.values.max())
list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))
Output:
[0, 'A']
Timings for larger datafames:
df = pd.DataFrame(data=np.arange(10000).reshape(-1,5), columns=list('ABCDE'))
np.where method
> %%timeit i,j = np.where(df.values == df.values.max())
> list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))
1000 loops, best of 3: 364 µs per loop
Other stack methods
> %timeit df.mask(~(df==df.max().max())).stack().index.tolist()
100 loops, best of 3: 7.68 ms per loop
> %timeit df.stack().index[np.argmax(df.values)`]
10 loops, best of 3: 50.5 ms per loop
> %timeit list(df.stack().idxmax())
1000 loops, best of 3: 1.58 ms per loop
Even larger dataframe:
df = pd.DataFrame(data=np.arange(100000).reshape(-1,5), columns=list('ABCDE'))
Respectively:
1000 loops, best of 3: 1.62 ms per loop
10 loops, best of 3: 18.2 ms per loop
100 loops, best of 3: 5.69 ms per loop
100 loops, best of 3: 6.64 ms per loop

simple, fast, one liner:
loc = [df.max(axis=1).idxmax(), df.max().idxmax()]
(For large data frames, .stack() can be quite slow.)

print('Max value:', df.stack().max())
print('Parameters :', df.stack().idxmax())
This is the best way imho.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

fastest way to change multiple loc in a dataframe - python

Related

Modify single value in pandas DataFrame with integer row and label column

Fastest ways to filter for values in pandas dataframe

Pandas DataFrame: copy the contents of a column if it is empty

How to map new variable in pandas in effective way

Get Column and Row Index for Highest Value in Dataframe Pandas

Categories

Resources