This question already has answers here:
Find the column name which has the maximum value for each row
(5 answers)
Closed 4 years ago.
I am stuck in caluculating the highest of the 3 columns and respctive category as per the dataset:
<Dataset Image>
I want to calculate the max confidence category with confidence level and create a dataset with 4 columns like:
TC_Name Failure MaxErrCategory MaxConfidence
I have tried capturing the max confidence level for each row but unable to figure out the category:
max_conf=data.max(axis=1)
Kindly help..
The idxmax method of the dataframe will give you, for each row, the name of the column where the first occurrence of the maximum has been found.
data.idxmax(axis=1)
As you have strings in some of your columns, you should first select the columns on which you want to compute the max:
data[ ["confidence1", "confidence2", "confidence3"] ].idxmax(axis=1)
Related
This question already has answers here:
seaborn heatmap color scheme based on row values
(1 answer)
Coloring Cells in Pandas
(4 answers)
Closed 5 months ago.
I have a table (which is currently produced in Excel), where a row wise comparison is made and each cell in a row is ranked from lowest to highest. The best score is strong green, the second best score is less green and the worst score is red. For cells with an equal score, the color of the cells will also be similar and based on their shared rank.
For some rows, the ranking is based on a ascending score, while some rows have a descending ranking.
How can this be done using Python? Do you know any modules that are capable of doing something similar? I've used Seaborn for other heatmaps, but none of them were based on a row wise comparison.
Any ideas?
The colors are not important. I just want to know how to rank the cells of each row compared to each other.
Use background_gradient. The RdYlGn map sound like it matches your description. There won't be a 100% reproduction of Excel's color map.
df.style.background_gradient("RdYlGn", axis=1)
List of color maps: https://matplotlib.org/stable/tutorials/colors/colormaps.html
This question already has answers here:
How to select rows in a DataFrame between two values, in Python Pandas?
(7 answers)
Closed 1 year ago.
Example df
How do I create a histogram that in this case only uses the range of 2–5 points, instead of the entire points data range of 1–6?
I'm trying to only display the average data spread, not the extreme areas. Is there maybe a function to zoom in to the significant ranges? And is there a smart way to describe those?
For your specific data, you can first filter your DataFrame, then call .hist(). Note that Series.between(left, right) includes both the left and right values:
df[df['points'].between(2, 5)].hist()
This question already has answers here:
Pandas how to use pd.cut()
(5 answers)
Closed 6 months ago.
I am using Pandas cut to bin certain values in ranges according to a column. I am using user defined bins i.e the ranges are being passed as array.
df['Range'] = pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000])
However the values I have are ranging till 100000. This restricts the values to 2000 as an upper limit, and I am losing values greater than 2000. I want to keep an interal for greater than 2000. Is there any way to do this?
Let's add np.inf to end of your bin list:
pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000,np.inf])
This question already has answers here:
Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas
(8 answers)
Closed 2 years ago.
I am trying to build a column that will be based off of another. The new column should reflect the values meeting certain criteria and put 0's where the values do not meet the criteria.
Example, a column called bank balance will have negative and positive values; the new column, overdraft, will have the negative values for the appropriate row and 0 where the balance is greater than 0.
Bal Ovr
21 0
-34 -34
45 0
-32 -32
The final result should look like that.
Assuming your dataframe is called df, you can use np.where and do:
import numpy as np
df['Ovr'] = np.where(df['Bal'] <0,'df['Bal'],0)
which will create a column called Ovr, with 0's when Bal is +ve, and the same as Bal when Bal is -ve.
df["over"] = df.Bal.apply(lambda x: 0 if x>0 else x)
Additional method to enrich your coding skills. However, it isn't needed for such easy tasks.
This question already has answers here:
Binning a column with pandas
(4 answers)
Closed 3 years ago.
I have a dataframe of cars. I have its car price column and I want to create a new column carsrange that would have values like 'high','low' etc according to car price. Like for example :
if price is between 0 and 9000 then cars range should have 'low' for those cars. similarly, if price is between 9000 and 30,000 carsrange should have 'medium' for those cars etc. I tried doing it, but my code is replacing one value to the other. Any help please?
I ran a for loop in the price column, and use the if-else iterations to define my column values.
for i in cars_data['price']:
if (i>0 and i<9000): cars_data['carsrange']='Low'
elif (i<9000 and i<18000): cars_data['carsrange']='Medium-Low'
elif (i<18000 and i>27000): cars_data['carsrange']='Medium'
elif(i>27000 and i<36000): cars_data['carsrange']='High-Medium'
else : cars_data['carsrange']='High'
Now, When I run the unique function for carsrange, it shows only 'High'.
cars_data['carsrange'].unique()
This is the Output:
In[74]:cars_data['carsrange'].unique()
Out[74]: array(['High'], dtype=object)
I believe I have applied the wrong concept here. Any ideas as to what I should do now?
you can use list:
resultList = []
for i in cars_data['price']:
if (i>0 and i<9000):
resultList.append("Low")
else:
resultList.append("HIGH")
# write other conditions here
cars_data["carsrange"] = resultList
then find uinque values from cars_data["carsrange"]