This question already has answers here:
Pandas how to use pd.cut()
(5 answers)
Closed 6 months ago.
I am using Pandas cut to bin certain values in ranges according to a column. I am using user defined bins i.e the ranges are being passed as array.
df['Range'] = pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000])
However the values I have are ranging till 100000. This restricts the values to 2000 as an upper limit, and I am losing values greater than 2000. I want to keep an interal for greater than 2000. Is there any way to do this?
Let's add np.inf to end of your bin list:
pd.cut(df.TOTAL, bins=[0,100,200,300,400,450,500,600,700,800,900,1000,2000,np.inf])
Related
This question already has answers here:
Comparing floats in a pandas column
(4 answers)
Closed 6 months ago.
So I have a dataframe with the index and a column named 0
image1
When I search for the index value of all the rows equal to '0' with
df.index[df[0] == 0].tolist()
It is working no problem
image2
When I search for the index value of a row equal to a specific value such as '0.000376' with df.index[df[0] == 0.000376].tolist(). The output gives me nothing eventhough this value does exist in the data set.
image3
Must be veeery basic but yeah I've been stuck on this for 2 days lol
This is due to floating point approximation, use:
import numpy as np
df.index[np.isclose(df[0], 0.000376)].tolist()
This question already has answers here:
How to select rows in a DataFrame between two values, in Python Pandas?
(7 answers)
Closed 1 year ago.
Example df
How do I create a histogram that in this case only uses the range of 2–5 points, instead of the entire points data range of 1–6?
I'm trying to only display the average data spread, not the extreme areas. Is there maybe a function to zoom in to the significant ranges? And is there a smart way to describe those?
For your specific data, you can first filter your DataFrame, then call .hist(). Note that Series.between(left, right) includes both the left and right values:
df[df['points'].between(2, 5)].hist()
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have used the following code to make a distplot.
data_agg = data.groupby('HourOfDay')['travel_time'].aggregate(np.median).reset_index()
plt.figure(figsize=(12,3))
sns.pointplot(data.HourOfDay.values, data.travel_time.values)
plt.show()
However I want to choose hours above 8 only and not 0-7. How do I proceed with that?
What about filtering first?
data_filtered = data[data['HourOfDay'] > 7]
# depending of the type of the column of date
data_agg = data_filtered.groupby('HourOfDay')['travel_time'].aggregate(np.median).reset_index()
plt.figure(figsize=(12,3))
Sns.pointplot(data_filtered.HourOfDay.values, data_filtered.travel_time.values)
plt.show()
This question already has answers here:
Find the column name which has the maximum value for each row
(5 answers)
Closed 4 years ago.
I am stuck in caluculating the highest of the 3 columns and respctive category as per the dataset:
<Dataset Image>
I want to calculate the max confidence category with confidence level and create a dataset with 4 columns like:
TC_Name Failure MaxErrCategory MaxConfidence
I have tried capturing the max confidence level for each row but unable to figure out the category:
max_conf=data.max(axis=1)
Kindly help..
The idxmax method of the dataframe will give you, for each row, the name of the column where the first occurrence of the maximum has been found.
data.idxmax(axis=1)
As you have strings in some of your columns, you should first select the columns on which you want to compute the max:
data[ ["confidence1", "confidence2", "confidence3"] ].idxmax(axis=1)
This question already has answers here:
Absolute value for column in Python
(2 answers)
Closed 5 years ago.
In my dataframe I have a column containing numbers, some positive, some negative. Example
Amount
0 -500
1 659
3 -10
4 344
I want to turn all numbers Df['Amount'] into positive numbers. I thought about multiplying all numbers with *-1. But though this turns negative numbers positive, and also does the reverse.
Is there a better way to do this?
You can assign the result back to the original column:
df['Amount'] = df['Amount'].abs()
Or you can create a new column, instead:
df['AbsAmount'] = df['Amount'].abs()
You can take absolute value
d['Amount'].apply(abs)
abs() is the standard way to get absolute values.