How to cut the lower values when caclulating percentile? - python

If I calculate 90 percentile using numpy:
import numpy as np
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
p = np.percentile(a, 90)
print (p)
It cuts the highest value so the result is:
9.1
How to cut instead the lower values so the output would be:
2
Thank you!

You want the 10th percentile, not the 90th.
p = np.percentile(a, 10)
print (p)
# 1.9

Related

Percentage distribution of values in a python list

Let's say we have following list. This list contains response times of a REST server in a traffic run.
[1, 2, 3, 3, 4, 5, 6, 7, 9, 1]
I need following output
Percentage of the requests served within a certain time (ms)
50% 3
60% 4
70% 5
80% 6
90% 7
100% 9
How can we get it done in python? This is apache bench kind of output. So basically lets say at 50%, we need to find point in list below which 50% of the list elements are present and so on.
You can try something like this:
responseTimes = [1, 2, 3, 3, 4, 5, 6, 7, 9, 1]
for time in range(3,10):
percentage = len([x for x in responseTimes if x <= time])/(len(responseTimes))
print(f'{percentage*100}%')
"So basically lets say at 50%, we need to find point in list below which 50% of the list elements are present and so on"
responseTimes = [1, 2, 3, 3, 4, 5, 6, 7, 9, 1]
percentage = 0
time = 0
while(percentage <= 0.5):
percentage = len([x for x in responseTimes if x <= time])/(len(responseTimes))
time+=1
print(f'Every time under {time}(ms) occurrs lower than 50% of the time')
You basically need to compute the cumulative ratio of the sorted response times.
from collections import Counter
values = [1, 2, 3, 3, 4, 5, 6, 7, 9, 1]
frequency = Counter(values) # {1: 2, 2: 1, 3: 2, ...}
total = 0
n = len(values)
for time in sorted(frequency):
total += frequency[time]
print(time, f'{100*total/n}%')
This will print all times with the corresponding ratios.
1 20.0%
2 30.0%
3 50.0%
4 60.0%
5 70.0%
6 80.0%
7 90.0%
9 100.0%

Python: Find outliers inside a list

I'm having a list with a random amount of integers and/or floats. What I'm trying to achieve is to find the exceptions inside my numbers (hoping to use the right words to explain this). For example:
list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
90 to 99% of my integer values are between 1 and 20
sometimes there are values that are much higher, let's say somewhere around 100 or 1.000 or even more
My problem is, that these values can be different all the time. Maybe the regular range is somewhere between 1.000 to 1.200 and the exceptions are in the range of half a million.
Is there a function to filter out these special numbers?
Assuming your list is l:
If you know you want to filter a certain percentile/quantile, you can
use:
This removes bottom 10% and top 90%. Of course, you can change any of
them to your desired cut-off (for example you can remove the bottom filter and only filter the top 90% in your example):
import numpy as np
l = np.array(l)
l = l[(l>np.quantile(l,0.1)) & (l<np.quantile(l,0.9))].tolist()
output:
[ 3 2 14 2 8 4 3 5]
If you are not sure of the percentile cut-off and are looking to
remove outliers:
You can adjust your cut-off for outliers by adjusting argument m in
function call. The larger it is, the less outliers are removed. This function seems to be more robust to various types of outliers compared to other outlier removal techniques.
import numpy as np
l = np.array(l)
def reject_outliers(data, m=6.):
d = np.abs(data - np.median(data))
mdev = np.median(d)
s = d / (mdev if mdev else 1.)
return data[s < m].tolist()
print(reject_outliers(l))
output:
[1, 3, 2, 14, 2, 1, 8, 1, 4, 3, 5]
You can use the built-in filter() method:
lst1 = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
lst2 = list(filter(lambda x: x > 5,lst1))
print(lst2)
Output:
[14, 108, 8, 97]
So here is a method how to block out those deviators
import math
_list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
def consts(_list):
mu = 0
for i in _list:
mu += i
mu = mu/len(_list)
sigma = 0
for i in _list:
sigma += math.pow(i-mu,2)
sigma = math.sqrt(sigma/len(_list))
return sigma, mu
def frequence(x, sigma, mu):
return (1/(sigma*math.sqrt(2*math.pi)))*math.exp(-(1/2)*math.pow(((x-mu)/sigma),2))
sigma, mu = consts(_list)
new_list = []
for i in range(len(_list)):
if frequence(_list[i], sigma, mu) > 0.01:
new_list.append(i)
print(new_list)

How do I generate dice rolls with different probabilities?

just working on a homework question where it asks to produce a dice rolling program where one or two numbers may be more likely to print than the rest. For example 2 and/or 3 might be 10% or 20% more likely to print than other elements on the dice. I got the code up to the point where I can get it to print a random number on the dice but can't figure out how to have a weighted output.
input:
def roll_dice(n, faces = 6):
rolls = []
rand = random.randrange
for x in range (n):
rolls.append(rand(1, faces + 1 ))
return rolls
print (roll_dice(5))
output:
[5, 11, 6, 7, 6, 5]
from scipy import stats
values = np.arange(1, 7)
prob = (0.1, 0.2, 0.3, 0.1, 0.1, 0.2) # probabilities must sum to 1
custm = stats.rv_discrete(values=(values, prob))
for i in range(10):
print(custm.rvs())
1, 2, 3, 6, 3, 2, 3, 2, 2, 1
Source: scipy.stats.rv_discrete
If you are on Python 3
import random
rolls = random.choices([1,2,3,4,5,6],weights=[10,10,10,10,10,50], k=10)
rolls
out:[1, 3, 2, 5, 3, 4, 6, 6, 6, 4]

Single line chunk re-assignment

As shown in the following code, I have a chunk list x and the full list h. I want to reassign back the values stored in x in the correct positions of h.
index = 0
for t1 in range(lbp, ubp):
h[4 + t1] = x[index]
index = index + 1
Does anyone know how to write it in a single line/expression?
Disclaimer: This is part of a bigger project and I simplified the questions as much as possible. You can expect the matrix sizes to be correct but if you think I am missing something please ask for it. For testing you can use the following variable values:
h = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x = [20, 21]
lbp = 2
ubp = 4
You can use slice assignment to expand on the left-hand side and assign your x list directly to the indices of h, e.g.:
h = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
x = [20, 21]
lbp = 2
ubp = 4
h[4 + lbp:4 + ubp] = x # or better yet h[4 + lbp:4 + lbp + len(x)] = x
print(h)
# [1, 2, 3, 4, 5, 6, 20, 21, 9, 10]
I'm not really sure why are you adding 4 to the indexes in your loop nor what lbp and ubp are supposed to mean, tho. Keep in mind that when you select a range like this, the list you're assigning to the range has to be of the same length as the range.

How do I implement this similarity measure in Python?

I tried implementing the distance measure shown in the image, in Python as such:
import numpy as np
A = [1, 2, 3, 4, 5, 6, 7, 8, 1]
B = [1, 2, 3, 2, 4, 6, 7, 8, 2]
A = np.asarray(A).flatten()
B = np.asarray(B).flatten()
x = np.sum(1 - np.divide((1 + np.minimum(A, B)), (1 + np.maximum(A, B))))
print("Distance: {}".format(x))
but after testing, it doesn't seem to be the right approach. The maximum value returned if there's no similarity at all between the given vectors should be 1, with 0 as perfect similiarity. A and B in the image are both vectors with size m.
Edit: forgot to add that I ignored the part for min(A, B) < 0 as that wont ever happen for my intentions
This should work. First, we create a matrix AB by stacking the columns and calculate the minimum vector AB_min and maximum vector AB_max out of that. Then, we compute D as you defined it, making use of numpy.where to specify the two conditions. After that, we sum the elements to get the D_proposed as you defined it. It gives a value of 0.9 for this example.
import numpy as np
A = [1, 2, 3, 4, 5, 6, 7, 8, 1]
B = [1, 2, 3, 2, 4, 6, 7, 8, 2]
AB = np.column_stack((A,B))
AB_min = np.min(AB,1)
AB_max = np.max(AB,1)
print AB_min
print AB_max
D = np.where(AB_min >= 0.,\
1. - (1. + AB_min) / (1. + AB_max),\
1. - (1. + AB_min + abs(AB_min)) / (1. + AB_max + abs(AB_min)))
print D
D_proposed = np.sum(D)
print D_proposed

Categories