How to calculate percentiles given array of values? [closed]

How to calculate percentiles given array of values? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to calculate percentile values for 10%, 50% and 90%. So the inputs would be a percentile you want to find and an array of values to calculate. How would I do this? It's been a while since stats...
Help in powershell or python would be appreciated.
Edit: Sorry, I meant creating my own function rather than using a pre built function/library

You can do it using numpy in the following way:
import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50)
You can read more about the percentile function in the attached link.
Other option is to use statistics.quantiles this will give you a distribution list of n - 1 cut points separating the n quantile intervals.
Examlpe of use:
from statistics import quantiles
quantiles([1, 2, 3, 4, 5], n=100)
# [0.06, 0.12, 0.18, 0.24, 0.3, 0.36, 0.42, 0.48, 0.54, 0.6, 0.66, 0.72, 0.78, 0.84, 0.9, 0.96, 1.02, 1.08, 1.14, 1.2, 1.26, 1.32, 1.38, 1.44, 1.5, 1.56, 1.62, 1.68, 1.74, 1.8, 1.86, 1.92, 1.98, 2.04, 2.1, 2.16, 2.22, 2.28, 2.34, 2.4, 2.46, 2.52, 2.58, 2.64, 2.7, 2.76, 2.82, 2.88, 2.94, 3.0, 3.06, 3.12, 3.18, 3.24, 3.3, 3.36, 3.42, 3.48, 3.54, 3.6, 3.66, 3.72, 3.78, 3.84, 3.9, 3.96, 4.02, 4.08, 4.14, 4.2, 4.26, 4.32, 4.38, 4.44, 4.5, 4.56, 4.62, 4.68, 4.74, 4.8, 4.86, 4.92, 4.98, 5.04, 5.1, 5.16, 5.22, 5.28, 5.34, 5.4, 5.46, 5.52, 5.58, 5.64, 5.7, 5.76, 5.82, 5.88, 5.94]
quantiles([1, 2, 3, 4, 5], n=100)[49]
Edit
To create your own function please refer to the following link: https://code.activestate.com/recipes/511478-finding-the-percentile-of-the-values/

You can find percentile with numpy
import numpy as np
arr = [20, 2, 7, 1, 34]
percentile_arr = [10,50,90]
for i in range(0,len(percentile_arr)):
percentile = np.percentile(arr, percentile_arr[i])
print(f"{percentile_arr[i]}th percentile of array is : {percentile}")
Edit
You can find different approaches with and without numpy here

Hmmm, by an array you meant a list?
If that's so, then you have very good option: for loop
my_values = [...]
result = []
percentage = 0,5
for i in my_values:
result.append(i*percentage)
the append method of the list result is an way of telling python "hey, I want you to add this thing over here on the list"

Related

Pandas Group By column to generate quantiles (.25, 0.5, .75)

Let's say we have CityName, Min-Temperature, Max-Temperature, Humidity of different cities.
We need an output dataframe grouped on CityName and want to generate 0.25, 0.5 and 0.75 quantiles. New column names would be OldColunmName + ('Q1)/('Q2')/('Q3').
Example INPUT
df = pd.DataFrame({'cityName': pd.Categorical(['a','a','a','a','b','b','b','b','a','a','a','a','b','b','b','b']),
'MinTemp': [1.1, 2.1, 3.1, 1.1, 2, 2.1, 2.2, 2.4, 2.5, 1.11, 1.31, 2.1, 1, 2, 2.3, 2.1],
'MaxTemp': [2.1, 4.2, 5.1, 2.13, 4, 3.1, 5.2, 3.4, 3.5, 2.11, 2.31, 3.1, 2, 4.3, 4.3, 3.1],
'Humidity': [0.29, 0.19, .45, 0.1, 0.1, 0.1, 0.2, 0.5, 0.11, 0.31, 0.1, .1, .2, 0.3, 0.3, 0.1]
})
OUTPUT

First Approach
First you have to group your data on the column you want which is 'cityName'. Then, because on each column you want to do multiple and different kinds of aggregations, you can use 'agg' function. For functions in the 'agg', you cannot give parameters so you define them as follow:
def quantile_50(x):
return x.quantile(0.5)
def quantile_25(x):
return x.quantile(0.25)
def quantile_75(x):
return x.quantile(0.75)
quantile_df = df.groupby('cityName').agg([quantile_25, quantile_50, quantile_75])
quantile_df
Second Approach
You can use describe method and select the statistics you need. By using idx you can choose which subindex to choose.
idx = pd.IndexSlice
df.groupby('cityName').describe().loc[:, idx[:, ['25%', '50%', '75%']]]

is there a parameter to set the precision for numpy.linspace?

I am trying to check if a numpy array contains a specific value:
>>> x = np.linspace(-5,5,101)
>>> x
array([-5. , -4.9, -4.8, -4.7, -4.6, -4.5, -4.4, -4.3, -4.2, -4.1, -4. ,
-3.9, -3.8, -3.7, -3.6, -3.5, -3.4, -3.3, -3.2, -3.1, -3. , -2.9,
-2.8, -2.7, -2.6, -2.5, -2.4, -2.3, -2.2, -2.1, -2. , -1.9, -1.8,
-1.7, -1.6, -1.5, -1.4, -1.3, -1.2, -1.1, -1. , -0.9, -0.8, -0.7,
-0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0. , 0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3, 1.4, 1.5,
1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6,
2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7,
3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
4.9, 5. ])
>>> -5. in x
True
>>> a = 0.2
>>> a
0.2
>>> a in x
False
I assigned a constant to variable a. It seems that the precision of a is not compatible with the elements in the numpy array generated by np.linspace().
I've searched the docs, but didn't find anything about this.

This is not a question of the precision of np.linspace, but rather of the type of the elements in the generated array.
np.linspace generates elements which, conceptually, equally divide the input range between them. However, these elements are then stored as floating point numbers with limited precision, which makes the generation process itself appear to lack precision.
By passing the dtype argument to np.linspace, you can specify the precision of the floating point type used to store its result, which can increase the apparent precision of the generation process.
Nevertheless, you should not use the equality operator to compare floating point numbers. Instead, use np.isclose in conjunction with np.ndarray.any, or some equivalent:
>>> floats_64 = np.linspace(-5, 5, 101, dtype='float64')
>>> floats_128 = np.linspace(-5, 5, 101, dtype='float128')
>>> print(0.2 in floats_64)
False
>>> print(floats_64[52])
0.20000000000000018
>>> print(np.isclose(0.2, floats_64).any()) # check if any element in floats_64 is close to 0.2
True
>>> print(0.2 in floats_128)
False
>>> print(floats_128[52])
0.20000000000000017764
>>> print(np.isclose(0.2, floats_128).any()) # check if any element in floats_128 is close to 0.2
True

How to efficiently generate matrix from vector in numpy? [duplicate]

This question already has answers here:
Most efficient way to map function over numpy array
(11 answers)
Numpy vectorize function with non-scalar output
(1 answer)
Closed 5 years ago.
I have a function f(x):[0,1]-> Rⁿ such as:
>>> f(0.54)
array([0.2, 0.3, 4.0, 5.2, ... , 1.0])
How can I efficiently apply that to a vector, in order to generate a matrix?
Example:
>>> f([0.54, 0.32, 0.56, 0.21])
array([0.2, 0.3, 4.0, 5.2, ... , 1.0],
[0.6, 0.1, 0.0, 2.3, ... , 4.7],
[0.1, 7.1, 0.2, 4.9, ... , 3.1],
[1.3, 2.8, 1.2, 1.1, ... , 5.3])
Note: numpy solutions are very welcome :)

How do I write a custom generator function with python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have this
for A in [0, -0.25, 0.25, -0.5, 0.5, -0.75, 0.75, -1.0, 1.0, -1.25, 1.25, -1.5, 1.5, -1.75, 1.75, -2.0, 2.0, -2.25, 2.25, -2.5, 2.5, -2.75, 2.75, -3.0, 3.0, -3.25, 3.25, -3.5, 3.5, -3.75, 3.75, -4.0, 4.0, -4.25, 4.25, -4.5, 4.5, -4.75, 4.75, -5.0, 5.0]:
Is it possible to make it with generator function? I have now this:
def frange(start, stop, step=1.0):
while start <= stop:
yield start
start += step
and use like this:
for error in self.frange(-2.5, 2.5, 0.25):
but its returns [-2.5, 2.25, ... , 0 , 2.25, 2.5] and for my program it's very hard to calculate. because I finding the value the near to zero, but I don't know how much combinations it could be.
I need go from zero and next value must be in minus and plus value.
like [0, -0.25, 0.25...].

Maybe you meant a generator instead of a lambda:
def opposing_numbers(increment, maximum):
yield 0
value = increment
while value <= maximum:
yield -value
yield value
value += increment
Then call it as:
opposing_numbers(0.25, 5)

You could use the numpy.arange() function, and then sort the values by their absolute value:
answer = sorted(np.arange(-5, 5.25, 0.25), key=abs)
print(answer)
Output
[0.0,
-0.25,
0.25,
-0.5,
0.5,
...,
-4.5,
4.5,
-4.75,
4.75,
-5.0,
5.0]

Try this:
[(i // 2) * 0.25 * (2*(i&1)- 1) for i in range(1,42)]

here is a cool one liner
from itertools import chain
list(chain(*zip([i /4.0 for i in range(10)], [-i/4.0 for i in range(9)])))
[0.0, 0.0, 0.25, -0.25, 0.5, -0.5, 0.75, -0.75, 1.0, -1.0, 1.25, -1.25, 1.5, -1.5, 1.75, -1.75, 2.0, -2.0]

Convert pandas DataFrame into list of lists [duplicate]

This question already has answers here:
Pandas DataFrame to List of Lists
(14 answers)
Closed 3 years ago.
I have a pandas data frame like this:
admit gpa gre rank
0 3.61 380 3
1 3.67 660 3
1 3.19 640 4
0 2.93 520 4
Now I want to get a list of rows in pandas like:
[[0,3.61,380,3], [1,3.67,660,3], [1,3.19,640,4], [0,2.93,520,4]]
How can I do it?

There is a built in method which would be the fastest method also, calling tolist on the .values np array:
df.values.tolist()
[[0.0, 3.61, 380.0, 3.0],
[1.0, 3.67, 660.0, 3.0],
[1.0, 3.19, 640.0, 4.0],
[0.0, 2.93, 520.0, 4.0]]

you can do it like this:
map(list, df.values)

EDIT: as_matrix is deprecated since version 0.23.0
You can use the built in values or to_numpy (recommended option) method on the dataframe:
In [8]:
df.to_numpy()
Out[8]:
array([[ 0.9, 7. , 5.2, ..., 13.3, 13.5, 8.9],
[ 0.9, 7. , 5.2, ..., 13.3, 13.5, 8.9],
[ 0.8, 6.1, 5.4, ..., 15.9, 14.4, 8.6],
...,
[ 0.2, 1.3, 2.3, ..., 16.1, 16.1, 10.8],
[ 0.2, 1.3, 2.4, ..., 16.5, 15.9, 11.4],
[ 0.2, 1.3, 2.4, ..., 16.5, 15.9, 11.4]])
If you explicitly want lists and not a numpy array add .tolist():
df.to_numpy().tolist()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate percentiles given array of values? [closed] - python

Related

Pandas Group By column to generate quantiles (.25, 0.5, .75)

is there a parameter to set the precision for numpy.linspace?

How to efficiently generate matrix from vector in numpy? [duplicate]

How do I write a custom generator function with python [closed]

Convert pandas DataFrame into list of lists [duplicate]

Categories

Resources