How can I get a suitable representation of levels in pandas.cut? - python

Is there an easy way to obtain the values of the levels produced by pandas.cut?
For example:
import pandas as pd
x = pd.cut(np.arange(0,20), 10)
x
Out[1]:
(-0.019, 1.9]
(-0.019, 1.9]
(1.9, 3.8]
(1.9, 3.8]
(3.8, 5.7]
(3.8, 5.7]
(5.7, 7.6]
(5.7, 7.6]
(7.6, 9.5]
(7.6, 9.5]
(9.5, 11.4]
(9.5, 11.4]
(11.4, 13.3]
(11.4, 13.3]
(13.3, 15.2]
(13.3, 15.2]
(15.2, 17.1]
(15.2, 17.1]
(17.1, 19]
(17.1, 19]
Levels (10): Index(['(-0.019, 1.9]', '(1.9, 3.8]', '(3.8, 5.7]',
'(5.7, 7.6]', '(7.6, 9.5]', '(9.5, 11.4]',
'(11.4, 13.3]', '(13.3, 15.2]', '(15.2, 17.1]',
'(17.1, 19]'], dtype=object)
What I would like to get is something like:
x.magic_method
Out[2]:
[[-0.019, 1.9], [1.9, 3.8], [3.8, 5.7],
[5.7, 7.6], [7.6, 9.5], [9.5, 11.4],
[11.4, 13.3], [13.3, 15.2], (15.2, 17.1],
[17.1, 19]]
or some other representation more suitable to manipulation. Instead, we obtain the index by using x.levels, but this representation is a unicode object, so I have to use a couple of loops to get what I want.
UPDATE:
By the way, I need a solution that works with a sequence of values in the second argument: pd.cut(np.arange(0,20), arr)

You can convert from unicode list to an array by following code:
import pandas as pd
x = pd.cut(np.arange(0,20), 10)
np.array(map(lambda t:t[1:-1].split(","), x.levels), float)

You can do this, but prob better to explain what you are actually doing; e.g. you already have the Categorical variable.
In [27]: x, bins = pd.cut(np.arange(0,20), 10, retbins=True)
In [28]: [ [ round(l,3), round(r,3) ] for l, r in zip(bins[:-1],bins[1:]) ]
Out[28]:
[[-0.019, 1.9],
[1.9, 3.8],
[3.8, 5.7],
[5.7, 7.6],
[7.6, 9.5],
[9.5, 11.4],
[11.4, 13.3],
[13.3, 15.2],
[15.2, 17.1],
[17.1, 19.0]]

Related

How to calculate percentiles given array of values? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to calculate percentile values for 10%, 50% and 90%. So the inputs would be a percentile you want to find and an array of values to calculate. How would I do this? It's been a while since stats...
Help in powershell or python would be appreciated.
Edit: Sorry, I meant creating my own function rather than using a pre built function/library
You can do it using numpy in the following way:
import numpy as np
a = np.array([1,2,3,4,5])
p = np.percentile(a, 50)
You can read more about the percentile function in the attached link.
Other option is to use statistics.quantiles this will give you a distribution list of n - 1 cut points separating the n quantile intervals.
Examlpe of use:
from statistics import quantiles
quantiles([1, 2, 3, 4, 5], n=100)
# [0.06, 0.12, 0.18, 0.24, 0.3, 0.36, 0.42, 0.48, 0.54, 0.6, 0.66, 0.72, 0.78, 0.84, 0.9, 0.96, 1.02, 1.08, 1.14, 1.2, 1.26, 1.32, 1.38, 1.44, 1.5, 1.56, 1.62, 1.68, 1.74, 1.8, 1.86, 1.92, 1.98, 2.04, 2.1, 2.16, 2.22, 2.28, 2.34, 2.4, 2.46, 2.52, 2.58, 2.64, 2.7, 2.76, 2.82, 2.88, 2.94, 3.0, 3.06, 3.12, 3.18, 3.24, 3.3, 3.36, 3.42, 3.48, 3.54, 3.6, 3.66, 3.72, 3.78, 3.84, 3.9, 3.96, 4.02, 4.08, 4.14, 4.2, 4.26, 4.32, 4.38, 4.44, 4.5, 4.56, 4.62, 4.68, 4.74, 4.8, 4.86, 4.92, 4.98, 5.04, 5.1, 5.16, 5.22, 5.28, 5.34, 5.4, 5.46, 5.52, 5.58, 5.64, 5.7, 5.76, 5.82, 5.88, 5.94]
quantiles([1, 2, 3, 4, 5], n=100)[49]
Edit
To create your own function please refer to the following link: https://code.activestate.com/recipes/511478-finding-the-percentile-of-the-values/
You can find percentile with numpy
import numpy as np
arr = [20, 2, 7, 1, 34]
percentile_arr = [10,50,90]
for i in range(0,len(percentile_arr)):
percentile = np.percentile(arr, percentile_arr[i])
print(f"{percentile_arr[i]}th percentile of array is : {percentile}")
Edit
You can find different approaches with and without numpy here
Hmmm, by an array you meant a list?
If that's so, then you have very good option: for loop
my_values = [...]
result = []
percentage = 0,5
for i in my_values:
result.append(i*percentage)
the append method of the list result is an way of telling python "hey, I want you to add this thing over here on the list"

Append selected values from a multi dimensional array to a new array

Hello :) I am a python beginner and i started working with numpy lately, basically i got a nd-array: data.shape = {55000, 784} filled with float32 values. Based on a condition i made, i want to append specific rows and their columns to a new array, its important that the formating stays the same. e.g. i want data[5][0-784] appended to an empty array.. i heard about something called fancy indexing, still couldn't figure out how to use it, an example would help me out big time. I would appreciate every help from you guys! - Greets
I'd recommend skimming through the documentation for Indexing. But, here is an example to demonstrate.
import numpy as np
data = np.array([[0, 1, 2], [3, 4, 5]])
print(data.shape)
(2, 3)
print(data)
[[0 1 2]
[3 4 5]]
selection = data[1, 1:3]
print(selection)
[4 5]
Fancy indexing is an advanced indexing function which allows indexing using integer arrays. Here is an example.
fancy_selection = data[[0, 1], [0, 2]]
print(fancy_selection)
[0 5]
Since you also asked about appending, have a look at Append a NumPy array to a NumPy array. Here is an example anyway.
data_two = np.array([[6, 7, 8]])
appended_array = np.concatenate((data, data_two))
print(appended_array)
[[0 1 2]
[3 4 5]
[6 7 8]]
As #hpaulj recommends in his comment appending to arrays is possible but inefficient and should be avoided. Let's turn to your example but make the numbers a bit smaller.
a = np.sum(np.ogrid[1:5, 0.1:0.39:0.1])
a
# array([[ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3],
# [ 3.1, 3.2, 3.3],
# [ 4.1, 4.2, 4.3]])
a.shape
# (4, 3)
Selecting an element:
a[1,2]
# 2.3
Selecting an entire row:
a[2, :] # or a[2] or a 2[, ...]
# array([ 3.1, 3.2, 3.3])
or column:
a[:, 1] # or a[..., 1]
# array([ 1.2, 2.2, 3.2, 4.2])
fancy indexing, observe that the first index is not a slice but a list or array:
a[[3,0,0,1], :] # or a[[3,0,0,1]]
# array([[ 4.1, 4.2, 4.3],
# [ 1.1, 1.2, 1.3],
# [ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3]])
fancy indexing can be used on multiple axes to select arbitrary elements and assemble them to a new shape for example you could make a 2x2x2 array like so:
a[ [[[0,1], [1,2]], [[3,3], [3,2]]], [[[2,1], [1,1]], [[2,1], [0,0]]] ]
# array([[[ 1.3, 2.2],
# [ 2.2, 3.2]],
#
# [[ 4.3, 4.2],
# [ 4.1, 3.1]]])
There is also logical indexing
mask = np.isclose(a%1.1, 1.0)
mask
# array([[False, False, False],
# [ True, False, False],
# [False, True, False],
# [False, False, True]], dtype=bool)
a[mask]
# array([ 2.1, 3.2, 4.3])
To combine arrays, collect them in a list and use concatenate
np.concatenate([a[1:, :2], a[:0:-1, [2,0]]], axis=1)
# array([[ 2.1, 2.2, 4.3, 4.1],
# [ 3.1, 3.2, 3.3, 3.1],
# [ 4.1, 4.2, 2.3, 2.1]])
Hope that help getting you started.

how to merge the values of a list of lists and a list into 1 resulting list of lists

I have a list of lists (a) and a list (b) which have the same "length" (in this case "4"):
a = [
[1.0, 2.0],
[1.1, 2.1],
[1.2, 2.2],
[1.3, 2.3]
]
b = [3.0, 3.1, 3.2, 3.3]
I would like to merge the values to obtain the following (c):
c = [
[1.0, 2.0, 3.0],
[1.1, 2.1, 3.1],
[1.2, 2.2, 3.2],
[1.3, 2.3, 3.3]
]
currently I'm doing the following to achieve it:
c = []
for index, elem in enumerate(a):
x = [a[index], [b[index]]] # x assigned here for better readability
c.append(sum(x, []))
my feeling is that there is an elegant way to do this...
note: the lists are a lot larger, for simplicity I shortened them. they are always(!) of the same length.
In python3.5+ use zip() within a list comprehension and in-place unpacking:
In [7]: [[*j, i] for i, j in zip(b, a)]
Out[7]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
In python 2 :
In [8]: [j+[i] for i, j in zip(b, a)]
Out[8]: [[1.0, 2.0, 3.0], [1.1, 2.1, 3.1], [1.2, 2.2, 3.2], [1.3, 2.3, 3.3]]
Or use numpy.column_stack in numpy:
In [16]: import numpy as np
In [17]: np.column_stack((a, b))
Out[17]:
array([[ 1. , 2. , 3. ],
[ 1.1, 2.1, 3.1],
[ 1.2, 2.2, 3.2],
[ 1.3, 2.3, 3.3]])

How to find two consecutive positive-negative values in an array?

I have the following array:
X
array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
For each values of X I know its recording time T. I want to find the indexes between two consecutive positive-negative or viceversa. Concluding I would like an array like
Y = array([ T(1)-T(0), T(2)-T(1), T(5)-T(4), T(7)-T(6)])
Perhaps iterating over the array in a list comprehension would work for you:
In [35]: x=np.array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
In [36]: y=np.array([b-a for a,b in zip(x, x[1:]) if (a<0) != (b<0)])
In [37]: y
Out[37]: array([ -6.5, 8.4, -22.7, 5.6])
Edit
I apparently didn't understand the question completely. Try this instead:
In [38]: X=np.array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
In [39]: T=np.array([ 0, 0.1, 2, 3.5, 5, 22, 25, 50])
In [40]: y=np.array([t1-t0 for x0,x1,t0,t1 in zip(X, X[1:], T, T[1:]) if (x0<0) != (x1<0)])
In [41]: y
Out[41]: array([ 0.1, 1.9, 17. , 25. ])

Convert pandas DataFrame into list of lists [duplicate]

This question already has answers here:
Pandas DataFrame to List of Lists
(14 answers)
Closed 3 years ago.
I have a pandas data frame like this:
admit gpa gre rank
0 3.61 380 3
1 3.67 660 3
1 3.19 640 4
0 2.93 520 4
Now I want to get a list of rows in pandas like:
[[0,3.61,380,3], [1,3.67,660,3], [1,3.19,640,4], [0,2.93,520,4]]
How can I do it?
There is a built in method which would be the fastest method also, calling tolist on the .values np array:
df.values.tolist()
[[0.0, 3.61, 380.0, 3.0],
[1.0, 3.67, 660.0, 3.0],
[1.0, 3.19, 640.0, 4.0],
[0.0, 2.93, 520.0, 4.0]]
you can do it like this:
map(list, df.values)
EDIT: as_matrix is deprecated since version 0.23.0
You can use the built in values or to_numpy (recommended option) method on the dataframe:
In [8]:
df.to_numpy()
Out[8]:
array([[ 0.9, 7. , 5.2, ..., 13.3, 13.5, 8.9],
[ 0.9, 7. , 5.2, ..., 13.3, 13.5, 8.9],
[ 0.8, 6.1, 5.4, ..., 15.9, 14.4, 8.6],
...,
[ 0.2, 1.3, 2.3, ..., 16.1, 16.1, 10.8],
[ 0.2, 1.3, 2.4, ..., 16.5, 15.9, 11.4],
[ 0.2, 1.3, 2.4, ..., 16.5, 15.9, 11.4]])
If you explicitly want lists and not a numpy array add .tolist():
df.to_numpy().tolist()

Categories