Numpy: recode numeric array to which quintile each element belongs - python

I have a numeric vector a:
import numpy as np
a = np.random.rand(100)
I wish to get the vector (or any other vector) recoded so that each element is either 0, 1, 2, 3 or 4, according to which a quintile it is in (could be more general for any quantile, like quartile, decile etc.).
This is what I'm doing. There has to be something more elegant, no?
from scipy.stats import percentileofscore
n_quantiles = 5
def get_quantile(i, a, n_quantiles):
if a[i] >= max(a):
return n_quantiles - 1
return int(percentileofscore(a, a[i])/(100/n_quantiles))
a_recoded = np.array([get_quantile(i, a, n_quantiles) for i in range(len(a))])
print(a)
print(a_recoded)
[0.04708996 0.86267278 0.23873192 0.02967989 0.42828385 0.58003015
0.8996666 0.15359369 0.83094778 0.44272398 0.60211289 0.90286434
0.40681163 0.91338397 0.3273745 0.00347029 0.37471307 0.72735901
0.93974808 0.55937197 0.39297097 0.91470761 0.76796271 0.50404401
0.1817242 0.78244809 0.9548256 0.78097562 0.90934337 0.89914752
0.82899983 0.44116683 0.50885813 0.2691431 0.11676798 0.84971927
0.38505195 0.7411976 0.51377242 0.50243197 0.89677377 0.69741088
0.47880953 0.71116534 0.01717348 0.77641096 0.88127268 0.17925502
0.53053573 0.16935597 0.65521692 0.19042794 0.21981197 0.01377195
0.61553814 0.8544525 0.53521604 0.88391848 0.36010949 0.35964882
0.29721931 0.71257335 0.26350287 0.22821314 0.8951419 0.38416004
0.19277649 0.67774468 0.27084229 0.46862229 0.3107887 0.28511048
0.32682302 0.14682896 0.10794566 0.58668243 0.16394183 0.88296862
0.55442047 0.25508233 0.86670299 0.90549872 0.04897676 0.33042884
0.4348465 0.62636481 0.48201213 0.49895892 0.36444648 0.01410316
0.46770595 0.09498391 0.96793139 0.03931124 0.64286295 0.50934846
0.59088907 0.56368594 0.7820928 0.77172038]
[0 4 1 0 2 3 4 0 4 2 3 4 2 4 1 0 1 3 4 2 1 4 3 2 0 3 4 3 4 4 4 2 2 1 0 4 1
3 2 2 4 3 2 3 0 3 4 0 2 0 3 0 1 0 3 4 2 4 1 1 1 3 1 1 4 1 0 3 1 2 1 1 1 0
0 3 0 4 2 1 4 4 0 1 2 3 2 2 1 0 2 0 4 0 3 2 3 2 3 3]
Update: just wanted to say this is so easy in R:
How to get the x which belongs to a quintile?

You could use argpartition. Example:
>>> a = np.random.random(20)
>>> N = len(a)
>>> nq = 5
>>> o = a.argpartition(np.arange(1, nq) * N // nq)
>>> out = np.empty(N, int)
>>> out[o] = np.arange(N) * nq // N
>>> a
array([0.61238649, 0.37168998, 0.4624829 , 0.28554766, 0.00098016,
0.41979328, 0.62275886, 0.4254548 , 0.20380679, 0.762435 ,
0.54054873, 0.68419986, 0.3424479 , 0.54971072, 0.06929464,
0.51059431, 0.68448674, 0.97009023, 0.16780152, 0.17887862])
>>> out
array([3, 1, 2, 1, 0, 2, 3, 2, 1, 4, 3, 4, 1, 3, 0, 2, 4, 4, 0, 0])

Here's one way to do it using pd.cut()
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(100))
df.columns = ['values']
# Apply the quantiles
gdf = df.groupby(pd.cut(df.loc[:, 'values'], np.arange(0, 1.2, 0.2)))['values'].apply(lambda x: list(x)).to_frame()
# Make use of the automatic indexing to assign quantile numbers
gdf.reset_index(drop=True, inplace=True)
# Re-expand the grouped list of values. Method provided by #Zero at https://stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows
gdf['values'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('values').reset_index()

Related

What will be the best approach for a digit like pattern in python?

i was trying a pattern in Python
if n == 6
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
after trying to think a lot
i did it like this --->
n = 6
for i in range(1,n):
x = 1
countj = 0
for j in range(i,n):
countj +=1
print(j,end=" ")
if j == n-1 and countj < n-1 :
while countj < n-1:
print(x , end =" ")
countj +=1
x +=1
print()
but i don't think it is the best approach, I was trying to search some better approach , but not able to get the proper one, So that I came here,, is there any possible better approach for the problem?
I would do like this, using a rotating deque instance:
>>> from collections import deque
>>> n = 6
>>> d = deque(range(1, n))
>>> for _ in range(1, n):
... print(*d)
... d.rotate(-1)
...
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
There is a similar/shorter code possible just using range slicing, but maybe it's a bit harder to understand how it works:
>>> ns = range(1, 6)
>>> for i in ns:
... print(*ns[i-1:], *ns[:i-1])
...
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
You could also create a mathematical function of the coordinates, which might look something like this:
>>> for row in range(5):
... for col in range(5):
... print((row + col) % 5 + 1, end=" ")
... print()
...
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
A too-clever way using list comprehension:
>>> r = range(5)
>>> [[1 + r[i - j - 1] for i in r] for j in reversed(r)]
[[1, 2, 3, 4, 5],
[2, 3, 4, 5, 1],
[3, 4, 5, 1, 2],
[4, 5, 1, 2, 3],
[5, 1, 2, 3, 4]]
more-itertools has this function:
>>> from more_itertools import circular_shifts
>>> circular_shifts(range(1, 6))
[(1, 2, 3, 4, 5),
(2, 3, 4, 5, 1),
(3, 4, 5, 1, 2),
(4, 5, 1, 2, 3),
(5, 1, 2, 3, 4)]
You can use itertools.cycle to make the sequence generated from range repeat itself, and then use itertools.islice to slice the sequence according to the iteration count:
from itertools import cycle, islice
n = 6
for i in range(n - 1):
print(*islice(cycle(range(1, n)), i, i + n - 1))
This outputs:
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
Your 'pattern' is actually known as a Hankel matrix, commonly used in linear algebra.
So there's a scipy function for creating them.
from scipy.linalg import hankel
hankel([1, 2, 3, 4, 5], [5, 1, 2, 3, 4])
or
from scipy.linalg import hankel
import numpy as np
def my_hankel(n):
x = np.arange(1, n)
return hankel(x, np.roll(x, 1))
print(my_hankel(6))
Output:
[[1 2 3 4 5]
[2 3 4 5 1]
[3 4 5 1 2]
[4 5 1 2 3]
[5 1 2 3 4]]
Seeing lots of answers involving Python libraries. If you want a simple way to do it, here it is.
n = 5
arr = [[1 + (start + i) % n for i in range(n)] for start in range(n)]
arr_str = "\n".join(" ".join(str(cell) for cell in row) for row in arr)
print(arr_str)

the code does not give the results what I want using equal sign in python

I do have a code like this.
import numpy as np
a = np.zeros(shape = (4,4))
a+= 2
b = np.zeros(shape = (4,4))
b+=2
t = 0
while t<2:
for i in range(1,3):
for j in range(1,3):
if a[i,j] == a[i-1,j]:
b[i,j] = a[i,j]+1
print(a,t)
print(b,t)
a = b
t+= 1
I am hoping that at t = 2
a = [2 2 2 2, 2 3 3 2, 2 3 3 2, 2 2 2 2] and b = [2 2 2 2, 2 3 3 2, 2 4 4 2, 2 2 2 2]
but in fact at the end of the run the a = [2 2 2 2, 2 3 3 2, 2 4 4 2, 2 2 2 2]
anyone know why? is it because i am declaring a = b? if it is yes, is there any way to do it?
thanks..
Replace a=b (which makes a and b the same array) with a[:,:]=b (which copies the elements of b into a).

For loop iteration with range restarting at index 1

I'm trying to iterate through a loop with a step of 2 indexes at the time and once it reaches the end to restart the same but from index 1 this time rather than zero.
I have already read different articles on stack like this with a while loop workaround. However, I'm looking for an option which will simply use the element in my for loop with range and without using itertool or other libraries or a nested loop:
Here is my code:
j = [0,0,1,1,2,2,3,3,9,11]
count = 0
for i in range(len(j)):
if i >= len(j)/2:
print(j[len(j)-i])
count += 1
else:
count +=1
print(j[i*2],i)
Here is the output:
0 0
1 1
2 2
3 3
9 4
2
2
1
1
0
The loop does not start back from where is supposed to.
Here is the desired output:
0 0
1 1
2 2
3 3
9 4
0 5
1 6
2 7
3 8
11 9
How can I fix it?
You can do that by combining two range() calls like:
Code:
j = [0, 0, 1, 1, 2, 2, 3, 3, 9, 11]
for i in (j[k] for k in
(list(range(0, len(j), 2)) + list(range(1, len(j), 2)))):
print(i)
and using an itertools solution:
import itertools as it
for i in it.chain.from_iterable((it.islice(j, 0, len(j), 2),
it.islice(j, 1, len(j), 2))):
print(i)
Results:
0
1
2
3
9
0
1
2
3
11
Another itertools solution:
import itertools as it
lst = [0, 0, 1, 1, 2, 2, 3, 3, 9, 11]
a, b = it.tee(lst)
next(b)
for i, x in enumerate(it.islice(it.chain(a, b), None, None, 2)):
print(x, i)
Output
0 0
1 1
2 2
3 3
9 4
0 5
1 6
2 7
3 8
11 9

How to return all opposite pairs in a Pandas DataFrame?

For the dataframe below, how to return all opposite pairs?
import pandas as pd
df1 = pd.DataFrame([1,2,-2,2,-1,-1,1,1], columns=['a'])
a
0 1
1 2
2 -2
3 2
4 -1
5 -1
6 1
7 1
The output should be as below:
(1) sum of all rows is 0
(2) as there are 3 "1" and 2 "-1" in
original data, output includes 2 "1" and 2"-1".
a
0 1
1 2
2 -2
4 -1
5 -1
6 1
Thank you very much.
Well, I thought this would take fewer lines (and probably can) but this does work. First just create a couple of new columns to simplify the later syntax:
>>> df1['abs_a'] = np.abs( df1['a'] )
>>> df1['ones'] = 1
Then the main thing you need is to do some counting. For example, are there fewer 1s or fewer -1s?
>>> df2 = df1.groupby(['abs_a','a']).count()
ones
abs_a a
1 -1 2
1 3
2 -2 1
2 2
>>> df3 = df2.groupby(level=0).min()
ones
abs_a
1 2
2 1
That's basically the answer right there, but I'll put it closer to the form you asked for:
>>> lst = [ [i]*j for i, j in zip( df3.index.tolist(), df3['ones'].tolist() ) ]
>>> arr = np.array( [item for sublist in lst for item in sublist] )
>>> np.hstack( [arr,-1*arr] )
array([ 1, 1, 2, -1, -1, -2], dtype=int64)
Or if you want to put it back into a dataframe:
>>> pd.DataFrame( np.hstack( [arr,-1*arr] ) )
0
0 1
1 1
2 2
3 -1
4 -1
5 -2

Python 2D Array can't work. Help~

_R = [0] * 5
R = [_R] * 4
num_user = 0
num_item = 0
for i in range(8):
s = input().split()
for j in range(4):
s[j] = int(s[j])
R[s[0]][s[1]] = s[2]
print(s[0], s[1], R[s[0]][s[1]])
num_user = max(num_user, s[0])
num_item = max(num_item, s[1])
print("=====")
for i in range(num_user + 1):
for j in range(num_item + 1):
print(i, j, R[i][j])
exit()
Probably you already understand what I am going to ask. The output confused me:
#output
1 2 3
2 4 2
1 1 5
3 2 2
2 2 1
3 3 4
1 4 3
2 1 4
=====
0 0 0
0 1 4
0 2 1
0 3 4
0 4 3
1 0 0
1 1 4
1 2 1
1 3 4
1 4 3
2 0 0
2 1 4
2 2 1
2 3 4
2 4 3
3 0 0
3 1 4
3 2 1
3 3 4
3 4 3
what is wrong with me? Last time I coded in Python it was 2.7 and it was long time ago. Have I forgot any important grammar?
you're creating the list of list in wrong way:
>>> _R = [0] * 5
>>> R = [_R] * 4
>>> [id(x) for x in R] #here all objects are acually identical
[36635392, 36635392, 36635392, 36635392]
>>> R[0][1]=1 #changing one element changes all other elements as well
>>> R
[[0, 1, 0, 0, 0], [0, 1, 0, 0, 0], [0, 1, 0, 0, 0], [0, 1, 0, 0, 0]]
better create your list this way:
>>> R=[[0]*5 for _ in range(4) ]
>>> [id(x) for x in R]
[37254008, 36635712, 38713784, 38714664]
>>>
_R = [0] * 5
R = [_R] * 4
That is a NOGO. R will contain _R 4 times, the same array 4 times...
Use this instead:
R = [[0 for col in range(5)] for row in range(4)]

Categories