python: sort array when sorting other array - python

I have two arrays:
a = np.array([1,3,4,2,6])
b = np.array(['c', 'd', 'e', 'f', 'g'])
These two array are linked (in the sense that there is a 1-1 correspondence between the elements of the two arrays), so when i sort a by decreasing order I would like to sort b in the same order.
For instance, when I do:
a = np.sort(a)[::-1]
I get:
a = [6, 4, 3, 2, 1]
and I would like to be able to get also:
b = ['g', 'e', 'd', 'f', 'c']

i would do smth like this:
import numpy as np
a = np.array([1,3,4,2,6])
b = np.array(['c', 'd', 'e', 'f', 'g'])
idx_order = np.argsort(a)[::-1]
a = a[idx_order]
b = b[idx_order]
output:
a = [6 4 3 2 1]
b = ['g' 'e' 'd' 'f' 'c']

I don't know how or even if you can do this in numpy arrays. However there is a way using standard lists albeit slightly convoluted. Consider this:-
a = [1, 3, 4, 2, 6]
b = ['c', 'd', 'e', 'f', 'g']
assert len(a) == len(b)
c = []
for i in range(len(a)):
c.append((a[i], b[i]))
r = sorted(c)
for i in range(len(r)):
a[i], b[i] = r[i]
print(a)
print(b)
In your problem statement, there is no relationship between the two tables. What happens here is that we make a relationship by grouping relevant data from each table into a temporary list of tuples. In this scenario, sorted() will carry out an ascending sort on the first element of each tuple. We then just rebuild our original arrays

Related

Python Equivalent for R's order function

According to this post np.argsort() would be the function I am looking for.
However, this is not giving me my desire result.
Below is the R code that I am trying to convert to Python and my current Python code.
R Code
data.frame %>% select(order(colnames(.)))
Python Code
dataframe.iloc[numpy.array(dataframe.columns).argsort()]
The dataframe I am working with is 1,000,000+ rows and 42 columns, so I can not exactly re-create the output.
But I believe I can re-create the order() outputs.
From my understanding each number represents the original position in the columns list
order(colnames(data.frame)) returns
3,2,5,6,8,4,7,10,9,11,12,13,14,15,16,17,18,19,23,20,21,22,1,25,26,28,24,27,38,29,34,33,36,30,31,32,35,41,42,39,40,37
numpy.array(dataframe.columns).argsort() returns
2,4,5,7,3,6,9,8,10,11,12,13,14,15,16,17,18,22,19,20,21,0,24,25,27,23,26,37,28,33,32,35,29,30,31,34,40,41,38,39,36,1
I know R does not have 0 index like python, so I know the first two numbers 3 and 2 are the same.
I am looking for python code that could potentially return the same ordering at the R code.
Do you have mixed case? This is handled differently in python and R.
R:
order(c('a', 'b', 'B', 'A', 'c'))
# [1] 1 4 2 3 5
x <- c('a', 'b', 'B', 'A', 'c')
x[order(c('a', 'b', 'B', 'A', 'c'))]
# [1] "a" "A" "b" "B" "c"
Python:
np.argsort(['a', 'b', 'B', 'A', 'c'])+1
# array([4, 3, 1, 2, 5])
x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.argsort(x)]
# array(['A', 'B', 'a', 'b', 'c'], dtype='<U1')
You can mimick R's behavior using numpy.lexsort and sorting by lowercase, then by the original array with swapped case:
x = np.array(['a', 'b', 'B', 'A', 'c'])
x[np.lexsort([np.char.swapcase(x), np.char.lower(x)])]
# array(['a', 'A', 'b', 'B', 'c'], dtype='<U1')
np.argsort is the same thing as R's order.
Just experiment
> x=c(1,2,3,10,20,30,5,15,25,35)
> x
[1] 1 2 3 10 20 30 5 15 25 35
> order(x)
[1] 1 2 3 7 4 8 5 9 6 10
>>> x=np.array([1,2,3,10,20,30,5,15,25,35])
>>> x
array([ 1, 2, 3, 10, 20, 30, 5, 15, 25, 35])
>>> x.argsort()+1
array([ 1, 2, 3, 7, 4, 8, 5, 9, 6, 10])
+1 here is just to have index starting with 1, since output of argsort are index (0-based index).
So maybe the problem comes from your columns (shot in the dark: you have 2d-arrays, and are passing lines to R and columns to python, or something like that).
But np.argsort is R's order.

Write many columns (more than 20)

I want to know an efficient and simple way to write many columns to a single file with Python.
For example, I have a and b arrays with a size of 20 for N rows.
each row has a different a and b.
I would like to write a file with a format like this:
Names of each column
0 a[0] b[0] a[1] b[1] ... a[19] b[19]
1 a[0] b[0] a[1] b[1] ....a[19] b[19]
I can only think this way:
data = open(output_filename,'w')
for i in range(0, N):
data.write('{} {} {} ...\n'.format(i, a[0], b[0], ....))
import numpy
#Assuming a, b are numpy arrays, else convert them accordingly:
# a= np.array(a)
# b= np.array(b)
c = np.zeros((100,40))
for i in range(20):
c[:, 2*i] = a[:,i]
c[:, 2*i+1] = b[:,i]
np.savetxt("test.txt",c)
This is the simplest way I could think of it.
If I understand correctly, you want to interleave two lists. You can do that with zip and some post-processing of it:
>>> a = ['a', 'b', 'c']
>>> b = ['d', 'e', 'f']
>>> print(list(zip(a, b)))
[('a', 'd'), ('b', 'e'), ('c', 'f')]
>>> from itertools import chain
>>> print(list(chain.from_iterable(zip(a, b))))
['a', 'd', 'b', 'e', 'c', 'f']
>>> print(' '.join(chain.from_iterable(zip(a, b))))
a d b e c f
You probably want to apply that something like this:
data.write('{} {}\n'.format(i, ' '.join(chain.from_iterable(zip(a, b)))))

Placing values of a smaller list at appropriate places in a larger list

I need to enter the values of a smaller list at appropriate locations in a larger list, e.g.,
l_1 = ['a', 'c', 'e', 'd']
v_1 = [5, 6, 8, 10]
l_2 = ['a', 'ab', 'c', 'd', 'e']
I am looking to generate a v_2 of values (initialized as zeros of size l_2) which takes v_1 at locations where l_1 belongs to l_2. So, for the above example, I would like to get
v_2 = [5, 0, 6, 10, 8]
Clearly, l_1 is a subset of l_2, i.e., l_2 will always have the quantities of l_1.
I found the first answer here helpful to determine the location of l_1 in l_2 and am looking to modify it to suit my case. Ideally, I would like to have 3 inputs
a = l_1
b = l_2
val = v_1
def ismember(a, b, val):
bind = {}
for i, elt in enumerate(b):
if elt not in bind:
bind[elt] = i
return [bind.get(itm, None) for itm in a]
And I need to get the return statement modified so that appropriate entries of v_1 are entered into the padded v_2, which, can be initialized as np.zeros((len(b),1)) within the function
It is much easier to construct a dict for lookups using the values from l_1 as keys and v_1 as values. For example:
>>> l_1 = ['a', 'c', 'e', 'd']
>>> v_1 = [5, 6, 8, 10]
>>> l_2 = ['a', 'ab', 'c', 'd', 'e']
then
>>> d = dict(zip(l_1, v_1))
>>> [d.get(i, 0) for i in l_2]
[5, 0, 6, 10, 8]

Accessing index of particular values in Python

I have a matrix of coordinates (numpy arrays)
arr = [[a,b,c],
[d,e,f],
......]]
where every tuple is unique, but a,b,c,d,e,f are not.
I'm wondering how to obtain the index at which
arr == [d,e,f]
I'm using
np.where(arr==[d,e,f])
but it returns a whole mess of values at which other individual elements are true.
For example,
vals = arr==[d,e,f]
returns
vals = [[False,False,False],
[True,True,True],
...............]]
But doing
np.where(vals==[True,True,True])
returns the other elements that contain only one or two trues, as well as the three trues. I just want the one tuple with all three trues.
You can get the indices of the rows that has all Trues by using numpy.all on 1st axis:
>>> arr1 = np.array(['d', 'e', 'f'])
>>> arr2 = np.array([['a' , 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', 'i']])
>>> np.all(arr2==arr1, axis=1)
array([False, True, False], dtype=bool)
# Now get the indices using `numpy.where`
>>> np.where(np.all(arr2==arr1, axis=1))[0]
array([1])
>>> arr2[_]
array([['d', 'e', 'f']],
dtype='|S1')

Trouble find value in list of lists

I have two lists. The first is a_list and is like this:
a_list = [1,2,3]
The second is b_list, and it's a list with lists in it. It's like this:
b_list = [['a',1,'b'],['c',2,'g'],['e',3,'5']
What I'm trying to do is use a_list to find the correct b_list and print the value[2] in the b_list.
My code looks like:
for a in a_list:
for b in b_list:
if b[1] == a:
print b[2]
The actually a_list has 136 values in it. And the real b_list has 315 lists in it.
I had initially written code to index the b item and remove it from b_list if b[1] == a.
I've taken that code out in order to solve the real problem.
There is no need to loop over a_list; a simple in test would suffice:
for b in b_list:
if b[1] in a_list:
print b[2]
This would perform better if you made a_list a set:
a_set = set(a_list)
for b in b_list:
if b[1] in a_set:
print b[2]
Either way, this code prints:
b
g
5
for your example data.
If I understood correctly what you want to do:
a_list = [1,2,3,5]
b_list = [['a',1,'b'],['c',2,'g'],['e',3,'5'],
['d',4,'h'],['Z',5,'X'],['m',6,'i']]
print 'a_list ==',a_list
print '\nb_list before :\n',b_list
print '\nEnumerating b_list in reversed order :'
L = len(b_list)
print (' i el L-i b_list[L-i] \n'
' -------------------------------------')
for i,el in enumerate(b_list[::-1],1):
print ' %d %r %d %r' % (i,el,L-i,b_list[L-i])
L = len(b_list)
for i,el in enumerate(b_list[::-1],1):
if el[1] in a_list:
del b_list[L-i]
print '\nb_list after :\n',b_list
result
a_list == [1, 2, 3, 5]
b_list before :
[['a', 1, 'b'], ['c', 2, 'g'], ['e', 3, '5'],
['d', 4, 'h'], ['Z', 5, 'X'], ['m', 6, 'i']]
Enumerating b_list in reversed order :
i el L-i b_list[L-i]
-------------------------------------
1 ['m', 6, 'i'] 5 ['m', 6, 'i']
2 ['Z', 5, 'X'] 4 ['Z', 5, 'X']
3 ['d', 4, 'h'] 3 ['d', 4, 'h']
4 ['e', 3, '5'] 2 ['e', 3, '5']
5 ['c', 2, 'g'] 1 ['c', 2, 'g']
6 ['a', 1, 'b'] 0 ['a', 1, 'b']
b_list after :
[['d', 4, 'h'], ['m', 6, 'i']]
The reason why it is necessary to iterate in b_list in reversed order is the one said by abarnert and explained hereafter by the doc:
Note: There is a subtlety when the sequence is being modified by the
loop (this can only occur for mutable sequences, i.e. lists). An
internal counter is used to keep track of which item is used next, and
this is incremented on each iteration. When this counter has reached
the length of the sequence the loop terminates. This means that if the
suite deletes the current (or a previous) item from the sequence, the
next item will be skipped (since it gets the index of the current item
which has already been treated). Likewise, if the suite inserts an
item in the sequence before the current item, the current item will be
treated again the next time through the loop. This can lead to nasty
bugs that can be avoided by making a temporary copy using a slice of
the whole sequence, e.g.,
for x in a[:]:
if x < 0: a.remove(x)
http://docs.python.org/2/reference/compound_stmts.html#the-for-statement

Categories