merge arrays in python based on a similar value - python

I want to merge two arrays in python based on the first element in each column of each array.
For example,
A = ([[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[5, 9, 1]])
B = ([[1, .002],
[4, .005],
[5, .006]])
So that I get an array
C = ([[1, 2, 3, .002],
[4, 5, 6, .005],
[4, 6, 7, .005],
[5, 7, 8, .006],
[5, 9, 1, .006]])
For more clarity:
First column in A is 1, 4, 4, 5, 5 and
First column of B is 1, 4, 5
So that 1 in A matches up with 1 in B and gets .002
How would I do this in python? Any suggestions would be great.

Is it Ok to modify A in place?:
d = dict((x[0],x[1:]) for x in B)
Now d is a dictionary where the first column are keys and the subsequent columns are values.
for lst in A:
if lst[0] in d: #Is the first value something that we can extend?
lst.extend(d[lst[0]])
print A
To do it out of place (inspired by the answer by Ashwini):
d = dict((x[0],x[1:]) for x in B)
C = [lst + d.get(lst[0],[]) for lst in A]
However, with this approach, you need to have lists in both A and B. If you have some lists and some tuples it'll fail (although it could be worked around if you needed to), but it will complicate the code slightly.
with either of these answers, B can have an arbitrary number of columns
As a side note on style: I would write the lists as:
A = [[1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[5, 9, 1]]
Where I've dropped the parenthesis ... They make it look too much like you're putting a list in a tuple. Python's automatic line continuation happens with parenthesis (), square brackets [] or braces {}.

(This answer assumes these are just regular lists. If they’re NumPy arrays, you have more options.)
It looks like you want to use B as a lookup table to find values to add to each row of A.
I would start by making a dictionary out of the data in B. As it happens, B is already in just the right form to be passed to the dict() builtin:
B_dict = dict(B)
Then you just need to build C row by row.
For each row in A, row[0] is the first element, so B_dict[row[0]] is the value you want to add to the end of the row. Therefore row + [B_dict[row[0]] is the row you want to add to C.
Here is a list comprehension that builds C from A and B_dict.
C = [row + [B_dict[row[0]]] for row in A]

You can convert B to a dictionary first, with the first element of each sublist as key and second one as value.
Then simply iterate over A and append the related value fetched from the dict.
In [114]: A = ([1, 2, 3],
[4, 5, 6],
[4, 6, 7],
[5, 7, 8],
[6, 9, 1])
In [115]: B = ([1, .002],
[4, .005],
[5, .006])
In [116]: [x + [dic[x[0]]] if x[0] in dic else [] for x in A]
Out[116]:
[[1, 2, 3, 0.002],
[4, 5, 6, 0.005],
[4, 6, 7, 0.005],
[5, 7, 8, 0.006],
[6, 9, 1]]

Here is a solution using itertools.product() that prevents having to create a dictionary for B:
In [1]: from itertools import product
In [2]: [lst_a + lst_b[1:] for (lst_a, lst_b) in product(A, B) if lst_a[0] == lst_b[0]]
Out[2]:
[[1, 2, 3, 0.002],
[4, 5, 6, 0.005],
[4, 6, 7, 0.005],
[5, 7, 8, 0.006],
[5, 9, 1, 0.006]]

The naive, simple way:
for alist in A:
for blist in B:
if blist[0] == alist[0]:
alist.extend(blist[1:])
# alist.append(blist[1]) if B will only ever contain 2-tuples.
break # Remove this if you want to append more than one.
The downside here is that it's O(N^2) complexity. For most small data sets, that should be ok. If you're looking for something more comprehensive, you'll probably want to look at #mgilson's answer. Some comparison:
His response converts everything in B to a dict and performs list slicing on each element. If you have a lot of values in B, that could be expensive. This uses the existing lists (you're only looking at the first value, anyway).
Because he's using dicts, he gets O(1) lookup times (his answer also assumes that you're never going to append multiple values to the end of the values in A). That means overall, his algorithm will achieve O(N). You'll need to weigh whether the overhead of creating a dict is going to outweight the iteration of the values in B.

Related

Numpy increment array indexed array? [duplicate]

I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]

Increase numpy array elements using array as index

I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]

Python Sorting List of List

Suppose I have this list:
newlis = [[3, 6, 4, 10], [1, 9, 2, 5], [0, 7, 8]]
I want to sort it in a way that each list is sorted. For instance:
newlis = [[3, 4, 6, 10], [1, 2, 5, 9], [0, 7, 8]]
I tried to write this code:
for i in range(len(newlis)):
if j in newlis[i] < newlis[i+1]:
newlis[i],newlis[i+1]=newlis[i+1],newlis[i]
print newlis
It's not working though. Can someone please help me out? Built in function are not allowed.
There are many things wrong here (among which is that this sounds like a homework question and we aren't supposed to respond to those) but I will give you some helpful advice:
You are comparing element J in list I to list I + 1.
You would want to compare element J in list I to element J + 1 in list I.
Also, you appear to be attempting to sort backwards. You will end up with large left and small right.
Also this is not a sorting algorithm. What happens when you have an array like
[3,6,4,10] => [6,4,10,3]
which is still not ordered, at all. Sorting algorithms are simple, but not that simple. I recommend looking them up.
In if j in newlis[i] < newlis[i+1]:, you are comparing sublists and not the elements of the sublists itself. You need two loops, one for iterating on newlis, and one for sorting the elements of each sublist of newlis.
A sample using Bubble Sort:
You can test it here:
>>> newlis = [[3, 6, 4, 10], [1, 9, 2, 5], [0, 7, 8]]
>>> for sublist in newlis:
... for i in range(len(sublist) - 1):
... if sublist[i] > sublist[i + 1]:
... sublist[i], sublist[i + 1] = sublist[i + 1], sublist[i]
>>> print(newlis)
[[3, 4, 6, 10], [1, 2, 5, 9], [0, 7, 8]]
Links about Bubble Sort:
http://www-ee.eng.hawaii.edu/~tep/EE160/Book/chap10/subsection2.1.2.2.html
http://www.go4expert.com/articles/bubble-sort-algorithm-absolute-beginners-t27883/

Remove elements from several lists simultaneously

I have three lists with the same length and another list that stores indexes of elements that I need to remove from all three lists. This is an example of what I mean:
a = [3,4,5,12,6,8,78,5,6]
b = [6,4,1,2,8,784,43,6,2]
c = [8,4,32,6,1,7,2,9,23]
(all have len()=9)
The other list contains the indexes of those elements I need to remove from all three lists:
d = [8,5,3]
(note that it is already sorted)
I know I can remove one element at the time from the three lists with:
for indx in d:
del a[indx]
del b[indx]
del c[indx]
How could I do this in one single line?
Not one line, but concise, readable, and completely idiomatic Python:
for indx in d:
for x in a, b, c:
del x[indx]
However, the fact that you're doing this in the first place implies that maybe rather than 3 separate list variables, you should have a list of 3 lists, or a dict of three lists keyed by the names 'a', 'b', and 'c'.
If you really want it in one line:
for indx in d: a.pop(indx), b.pop(indx), c.pop(indx)
But that's really terrible. You're calling pop when you don't care about the values, and building up a tuple you don't need.
If you want to play code golf, you can save a few characters by using a list comprehension—which adds one more language abuse, and builds another, larger object you don't actually want—as in Ioan Alexandru Cucu's answer:
[x.pop(indx) for indx in d for x in a, b, c]
Of course the best way to write it in one line is to factor it out into a function:
def delemall(indices, *lists):
for index in indices:
for x in lists:
del x[indx]
And now, each of the 300 times you need to do this, it's just:
delemall(d, a, b, c)
Maybe numpy is useful for something like this, if your three lists were a 2D numpy.array deleting specified columns would be very easy.
a = [3,4,5,12,6,8,78,5,6]
b = [6,4,1,2,8,784,43,6,2]
c = [8,4,32,6,1,7,2,9,23]
big_array = np.array([a,b,c])
d = [8,5,3]
Result:
>>> big_array
array([[ 3, 4, 5, 12, 6, 8, 78, 5, 6],
[ 6, 4, 1, 2, 8, 784, 43, 6, 2],
[ 8, 4, 32, 6, 1, 7, 2, 9, 23]])
>>> np.delete(big_array, d, axis=1)
array([[ 3, 4, 5, 6, 78, 5],
[ 6, 4, 1, 8, 43, 6],
[ 8, 4, 32, 1, 2, 9]])
I think just your code is OK, to make it a single line:
In [234]: for i in d: del a[i], b[i], c[i]
In [235]: a,b,c
Out[235]: ([3, 4, 5, 6, 78, 5], [6, 4, 1, 8, 43, 6], [8, 4, 32, 1, 2, 9])
but I still like leaving that for loop two lines ;)
import operator
a = [3,4,5,12,6,8,78,5,6]
b = [6,4,1,2,8,784,43,6,2]
c = [8,4,32,6,1,7,2,9,23]
d = [8,5,3]
for _ in (operator.delitem(q,i) for q in (a,b,c) for i in d): pass
print(a,b,c)

sort 2-D list python

I'm relatively new to programming, and I want to sort a 2-D array (lists as they're called in Python) by the value of all the items in each sub-array. For example:
pop = [[1,5,3],[1,1,1],[7,5,8],[2,5,4]]
The sum of the first element of pop would be 9, because 1 + 5 + 3 = 9. The sum of the second would be 3, because 1 + 1 + 1 = 3, and so on.
I want to rearrange this so the new order would be:
newPop = [pop[1], pop[0], pop[3], pop[2]]
How would I do this?
Note: I don't want to sort the elements each sub-array, but sort according to the sum of all the numbers in each sub-array.
You can use sorted():
>>> pop = [[1,5,3],[1,1,1],[7,5,8],[2,5,4]]
>>> newPop = sorted(pop, key=sum)
>>> newPop
[[1, 1, 1], [1, 5, 3], [2, 5, 4], [7, 5, 8]]
You can also sort in-place with pop.sort(key=sum). Unless you definitely want to preserve the original list, you should prefer in-pace sorting.
Try this:
sorted(pop, key=sum)
Explanation:
The sorted() procedure sorts an iterable (a list in this case) in ascending order
Optionally, a key parameter can be passed to determine what property of the elements in the list is going to be used for sorting
In this case, the property is the sum of each of the elements (which are sublists)
So essentially this is what's happening:
[[1,5,3], [1,1,1], [7,5,8], [2,5,4]] # original list
[sum([1,5,3]), sum([1,1,1]), sum([7,5,8]), sum([2,5,4])] # key=sum
[9, 3, 20, 11] # apply key
sorted([9, 3, 20, 11]) # sort
[3, 9, 11, 20] # sorted
[[1,1,1], [1,5,3], [2,5,4], [7,5,8]] # elements coresponding to keys
#arshajii beat me to the punch, and his answer is good. However, if you would prefer an in-place sort:
>>> pop = [[1,5,3],[1,1,1],[7,5,8],[2,5,4]]
>>> pop.sort(key=sum)
>>> pop
[[1, 1, 1], [1, 5, 3], [2, 5, 4], [7, 5, 8]]
I have to look up Python's sorting algorithm -- I think it's called Timsort, bit I'm pretty sure an in-place sort would be less memory intensive and about the same speed.
Edit: As per this answer, I would definitely recommend x.sort()
If you wanted to sort the lists in a less traditional way, you could write your own function (that takes one parameter.) At risk of starting a flame war, I would heavily advise against lambda.
For example, if you wanted the first number to be weighted more heavily than the second number more heavily than the third number, etc:
>>> def weightedSum(listToSum):
... ws = 0
... weight = len(listToSum)
... for i in listToSum:
... ws += i * weight
... weight -= 1
... return ws
...
>>> weightedSum([1, 2, 3])
10
>>> 1 * 3 + 2 * 2 + 3 * 1
10
>>> pop
[[1, 5, 3], [1, 1, 1], [7, 5, 8], [2, 5, 4]]
>>> pop.sort(key=weightedSum)
>>> pop
[[1, 1, 1], [1, 5, 3], [2, 5, 4], [7, 5, 8]]
>>> pop += [[1, 3, 8]]
>>> pop.sort(key=weightedSum)
>>> pop
[[1, 1, 1], [1, 5, 3], [1, 3, 8], [2, 5, 4], [7, 5, 8]]

Categories