zero padding numpy array - python

Suppose I have a list contains un-equal length lists.
a = [ [ 1, 2, 3], [2], [2, 4] ]
What is the best way to obtain a zero padding numpy array with standard shape?
zero_a = [ [1, 2, 3], [2, 0, 0], [2, 4, 0] ]
I know I can use list operation like
n = max( map( len, a ) )
map( lambda x : x.extend( [0] * (n-len(x)) ), a )
zero_a = np.array(zero_a)
but I was wondering is there any easy numpy way to do this work?

As numpy have to know size of an array just prior to its initialization, best solution would be a numpy based constructor for such case. Sadly, as far as I know, there is none.
Probably not ideal, but slightly faster solution will be create numpy array with zeros and fill with list values.
import numpy as np
def pad_list(lst):
inner_max_len = max(map(len, lst))
map(lambda x: x.extend([0]*(inner_max_len-len(x))), lst)
return np.array(lst)
def apply_to_zeros(lst, dtype=np.int64):
inner_max_len = max(map(len, lst))
result = np.zeros([len(lst), inner_max_len], dtype)
for i, row in enumerate(lst):
for j, val in enumerate(row):
result[i][j] = val
return result
Test case:
>>> pad_list([[ 1, 2, 3], [2], [2, 4]])
array([[1, 2, 3],
[2, 0, 0],
[2, 4, 0]])
>>> apply_to_zeros([[ 1, 2, 3], [2], [2, 4]])
array([[1, 2, 3],
[2, 0, 0],
[2, 4, 0]])
Performance:
>>> timeit.timeit('from __main__ import pad_list as f; f([[ 1, 2, 3], [2], [2, 4]])', number = 10000)
0.3937079906463623
>>> timeit.timeit('from __main__ import apply_to_zeros as f; f([[ 1, 2, 3], [2], [2, 4]])', number = 10000)
0.1344289779663086

Not strictly a function from numpy, but you could do something like this
from itertools import izip, izip_longest
import numpy
a=[[1,2,3], [4], [5,6]]
res1 = numpy.array(list(izip(*izip_longest(*a, fillvalue=0))))
or, alternatively:
res2=numpy.array(list(izip_longest(*a, fillvalue=0))).transpose()
If you use python 3, use zip, and itertools.zip_longest.

Related

Append indices of element to each element

So basically I want to create a new array for each element and append the coordinates of the element to the original value (so adding the x and y position to the original element):
[ [7,2,4],[1,5,3] ]
then becomes
[ [[0,0,7][0,1,2][0,2,4]],
[[1,0,1][1,1,5][1,2,3]] ]
I've been looking for different ways to make this work with the axis system in NumPy but I'm probably overseeing some more obvious way.
You can try np.meshgrid to create a grid and then np.stack to combine it with input array:
import numpy as np
a = np.asarray([[7,2,4],[1,5,3]])
result = np.stack(np.meshgrid(range(a.shape[1]), range(a.shape[0]))[::-1] + [a], axis=-1)
Output:
array([[[0, 0, 7],
[0, 1, 2],
[0, 2, 4]],
[[1, 0, 1],
[1, 1, 5],
[1, 2, 3]]])
Let me know if it helps.
Without numpy you could use list comprehension:
old_list = [ [7,2,4],[1,5,3] ]
new_list = [ [[i,j,old_list[i][j]] for j in range(len(old_list[i]))] for i in range(old_list) ]
I'd assume that numpy is faster but the sublists are not required to have equal length in this solution.
Another approach using enumerate
In [38]: merge = list()
...: for i,j in enumerate(val):
...: merge.append([[i, m, n] for m, n in enumerate(j)])
...:
In [39]: merge
Out[39]: [[[0, 0, 7], [0, 1, 2], [0, 2, 4]], [[1, 0, 1], [1, 1, 5], [1, 2, 3]]]
Hope it useful
a = np.array([[7,2,4], [1,5,3]])
idx = np.argwhere(a)
idx = idx.reshape((*(a.shape), -1))
a = np.expand_dims(a, axis=-1)
a = np.concatenate((idx, a), axis=-1)

How to find a row-wise intersection of 2d numpy arrays?

I look for an efficient way to get a row-wise intersection of two two-dimensional numpy ndarrays. There is only one intersection per row. For example:
[[1, 2], ∩ [[0, 1], -> [1,
[3, 4]] [0, 3]] 3]
In the best case zeros should be ignored:
[[1, 2, 0], ∩ [[0, 1, 0], -> [1,
[3, 4, 0]] [0, 3, 0]] 3]
My solution:
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[0, 1],
[0, 3]])
arr3 = np.empty(len(arr1))
for i in range(len(arr1)):
arr3[i] = np.intersect1d(arr1[i], arr2[i])
print(arr3)
# [ 1. 3.]
I have about 1 million rows, so the vectorized operations are most preferred. You are welcome to use other python packages.
You can use np.apply_along_axis.
I wrote a solution that pads to the size of the arr1.
Didn't test the efficiency.
import numpy as np
def intersect1d_padded(x):
x, y = np.split(x, 2)
padded_intersection = -1 * np.ones(x.shape, dtype=np.int)
intersection = np.intersect1d(x, y)
padded_intersection[:intersection.shape[0]] = intersection
return padded_intersection
def rowwise_intersection(a, b):
return np.apply_along_axis(intersect1d_padded,
1, np.concatenate((a, b), axis=1))
result = rowwise_intersection(arr1,arr2)
>>> array([[ 1, -1],
[ 3, -1]])
if you know you have only one element in the intersection you can use
result = rowwise_intersection(arr1,arr2)[:,0]
>>> array([1, 3])
You can also modify intersect1d_padded to return a scalar with the intersection value.
I don't know of an elegant way to do it in numpy, but a simple list comprehension can do the trick:
[list(set.intersection(set(_x),set(_y)).difference({0})) for _x,_y in zip(x,y)]

Permuting characters in a string

Warning: this question is not what you think!
Suppose I have a string like this (Python):
'[[1, 2], [2, 3], [0, 3]]'
Now suppose further that I have the permutation of the characters 0, 1, 2, 3 which swaps 0 and 1, as well as (separately) 2 and 3. Then I would wish to obtain
'[[0, 3], [3, 2], [1, 2]]'
from this. As another example, suppose I want to use the more complicated permutation where 1 goes to 2, 2 goes to 3, and 3 goes to 1? Then I would desire the output
'[[2, 3], [3, 1], [0, 1]]'
Question: Given a permutation (encoded however one likes) of characters/integers 0 to n-1 and a string containing (some of) them, I would like a function which takes such a string and gives the appropriate resulting string where these characters/integers have been permuted - and nothing else.
I have been having a lot of difficult seeing whether there is some obvious use of re or even just indexing that will help me, because usually these replacements are sequential, which would obviously be bad in this case. Any help will be much appreciated, even if it makes me look dumb.
(If someone has an idea for the original list [[1, 2], [2, 3], [0, 3]], that is fine too, but that is a list of lists and presumably more annoying than a string, and the string would suffice for my purposes.)
Here's a simple solution using a regular expression with callback:
import re
s = '[[1, 2], [2, 3], [0, 3]]'
map = [3, 2, 1, 0]
print(re.sub('\d+', # substitute all numbers
lambda m : str(map[int(m.group(0))]), # ... with the mapping
s # ... for string s
)
)
# output: [[2, 1], [1, 0], [3, 0]]
Well I think in general you'll need to use a working memory copy of the resultant to avoid the sequential issue you mention. Also converting to some structured data format like an array to work in makes things much easier (you don't say so but your target string is clearly a stringified array so I'm taking that for granted). Here is one idea using eval and numpy:
import numpy as np
s = '[[2, 3], [3, 1], [0, 1]]'
a = np.array(eval(s))
print('before\n', a)
mymap = [1,2,3,0]
a = np.array([mymap[i] for i in a.flatten()]).reshape(a.shape)
print('after\n', a)
Gives:
before
[[2 3]
[3 1]
[0 1]]
after
[[3 0]
[0 2]
[1 2]]
permutation = {'0':'1', '1':'0', '2':'3', '3':'2'}
s = '[[1, 2], [2, 3], [0, 3]]'
rv = ''
for c in s:
rv += permutation.get(c, c)
print(rv)
?
You can build a mapping of your desired transformations:
import ast
d = ast.literal_eval('[[1, 2], [2, 3], [0, 3]]')
m = {1: 2, 2: 3, 3: 1}
new_d = [[m.get(i) if i in m else
(lambda x:i if not x else x[0])([a for a, b in m.items() if b == i]) for i in b] for b in d]
Output:
[[2, 3], [3, 1], [0, 1]]
For the first desired swap:
m = {0:1, 2:3}
d = ast.literal_eval('[[1, 2], [2, 3], [0, 3]]')
new_d = [[m.get(i) if i in m else
(lambda x:i if not x else x[0])([a for a, b in m.items() if b == i]) for i in b] for b in d]
Output:
[[0, 3], [3, 2], [1, 2]]
This is absolutely inelegant regarding the quality of this forum I confess but here is my suggestion just to help:
string = '[[1, 2], [2, 3], [0, 3]]'
numbers = dict(zero = 0, one = 1, two = 2, three=3, four = 4, five = 5, six=6, seven=7, height=8, nine = 9)
string = string.replace('0', 'one').replace('1', 'zero').replace('2','three').replace('3', 'two')
for x in numbers.keys():
string = string.replace(x, str(numbers[x]))
[[0, 3], [3, 2], [1, 2]]

Sum of two nested lists

I have two nested lists:
a = [[1, 1, 1], [1, 1, 1]]
b = [[2, 2, 2], [2, 2, 2]]
I want to make:
c = [[3, 3, 3], [3, 3, 3]]
I have been referencing the zip documentation, and researching other posts, but don't really understand how they work. Any help would be greatly appreciated!
You may use list comprehension with zip() as:
>>> a = [[1, 1, 1], [1, 1, 1]]
>>> b = [[2, 2, 2], [2, 2, 2]]
>>> [[i1+j1 for i1, j1 in zip(i,j)] for i, j in zip(a, b)]
[[3, 3, 3], [3, 3, 3]]
More generic way is to create a function as:
def my_sum(*nested_lists):
return [[sum(items) for items in zip(*zipped_list)] for zipped_list in zip(*nested_lists)]
which can accept any number of list. Sample run:
>>> a = [[1, 1, 1], [1, 1, 1]]
>>> b = [[2, 2, 2], [2, 2, 2]]
>>> c = [[3, 3, 3], [3, 3, 3]]
>>> my_sum(a, b, c)
[[6, 6, 6], [6, 6, 6]]
If you're going to do this a whole bunch, you'll be better off using numpy:
import numpy as np
a = [[1, 1, 1], [1, 1, 1]]
b = [[2, 2, 2], [2, 2, 2]]
aa = np.array(a)
bb = np.array(b)
c = aa + bb
Working with numpy arrays will be much more efficient than repeated uses of zip on lists. On top of that, numpy allows you to work with arrays much more expressively so the resulting code us usually much easier to read.
If you don't want the third party dependency, you'll need to do something a little different:
c = []
for a_sublist, b_sublist in zip(a, b):
c.append([a_sublist_item + b_sublist_item for a_sublist_item, b_sublist_item in zip(a_sublist, b_sublist)])
Hopefully the variable names make it clear enough what it going on here, but basically, each zip takes the inputs and combines them (one element from each input). We need 2 zips here -- the outermost zip pairs lists from a with lists from b whereas the inner zip pairs up individual elements from the sublists that were already paired.
I use python build-in function map() to do this.
If I have simple list a and b, I sum them as this way:
>>> a = [1,1,1]
>>> b = [2,2,2]
>>> map(lambda x, y: x + y, a, b)
[3, 3, 3]
If I have nested list a and b, I sum them as a similar way:
>>> a = [[1, 1, 1], [1, 1, 1]]
>>> b = [[2, 2, 2], [2, 2, 2]]
>>> map(lambda x, y: map(lambda i, j: i + j, x, y), a, b)
[[3, 3, 3], [3, 3, 3]]

Sort a 2D numpy array by the median value of the rows

If I have a 2D list in python, I can easily sort by the median value of each sublist like this:
import numpy as np
a = [[1,2,3],[1,1,1],[3,3,3,]]
a.sort(key=lambda x: np.median(x))
print a
Yielding...
[[1, 1, 1], [1, 2, 3], [3, 3, 3]]
Is there a way to do this with a numpy array without converting it to a regular python list?
a = np.array([[1,2,3],[1,1,1],[3,3,3,]])
a.sort(key=lambda x: np.median(x))
I guess the numpythonic way would be to use fancy-indexing:
>>> a = np.array([[1,2,3],[1,1,1],[3,3,3,]])
>>> a[np.median(a,axis=1).argsort()]
array([[1, 1, 1],
[1, 2, 3],
[3, 3, 3]])

Categories