Python generator with numpy array - python

I'd like to create a generator that returns a array on fly. For example:
import numpy as np
def my_gen():
c = np.ones(5)
j = 0
t = 10
while j < t:
c[0] = j
yield c
j += 1
With a simple for loop:
for g in my_gen():
print (g)
I got what I want. But with list(my_gen()), I got a list which contains always the same thing.
I digged a little deeper and I find when I yield c.tolist() instead of yield c, everything went ok...
I just cannot explain myself how come this strange behaviour...

That is because c is always pointing to the same numpy array reference, you are just changing the element inside c in the generator function.
When simply printing, it prints the complete c array at that particular moment , hence you correctly get the values printed.
But when you are using list(my_gen()) , you keep adding the same reference to c numpy array into the list, and hence any changes to that numpy array also reflect in the previously added elements in the list.
It works for you when you do yield c.tolist() , because that creates a new list from the numpy array, hence you keep adding new list objects to the list and hence changes in the future to c does not reflect in the previously added lists.

An alternative generator returns a copy of a list. I'm retaining the np.ones() as a convenient way of creating the numbers, but converting it to a list right away (just once) (array.tolist() is relatively expensive).
I yield c[:] to avoid that 'current version' problem.
def gen_c():
c = np.ones(5,dtype=int).tolist()
j = 0
t = 10
while j < t:
c[0] = j
yield c[:]
j += 1
In [54]: list(gen_c())
Out[54]:
[[0, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1],
[5, 1, 1, 1, 1],
[6, 1, 1, 1, 1],
[7, 1, 1, 1, 1],
[8, 1, 1, 1, 1],
[9, 1, 1, 1, 1]]
In [55]: np.array(list(gen_c()))
Out[55]:
array([[0, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1],
[5, 1, 1, 1, 1],
[6, 1, 1, 1, 1],
[7, 1, 1, 1, 1],
[8, 1, 1, 1, 1],
[9, 1, 1, 1, 1]])

Ok, I think because in this generator, since I'm returning the same reference, generator yield always the same thing. If I yield np.array(c), that'll work...

Related

Create a PyTorch tensor of sequences which excludes specified value

I have a 1d PyTorch tensor containing integers between 0 and n-1. Now I need to create a 2d PyTorch tensor with n-1 columns, where each row is a sequence from 0 to n-1 excluding the value in the first tensor. How can I achieve this efficiently?
Ex:
n = 3
a = torch.Tensor([0, 1, 2, 1, 2, 0])
# desired output
b = [
[1, 2],
[0, 2],
[0, 1],
[0, 2],
[0, 1],
[1, 2]
]
Typically, the a.numel() >> n.
Detailed Explanation:
The first element of a is 0, hence it has to map to the sequence [0, 1, 2] excluding 0, which is [1, 2].
Similarly, the second element of a is 1, hence it has to map to [0, 2] and so on.
PS: I actually have an additional batch dimension, which I've excluded here for simplicity. Hence, I need the solution to be easily extendable to one additional dimension.
We can construct a tensor with the desired sequences and index with tensor a.
import torch
n = 3
a = torch.Tensor([0, 1, 2, 1, 2, 0]) # using torch.tensor is recommended
def exclude_gather(a, n):
sequences = torch.nonzero(torch.arange(n) != torch.arange(n)[:,None], as_tuple=True)[1].reshape(-1, n-1)
return sequences[a.long()]
exclude_gather(a, n)
Output
tensor([[1, 2],
[0, 2],
[0, 1],
[0, 2],
[0, 1],
[1, 2]])
We can add a batch dimension with functorch.vmap
from functorch import vmap
n = 4
b = torch.Tensor([[0, 1, 2, 1, 3, 0],[0, 3, 1, 0, 2, 1]])
vmap(exclude_gather, in_dims=(0, None))(b, n)
Output
tensor([[[1, 2, 3],
[0, 2, 3],
[0, 1, 3],
[0, 2, 3],
[0, 1, 2],
[1, 2, 3]],
[[1, 2, 3],
[0, 1, 2],
[0, 2, 3],
[1, 2, 3],
[0, 1, 3],
[0, 2, 3]]])
All you have to do is initialize a multi-dimension array with all possible indices using torch.arange(). After that, purge indices that you don't want from each tensor using a boolean mask.
import torch
a = torch.Tensor([0, 1, 2, 1, 2, 0])
n = 3
b = [torch.arange(n) for i in range(len(a))]
c = [b[i]!=a[i] for i in range(len(b))]
# use the boolean array as a mask to apply on b
d = [[b[i][c[i]] for i in range(len(b))]]
print(d) # this can be converted to a list of numbers or torch tensor
This prints the output - [[tensor([1, 2]), tensor([0, 2]), tensor([0, 1]), tensor([0, 2]), tensor([0, 1]), tensor([1, 2])]] which you can convert to int/numpy/torch array/tensor easily.
This can be extended to multiple dimensions as well.
The following does the trick
b = []
for i in range(n-1):
b.append(i * torch.ones_like(a) + (a <= i))
b = torch.stack(b, dim=1)
Since n << size(a), the for loop should not be very costly.

xarray.DataArray how get single element

I want to create a three-dimensional table that contains numbers. For two-dimensional data I would just use a csv-file or a pandas.DataFrame, but this appeared to be not so easy to use for three dimensions. I decided to use xarray. If you think, there are easier solutions, feel free to tell: I only want to create the array, save it in file and read it out for later use. Each column won't have more than 100 elements.
I create the array (and access it) using
import numpy as np
import xarray as xr
da =xr.DataArray(
np.ones((4,4,4),dtype=int),
[
("x-col", ['a','b','c','d']),
('y-col', ['A','B','C','D']),
('z-col', [1,2,3,4])
]
)
da.loc['a','C',2]=4
da
which gives
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 4, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]])
Coordinates:
x-col (x-col) <U1 'a' 'b' 'c' 'd'
y-col (y-col) <U1 'A' 'B' 'C' 'D'
z-col (z-col) int32 1 2 3 4
Attributes: (0)
as expected. However if I want to access the number using da.loc['a','C',2] I still get
array(4)
Coordinates:
x-col () <U1 'a'
y-col () <U1 'C'
z-col () int32 2
Attributes: (0)
Is there a way such that I get the number 4 without the wrapping?
Also, can you propose an elegant method how to store the DataArray to disk so I can use it later?

Matrix code not working even is similar to other

I was trying to create a code for a identity matrix and came out with this code:
def identidade(n):
i =0
l = [0] * n
l1 = [l.copy()] *n
for i in range (n):
l1[i][i] = 1
print(l1)
return l1
the output is:
[[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]]
But i came along with a very similar code in the internet:
def identity(n):
m=[[0 for x in range(n)] for y in range(n)]
for i in range(0,n):
m[i][i] = 1
return m
that returns:
[[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]]
So, my question is why my code doesn't return the correct output when selecting the element in the list of lists (l1[i][i] = 1) ?
tks in advance
The actual problem here is that you are using the * operator to create (as you hope) the copies of the '[l.copy()]' list, but it actually creates references. Using the copy() inside of the square brackets just breaks the connection to the original 'l' list, but does not solve the problem with creation of references to the newly created copy.
Just try to replace the * operator with for loop - this will solve your problem.

Subset Sum in python using DP

I'm following this link to write a DP solution for Subset problem.
def subsetSum(input, target):
row, col = len(input)+1, target+1
db = [[False] * col] * row
for i in range(row):
db[i][0] = True
for i in range(1, row):
for j in range(1, col):
db[i][j]=db[i-1][j]
if db[i][j]==False and j>=input[i-1]:
db[i][j] = db[i-1][j-input[i-1]]
return db[i][j]
target = 5
input = [1,3,9,2]
subsetSum(input, target)
Interestingly after every iteration of "j", db[i-1] (the previous row where we are referring to the values) is also getting updated. I'm really lost whats happening here. Please suggest.
Please find this link for the printed statements.
The issue is in this line
db = [[False] * col] * row.
When you use the * operator, a copy of the original list is made that refers to the original list.
Consider the following example:
l = [[1]*5]*3
print(l) # prints [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
l[0][0] = 0
print(l) # prints [[0, 1, 1, 1, 1], [0, 1, 1, 1, 1], [0, 1, 1, 1, 1]]
Each inner list refers to the same object. Thus, when the first element of the first list is changed, all lists appear to change.
To remedy this, you can use a list comprehension:
l = [[1]*5 for _ in range(3)]
print(l) # prints [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
l[0][0] = 0
print(l) # prints [[0, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]]
Specifically, you can replace your assignment to db with the following:
db = [[False]*col for _ in range(row)].

Confounding recursive list append in Python

I'm trying to create a pair of functions that, given a list of "starting" numbers, will recursively add to each index position up to a defined maximum value (much in the same way that a odometer works in a car--each counter wheel increasing to 9 before resetting to 1 and carrying over onto the next wheel).
The code looks like this:
number_list = []
def counter(start, i, max_count):
if start[len(start)-1-i] < max_count:
start[len(start)-1-i] += 1
return(start, i, max_count)
else:
for j in range (len(start)):
if start[len(start)-1-i-j] == max_count:
start[len(start)-1-i-j] = 1
else:
start[len(start)-1-i-j] += 1
return(start, i, max_count)
def all_values(fresh_start, i, max_count):
number_list.append(fresh_start)
new_values = counter(fresh_start,i,max_count)
if new_values != None:
all_values(*new_values)
When I run all_values([1,1,1],0,3) and print number_list, though, I get:
[[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1], [1, 1, 1],
[1, 1, 1], [1, 1, 1], [1, 1, 1]]
Which is unfortunate. Doubly so knowing that if I replace the first line of all_values with
print(fresh_start)
I get exactly what I'm after:
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 1]
[1, 2, 2]
[1, 2, 3]
[1, 3, 1]
[1, 3, 2]
[1, 3, 3]
[2, 1, 1]
[2, 1, 2]
[2, 1, 3]
[2, 2, 1]
[2, 2, 2]
[2, 2, 3]
[2, 3, 1]
[2, 3, 2]
[2, 3, 3]
[3, 1, 1]
[3, 1, 2]
[3, 1, 3]
[3, 2, 1]
[3, 2, 2]
[3, 2, 3]
[3, 3, 1]
[3, 3, 2]
[3, 3, 3]
I have already tried making a copy of fresh_start (by way of temp = fresh_start) and appending that instead, but with no change in the output.
Can anyone offer any insight as to what I might do to fix my code? Feedback on how the problem could be simplified would be welcome as well.
Thanks a lot!
temp = fresh_start
does not make a copy. Appending doesn't make copies, assignment doesn't make copies, and pretty much anything that doesn't say it makes a copy doesn't make a copy. If you want a copy, slice it:
fresh_start[:]
is a copy.
Try the following in the Python interpreter:
>>> a = [1,1,1]
>>> b = []
>>> b.append(a)
>>> b.append(a)
>>> b.append(a)
>>> b
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
>>> b[2][2] = 2
>>> b
[[1, 1, 2], [1, 1, 2], [1, 1, 2]]
This is a simplified version of what's happening in your code. But why is it happening?
b.append(a) isn't actually making a copy of a and stuffing it into the array at b. It's making a reference to a. It's like a bookmark in a web browser: when you open a webpage using a bookmark, you expect to see the webpage as it is now, not as it was when you bookmarked it. But that also means that if you have multiple bookmarks to the same page, and that page changes, you'll see the changed version no matter which bookmark you follow.
It's the same story with temp = a, and for that matter, a = [1,1,1]. temp and a are "bookmarks" to a particular array which happens to contain three ones. And b in the example above, is a bookmark to an array... which contains three bookmarks to that same array that contains three ones.
So what you do is create a new array and copy in the elements of the old array. The quickest way to do that is to take an array slice containing the whole array, as user2357112 demonstrated:
>>> a = [1,1,1]
>>> b = []
>>> b.append(a[:])
>>> b.append(a[:])
>>> b.append(a[:])
>>> b
[[1, 1, 1], [1, 1, 1], [1, 1, 1]]
>>> b[2][2] = 2
>>> b
[[1, 1, 1], [1, 1, 1], [1, 1, 2]]
Much better.
When I look at the desired output I can't help but think about using one of the numpy grid data production functions.
import numpy
first_column, second_column, third_column = numpy.mgrid[1:4,1:4,1:4]
numpy.dstack((first_column.flatten(),second_column.flatten(),third_column.flatten()))
Out[23]:
array([[[1, 1, 1],
[1, 1, 2],
[1, 1, 3],
[1, 2, 1],
[1, 2, 2],
[1, 2, 3],
[1, 3, 1],
[1, 3, 2],
[1, 3, 3],
[2, 1, 1],
[2, 1, 2],
[2, 1, 3],
[2, 2, 1],
[2, 2, 2],
[2, 2, 3],
[2, 3, 1],
[2, 3, 2],
[2, 3, 3],
[3, 1, 1],
[3, 1, 2],
[3, 1, 3],
[3, 2, 1],
[3, 2, 2],
[3, 2, 3],
[3, 3, 1],
[3, 3, 2],
[3, 3, 3]]])
Of course, the utility of this particular approach might depend on the variety of input you need to deal with, but I suspect this could be an interesting way to build the data and numpy is pretty fast for this kind of thing. Presumably if your input list has more elements you could have more min:max arguments fed into mgrid[] and then unpack / stack in a similar fashion.
Here is a simplified version of your program, which works. Comments will follow.
number_list = []
def _adjust_counter_value(counter, n, max_count):
"""
We want the counter to go from 1 to max_count, then start over at 1.
This function adds n to the counter and then returns a tuple:
(new_counter_value, carry_to_next_counter)
"""
assert max_count >= 1
assert 1 <= counter <= max_count
# Counter is in closed range: [1, max_count]
# Subtract 1 so expected value is in closed range [0, max_count - 1]
x = counter - 1 + n
carry, x = divmod(x, max_count)
# Add 1 so expected value is in closed range [1, max_count]
counter = x + 1
return (counter, carry)
def increment_counter(start, i, max_count):
last = len(start) - 1 - i
copy = start[:] # make a copy of the start
add = 1 # start by adding 1 to index
for i_cur in range(last, -1, -1):
copy[i_cur], add = _adjust_counter_value(copy[i_cur], add, max_count)
if 0 == add:
return (copy, i, max_count)
else:
# if we have a carry out of the 0th position, we are done with the sequence
return None
def all_values(fresh_start, i, max_count):
number_list.append(fresh_start)
new_values = increment_counter(fresh_start,i,max_count)
if new_values != None:
all_values(*new_values)
all_values([1,1,1],0,3)
import itertools as it
correct = [list(tup) for tup in it.product(range(1,4), range(1,4), range(1,4))]
assert number_list == correct
Since you want the counters to go from 1 through max_count inclusive, it's a little bit tricky to update each counter. Your original solution was to use several if statements, but here I have made a helper function that uses divmod() to compute each new digit. This lets us add any increment to any digit and will find the correct carry out of the digit.
Your original program never changed the value of i so my revised one doesn't either. You could simplify the program further by getting rid of i and just having increment_counter() always go to the last position.
If you run a for loop to the end without calling break or return, the else: case will then run if there is one present. Here I added an else: case to handle a carry out of the 0th place in the list. If there is a carry out of the 0th place, that means we have reached the end of the counter sequence. In this case we return None.
Your original program is kind of tricky. It has two explicit return statements in counter() and an implicit return at the end of the sequence. It does return None to signal that the recursion can stop, but the way it does it is too tricky for my taste. I recommend using an explicit return None as I showed.
Note that Python has a module itertools that includes a way to generate a counter series like this. I used it to check that the result is correct.
I'm sure you are writing this to learn about recursion, but be advised that Python isn't the best language for recursive solutions like this one. Python has a relatively shallow recursion stack, and does not automatically turn tail recursion into an iterative loop, so this could cause a stack overflow inside Python if your recursive calls nest enough times. The best solution in Python would be to use itertools.product() as I did to just directly generate the desired counter sequence.
Since your generated sequence is a list of lists, and itertools.product() produces tuples, I used a list comprehension to convert each tuple into a list, so the end result is a list of lists, and we can simply use the Python == operator to compare them.

Categories