Related
rows is a 343x30 matrix of real numbers. Im trying to append row vectors from rows to true rows and false rows but it only adds the first row and doesnt do anything afterwards. Ive tried vstack and also tried putting example as a 2d array ([example]) but it crashed my pycharm. what can I do?
true_rows = []
true_labels = []
false_rows = []
false_labels = []
i = 0
for example in rows:
if question.match(example):
true_rows = np.append(true_rows , example , axis=0)
true_labels.append(labels[i])
else:
#false_rows = np.vstack(false_rows, example_t)
false_rows = np.append(false_rows, example, axis=0)
false_labels.append(labels[i])
i += 1
you can use only a simple list to append your rows and then transform this list to numpy array such as :
exemple1 = np.array([1,2,3,4,5])
exemple2 = np.array([6,7,8,9,10])
exemple3 = np.array([11,12,13,14,15])
true_rows = []
true_rows.append(exemple1)
true_rows.append(exemple2)
true_rows.append(exemple3)
true_rows = np.array(true_rows)
you will get this results:
true_rows = array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
you can also use np.concatenate if you want to get one dimensional array like this:
true_rows = np.concatenate(true_rows , axis =0)
you will get this results:
true_rows = array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
Your use of [] and np.append suggests you are trying to imitate a common list append model with arrays. You atleast read enough of the np.append docs to know you need to use axis, and that it returns a new array (the docs are quite clear this is a copy).
But did you test this idea with a small example, and actually look at the results (step by step)?
In [326]: rows = []
In [327]: rows = np.append(rows, np.arange(3), axis=0)
In [328]: rows
Out[328]: array([0., 1., 2.])
In [329]: rows.shape
Out[329]: (3,)
the first append doesn't do anything - the result is the same as arange(3).
In [330]: rows = np.append(rows, np.arange(3), axis=0)
In [331]: rows
Out[331]: array([0., 1., 2., 0., 1., 2.])
In [332]: rows.shape
Out[332]: (6,)
Do you understand why? We join 2 1d arrays on axis 0, making a 1d.
Using [] as a starting point is the same starting with this array:
In [333]: np.array([])
Out[333]: array([], dtype=float64)
In [334]: np.array([]).shape
Out[334]: (0,)
And with axis, np.append is just a call to concatenate:
In [335]: np.concatenate(( [], np.arange(3)), axis=0)
Out[335]: array([0., 1., 2.])
np.append sort looks like list append, but it is not a clone. It's really just a poorly named way to use concatenate. And you can't use it properly without actually understanding dimensions. np.append has an example with an error much like what you got with concatentate.
Repeated use of these array concatenates in a loop is not a good idea. It's hard to get the dimensions right, as you found. And even when it works, it is slow, since each step makes a copy (which grows with the iteration).
That's why the other answer sticks with list append.
vstack is like concatenate with axis 0, but it makes sure all arguments are 2d. But if the number columns differ, it raise an error:
In [336]: np.vstack(( [],np.arange(3)))
Traceback (most recent call last):
File "<ipython-input-336-22038d6ef0f7>", line 1, in <module>
np.vstack(( [],np.arange(3)))
File "<__array_function__ internals>", line 180, in vstack
File "/usr/local/lib/python3.8/dist-packages/numpy/core/shape_base.py", line 282, in vstack
return _nx.concatenate(arrs, 0)
File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 0 and the array at index 1 has size 3
In [337]: np.vstack(( [0,0,0],np.arange(3)))
Out[337]:
array([[0, 0, 0],
[0, 1, 2]])
If all you are joining are rows of a (n,30) array, then you do know the column size of the result.
In [338]: res = np.zeros((0,3))
In [339]: np.vstack(( res, np.arange(3)))
Out[339]: array([[0., 1., 2.]])
If you pay attention to the shape details, it is possible to create an array iteratively.
But instead of collecting rows one by one, why not create a mask and do the collection once.
Roughly do
mask = np.array([question.match(example) for example in rows])
true_rows = rows[mask]
false_rows = rows[~mask]
this still requires an iteration, but overall should be faster.
Construct a 2D, 3x3 matrix with random numbers from 1 to 8 with no duplicates
import numpy as np
random_matrix = np.random.randint(0,10,size=(3,3))
print(random_matrix)
If you want an answer where we don't have to rely on numpy then you can do this:
import random
# Generates a randomized list between 0-9, where 0 is replaced by "#"
x = ["#" if i == 0 else i for i in random.sample(range(10), k=9)]
print(x)
# Slices the list into a 3x3 format
newx = [x[idx:idx+3] for idx in range(0, len(x), 3)]
print(newx)
Output:
[6, 2, 7, 4, '#', 8, 9, 1, 3]
[[6, 2, 7], [4, '#', 8], [9, 1, 3]]
import numpy
x = numpy.arange(0, 9)
numpy.random.shuffle(x)
x = numpy.reshape(x, (3,3))
print(numpy.where(x==0, '#', x))
Let me know, but with my solution, integers seems to be replaced by string.. i don't know if you care. Else, I will found an other solution
You can achieve your goal using a few steps:
Generate sequence of values (in some range) you would like to randomly select into matrix.
Take randomly some number of elements from this sequence to new sequence.
From this new sequence make matrix with wanted shape.
import numpy as np
from random import sample
#step one
values = range(0,11)
#step two
random_sequence = sample(values, 9)
#step three
random_matrix = np.array(random_sequence).reshape(3,3)
Because you sample some number of elements, from unique sequence, that guarantee you uniqueness of new sequence, and then matrix.
You can use np.random.choice with replace=False to generate the (3, 3) array:
np.random.choice(np.arange(9), size=(3, 3), replace=False)
Replacing 0 with np.nan:
>>> np.where(x, x, np.nan)
array([[ 4., 1., 3.],
[ 5., nan, 8.],
[ 2., 6., 7.]])
However, I think Hampus Larsson's answer is better, as this problem is not appropriate for numpy if you intend to replace 0 with the string "#".
you could use numpy but random is enough
import random
numbers = list(range(9))
random.shuffle(numbers)
my_list = [[numbers[i*3 + j] for j in range(0,3)] for i in range(0,3)]
I am writing a jury-rigged PyTorch version of scipy.linalg.toeplitz, which currently has the following form:
def toeplitz_torch(c, r=None):
c = torch.tensor(c).ravel()
if r is None:
r = torch.conj(c)
else:
r = torch.tensor(r).ravel()
# Flip c left to right.
idx = [i for i in range(c.size(0)-1, -1, -1)]
idx = torch.LongTensor(idx)
c = c.index_select(0, idx)
vals = torch.cat((c, r[1:]))
out_shp = len(c), len(r)
n = vals.stride(0)
return torch.as_strided(vals[len(c)-1:], size=out_shp, stride=(-n, n)).copy()
But torch.as_strided currently does not support negative strides. My function, therefore, throws the error:
RuntimeError: as_strided: Negative strides are not supported at the moment, got strides: [-1, 1].
My (perhaps incorrect) understanding of as_strided is that it inserts the values of the first argument into a new array whose size is specified by the second argument and it does so by linearly indexing those values in the original array and placing them at subscript-indexed strides given by the final argument.
Both the NumPy and PyTorch documentation concerning as_strided have scary warnings about using the function with "extreme care" and I don't understand this function fully, so I'd like to ask:
Is my understanding of as_strided correct?
Is there a simple way to rewrite this so negative strides work?
Will I be able to pass a gradient w.r.t c (or r) through toeplitz_torch?
> 1. Is my understanding of as_strided correct?
The stride is an interface for your tensor to access the underlying contiguous data buffer. It does not insert values, no copies of the values are done by torch.as_strided, the strides define the artificial layout of what we refer to as multi-dimensional array (in NumPy) or tensor (in PyTorch).
As Andreas K. puts it in another answer:
Strides are the number of bytes to jump over in the memory in order to get from one item to the next item along each direction/dimension of the array. In other words, it's the byte-separation between consecutive items for each dimension.
Please feel free to read the answers over there if you have some trouble with strides. Here we will take your example and look at how it is implemented with as_strided.
The example given by Scipy for linalg.toeplitz is the following:
>>> toeplitz([1,2,3], [1,4,5,6])
array([[1, 4, 5, 6],
[2, 1, 4, 5],
[3, 2, 1, 4]])
To do so they first construct the list of values (what we can refer to as the underlying values, not actually underlying data): vals which is constructed as [3 2 1 4 5 6], i.e. the Toeplitz column and row flattened.
Now notice the arguments passed to np.lib.stride_tricks.as_strided:
values: vals[len(c)-1:] notice the slice: the tensors show up smaller, yet the underlying values remain, and they correspond to those of vals. Go ahead and compare the two with storage_offset: it's just an offset of 2, the values are still there! How this works is that it essentially shifts the indices such that index=0 will refer to value 1, index=1 to 4, etc...
shape: given by the column/row inputs, here (3, 4). This is the shape of the resulting object.
strides: this is the most important piece: (-n, n), in this case (-1, 1)
The most intuitive thing to do with strides is to describe a mapping between the multi-dimensional space: (i, j) ∈ [0,3[ x [0,4[ and the flattened 1D space: k ∈ [0, 3*4[. Since the strides are equal to (-n, n) = (-1, 1), the mapping is -n*i + n*j = -1*i + 1*j = j-i. Mathematically you can describe your matrix as M[i, j] = F[j-i] where F is the flattened values vector [3 2 1 4 5 6].
For instance, let's try with i=1 and j=2. If you look at the Topleitz matrix above M[1, 2] = 4. Indeed F[k] = F[j-i] = F[1] = 4
If you look closely you will see the trick behind negative strides: they allow you to 'reference' to negative indices: for instance, if you take j=0 and i=2, then you see k=-2. Remember how vals was given with an offset of 2 by slicing vals[len(c)-1:]. If you look at its own underlying data storage it's still [3 2 1 4 5 6], but has an offset. The mapping for vals (in this case i: 1D -> k: 1D) would be M'[i] = F'[k] = F'[i+2] because of the offset. This means M'[-2] = F'[0] = 3.
In the above I defined M' as vals[len(c)-1:] which basically equivalent to the following tensor:
>>> torch.as_strided(vals, size=(len(vals)-2,), stride=(1,), storage_offset=2)
tensor([1, 4, 5, 6])
Similarly, I defined F' as the flattened vector of underlying values: [3 2 1 4 5 6].
The usage of strides is indeed a very clever way to define a Toeplitz matrix!
> 2. Is there a simple way to rewrite this so negative strides work?
The issue is, negative strides are not implemented in PyTorch... I don't believe there is a way around it with torch.as_strided, otherwise it would be rather easy to extend the current implementation and provide support for that feature.
There are however alternative ways to solve the problem. It is entirely possible to construct a Toeplitz matrix in PyTorch, but that won't be with torch.as_strided.
We will do the mapping ourselves: for each element of M indexed by (i, j), we will find out the corresponding index k which is simply j-i. This can be done with ease, first by gathering all (i, j) pairs from M:
>>> i, j = torch.ones(3, 4).nonzero().T
(tensor([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]),
tensor([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]))
Now we essentially have k:
>>> j-i
tensor([ 0, 1, 2, 3, -1, 0, 1, 2, -2, -1, 0, 1])
We just need to construct a flattened tensor of all possible values from the row r and column c inputs. Negative indexed values (the content of c) are put last and flipped:
>>> values = torch.cat((r, c[1:].flip(0)))
tensor([1, 4, 5, 6, 3, 2])
Finally index values with k and reshape:
>>> values[j-i].reshape(3, 4)
tensor([[1, 4, 5, 6],
[2, 1, 4, 5],
[3, 2, 1, 4]])
To sum it up, my proposed implementation would be:
def toeplitz(c, r):
vals = torch.cat((r, c[1:].flip(0)))
shape = len(c), len(r)
i, j = torch.ones(*shape).nonzero().T
return vals[j-i].reshape(*shape)
> 3. Will I be able to pass a gradient w.r.t c (or r) through toeplitz_torch?
That's an interesting question because torch.as_strided doesn't have a backward function implemented. This means you wouldn't have been able to backpropagate to c and r! With the above method, however, which uses 'backward-compatible' builtins, the backward pass comes free of charge.
Notice the grad_fn on the output:
>>> toeplitz(torch.tensor([1.,2.,3.], requires_grad=True),
torch.tensor([1.,4.,5.,6.], requires_grad=True))
tensor([[1., 4., 5., 6.],
[2., 1., 4., 5.],
[3., 2., 1., 4.]], grad_fn=<ViewBackward>)
This was a quick draft (that did take a little while to write down), I will make some edits. If you have some questions or remarks, don't hesitate to comment! I would be interested in seeing other answers as I am not an expert with strides, this is just my take on the problem.
Lets say I have the following arrays which contain the X and Y values for a bunch of vectors, respectively:
xdat = np.array([3,2,7,4])
ydat = np.array([2,4,4,9])
Lets say that I wanted to draw the sum total of these vectors (a+b+c+d), not only as a single line from the origin, but drawn sequentially from the sum of each individual vector.
How do I do this?
My idea is to use plt.plot for the values of two new arrays which contain the X and Y coordinates for each start/end point of all the vectors. The specific coordinates would be calculated from xdat and ydat. Assuming this was the most efficient method (without resorting to some easy-to-use function already built into python) how would I code this?
It sounds like you want numpy.cumsum
import numpy as np
xdat = np.array([3,2,7,4])
ydat = np.array([2,4,4,9])
dat = np.vstack((xdat, ydat))
# array([[3, 2, 7, 4],
# [2, 4, 4, 9]])
dat = np.cumsum(dat, axis=1)
# array([[ 3, 5, 12, 16],
# [ 2, 6, 10, 19]], dtype=int32)
# optionally start at 0, 0 (can do this before or after cumsum)
dat = np.hstack([np.zeros((2, 1)), dat])
# array([[ 0., 3., 5., 12., 16.],
# [ 0., 2., 6., 10., 19.]])
I stacked them up for convenience, but you could also run cumsum on the 1-D arrays. The axis argument selects either to run over the whole flattened array (None, the default), or along the n-th axis (row = 0, column = 1)
If you want to plot the X-Y coordinates, I'd do so with plt.plot(*dat), which will unpack the X and Y rows as arguments to plot.
I'm trying to move a few Matlab libraries that I've built to the python environment. So far, the biggest issue I faced is the dynamic allocation of arrays based on index specification. For example, using Matlab, typing the following:
x = [1 2];
x(5) = 3;
would result in:
x = [ 1 2 0 0 3]
In other words, I didn't know before hand the size of (x), nor its content. The array must be defined on the fly, based on the indices that I'm providing.
In python, trying the following:
from numpy import *
x = array([1,2])
x[4] = 3
Would result in the following error: IndexError: index out of bounds. On workaround is incrementing the array in a loop and then assigned the desired value as :
from numpy import *
x = array([1,2])
idx = 4
for i in range(size(x),idx+1):
x = append(x,0)
x[idx] = 3
print x
It works, but it's not very convenient and it might become very cumbersome for n-dimensional arrays.I though about subclassing ndarray to achieve my goal, but I'm not sure if it would work. Does anybody knows of a better approach?
Thanks for the quick reply. I didn't know about the setitem method (I'm fairly new to Python). I simply overwritten the ndarray class as follows:
import numpy as np
class marray(np.ndarray):
def __setitem__(self, key, value):
# Array properties
nDim = np.ndim(self)
dims = list(np.shape(self))
# Requested Index
if type(key)==int: key=key,
nDim_rq = len(key)
dims_rq = list(key)
for i in range(nDim_rq): dims_rq[i]+=1
# Provided indices match current array number of dimensions
if nDim_rq==nDim:
# Define new dimensions
newdims = []
for iDim in range(nDim):
v = max([dims[iDim],dims_rq[iDim]])
newdims.append(v)
# Resize if necessary
if newdims != dims:
self.resize(newdims,refcheck=False)
return super(marray, self).__setitem__(key, value)
And it works like a charm! However, I need to modify the above code such that the setitem allow changing the number of dimensions following this request:
a = marray([0,0])
a[3,1,0] = 0
Unfortunately, when I try to use numpy functions such as
self = np.expand_dims(self,2)
the returned type is numpy.ndarray instead of main.marray. Any idea on how I could enforce that numpy functions output marray if a marray is provided as an input? I think it should be doable using array_wrap, but I could never find exactly how. Any help would be appreciated.
Took the liberty of updating my old answer from Dynamic list that automatically expands. Think this should do most of what you need/want
class matlab_list(list):
def __init__(self):
def zero():
while 1:
yield 0
self._num_gen = zero()
def __setitem__(self,index,value):
if isinstance(index, int):
self.expandfor(index)
return super(dynamic_list,self).__setitem__(index,value)
elif isinstance(index, slice):
if index.stop<index.start:
return super(dynamic_list,self).__setitem__(index,value)
else:
self.expandfor(index.stop if abs(index.stop)>abs(index.start) else index.start)
return super(dynamic_list,self).__setitem__(index,value)
def expandfor(self,index):
rng = []
if abs(index)>len(self)-1:
if index<0:
rng = xrange(abs(index)-len(self))
for i in rng:
self.insert(0,self_num_gen.next())
else:
rng = xrange(abs(index)-len(self)+1)
for i in rng:
self.append(self._num_gen.next())
# Usage
spec_list = matlab_list()
spec_list[5] = 14
This isn't quite what you want, but...
x = np.array([1, 2])
try:
x[index] = value
except IndexError:
oldsize = len(x) # will be trickier for multidimensional arrays; you'll need to use x.shape or something and take advantage of numpy's advanced slicing ability
x = np.resize(x, index+1) # Python uses C-style 0-based indices
x[oldsize:index] = 0 # You could also do x[oldsize:] = 0, but that would mean you'd be assigning to the final position twice.
x[index] = value
>>> x = np.array([1, 2])
>>> x = np.resize(x, 5)
>>> x[2:5] = 0
>>> x[4] = 3
>>> x
array([1, 2, 0, 0, 3])
Due to how numpy stores the data linearly under the hood (though whether it stores as row-major or column-major can be specified when creating arrays), multidimensional arrays are pretty tricky here.
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.resize(x, (6, 4))
array([[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6],
[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6]])
You'd need to do this or something similar:
>>> y = np.zeros((6, 4))
>>> y[:x.shape[0], :x.shape[1]] = x
>>> y
array([[ 1., 2., 3., 0.],
[ 4., 5., 6., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
A python dict will work well as a sparse array. The main issue is the syntax for initializing the sparse array will not be as pretty:
listarray = [100,200,300]
dictarray = {0:100, 1:200, 2:300}
but after that the syntax for inserting or retrieving elements is the same
dictarray[5] = 2345