Python: Parse string to array - python

I am currently having the problem parsing a string to a numpy array.
The string look like this:
input = '{{13,1},{2,1},{4,4},{1,7},{9,1}}'
The string represents a sparse vector, where the vector itself is delimited by curly brackets. Each entry, itself delimited by curly brackets, indicates which indices have which entries. The first entry in the list encodes the dimensions of the vector.
In the above example, the vector has length of 13 and 4 entries which are different from 0.
output = np.array([0,7,1,0,4,0,0,0,0,1,0,0,0])
After parsing it to an array, I have to parse to back to a string in its dense format, with the format:
stringoutput = '{0,7,1,0,4,0,0,0,0,1,0,0,0}'
While I managed to parse the numpy array to a string, I ran into the problem of having the wrong brackets (i.e. the build in array2string function uses [], while I need {})
I am open for any suggestions that help, solving this efficiently (even for large sparse vectors).
Thank you.
\edit: The given vector is always one dimensional, i.e. the second number within the first {} will always be 1. (and you only need 1 index to locate the position of elements)

Here is a numpythonic way:
In [132]: inp = '{{13,1},{2,1},{4,4},{1,7},{9,1}}'
# Relace the brackets with parenthesis in order to convert the string to a valid python object.
In [133]: inp = ast.literal_eval(inp.replace('{', '(').replace('}', ')'))
# Unpack the dimention and rest of then values from input object
In [134]: dim, *rest = inp
# Creat the zero array based on extracted dimention
In [135]: arr = np.zeros(dim)
# use `zip` to collecte teh indices and values separately in order to be use in `np.put`
In [136]: indices, values = zip(*rest)
In [137]: np.put(arr, indices, values)
In [138]: arr
Out[138]:
array([[ 0.],
[ 7.],
[ 1.],
[ 0.],
[ 4.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 1.],
[ 0.],
[ 0.],
[ 0.]])

I like #Kasramvd's approach, but figured I'd put this one out there as well:
In [116]: r = (list(map(int, a.split(','))) for a in input[2:-2].split('},{'))
In [118]: l = np.zeros(next(r)[0], np.int)
In [119]: for a in r:
...: l[a[0]] = a[1]
...:
In [122]: s = '{' + ','.join(map(str, l)) + '}'
In [123]: s
Out[123]: '{0,7,1,0,4,0,0,0,0,1,0,0,0}'

This is based on #Kasramvd's answer. I adjusted how the other values are populated.
from #Kasramvd
import numpy as np
import ast
inp = '{{13,1},{2,1},{4,4},{1,7},{9,1}}'
inp = ast.literal_eval(inp.replace('{', '(').replace('}', ')'))
dim, *rest = inp
my adjustments
a = np.zeros(dim, dtype=int)
r = np.array(rest)
a[r[:, 0], 0] = r[:, 1]
a
array([[0],
[7],
[1],
[0],
[4],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0]])
in one dimension
a = np.zeros(dim[0], dtype=int)
r = np.array(rest)
a[r[:, 0]] = r[:, 1]
a
array([0, 7, 1, 0, 4, 0, 0, 0, 0, 1, 0, 0, 0])

Related

How do I remove rows in a list containing numpy arrays based on a condition?

I have the following numpy array arr_split:
import numpy as np
arr1 = np.array([[1.,2,3], [4,5,6], [7,8,9]])
arr_split = np.array_split(arr1,
indices_or_sections = 4,
axis = 0)
arr_split
Output:
[array([[1., 2., 3.]]),
array([[4., 5., 6.]]),
array([[7., 8., 9.]]),
array([], shape=(0, 3), dtype=float64)]
How do I remove rows which are "empty" (ie. in the above eg., it's the last row). The array arr_split can have any number of "empty" rows. The above eg. just so happens to have only one row which is "empty".
I have tried using list comprehension, as per below:
arr_split[[(arr_split[i].shape[0] != 0) for i in range(len(arr_split))]]
but this doesn't work because the list comprehension [(arr_split[i].shape[0] != 0) for i in range(len(arr_split))] part returns a list, when I actually just need the elements in the list to feed into arr_split[] as indices.
Anyone know how I could fix this or is there another way of doing this? If possible, looking for the easiest way of doing this without too many loops or if statements.
you can change the indices_or_sections value to length of the first axis, this will prevent any empty arrays from being produced
import numpy as np
arr1 = np.array([[1.,2,3], [4,5,6], [7,8,9]])
arr_split = np.array_split(arr1,
indices_or_sections = arr1.shape[0],
axis = 0)
arr_split
>>> [
array([[1., 2., 3.]]),
array([[4., 5., 6.]]),
array([[7., 8., 9.]])
]
Just loop through and check the size. Only add them to the new list if they have a size greater than 0.
arr_split_new = [arr for arr in arr_split if arr.size > 0]
You can use enumerate to get the indexes and size to check if empty
indexes = [idx for idx, v in enumerate(arr_split) if v.size != 0]
[0, 1, 2]

Create a numpy array and update it's values in every iteration

I'm using a video processing tool that needs to input the processing data from each frame into an array.
for p in det.read(frame, fac):
point_values = np.array([])
for j, (x, y) in enumerate(p): #iteration through points
point_values = np.append(point_values,y)
point_values = np.append(point_values,x)
this code runs again each frame. I'm expecting "point_values = np.array([])" to reset the array and then start filling it again.
I'm not sure if my logic is wrong or is it a syntax issue.
Your code does:
In [77]: p = [(0,0),(0,2),(1,0),(1,2)]
In [78]: arr = np.array([])
In [79]: for j,(x,y) in enumerate(p):
...: arr = np.append(arr,y)
...: arr = np.append(arr,x)
...:
In [80]: arr
Out[80]: array([0., 0., 2., 0., 0., 1., 2., 1.])
No syntax error. The list equivalent is faster and cleaner:
In [85]: alist =[]
In [86]: for x,y in p: alist.extend((y,x))
In [87]: alist
Out[87]: [0, 0, 2, 0, 0, 1, 2, 1]
But you don't give any indication of how this action is supposed to fit within a larger context. You create a new point_values for each p, but then don't do anything with it.

mapping over 2 numpy.ndarray simultaneously

Here's the problem. Let's say I have a matrix A =
array([[ 1., 0., 2.],
[ 0., 0., 2.],
[ 0., -1., 3.]])
and a vector of indices p = array([0, 2, 1]). I want to turn a 3x3 matrix A to an array of length 3 (call it v) where v[j] = A[j, p[j]] for j = 0, 1, 2. I can do it the following way:
v = map(lambda (row, idx): row[idx], zip(A, p))
So for the above matrix A and a vector of indices p I expect to get array([1, 2, -1]) (ie 0th element of row 0, 2nd element of row 1, 1st element of row 2).
But can I achieve the same result by using native numpy (ie without explicitly zipping and then mapping)? Thanks.
I don't think that such a functionality exists. To achieve what you want, I can think of two easy ways. You could do:
np.diag(A[:, p])
Here the array p is applied as a column index for every row such that on the diagonal you will have the elements that you are looking for.
As an alternative you can avoid to produce a lot of unnecessary entries by using:
A[np.arange(A.shape[0]), p]

reshape list of numpy arrays and then reshape back

I have a list which consists of several numpy arrays with different shapes.
I want to reshape this list of arrays into a numpy vector and then change each element in the vector and then reshape it back to the original list of arrays.
For example:
input
[numpy.zeros((2,2)), numpy.ones((3,3))]
First
To vector
[0,0,0,0,1,1,1,1,1,1,1,1,1]
Second
every time change only one element. for example change the 1st element 0 to 2
[0,2,0,0,1,1,1,1,1,1,1,1,1]
Last
convert it back to
[array([[0,2],[0,0]]),array([[1,1,1],[1,1,1],[1,1,1]])]
Is there any fast implementation? Thanks very much.
It seems like converting to a list and back will be inefficient. Instead, why not figure out which array to index (and where) and then just update that index? e.g.
def change_element(arr1, arr2, ix, value):
which = ix >= arr1.size
arr = [arr1, arr2][which]
ix = ix - arr1.size if which else ix
arr.ravel()[ix] = value
And here's some example usage:
>>> arr1 = np.zeros((2, 2))
>>> arr2 = np.ones((3, 3))
>>> change_element(arr1, arr2, 1, 2)
>>> change_element(arr1, arr2, 6, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 1. , 1. , 1. ],
[ 1. , 1. , 1. ]])
>>> change_element(arr1, arr2, 7, 3.14)
>>> arr1
array([[ 0., 2.],
[ 0., 0.]])
>>> arr2
array([[ 1. , 1. , 3.14],
[ 3.14, 1. , 1. ],
[ 1. , 1. , 1. ]])
A few notes -- This updates the arrays in place. It doesn't create new arrays. If you really need to create new arrays, I suppose you could np.copy them and return. Also, this relies on the arrays sharing memory before and after the ravel. I don't remember the exact circumstances where ravel would return a new array rather than a view into the original array . . .
Generalizing to more arrays is actually quite easy. We just need to walk down the list of arrays and see if ix is less than the array size. If it is, we've found our array. If it isn't, we need to subtract the array's size from ix to represent the number of elements we've traversed thus far:
def change_element(arrays, ix, value):
for arr in arrays:
if ix < arr.size:
arr.ravel()[ix] = value
return
ix -= arr.size
And you can call this similar to before:
change_element([arr1, arr2], 6, 3.14159)
#mgilson probably has the best answer for you, but if you absolutely have to convert to a flat list first and then go back again (perhaps because you need to do something else with the flat list as well), then you can do this with list comprehensions:
lst = [numpy.zeros((2,4)), numpy.ones((3,3))]
tlist = [e for a in lst for e in a.ravel()]
tlist[1] = 2
i = 0
lst2 = []
dims = [a.shape for a in lst]
for n, m in dims:
lst2.append(np.array(tlist[i:i+n*m]).reshape(n,m))
i += n*m
lst2
[array([[ 0., 2.],
[ 0., 0.]]), array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])]
Of course, you lose the information about your array sizes when you flatten, so you need to store them somewhere (here, in dims).

Python Dynamic Array allocation, Matlab style

I'm trying to move a few Matlab libraries that I've built to the python environment. So far, the biggest issue I faced is the dynamic allocation of arrays based on index specification. For example, using Matlab, typing the following:
x = [1 2];
x(5) = 3;
would result in:
x = [ 1 2 0 0 3]
In other words, I didn't know before hand the size of (x), nor its content. The array must be defined on the fly, based on the indices that I'm providing.
In python, trying the following:
from numpy import *
x = array([1,2])
x[4] = 3
Would result in the following error: IndexError: index out of bounds. On workaround is incrementing the array in a loop and then assigned the desired value as :
from numpy import *
x = array([1,2])
idx = 4
for i in range(size(x),idx+1):
x = append(x,0)
x[idx] = 3
print x
It works, but it's not very convenient and it might become very cumbersome for n-dimensional arrays.I though about subclassing ndarray to achieve my goal, but I'm not sure if it would work. Does anybody knows of a better approach?
Thanks for the quick reply. I didn't know about the setitem method (I'm fairly new to Python). I simply overwritten the ndarray class as follows:
import numpy as np
class marray(np.ndarray):
def __setitem__(self, key, value):
# Array properties
nDim = np.ndim(self)
dims = list(np.shape(self))
# Requested Index
if type(key)==int: key=key,
nDim_rq = len(key)
dims_rq = list(key)
for i in range(nDim_rq): dims_rq[i]+=1
# Provided indices match current array number of dimensions
if nDim_rq==nDim:
# Define new dimensions
newdims = []
for iDim in range(nDim):
v = max([dims[iDim],dims_rq[iDim]])
newdims.append(v)
# Resize if necessary
if newdims != dims:
self.resize(newdims,refcheck=False)
return super(marray, self).__setitem__(key, value)
And it works like a charm! However, I need to modify the above code such that the setitem allow changing the number of dimensions following this request:
a = marray([0,0])
a[3,1,0] = 0
Unfortunately, when I try to use numpy functions such as
self = np.expand_dims(self,2)
the returned type is numpy.ndarray instead of main.marray. Any idea on how I could enforce that numpy functions output marray if a marray is provided as an input? I think it should be doable using array_wrap, but I could never find exactly how. Any help would be appreciated.
Took the liberty of updating my old answer from Dynamic list that automatically expands. Think this should do most of what you need/want
class matlab_list(list):
def __init__(self):
def zero():
while 1:
yield 0
self._num_gen = zero()
def __setitem__(self,index,value):
if isinstance(index, int):
self.expandfor(index)
return super(dynamic_list,self).__setitem__(index,value)
elif isinstance(index, slice):
if index.stop<index.start:
return super(dynamic_list,self).__setitem__(index,value)
else:
self.expandfor(index.stop if abs(index.stop)>abs(index.start) else index.start)
return super(dynamic_list,self).__setitem__(index,value)
def expandfor(self,index):
rng = []
if abs(index)>len(self)-1:
if index<0:
rng = xrange(abs(index)-len(self))
for i in rng:
self.insert(0,self_num_gen.next())
else:
rng = xrange(abs(index)-len(self)+1)
for i in rng:
self.append(self._num_gen.next())
# Usage
spec_list = matlab_list()
spec_list[5] = 14
This isn't quite what you want, but...
x = np.array([1, 2])
try:
x[index] = value
except IndexError:
oldsize = len(x) # will be trickier for multidimensional arrays; you'll need to use x.shape or something and take advantage of numpy's advanced slicing ability
x = np.resize(x, index+1) # Python uses C-style 0-based indices
x[oldsize:index] = 0 # You could also do x[oldsize:] = 0, but that would mean you'd be assigning to the final position twice.
x[index] = value
>>> x = np.array([1, 2])
>>> x = np.resize(x, 5)
>>> x[2:5] = 0
>>> x[4] = 3
>>> x
array([1, 2, 0, 0, 3])
Due to how numpy stores the data linearly under the hood (though whether it stores as row-major or column-major can be specified when creating arrays), multidimensional arrays are pretty tricky here.
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.resize(x, (6, 4))
array([[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6],
[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6]])
You'd need to do this or something similar:
>>> y = np.zeros((6, 4))
>>> y[:x.shape[0], :x.shape[1]] = x
>>> y
array([[ 1., 2., 3., 0.],
[ 4., 5., 6., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
A python dict will work well as a sparse array. The main issue is the syntax for initializing the sparse array will not be as pretty:
listarray = [100,200,300]
dictarray = {0:100, 1:200, 2:300}
but after that the syntax for inserting or retrieving elements is the same
dictarray[5] = 2345

Categories