'numpy.float64' object is not iterable - python

I'm trying to iterate an array of values generated with numpy.linspace:
slX = numpy.linspace(obsvX, flightX, numSPts)
slY = np.linspace(obsvY, flightY, numSPts)
for index,point in slX:
yPoint = slY[index]
arcpy.AddMessage(yPoint)
This code worked fine on my office computer, but I sat down this morning to work from home on a different machine and this error came up:
File "C:\temp\gssm_arcpy.1.0.3.py", line 147, in AnalyzeSightLine
for index,point in slX:
TypeError: 'numpy.float64' object is not iterable
slX is just an array of floats, and the script has no problem printing the contents -- just, apparently iterating through them. Any suggestions for what is causing it to break, and possible fixes?

numpy.linspace() gives you a one-dimensional NumPy array. For example:
>>> my_array = numpy.linspace(1, 10, 10)
>>> my_array
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
Therefore:
for index,point in my_array
cannot work. You would need some kind of two-dimensional array with two
elements in the second dimension:
>>> two_d = numpy.array([[1, 2], [4, 5]])
>>> two_d
array([[1, 2], [4, 5]])
Now you can do this:
>>> for x, y in two_d:
print(x, y)
1 2
4 5

Related

How to iterate a tensorflow.train.Example or train.Feature object to get a python list?

I have a tensorflow.train.Feature object which I need to traverse and make a numpy array out of it.
import tensorflow as tf
int_feature = tf.train.Feature(
int64_list=tf.train.Int64List(value=[1, 2, 3, 4]))
float_feature = tf.train.Feature(
float_list=tf.train.FloatList(value=[1., 2., 3., 4.]))
example = tf.train.Example(
features=tf.train.Features(feature={
'my_ints': int_feature,
'my_floats': float_feature
}))
How can I convert it to a numpy array or a python list?
Expected output:
[[1, 1.], [2, 2.], [3, 3.], [4, 4.]]
Okay, found the answer after some trial and error.
arr_1 = np.array(example.features.feature['my_ints'].int64_list.value).reshape(1, -1)
arr_2 = np.array(example.features.feature['my_floats'].float_list.value).reshape(1, -1)
print(arr_1)
print(arr_2)
np.stack([arr_1, arr_2], axis = 1)

How to build an object numpy array from an iterator?

I want to create a NumPy array of np.ndarray from an iterable. This is because I have a function that will return np.ndarray of some constant shape, and I need to create an array of results from this function, something like this:
OUTPUT_SHAPE = some_constant
def foo(input) -> np.ndarray:
# processing
# generated np.ndarray of shape OUTPUT_SHAPE
return output
inputs = [i for i in range(100000)]
iterable = (foo(input) for input in inputs)
arr = np.fromiter(iterable, np.ndarray)
This obviously gives an error:-
cannot create object arrays from iterator
I cannot first create a list then convert it to an array, because it will first create a copy of every output array, so for a time, there will be almost double memory occupied, and I have very limited memory.
Can anyone help me?
You probably shouldn't make an object array. You should probably make an ordinary 2D array of non-object dtype. As long as you know the number of results the iterator will give in advance, you can avoid most of the copying you're worried about by doing it like this:
arr = numpy.empty((num_iterator_outputs, OUTPUT_SHAPE), dtype=whatever_appropriate_dtype)
for i, output in enumerate(iterable):
arr[i] = output
This only needs to hold arr and a single output in memory at once, instead of arr and every output.
If you really want an object array, you can get one. The simplest way would be to go through a list, which will not perform the copying you're worried about as long as you do it right:
outputs = list(iterable)
arr = numpy.empty(len(outputs), dtype=object)
arr[:] = outputs
Note that if you just try to call numpy.array on outputs, it will try to build a 2D array, which will cause the copying you're worried about. This is true even if you specify dtype=object - it'll try to build a 2D array of object dtype, and that'll be even worse, for both usability and memory.
An object dtype array contains references, just like a list.
Define 3 arrays:
In [589]: a,b,c = np.arange(3), np.ones(3), np.zeros(3)
put them in a list:
In [590]: alist = [a,b,c]
and in an object dtype array:
In [591]: arr = np.empty(3,object)
In [592]: arr[:] = alist
In [593]: arr
Out[593]:
array([array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])],
dtype=object)
In [594]: alist
Out[594]: [array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])]
Modify one, and see the change in the list and array:
In [595]: b[:] = [1,2,3]
In [596]: b
Out[596]: array([1., 2., 3.])
In [597]: alist
Out[597]: [array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])]
In [598]: arr
Out[598]:
array([array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])],
dtype=object)
A numeric dtype array created from these copies all values:
In [599]: arr1 = np.stack(arr)
In [600]: arr1
Out[600]:
array([[0., 1., 2.],
[1., 2., 3.],
[0., 0., 0.]])
So even if your use of fromiter worked, it wouldn't be any different, memory wise from a list accumulation:
alist = []
for i in range(n):
alist.append(constant_array)

How to reconstruct the raw data from a histogram?

I need to recover the "raw data" from a timing histogram provided by a timing counter as a .csv file.
I've got the code below but since the actual data has several thousands of counts in each bin, a for loop is taking a very long time, so I was wondering if there was a better way.
import numpy as np
# Example histogram with 1 second bins
hist = np.array([[1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [0, 17, 3, 34, 35, 100, 101, 107, 12, 1]])
# Array for bins and counts
time_bins = hist[0]
counts = hist[1]
# Empty data to append
data = np.empty(0)
for i in range(np.size(counts)):
for j in range(counts[i]):
data = np.append(data, [time_bins[i]])
I get that the resolution of the raw data will be the smallest time bin but that is fine for my purposes.
In the end, this is to be able to produce another histogram with logarithmic bins, which I am able to do with the raw data.
EDIT
The code I'm using to load the CSV is
x = np.loadtxt(fname, delimiter=',', skiprows=1).T
a = x[0]
b = x[1]
data = np.empty(0)
for i in range(np.size(b)):
for j in range(np.int(b[i])):
data = np.append(data, [a[i]])
You can do this with a list comprehension and the numpy concatenation:
import numpy as np
hist = np.array([[1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [0, 17, 3, 34, 35, 100, 101, 107, 12, 1]])
new_array = np.concatenate([[hist[0][i]]*int(hist[1][i]) for i in range(len(hist[0]))])
Especially if there's a lot of data, copying the array each iteration (which is what append does -- numpy arrays can't be resized) will be costly. Try allocating first (i.e. data = np.zeros(np.size(counts))) and then just assigning to it.
I'm also not sure what your innermost for loop is doing, since each iteration appends the same thing?

What does -1 mean in pytorch view?

As the question says, what does -1 do in pytorch view?
>>> a = torch.arange(1, 17)
>>> a
tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16.])
>>> a.view(1,-1)
tensor([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16.]])
>>> a.view(-1,1)
tensor([[ 1.],
[ 2.],
[ 3.],
[ 4.],
[ 5.],
[ 6.],
[ 7.],
[ 8.],
[ 9.],
[ 10.],
[ 11.],
[ 12.],
[ 13.],
[ 14.],
[ 15.],
[ 16.]])
Does it (-1) generate additional dimension?
Does it behave the same as numpy reshape -1?
Yes, it does behave like -1 in numpy.reshape(), i.e. the actual value for this dimension will be inferred so that the number of elements in the view matches the original number of elements.
For instance:
import torch
x = torch.arange(6)
print(x.view(3, -1)) # inferred size will be 2 as 6 / 3 = 2
# tensor([[ 0., 1.],
# [ 2., 3.],
# [ 4., 5.]])
print(x.view(-1, 6)) # inferred size will be 1 as 6 / 6 = 1
# tensor([[ 0., 1., 2., 3., 4., 5.]])
print(x.view(1, -1, 2)) # inferred size will be 3 as 6 / (1 * 2) = 3
# tensor([[[ 0., 1.],
# [ 2., 3.],
# [ 4., 5.]]])
# print(x.view(-1, 5)) # throw error as there's no int N so that 5 * N = 6
# RuntimeError: invalid argument 2: size '[-1 x 5]' is invalid for input with 6 elements
print(x.view(-1, -1, 3)) # throw error as only one dimension can be inferred
# RuntimeError: invalid argument 1: only one dimension can be inferred
I love the answer that Benjamin gives https://stackoverflow.com/a/50793899/1601580
Yes, it does behave like -1 in numpy.reshape(), i.e. the actual value for this dimension will be inferred so that the number of elements in the view matches the original number of elements.
but I think the weird case edge case that might not be intuitive for you (or at least it wasn't for me) is when calling it with a single -1 i.e. tensor.view(-1).
My guess is that it works exactly the same way as always except that since you are giving a single number to view it assumes you want a single dimension. If you had tensor.view(-1, Dnew) it would produce a tensor of two dimensions/indices but would make sure the first dimension to be of the correct size according to the original dimension of the tensor. Say you had (D1, D2) you had Dnew=D1*D2 then the new dimension would be 1.
For real examples with code you can run:
import torch
x = torch.randn(1, 5)
x = x.view(-1)
print(x.size())
x = torch.randn(2, 4)
x = x.view(-1, 8)
print(x.size())
x = torch.randn(2, 4)
x = x.view(-1)
print(x.size())
x = torch.randn(2, 4, 3)
x = x.view(-1)
print(x.size())
output:
torch.Size([5])
torch.Size([1, 8])
torch.Size([8])
torch.Size([24])
History/Context
I feel a good example (common case early on in pytorch before the flatten layer was official added was this common code):
class Flatten(nn.Module):
def forward(self, input):
# input.size(0) usually denotes the batch size so we want to keep that
return input.view(input.size(0), -1)
for sequential. In this view x.view(-1) is a weird flatten layer but missing the squeeze (i.e. adding a dimension of 1). Adding this squeeze or removing it is usually important for the code to actually run.
Example2
if you are wondering what x.view(-1) does it flattens the vector. Why? Because it has to construct a new view with only 1 dimension and infer the dimension -- so it flattens it. In addition it seems this operation avoids the very nasty bugs .resize() brings since the order of the elements seems to be respected. Fyi, pytorch now has this op for flattening: https://pytorch.org/docs/stable/generated/torch.flatten.html
#%%
"""
Summary: view(-1, ...) keeps the remaining dimensions as give and infers the -1 location such that it respects the
original view of the tensor. If it's only .view(-1) then it only has 1 dimension given all the previous ones so it ends
up flattening the tensor.
ref: my answer https://stackoverflow.com/a/66500823/1601580
"""
import torch
x = torch.arange(6)
print(x)
x = x.reshape(3, 2)
print(x)
print(x.view(-1))
output
tensor([0, 1, 2, 3, 4, 5])
tensor([[0, 1],
[2, 3],
[4, 5]])
tensor([0, 1, 2, 3, 4, 5])
see the original tensor is returned!
I guess this works similar to np.reshape:
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
If you have a = torch.arange(1, 18) you can view it various ways like a.view(-1,6),a.view(-1,9), a.view(3,-1) etc.
From the PyTorch documentation:
>>> x = torch.randn(4, 4)
>>> x.size()
torch.Size([4, 4])
>>> y = x.view(16)
>>> y.size()
torch.Size([16])
>>> z = x.view(-1, 8) # the size -1 is inferred from other dimensions
>>> z.size()
torch.Size([2, 8])
-1 infers to 2, for instance, if you have
>>> a = torch.rand(4,4)
>>> a.size()
torch.size([4,4])
>>> y = x.view(16)
>>> y.size()
torch.size([16])
>>> z = x.view(-1,8) # -1 is generally inferred as 2 i.e (2,8)
>>> z.size()
torch.size([2,8])
-1 is a PyTorch alias for "infer this dimension given the others have all been specified" (i.e. the quotient of the original product by the new product). It is a convention taken from numpy.reshape().
Hence t.view(1,17) in the example would be equivalent to t.view(1,-1) or t.view(-1,17).

Growing matrices columnwise in NumPy

In pure Python you can grow matrices column by column pretty easily:
data = []
for i in something:
newColumn = getColumnDataAsList(i)
data.append(newColumn)
NumPy's array doesn't have the append function. The hstack function doesn't work on zero sized arrays, thus the following won't work:
data = numpy.array([])
for i in something:
newColumn = getColumnDataAsNumpyArray(i)
data = numpy.hstack((data, newColumn)) # ValueError: arrays must have same number of dimensions
So, my options are either to remove the initalization iside the loop with appropriate condition:
data = None
for i in something:
newColumn = getColumnDataAsNumpyArray(i)
if data is None:
data = newColumn
else:
data = numpy.hstack((data, newColumn)) # works
... or to use a Python list and convert is later to array:
data = []
for i in something:
newColumn = getColumnDataAsNumpyArray(i)
data.append(newColumn)
data = numpy.array(data)
Both variants seem a little bit awkward to be. Are there nicer solutions?
NumPy actually does have an append function, which it seems might do what you want, e.g.,
import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)
your second snippet (hstack) will work if you add another line, e.g.,
my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))
hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.
While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:
initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).
For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:
>>> # initialize your skeleton array using 'empty' for lowest-memory footprint
>>> M = NP.empty(shape=(10, 5), dtype=float)
>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)
populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets
>>> for index, itm in enumerate(range(5)):
M[:,index] = fnx(10)
>>> M
array([[ 1., 7., 0., 8., 7.],
[ 9., 0., 6., 9., 4.],
[ 2., 3., 6., 3., 4.],
[ 3., 4., 1., 0., 5.],
[ 2., 3., 5., 3., 0.],
[ 4., 6., 5., 6., 2.],
[ 0., 6., 1., 6., 8.],
[ 3., 8., 0., 8., 0.],
[ 5., 2., 5., 0., 1.],
[ 0., 6., 5., 9., 1.]])
of course if you don't known in advance what size your array should be
just create one much bigger than you need and trim the 'unused' portions
when you finish populating it
>>> M[:3,:3]
array([[ 9., 3., 1.],
[ 9., 6., 8.],
[ 9., 7., 5.]])
Usually you don't keep resizing a NumPy array when you create it. What don't you like about your third solution? If it's a very large matrix/array, then it might be worth allocating the array before you start assigning its values:
x = len(something)
y = getColumnDataAsNumpyArray.someLengthProperty
data = numpy.zeros( (x,y) )
for i in something:
data[i] = getColumnDataAsNumpyArray(i)
The hstack can work on zero sized arrays:
import numpy as np
N = 5
M = 15
a = np.ndarray(shape = (N, 0))
for i in range(M):
b = np.random.rand(N, 1)
a = np.hstack((a, b))
Generally it is expensive to keep reallocating the NumPy array - so your third solution is really the best performance wise.
However I think hstack will do what you want - the cue is in the error message,
ValueError: arrays must have same number of dimensions`
I'm guessing that newColumn has two dimensions (rather than a 1D vector), so you need data to also have two dimensions..., for example, data = np.array([[]]) - or alternatively make newColumn a 1D vector (generally if things are 1D it is better to keep them 1D in NumPy, so broadcasting, etc. work better). in which case use np.squeeze(newColumn) and hstack or vstack should work with your original definition of the data.

Categories