Custom nditer Behavior - python

I'm working with some image data of shape (x,y,3). Is a way to use a numpy construct like nditer to iterate over the (r,g,b) tuples corresponding to pixels? Out of the box, nditer iterates over the scalar values, but many numpy functions have something like a axis= argument to change behavoirs like this.

Check np.ndindex. It's doc may be enough to get you started. Better yet, read its code.
I have answered similar questions by essentially reverse enginerring ndindex. If there isn't a link in the Related sidebar, I'll do a quick search.
shallow iteration with nditer

Related

Resizing list from a 2D array

I came across a problem when trying to do the resize as follow:
I am having a 2D array after processing data and now I want to resize the data, with each row ignoring the first 5 items.
What I am doing right now is to:
edit: this apporach works fine as long as you make sure you are working with a list, but not string. it failed to work on my side because I haven't done the convertion from string to list properly.
so it ends up eliminating the first five characters in the entire string.
2dArray=[[array1],[array2]]
new_Resize_2dArray= [array[5:] for array in 2dArray]
However, it does not seems to work as it just recopy all the element over to the new_Resize_2dArray.
I would like to ask for help to see what did I do wrong or if there is any scientific calculation library I could use to acheive this.
First, because python list indexing is zero-based, your code should read new_Resize_2dArray= [array[5:] for array in 2dArray] if you want to not include the first 5 columns. Otherwise, I see no issue with your single line of code.
As for scientific computing libraries, numpy is a highly prevalent 3rd party package with a high-performance multidimensional array type ndarray. Using ndarrays, your code could be shortened to new_Resize_2dArray = 2dArray[:,5:]
Aside: It would help to include a bit more of your code or a minimum example where you are getting the unexpected result (e.g., use a fake/stand-in 2d array to see if it works as expected or still fails.

numpy: why does np.append() flatten my array?

I'm trying to
replicate this function in numpy
but for some reason it keeps doing this
or flattening the array and generally not behaving like i'd expect
or returning an error
The docstring is very clear. It explains at least three times that:
If axis is None, out is a flattened array.
This is the only reasonable thing to do. If the inputs are multidimensional, but you don't specify which axis to operate on, how can the code determine the "right" axis? For example, what if the input is a square, 2D array? In that case, both axes are equally valid.
There are too many ways for code that tries to be smart about appending to fail, or worse, to succeed but with the wrong results. Instead, the authors decided that flattening is a reasonable default choice, and made that choice explicit in the documentation.
Also note that there is no way to replicate the behavior at the top of your post in NumPy. By definition, ndarrays are rectangular, but the list you have here is "ragged". You cannot have an ndarray where each row or column has different size.

Selecting part of Numpy array

I'm kind of newbie in Python, and I read some code written by someone experienced. This part should take part of Numpy array
a=np.random.random((10000,32,32,3)) # random values as an example
mask=list(range(5000))
a=a[mask]
For me it looks rather wasteful to create another list to get part of array. Moreover, resulting array is really the first 5000 fields, no complex selection is required.
As far as I know, the following code should give the same result:
a=a[:5000]
What is advantage of the first example? Is it faster? Or I missed something?

Is there a way to efficiently vectorize Tensorflow ops on images?

Tensorflow has a great deal of transformations that can be applied to 3D-tensors representing images ([height, width, depth]) like tf.image.rot90() or tf.image.random_flip_left_right() for example.
I know that they are meant to be used with queues hence the fact that they operate on only one image.
But would there be a way to vectorize the ops to transform 4D-tensor ([batch_size,height,width,depth]) to same size tensor with op applied image-wise along the first dimension without explicitely looping through them with tf.while_loop()?
(EDIT : Regarding rot90() a clever hack taken from numpy rot90 would be to do:
rot90=tf.reverse(x,tf.convert_to_tensor((False,False,True,False)))
rot90=tf.transpose(rot90,([0,2,1,3])
EDIT 2: It turns out this question has already been answered quite a few times (one example) it seems map_fn is the way to go if you want an optimized version. I had already seen it but I had forgotten. I guess this makes this question a duplicate...
However for random op or more complex op it would be nice to have a generic method to vectorize existing functions...)
Try tf.map_fn.
processed_images = tf.map_fn(process_fn, images)

Questions regarding numpy in Python

I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?
Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.

Categories