I have multiple 2D arrays saved in a list called image_concat. This list will be composed of over a hundred of these arrays, but for now I'm just trying to make my code run for a list with only two of them. These arrays all have different shapes, and I would like to find the largest x-dimension and largest y-dimension out of all the arrays, and then pad all the other ones with enough zeros around the edges so that in the end, they all have the same shape. Note that the largest x-dimension and largest y-dimension might belong to separate arrays, or they might belong to the same one. What I have tried writing so far is not successfully changing the shape of the smaller array for some reason. But I also think that some issues will arise even after changing the shapes, since some arrays might be off by one in the end due to elements in the shape being even or odd.
import astropy
import numpy as np
import math
import matplotlib.pyplot as plt
from astropy.utils.data import download_file
from astropy.io import fits
images = ['http://irsa.ipac.caltech.edu/ibe/data/wise/allsky/4band_p1bm_frm/9a/02729a/148/02729a148-w2-int-1b.fits?center=89.353536,37.643864deg&size=0.6deg', 'http://irsa.ipac.caltech.edu/ibe/data/wise/allsky/4band_p1bm_frm/2a/03652a/123/03652a123-w4-int-1b.fits?center=294.772333,-19.747157deg&size=0.6deg']
image_list = []
for url in images:
image_list.append(download_file(url, cache=True))
image_concat = [fits.getdata(image) for image in image_list]
# See shapes in the beginning
print(np.shape(image_concat[0]))
print(np.shape(image_concat[1]))
def pad(image_concat):
# Identify largest x and y dimensions
xdims, ydims = np.zeros(len(image_concat)), np.zeros(len(image_concat))
for i in range(len(xdims)):
x, y = np.shape(image_concat[i])
xdims[i] = x
ydims[i] = y
x_max = int(np.max(xdims))
y_max = int(np.max(ydims))
# Pad all arrays except the largest dimensions
for A in image_concat:
x_len, y_len = np.shape(A)
print(math.ceil((y_max-y_len)/2))
print(math.ceil((x_max-x_len)/2))
np.pad(A, ((math.ceil((y_max-y_len)/2), math.ceil((y_max-y_len)/2)), (math.ceil((x_max-x_len)/2), math.ceil((x_max-x_len)/2))), 'constant', constant_values=0)
return image_concat
image_concat = pad(image_concat)
# See shapes afterwards (they haven't changed for some reason)
print(np.shape(image_concat[0]))
print(np.shape(image_concat[1]))
I can't understand why the shape isn't changing for this case. And also, is there a way to easily generalize this so that it will work on many arrays regardless of if they have even or odd dimensions?
np.pad doesn't modify the array in-place, it returns a padded array. So you'd need to do image_concat[i] = np.pad(...), where i is the index of A.
Related
I'm having some trouble trying to check if a python tuple is in a one dimensional numpy array. I'm working on a loop that will record all the colors present in an image and store them into an array. It worked well using normal lists, but the image is very large and I think NumPy Arrays will speed up the loop as it took several minutes to complete the loop.
Here's what the code looks like:
from PIL import Image
import numpy as np
img = Image.open("bg.jpg").convert("RGB")
pixels = img.load()
colors = np.array([])
for h in range(img.size[1]):
for w in range(img.size[0]):
if pixels[w,h] not in colors:
colors = np.append(colors, pixels[w,h])
else:
continue
When I run this, I get the following error:
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
if pixels[w,h] in colors:
Thanks in advance, and if you know a faster way to do this please let me know.
I'm not sure what you need exactly. But i hope the next piece of code will help you.
import numpy as np
image = np.arange(75).reshape(5, 5, 3) % 8
# Get the set of unique pixles
pixel_list = image.reshape(-1, 3)
unique_pixels = np.unique(pixel_list, axis = 0)
# Test whether a pixel is in the list of pixels:
i = 0
pixel_in_list = (unique_pixels[i] == pixel_list).all(1).any(0)
# all(1) - all the dimensions (rgb) of the pixels need to match
# any(0) - test if any of the pixels match
# Test whether any of the pixels in the set is in the list of pixels:
compare_all = unique_pixels.reshape(-1, 1, 3) == pixel_list.reshape(1, -1, 3)
pixels_in_list = compare_all.all(2).any()
# all(2) - all the dimensions (rgb) of the pixels need to match
# any() - test if any of the pixelsin the set matches any of the pixels in the list
I found a faster way to make my loop run faster without NumPy and that is by using sets, which is way faster than using lists or NumPy. This is what the code looks like now:
from PIL import Image
img = Image.open("bg.jpg").convert("RGB")
pixels = img.load()
colors = set({})
for h in range(img.size[1]):
for w in range(img.size[0]):
if pixels[w,h] in colors:
continue
else:
colors.add(pixels[w,h])
This solves my initial problem of the lists being too slow to loop through, and it goes around the second problem of NumPy unable to compare the tuples. Thanks for all the replies, have a good day.
Assuming pixels is of shape (3, w, h) or (3, h, w) (i.e., the color channels are along the first axis), and assuming all you're after are the unique colors in the image:
channels = (channel.flatten() for channel in pixels)
colors = set(zip(*channels))
If you want a list instead of a set, colors = list(set(zip(*channels))).
You seem to be misunderstanding where numpy comes in handy. A numpy array of tuples is not going to be any faster than a a Python list of tuples. The speed of numpy comes into play in numerical computation on matrices and vectors. A numpy array of tuples cannot take advantage of any of the things that make numpy so fast.
What you're trying to do is simply not appropriate for numpy, and won't help speed up your code at all.
I have a 3D numpy array that is a stack of 2D (m,n) images at certain timestamps, t. So my array is of shape (t, m, n). I want to plot the value of one of the pixels as a function of time.
e.g.:
import numpy as np
import matplotlib.pyplot as plt
data_cube = []
for i in xrange(10):
a = np.random(100,100)
data_cube.append(a)
So my (t, m, n) now has shape (10,100,100). Say I wanted a 1D plot the value of index [12][12] at each of the 10 steps I would do:
plt.plot(data_cube[:][12][12])
plt.show()
But I'm getting index out of range errors. I thought I might have my indices mixed up, but every plot I generate seems to be in the 'wrong' axis, i.e. across one of the 2D arrays, but instead I want it 'through' the vertical stack. Thanks in advance!
Here is the solution: Since you are already using numpy, convert you final list to an array and just use slicing. The problem in your case was two-fold:
First: Your final data_cube was not an array. For a list, you will have to iterate over the values
Second: Slicing was incorrect.
import numpy as np
import matplotlib.pyplot as plt
data_cube = []
for i in range(10):
a = np.random.rand(100,100)
data_cube.append(a)
data_cube = np.array(data_cube) # Added this step
plt.plot(data_cube[:,12,12]) # Modified the slicing
Output
A less verbose version that avoids iteration:
data_cube = np.random.rand(10, 100,100)
plt.plot(data_cube[:,12,12])
Note: I'm using numpy
import numpy as np
Given 4 arrays of the same (but arbitrary) shape, I am trying to write a function that forms 2x2 matrices from each corresponding element of the arrays, finds the eigenvalues, and returns two arrays of the same shape as the original four, with its elements being eigenvalues (i.e. the resulting arrays would have the same shape as the input, with array1 holding all the first eigenvalues and array2 holding all the second eigenvalues).
I tried doing the following, but unsurprisingly, it gives me an error that says the array is not square.
temp = np.linalg.eig([[m1, m2],[m3, m4]])[0]
I suppose I can make an empty temp variable in the same shape,
temp = np.zeros_like(m1)
and go over each element of the original arrays and repeat the process. My problem is that I want this generalised for arrays of any arbitrary shape (need not be one dimensional). I would guess that finding the shape of the arrays and designing loops to go over each element would not be a very good way of doing it. How do I do this efficiently?
Construct a 2x2x... array:
temp = np.array([[m1, m2], [m3, m4]])
Move the first two dimensions to the end for a ...x2x2 array:
for _ in range(2):
temp = np.rollaxis(temp, 0, temp.ndim)
Call np.linalg.eigvals (which broadcasts) for a ...x2 array of eigenvalues:
eigvals = np.linalg.eigvals(temp)
And split this into an array of first eigenvalues and an array of second eigenvalues:
eigvals1, eigvals2 = eigvals[..., 0], eigvals[..., 1]
I have a list of matrices, some of them have bigger height (.shape[0]) than the others I want to make them all in equal height. So I want to find the height of the biggest matrix and add pad the rest of matrices with the difference so that the content of amtrix stays in the middle. (if difference is not even then add to the bottom one row more than the top. This i my code so far:
def equalize_heights(matrices,maxHeight):
newMatrices = []
matricesNum = len(matrices)
for i in xrange(matricesNum):
matrixHeight = matrices[i].shape[0]
if (matrixHeight == maxHeight):
newMatrices.append(matrices[i])
else:
addToTop = (maxHeight-matrixHeight)/2
addToBottom = (maxHeight-matrixHeight)/2 +((maxHeight-matrixHeight)%2)
now matrixes that are not as high as biggest one should have 'addToTop' rows added to the top of martrix (rows filled iwth 0's) and 'addToBottom' rows added to the bottom.
i think im supposed to use numpy.pad function but I don't understand how exactly.
Keep in mind that np.pad pads in every dimension, not just the height. Consider using np.concatenate instead. Also note that you don't need to pass the maximum height - your function can just calculate that itself.
E.g.:
import numpy as np
matrices = [np.array([[1,2], [1,2]]), np.array([[1,2], [1,2], [1,2]])]
def equalize_heights(matrices):
max_height = matrices[0].shape[0]
for matrix in matrices[1:]:
max_height = max(max_height, matrix.shape[0])
for idx, matrix in enumerate(matrices):
matrices[idx] = np.concatenate((
matrix,
np.zeros((max_height - matrix.shape[0], matrix.shape[1]))
))
Note that this will not center your matrices the way you wanted, but that shouldn't be too difficult. (Put three arrays in the tuple to concatenate, rather than two.
I have an array of values and would like to create a matrix from that, where each row is my starting point vector multiplied by a sample from a (normal) distribution.
The number of rows of this matrix will then vary in dependence from the number of samples I want.
%pylab
my_vec = array([1,2,3])
my_rand_vec = my_vec*randn(100)
Last command does not work, because array shapes do not match.
I could think of using a for loop, but I am trying to leverage on array operations.
Try this
my_rand_vec = my_vec[None,:]*randn(100)[:,None]
For small numbers I get for example
import numpy as np
my_vec = np.array([1,2,3])
my_rand_vec = my_vec[None,:]*np.random.randn(5)[:,None]
my_rand_vec
# array([[ 0.45422416, 0.90844831, 1.36267247],
# [-0.80639766, -1.61279531, -2.41919297],
# [ 0.34203295, 0.6840659 , 1.02609885],
# [-0.55246431, -1.10492863, -1.65739294],
# [-0.83023829, -1.66047658, -2.49071486]])
Your solution my_vec*rand(100) does not work because * corresponds to the element-wise multiplication which only works if both arrays have identical shapes.
What you have to do is adding an additional dimension using [None,:] and [:,None] such that numpy's broadcasting works.
As a side note I would recommend not to use pylab. Instead, use import as in order to include modules as pointed out here.
It is the outer product of vectors:
my_rand_vec = numpy.outer(randn(100), my_vec)
You can pass the dimensions of the array you require to numpy.random.randn:
my_rand_vec = my_vec*np.random.randn(100,3)
To multiply each vector by the same random number, you need to add an extra axis:
my_rand_vec = my_vec*np.random.randn(100)[:,np.newaxis]