string representation of a numpy array with commas separating its elements - python

I have a numpy array, for example:
points = np.array([[-468.927, -11.299, 76.271, -536.723],
[-429.379, -694.915, -214.689, 745.763],
[ 0., 0., 0., 0. ]])
if I print it or turn it into a string with str() I get:
print w_points
[[-468.927 -11.299 76.271 -536.723]
[-429.379 -694.915 -214.689 745.763]
[ 0. 0. 0. 0. ]]
I need to turn it into a string that prints with separating commas while keeping the 2D array structure, that is:
[[-468.927, -11.299, 76.271, -536.723],
[-429.379, -694.915, -214.689, 745.763],
[ 0., 0., 0., 0. ]]
Does anybody know an easy way of turning a numpy array to that form of string?
I know that .tolist() adds the commas but the result loses the 2D structure.

Try using repr
>>> import numpy as np
>>> points = np.array([[-468.927, -11.299, 76.271, -536.723],
... [-429.379, -694.915, -214.689, 745.763],
... [ 0., 0., 0., 0. ]])
>>> print(repr(points))
array([[-468.927, -11.299, 76.271, -536.723],
[-429.379, -694.915, -214.689, 745.763],
[ 0. , 0. , 0. , 0. ]])
If you plan on using large numpy arrays, set np.set_printoptions(threshold=np.nan) first. Without it, the array representation will be truncated after about 1000 entries (by default).
>>> arr = np.arange(1001)
>>> print(repr(arr))
array([ 0, 1, 2, ..., 998, 999, 1000])
Of course, if you have arrays that large, this starts to become less useful and you should probably analyze the data some way other than just looking at it and there are better ways of persisting a numpy array than saving it's repr to a file...

Now, in numpy 1.11, there is numpy.array2string:
In [279]: a = np.reshape(np.arange(25, dtype='int8'), (5, 5))
In [280]: print(np.array2string(a, separator=', '))
[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]]
Comparing with repr from #mgilson (shows "array()" and dtype):
In [281]: print(repr(a))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]], dtype=int8)
P.S. Still need np.set_printoptions(threshold=np.nan) for large array.

The function you are looking for is np.set_string_function. source
What this function does is let you override the default __str__ or __repr__ functions for the numpy objects. If you set the repr flag to True, the __repr__ function will be overriden with your custom function. Likewise, if you set repr=False, the __str__ function will be overriden. Since print calls the __str__ function of the object, we need to set repr=False.
For example:
np.set_string_function(lambda x: repr(x), repr=False)
x = np.arange(5)
print(x)
will print the output
array([0, 1, 2, 3, 4])
A more aesthetically pleasing version is
np.set_string_function(lambda x: repr(x).replace('(', '').replace(')', '').replace('array', '').replace(" ", ' ') , repr=False)
print(np.eye(3))
which gives
[[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]]
Hope this answers your question.

Another way to do it, which is particularly helpful when an object doesn't have a __repr__() method, is to employ Python's pprint module (which has various formatting options). Here is what that looks like, by example:
>>> import numpy as np
>>> import pprint
>>>
>>> A = np.zeros(10, dtype=np.int64)
>>>
>>> print(A)
[0 0 0 0 0 0 0 0 0 0]
>>>
>>> pprint.pprint(A)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Related

Copy and replace values in numpy array.. nans will be nan but other values will be zero

I have a numpy array which I want to copy to another numpy array and replace all values to zero except the nan values. Can you help please?
One way is to use numpy.where.
Data from #GrantWilliams.
Problem 1
import numpy as np
a = np.array([1, 2, np.nan, 4, 5, np.nan])
c = np.array([10, 11, 12, 13, 14, 15])
res1 = np.where(np.isnan(a), np.nan, 0)
array([ 0., 0., nan, 0., 0., nan])
Problem 2
res2 = np.where(np.isnan(a), c, 0)
array([ 0, 0, 12, 0, 0, 15])
If you want to set a 0 array then i guess there is no need to copy the original array. Just make an array of zeros and set the appropriate indices to nan.
import numpy as np
a = np.array([1, 2, np.nan, 4, 5, np.nan])
b = np.zeros(a.shape)
b[np.isnan(a)] = np.nan
outputs
>>> b
[ 0. 0. nan 0. 0. nan]
edit: now that you've updated the problem you can use this instead:
import numpy as np
a = np.array([1, 2, np.nan, 4, 5, np.nan])
b = np.zeros(a.shape)
c = np.array([10, 11, 12, 13, 14, 15])
b[np.isnan(a)] = c[np.isnan(a)]
print(b)
outputs
>>> b
[ 0. 0. 12. 0. 0. 15.]
feel free to change the dtype to int if thats what youre using as well

Creating an N-dimensional grid with Python

I'm trying to create a grid of coordinates for an algorithm that requires and understanding of distance. I know how to do this for a known number of dimensions - like so for 2D:
x = [0,1,2]
y = [10,11,12]
z = np.zeros((3,3,2))
for i,X in enumerate(x):
for j,Y in enumerate(y):
z[i][j][0] = X
z[i][j][1] = Y
print(z)
--------------------------
array([[[ 0., 10.],
[ 0., 11.],
[ 0., 12.]],
[[ 1., 10.],
[ 1., 11.],
[ 1., 12.]],
[[ 2., 10.],
[ 2., 11.],
[ 2., 12.]]])
This works well enough. I end up with a shape of (3,3,2) where the 2 is the values of the coordinates at that point. I'm trying to use this to create a probability surface, so I need to be able to have each point be it's own "location" value. Is there a way to easily extend this into N-dimensions? There I would have an unknown number of for loops. Due to project constraints I have access to Python built-ins and numpy, but that's more or less it.
I've tried np.meshgrid() but it results in an output shape of (2,3,3) and my attempts to reshape it never give me the coordinates in the correct order. Any ideas on how I could do this cleanly?
I can replicate your z with
In [223]: np.stack([np.tile([x],(1,3)).reshape(3,3).T,np.tile([y],(3,1))],2)
Out[223]:
array([[[ 0, 10],
[ 0, 11],
[ 0, 12]],
[[ 1, 10],
[ 1, 11],
[ 1, 12]],
[[ 2, 10],
[ 2, 11],
[ 2, 12]]])
The tile pieces look like
In [224]: np.tile([y],(3,1))
Out[224]:
array([[10, 11, 12],
[10, 11, 12],
[10, 11, 12]])
In [225]: np.tile([x],(1,3)).reshape(3,3).T
Out[225]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
I might be able clean up the 2nd one. But the basic idea is to replicate the inputs in such a way that stack can combine them into the desired (n,n,2) array.
Once this is understood, it shouldn't be hard to extend things to 3d and up. But I haven't fully processed your intentions.
Possibly simpler (and repeat is faster than tile):
np.stack([np.repeat(x,3).reshape(3,3), np.repeat(y,3).reshape(3,3).T], 2)
With more dimensions the transpose might require refinement.
Same thing with meshgrid (it probably uses repeat or tile internally:
In [232]: np.stack(np.meshgrid(x,y, indexing='ij'),2)
Out[232]:
array([[[ 0, 10],
[ 0, 11],
[ 0, 12]],
[[ 1, 10],
[ 1, 11],
[ 1, 12]],
[[ 2, 10],
[ 2, 11],
[ 2, 12]]])
In higher dimensions:
In [237]: np.stack(np.meshgrid([1,2], [10,20,30], [100,200,300,400], indexing='ij'), 3).sum(axis=-1)
Out[237]:
array([[[111, 211, 311, 411],
[121, 221, 321, 421],
[131, 231, 331, 431]],
[[112, 212, 312, 412],
[122, 222, 322, 422],
[132, 232, 332, 432]]])

An elegant way of inserting a numpy matrix into another

I have a requirement where I have 2 2D numpy arrays, and I would like to combine them in a specific manner:
x = [[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]
| | |
0 1 2
y = [[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]
| | |
3 4 5
x op y = [ 0 3 1 4 2 5 ] (in terms of the columns)
In other words,
The combination of x and y should look something like this:
[[ 0., 10., 1., 11., 2., 12.],
[ 3., 13., 4., 14., 5., 15.],
[ 6., 16., 7., 17., 8., 18.]]
Where I alternately combine the columns of each individual array to form the final 2D array. I have come up with one way of doing so, but it is rather ugly. Here's my code:
x = np.arange(9).reshape(3, 3)
y = np.arange(start=10, stop=19).reshape(3, 3)
>>> a = np.zeros((6, 3)) # create a 2D array where num_rows(a) = num_cols(x) + num_cols(y)
>>> a[: : 2] = x.T
>>> a[1: : 2] = y.T
>>> a.T
array([[ 0., 10., 1., 11., 2., 12.],
[ 3., 13., 4., 14., 5., 15.],
[ 6., 16., 7., 17., 8., 18.]])
As you can see, this is a very ugly sequence of operations. Furthermore, things become even more cumbersome in higher dimensions. For example, if you have x and y to be [3 x 3 x 3], then this operation has to be repeated in each dimension. So I'd probably have to tackle this with a loop.
Is there a simpler way around this?
Thanks.
In [524]: x=np.arange(9).reshape(3,3)
In [525]: y=np.arange(10,19).reshape(3,3)
This doesn't look at all ugly to me (one liners are over rated):
In [526]: a = np.zeros((3,6),int)
....
In [528]: a[:,::2]=x
In [529]: a[:,1::2]=y
In [530]: a
Out[530]:
array([[ 0, 10, 1, 11, 2, 12],
[ 3, 13, 4, 14, 5, 15],
[ 6, 16, 7, 17, 8, 18]])
still if you want a one liner, this might do:
In [535]: np.stack((x.T,y.T),axis=1).reshape(6,3).T
Out[535]:
array([[ 0, 10, 1, 11, 2, 12],
[ 3, 13, 4, 14, 5, 15],
[ 6, 16, 7, 17, 8, 18]])
The idea on this last was to combine the arrays on a new dimension, and reshape is some way other. I found it by trial and error.
and with another trial:
In [539]: np.stack((x,y),2).reshape(3,6)
Out[539]:
array([[ 0, 10, 1, 11, 2, 12],
[ 3, 13, 4, 14, 5, 15],
[ 6, 16, 7, 17, 8, 18]])
Here is a compact way to write it with a loop, it might be generalizable to higher dimension arrays with a little work:
x = np.array([[0,1,2], [3,4,5], [6,7,8]])
y = np.array([[10,11,12], [13,14,15], [16,17,18]])
z = np.zeros((3,6))
for i in xrange(3):
z[i] = np.vstack((x.T[i],y.T[i])).reshape((-1,),order='F')

huge matrix sorted and then find smallest elements with their indices into a list

I have a matrix M that is rather large. I am trying to find the top 5 closest distances along with their indices.
M = csr_matrix(M)
dst = pairwise_distances(M,Y=None,metric='euclidean')
dst becomes a huge matrix and I am trying to sort it efficiently or use scipy or sklearn to find the closest 5 distances.
Here is an example of what I am trying to do:
X = np.array([[2, 3, 5], [2, 3, 6], [2, 3, 8], [2, 3, 3], [2, 3, 4]])
I then calculate dst as:
[[ 0. 1. 3. 2. 1.]
[ 1. 0. 2. 3. 2.]
[ 3. 2. 0. 5. 4.]
[ 2. 3. 5. 0. 1.]
[ 1. 2. 4. 1. 0.]]
So, row 0 to itself has a distance of 0., row 0 to 1 has a distance of 1.,... row 2 to row 3 has a distance of 5., and so on. I want to find these closest 5 distances and put them in a list with the corresponding rows, maybe like [distance, row, row]. I don't want any diagonal elements or duplicate elements so I take the upper triangular matrix as follows:
[[ inf 1. 3. 2. 1.]
[ nan inf 2. 3. 2.]
[ nan nan inf 5. 4.]
[ nan nan nan inf 1.]
[ nan nan nan nan inf]]
Now, the top 5 distances least to greatest are:
[1, 0, 1], [1, 0, 4], [1, 3, 4], [2, 1, 2], [2, 0, 3], [2, 1, 4]
As you can see there are three elements that have distance 2 and three elements that have distance 1. From these I want to randomly choose one of the elements with distance 2 to keep as I only want the top f elements where f=5 in this case.
This is just a sample as this matrix could be very large. Is there an efficient way to do the above besides using a basic sorted function? I couldn't find any sklearn or scipy to help me with this.
Here's a fully vectorized solution to your problem:
import numpy as np
from scipy.spatial.distance import pdist
def smallest(M, f):
# compute the condensed distance matrix
dst = pdist(M, 'euclidean')
# indices of the upper triangular matrix
rows, cols = np.triu_indices(M.shape[0], k=1)
# indices of the f smallest distances
idx = np.argsort(dst)[:f]
# gather results in the specified format: distance, row, column
return np.vstack((dst[idx], rows[idx], cols[idx])).T
Notice that np.argsort(dst)[:f] yields the indices of the smallest f elements of the condensed distance matrix dst sorted in ascending order.
The following demo reproduces the result of your toy example and shows how the function smallest deals with a fairly large matrix of integers:
In [59]: X = np.array([[2, 3, 5], [2, 3, 6], [2, 3, 8], [2, 3, 3], [2, 3, 4]])
In [60]: smallest(X, 5)
Out[60]:
array([[ 1., 0., 1.],
[ 1., 0., 4.],
[ 1., 3., 4.],
[ 2., 0., 3.],
[ 2., 1., 2.]])
In [61]: large_X = np.random.randint(100, size=(10000, 2000))
In [62]: large_X
Out[62]:
array([[ 8, 78, 97, ..., 23, 93, 90],
[42, 2, 21, ..., 68, 45, 62],
[28, 45, 30, ..., 0, 75, 48],
...,
[26, 88, 78, ..., 0, 88, 43],
[91, 53, 94, ..., 85, 44, 37],
[39, 8, 10, ..., 46, 15, 67]])
In [63]: %time smallest(large_X, 5)
Wall time: 1min 32s
Out[63]:
array([[ 1676.12529365, 4815. , 5863. ],
[ 1692.97253374, 1628. , 2950. ],
[ 1693.558384 , 5742. , 8240. ],
[ 1695.86408654, 2140. , 6969. ],
[ 1696.68853948, 5477. , 6641. ]])

numpy.resize() rearanging instead of resizing?

I'm trying to resize numpy array, but it seems that the resize works by first flattening the array, then getting first X*Y elem and putting them in the new shape. What I want to do instead is to cut the array at coord 3,3, not rearrange it. Similar thing happens when I try to upsize it say to 7,7 ... instead of "rearranging" I want to fill the new cols and rows with zeros and keep the data as it is.
Is there a way to do that ?
> a = np.zeros((5,5))
> a.flat = range(25)
> a
array(
[[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.],
[ 20., 21., 22., 23., 24.]])
> a.resize((3,3),refcheck=False)
> a
array(
[[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
thank you ...
Upsizing to 7x7 goes like this
upsized = np.zeros([7, 7])
upsized[:5, :5] = a
I believe you want to use numpy's slicing syntax instead of resize. resize works by first raveling the array and working with a 1D view.
>>> a = np.arange(25).reshape(5,5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
>>> a[:3,:3]
array([[ 0, 1, 2],
[ 5, 6, 7],
[10, 11, 12]])
What you are doing here is taking a view of the numpy array. For example to update the original array by slicing:
>>> a[:3,:3] = 0
>>> a
array([[ 0, 0, 0, 3, 4],
[ 0, 0, 0, 8, 9],
[ 0, 0, 0, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
An excellent guide on numpy's slicing syntax can be found here.
Upsizing (or padding) only works by making a copy of the data. You start with an array of zeros and fill in appropriately
upsized = np.zeros([7, 7])
upsized[:5, :5] = a

Categories