I want to use a matrix in my Python code but I don't know the exact size of my matrix to define it.
For other matrices, I have used np.zeros(a), where a is known.
What should I do to define a matrix with unknown size?
In this case, maybe an approach is to use a python list and append to it, up until it has the desired size, then cast it to a np array
pseudocode:
matrix = []
while matrix not full:
matrix.append(elt)
matrix = np.array(matrix)
You could write a function that tries to modify the np.array, and expand if it encounters an IndexError:
x = np.random.normal(size=(2,2))
r,c = (5,10)
try:
x[r,c] = val
except IndexError:
r0,c0 = x.shape
r_ = r+1-r0
c_ = c+1-c0
if r > 0:
x = np.concatenate([x,np.zeros((r_,x.shape[1]))], axis = 0)
if c > 0:
x = np.concatenate([x,np.zeros((x.shape[0],c_))], axis = 1)
There are problems with this implementation though: First, it makes a copy of the array and returns a concatenation of it, which translates to a possible bottleneck if you use it many times. Second, the code I provided only works if you're modifying a single element. You could do it for slices, and it would take more effort to modify the code; or you can go the whole nine yards and create a new object inheriting np.array and override the .__getitem__ and .__setitem__ methods.
Or you could just use a huge matrix, or better yet, see if you can avoid having to work with matrices of unknown size.
If you have a python generator you can use np.fromiter:
def gen():
yield 1
yield 2
yield 3
In [11]: np.fromiter(gen(), dtype='int64')
Out[11]: array([1, 2, 3])
Beware if you pass an infinite iterator you will most likely crash python, so it's often a good idea to cap the length (with the count argument):
In [21]: from itertools import count # an infinite iterator
In [22]: np.fromiter(count(), dtype='int64', count=3)
Out[22]: array([0, 1, 2])
Best practice is usually to either pre-allocate (if you know the size) or build the array as a list first (using list.append). But lists don't build in 2d very well, which I assume you want since you specified a "matrix."
In that case, I'd suggest pre-allocating an oversize scipy.sparse matrix. These can be defined to have a size much larger than your memory, and lil_matrix or dok_matrix can be built sequentially. Then you can pare it down once you enter all of your data.
from scipy.sparse import dok_matrix
dummy = dok_matrix((1000000, 1000000)) # as big as you think you might need
for i, j, data in generator():
dummy[i,j] = data
s = np.array(dummy.keys).max() + 1
M = dummy.tocoo[:s,:s] #or tocsr, tobsr, toarray . . .
This way you build your array as a Dictionary of Keys (dictionaries supporting dynamic assignment much better than ndarray does) , but still have a matrix-like output that can be (somewhat) efficiently used for math, even in a partially built state.
Related
I have 2 arrays of a million elements (created from an image with the brightness of each pixel)
I need to get a number that is the sum of the products of the array elements of the same name. That is, A(1,1) * B(1,1) + A(1,2) * B(1,2)...
In the loop, python takes the value of the last variable from the loop (j1) and starts running through it, then adds 1 to the penultimate variable and runs through the last one again, and so on. How can I make it count elements of the same name?
res1, res2 - arrays (specifically - numpy.ndarray)
Perhaps there is a ready-made function for this, but I need to make it as open as possible, without a ready-made one.
sum = 0
for i in range(len(res1)):
for j in range(len(res2[i])):
for i1 in range(len(res2)):
for j1 in range(len(res1[i1])):
sum += res1[i][j]*res2[i1][j1]
In the first part of my answer I'll explain how to fix your code directly. Your code is almost correct but contains one big mistake in logic. In the second part of my answer I'll explain how to solve your problem using numpy. numpy is the standard python package to deal with arrays of numbers. If you're manipulating big arrays of numbers, there is no excuse not to use numpy.
Fixing your code
Your code uses 4 nested for-loops, with indices i and j to iterate on the first array, and indices i1 and j1 to iterate on the second array.
Thus you're multiplying every element res1[i][j] from the first array, with every element res2[i1][j1] from the second array. This is not what you want. You only want to multiply every element res1[i][j] from the first array with the corresponding element res2[i][j] from the second array: you should use the same indices for the first and the second array. Thus there should only be two nested for-loops.
s = 0
for i in range(len(res1)):
for j in range(len(res1[i])):
s += res1[i][j] * res2[i][j]
Note that I called the variable s instead of sum. This is because sum is the name of a builtin function in python. Shadowing the name of a builtin is heavily discouraged. Here is the list of builtins: https://docs.python.org/3/library/functions.html ; do not name a variable with a name from that list.
Now, in general, in python, we dislike using range(len(...)) in a for-loop. If you read the official tutorial and its section on for loops, it suggests that for-loop can be used to iterate on elements directly, rather than on indices.
For instance, here is how to iterate on one array, to sum the elements on an array, without using range(len(...)) and without using indices:
# sum the elements in an array
s = 0
for row in res1:
for x in row:
s += x
Here row is a whole row, and x is an element. We don't refer to indices at all.
Useful tools for looping are the builtin functions zip and enumerate:
enumerate can be used if you need access both to the elements, and to their indices;
zip can be used to iterate on two arrays simultaneously.
I won't show an example with enumerate, but zip is exactly what you need since you want to iterate on two arrays:
s = 0
for row1, row2 in zip(res1, res2):
for x, y in zip(row1, row2):
s += x * y
You can also use builtin function sum to write this all without += and without the initial = 0:
s = sum(x * y for row1,row2 in zip(res1, res2) for x,y in zip(row1, row2))
Using numpy
As I mentioned in the introduction, numpy is a standard python package to deal with arrays of numbers. In general, operations on arrays using numpy is much, much faster than loops on arrays in core python. Plus, code using numpy is usually easier to read than code using core python only, because there are a lot of useful functions and convenient notations. For instance, here is a simple way to achieve what you want:
import numpy as np
# convert to numpy arrays
res1 = np.array(res1)
res2 = np.array(res2)
# multiply elements with corresponding elements, then sum
s = (res1 * res2).sum()
Relevant documentation:
sum: .sum() or np.sum();
pointwise multiplication: np.multiply() or *;
dot product: np.dot.
Solution 1:
import numpy as np
a,b = np.array(range(100)), np.array(range(100))
print((a * b).sum())
Solution 2 (more open, because of use of pd.DataFrame):
import pandas as pd
import numpy as np
a,b = np.array(range(100)), np.array(range(100))
df = pd.DataFrame(dict({'col1': a, 'col2': b}))
df['vect_product'] = df.col1 * df.col2
print(df['vect_product'].sum())
Two simple and fast options using numpy are: (A*B).sum() and np.dot(A.ravel(),B.ravel()). The first method sums all elements of the element-wise multiplication of A and B. np.sum() defaults to sum(axis=None), so we will get a single number. In the second method, you create a 1D view into the two matrices and then apply the dot-product method to get a single number.
import numpy as np
A = np.random.rand(1000,1000)
B = np.random.rand(1000,1000)
s = (A*B).sum() # method 1
s = np.dot(A.ravel(),B.ravel()) # method 2
The second method should be extremely fast, as it doesn't create new copies of A and B but a view into them, so no extra memory allocations.
I am trying to 'expand' an array (generate a new array with proportionally more elements in all dimensions). I have an array with known numbers (let's call it X) and I want to make it j times bigger (in each dimension).
So far I generated a new array of zeros with more elements, then I used broadcasting to insert the original numbers in the new array (at fixed intervals).
Finally, I used linspace to fill the gaps, but this part is actually not directly relevant to the question.
The code I used (for n=3) is:
import numpy as np
new_shape = (np.array(X.shape) - 1 ) * ratio + 1
new_array = np.zeros(shape=new_shape)
new_array[::ratio,::ratio,::ratio] = X
My problem is that this is not general, I would have to modify the third line based on ndim. Is there a way to use such broadcasting for any number of dimensions in my array?
Edit: to be more precise, the third line would have to be:
new_array[::ratio,::ratio] = X
if ndim=2
or
new_array[::ratio,::ratio,::ratio,::ratio] = X
if ndim=4
etc. etc. I want to avoid having to write code for each case of ndim
p.s. If there is a better tool to do the entire process (such as 'inner-padding' that I am not aware of, I will be happy to learn about it).
Thank you
array = array[..., np.newaxis] will add another dimension
This article might help
You can use slice notation -
slicer = tuple(slice(None,None,ratio) for i in range(X.ndim))
new_array[slicer] = X
Build the slicing tuple manually. ::ratio is equivalent to slice(None, None, ratio):
new_array[(slice(None, None, ratio),)*new_array.ndim] = ...
I work with Python 2.7, numpy and pandas.
I have :
a function y=f(x) where both x and y are scalars.
a one-dimensional array of scalars of length n : [x0, x1, ..., x(n-1)]
I need to construct a 2-dimensional array D[i,j]=f(xi)*f(xj) where i,j are indices in [0,...,n-1].
I could use loops and/or a comprehension list, but that would be slow. I would like to use a vectorized approach instead.
I thought that "numpy.indices" would help me (see Create a numpy matrix with elements a function of indices), but I admit I am at a loss on how to use that command for my purpose.
Thanks in advance!
Ignore the comments that dismiss vectorization; it's a good habit to have, and it does deliver performance with the right accelerators. Anyway, what I really wanted to say was that you want to find the outer product:
x_ = numpy.array(x)
y = f(x_)
numpy.outer(y, y)
If you're working with numbers you should be working with numpy data structures anyway. Then you get fast, readable code like this.
I would like to use a vectorized approach instead.
You sound like you might be a Matlab user -- you should be aware that numpy's vectorize function provides no performance benefit:
The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.
Unless it just so happens that there's already an operation in numpy that does exactly what you want, you're going to be stuck with numpy.vectorize and nothing to really gain over a for loop. With that being said, you should be able to do that like so:
def makeArray():
a = [1, 2, 3, 4]
def addTo(arr):
return f(a[math.floor(arr/4)]) * f(a[arr % 4])
vecAdd = numpy.vectorize(addTo)
return vecAdd(numpy.arange(4 * 4).reshape(4, 4))
EDIT:
If f is actually a one-dimensional array, you can do this:
f_matrix = numpy.matrix(f)
D = f_matrix.T * f_matrix
You can use fromfunc to vectorize the function then use the dot product to multiply:
f2 = numpy.fromfunc(f, 1, 1) # vectorize the function
res1 = f2(x) # get the results for f(x)
res1 = res1[np.newaxis] # result has to be 2D for the next step
res2 = np.dot(a.T, a) # get f(xi)*f(xj)
I'm writing some modelling routines in NumPy that need to select cells randomly from a NumPy array and do some processing on them. All cells must be selected without replacement (as in, once a cell has been selected it can't be selected again, but all cells must be selected by the end).
I'm transitioning from IDL where I can find a nice way to do this, but I assume that NumPy has a nice way to do this too. What would you suggest?
Update: I should have stated that I'm trying to do this on 2D arrays, and therefore get a set of 2D indices back.
How about using numpy.random.shuffle or numpy.random.permutation if you still need the original array?
If you need to change the array in-place than you can create an index array like this:
your_array = <some numpy array>
index_array = numpy.arange(your_array.size)
numpy.random.shuffle(index_array)
print your_array[index_array[:10]]
All of these answers seemed a little convoluted to me.
I'm assuming that you have a multi-dimensional array from which you want to generate an exhaustive list of indices. You'd like these indices shuffled so you can then access each of the array elements in a randomly order.
The following code will do this in a simple and straight-forward manner:
#!/usr/bin/python
import numpy as np
#Define a two-dimensional array
#Use any number of dimensions, and dimensions of any size
d=numpy.zeros(30).reshape((5,6))
#Get a list of indices for an array of this shape
indices=list(np.ndindex(d.shape))
#Shuffle the indices in-place
np.random.shuffle(indices)
#Access array elements using the indices to do cool stuff
for i in indices:
d[i]=5
print d
Printing d verified that all elements have been accessed.
Note that the array can have any number of dimensions and that the dimensions can be of any size.
The only downside to this approach is that if d is large, then indices may become pretty sizable. Therefore, it would be nice to have a generator. Sadly, I can't think of how to build a shuffled iterator off-handedly.
Extending the nice answer from #WoLpH
For a 2D array I think it will depend on what you want or need to know about the indices.
You could do something like this:
data = np.arange(25).reshape((5,5))
x, y = np.where( a = a)
idx = zip(x,y)
np.random.shuffle(idx)
OR
data = np.arange(25).reshape((5,5))
grid = np.indices(data.shape)
idx = zip( grid[0].ravel(), grid[1].ravel() )
np.random.shuffle(idx)
You can then use the list idx to iterate over randomly ordered 2D array indices as you wish, and to get the values at that index out of the data which remains unchanged.
Note: You could also generate the randomly ordered indices via itertools.product too, in case you are more comfortable with this set of tools.
Use random.sample to generates ints in 0 .. A.size with no duplicates,
then split them to index pairs:
import random
import numpy as np
def randint2_nodup( nsample, A ):
""" uniform int pairs, no dups:
r = randint2_nodup( nsample, A )
A[r]
for jk in zip(*r):
... A[jk]
"""
assert A.ndim == 2
sample = np.array( random.sample( xrange( A.size ), nsample )) # nodup ints
return sample // A.shape[1], sample % A.shape[1] # pairs
if __name__ == "__main__":
import sys
nsample = 8
ncol = 5
exec "\n".join( sys.argv[1:] ) # run this.py N= ...
A = np.arange( 0, 2*ncol ).reshape((2,ncol))
r = randint2_nodup( nsample, A )
print "r:", r
print "A[r]:", A[r]
for jk in zip(*r):
print jk, A[jk]
Let's say you have an array of data points of size 8x3
data = np.arange(50,74).reshape(8,-1)
If you truly want to sample, as you say, all the indices as 2d pairs, the most compact way to do this that i can think of, is:
#generate a permutation of data's size, coerced to data's shape
idxs = divmod(np.random.permutation(data.size),data.shape[1])
#iterate over it
for x,y in zip(*idxs):
#do something to data[x,y] here
pass
Moe generally, though, one often does not need to access 2d arrays as 2d array simply to shuffle 'em, in which case one can be yet more compact. just make a 1d view onto the array and save yourself some index-wrangling.
flat_data = data.ravel()
flat_idxs = np.random.permutation(flat_data.size)
for i in flat_idxs:
#do something to flat_data[i] here
pass
This will still permute the 2d "original" array as you'd like. To see this, try:
flat_data[12] = 1000000
print data[4,0]
#returns 1000000
people using numpy version 1.7 or later there can also use the builtin function numpy.random.choice
What's the best way to create 2D arrays in Python?
What I want is want is to store values like this:
X , Y , Z
so that I access data like X[2],Y[2],Z[2] or X[n],Y[n],Z[n] where n is variable.
I don't know in the beginning how big n would be so I would like to append values at the end.
>>> a = []
>>> for i in xrange(3):
... a.append([])
... for j in xrange(3):
... a[i].append(i+j)
...
>>> a
[[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
Depending what you're doing, you may not really have a 2-D array.
80% of the time you have simple list of "row-like objects", which might be proper sequences.
myArray = [ ('pi',3.14159,'r',2), ('e',2.71828,'theta',.5) ]
myArray[0][1] == 3.14159
myArray[1][1] == 2.71828
More often, they're instances of a class or a dictionary or a set or something more interesting that you didn't have in your previous languages.
myArray = [ {'pi':3.1415925,'r':2}, {'e':2.71828,'theta':.5} ]
20% of the time you have a dictionary, keyed by a pair
myArray = { (2009,'aug'):(some,tuple,of,values), (2009,'sep'):(some,other,tuple) }
Rarely, will you actually need a matrix.
You have a large, large number of collection classes in Python. Odds are good that you have something more interesting than a matrix.
In Python one would usually use lists for this purpose. Lists can be nested arbitrarily, thus allowing the creation of a 2D array. Not every sublist needs to be the same size, so that solves your other problem. Have a look at the examples I linked to.
If you want to do some serious work with arrays then you should use the numpy library. This will allow you for example to do vector addition and matrix multiplication, and for large arrays it is much faster than Python lists.
However, numpy requires that the size is predefined. Of course you can also store numpy arrays in a list, like:
import numpy as np
vec_list = [np.zeros((3,)) for _ in range(10)]
vec_list.append(np.array([1,2,3]))
vec_sum = vec_list[0] + vec_list[1] # possible because we use numpy
print vec_list[10][2] # prints 3
But since your numpy arrays are pretty small I guess there is some overhead compared to using a tuple. It all depends on your priorities.
See also this other question, which is pretty similar (apart from the variable size).
I would suggest that you use a dictionary like so:
arr = {}
arr[1] = (1, 2, 4)
arr[18] = (3, 4, 5)
print(arr[1])
>>> (1, 2, 4)
If you're not sure an entry is defined in the dictionary, you'll need a validation mechanism when calling "arr[x]", e.g. try-except.
If you are concerned about memory footprint, the Python standard library contains the array module; these arrays contain elements of the same type.
Please consider the follwing codes:
from numpy import zeros
scores = zeros((len(chain1),len(chain2)), float)
x=list()
def enter(n):
y=list()
for i in range(0,n):
y.append(int(input("Enter ")))
return y
for i in range(0,2):
x.insert(i,enter(2))
print (x)
here i made function to create 1-D array and inserted into another array as a array member. multiple 1-d array inside a an array, as the value of n and i changes u create multi dimensional arrays