In Java you would do it like this: Node[][] nodes; where Node.java is a custom class. How do I do it in python where Node.py:
class Node(object):
def __init__(self):
self.memory = []
self.temporal_groups = []
I have imported numpy and created an object type
typeObject = numpy.dtype('O') # O stands for python objects
nodes = ???
You can try it this way, inside your node class create a function that will return the generic array:
def genArray(a, b):
return [[0 for y in range(a)] for x in range(b)]
then you can assign them the way you want. Maybe you might change the 0 to your node object. Let me know if this helps
You have two easy options: use numpy or declare a nested list. The latter approach is more conceptually similar to Node[][] since it allows for ragged lists, as does Java, but the former approach will probably make processing faster.
numpy arrays
To make an array in numpy:
import numpy as np
x = np.full((m, n), None, dtype=np.object)
In numpy, you have to have some idea about the size of the array (here m, n) up-front. There are ways to grow an array, but they are not very efficient, especially for large arrays. np.full will initialize your array with a copy of whatever reference you want. You can modify the elements as you wish after that.
Python lists
To create a ragged list, you do not have to do much:
x = []
This creates an empty list. This is equivalent to Node[][] in Java because that declares a list too. The main difference is that Python lists can change size and are untyped. They are effectively always Object[].
To add more dimensions to the list, just do x[i] =
[], which will insert a nested list into your outer list. This is similar to defining something like
Node[][] nodes = new Node[m][];
nodes[i] = new Node[n];
where m is the number of nested lists (rows) and n is the number of elements in each list (columns). Again, the main difference is that once you have a list in Python, you can expand or contract it as you Java.
Manipulate with x[i][j] as you would in Java. You can add new sublists by doing x.append([]) or x[i] = [], where i is an index past the end of x.
Related
I wanted to know if there is a way to somehow shift an array by a non integer value in python. Let's say I have a matrix T[i,j] and I want to interpolate the value of T[1.3,4.5] by using T[1,4], T[1,5], T[2,4] and T[2,5]. Is there a simple and fast way to do that?
I have been stuck trying to use scipy.ndimage.shift() for the past few hours but I couldn't understand how to make it work.
EDIT: It seems this can be accomplished with scipy.interpolation.interp2d, though interp2d serves a more general purpose and thus may be slower in this case.
You'd want to do something like iT = interp2d(range(T.shape[0]),range(T.shape[1]),T.T). Please note that, in the T.T at the end, the .T stands for transpose, and has nothing to do with the fact that the array is called T. iT can be accessed as a function, for instance, print(iT(1.1,2.3)).
It will return arrays, as opposed to single values, which indicates that one can pass arrays as arguments too (i.e. compute the value of the interpolation at several points "at once".
I am not aware of any standard way to do this. I'd accomplish it by simply wrapping the array in an instance of some kind of "interpolated array" class. For simplicity, let's start by assuming T is a 1D array:
class InterpolatedArray:
def __init__(self, array):
self.array = array
def __getitem__(self, i):
i0 = int(i)
i1 = i0+1
f = i-i0
return self.array[i0]*(1-f)+self.array[i1]*f
As you can see, it overloads the subscripting routine, so that when you attempt to access (for instance) array[1.1], this returns 0.9*array[1]+0.1*array[2]. You'd have to explicitly build this object from your previous array:
iT = InterpolatedArray(T)
print(iT[1.1])
As for the two-dimensional case, it works the same, just with a little bit more work on the indexing:
class InterpolatedMatrix:
def __init__(self, array):
self.array = array
def __getitem__(self, i, j):
i0 = int(i)
i1 = i0+1
fi = i-i0
j0 = int(j)
j1 = j0+1
fj = j-j0
return self.array[i0,j0]*(1-fi)*(1-fj)\
+self.array[i1,j0]*fi*(1-fj)\
+self.array[i0,j1]*(1-fi)*fj\
+self.array[i1,j1]*fi*fj\
You can probably rewrite some of that code to optimize the amount of operations that are performed.
Note, also, that if for some reason you want to access every index in the image with some small fixed offset (i.e. T[i+0.3,j+0.5] for i,j in T.shape), then it would be better to do this with vectorization, using numpy (something similar may also be possible with scipy).
I have 2 arrays of a million elements (created from an image with the brightness of each pixel)
I need to get a number that is the sum of the products of the array elements of the same name. That is, A(1,1) * B(1,1) + A(1,2) * B(1,2)...
In the loop, python takes the value of the last variable from the loop (j1) and starts running through it, then adds 1 to the penultimate variable and runs through the last one again, and so on. How can I make it count elements of the same name?
res1, res2 - arrays (specifically - numpy.ndarray)
Perhaps there is a ready-made function for this, but I need to make it as open as possible, without a ready-made one.
sum = 0
for i in range(len(res1)):
for j in range(len(res2[i])):
for i1 in range(len(res2)):
for j1 in range(len(res1[i1])):
sum += res1[i][j]*res2[i1][j1]
In the first part of my answer I'll explain how to fix your code directly. Your code is almost correct but contains one big mistake in logic. In the second part of my answer I'll explain how to solve your problem using numpy. numpy is the standard python package to deal with arrays of numbers. If you're manipulating big arrays of numbers, there is no excuse not to use numpy.
Fixing your code
Your code uses 4 nested for-loops, with indices i and j to iterate on the first array, and indices i1 and j1 to iterate on the second array.
Thus you're multiplying every element res1[i][j] from the first array, with every element res2[i1][j1] from the second array. This is not what you want. You only want to multiply every element res1[i][j] from the first array with the corresponding element res2[i][j] from the second array: you should use the same indices for the first and the second array. Thus there should only be two nested for-loops.
s = 0
for i in range(len(res1)):
for j in range(len(res1[i])):
s += res1[i][j] * res2[i][j]
Note that I called the variable s instead of sum. This is because sum is the name of a builtin function in python. Shadowing the name of a builtin is heavily discouraged. Here is the list of builtins: https://docs.python.org/3/library/functions.html ; do not name a variable with a name from that list.
Now, in general, in python, we dislike using range(len(...)) in a for-loop. If you read the official tutorial and its section on for loops, it suggests that for-loop can be used to iterate on elements directly, rather than on indices.
For instance, here is how to iterate on one array, to sum the elements on an array, without using range(len(...)) and without using indices:
# sum the elements in an array
s = 0
for row in res1:
for x in row:
s += x
Here row is a whole row, and x is an element. We don't refer to indices at all.
Useful tools for looping are the builtin functions zip and enumerate:
enumerate can be used if you need access both to the elements, and to their indices;
zip can be used to iterate on two arrays simultaneously.
I won't show an example with enumerate, but zip is exactly what you need since you want to iterate on two arrays:
s = 0
for row1, row2 in zip(res1, res2):
for x, y in zip(row1, row2):
s += x * y
You can also use builtin function sum to write this all without += and without the initial = 0:
s = sum(x * y for row1,row2 in zip(res1, res2) for x,y in zip(row1, row2))
Using numpy
As I mentioned in the introduction, numpy is a standard python package to deal with arrays of numbers. In general, operations on arrays using numpy is much, much faster than loops on arrays in core python. Plus, code using numpy is usually easier to read than code using core python only, because there are a lot of useful functions and convenient notations. For instance, here is a simple way to achieve what you want:
import numpy as np
# convert to numpy arrays
res1 = np.array(res1)
res2 = np.array(res2)
# multiply elements with corresponding elements, then sum
s = (res1 * res2).sum()
Relevant documentation:
sum: .sum() or np.sum();
pointwise multiplication: np.multiply() or *;
dot product: np.dot.
Solution 1:
import numpy as np
a,b = np.array(range(100)), np.array(range(100))
print((a * b).sum())
Solution 2 (more open, because of use of pd.DataFrame):
import pandas as pd
import numpy as np
a,b = np.array(range(100)), np.array(range(100))
df = pd.DataFrame(dict({'col1': a, 'col2': b}))
df['vect_product'] = df.col1 * df.col2
print(df['vect_product'].sum())
Two simple and fast options using numpy are: (A*B).sum() and np.dot(A.ravel(),B.ravel()). The first method sums all elements of the element-wise multiplication of A and B. np.sum() defaults to sum(axis=None), so we will get a single number. In the second method, you create a 1D view into the two matrices and then apply the dot-product method to get a single number.
import numpy as np
A = np.random.rand(1000,1000)
B = np.random.rand(1000,1000)
s = (A*B).sum() # method 1
s = np.dot(A.ravel(),B.ravel()) # method 2
The second method should be extremely fast, as it doesn't create new copies of A and B but a view into them, so no extra memory allocations.
So here is my approach:
def transpose(m):
output = [["null" for i in range(len(m))] for j in range(len(m[0]))]
for i in range(len(m[0])):
for j in range(len(m)):
if i==j:
output[i][j]=m[i][j]
else:
output[i][j]=m[j][i]
return(output)
the above method creates an array/list as a placeholder so that new values can be added. I tried this approach because I am new to python and was previously learning Java which had built-in arrays but python doesn't and I found there was no easy way of indexing 2D lists similar to what we do in java unless we predefine list (like in java but had to use some for loops). I know there are packages which implement arrays but I am fairly new to the language so I tried simulating it the way I was familiar.
So my main question is that is there a better approach to predefine lists for a restricted kinda size (like arrays in java) without these funky for loops. OR even a better way to have predefined list which I can then easily index without needing to append list inside list and all those stuff. Its really difficult for me because it doesn't behave like I want.
Also I made a helper method for prebuilding lists like this:
def arraybuilder(r,c,jagged=[]): #builds an empty placeholder 2D array/list of required size
output=[]
if not jagged:
output = [["null" for i in range(c)] for j in range(r)]
return(output)
else:
noOfColumns=[]
for i in range(len(jagged)):
noOfColumns.append(len(jagged[i]))
for i in range(len(jagged)):
row=[]
for j in range(noOfColumns[i]):
row.append("null")
output.append(row)
return(output,noOfColumns)#returns noOfColumns as well for iteration purposes
The typical transposition pattern for 2d iterables is zip(*...):
def transpose(m):
return [*map(list, zip(*m))]
# same as:
# return [list(col) for col in zip(*m))]
zip(*m) unpacks the nested lists and zips (interleaves) them into column tuples. Since zip returns a lazy iterator over tuples, we consume it into a list while converting all the tuples into lists as well.
And if you want to be more explicit, there are more concise ways of creating a nested list. Here is a nested comprehension:
def transpose(m):
return [[row[c] for row in m] for c in range(len(m[0]))]
What I want to do:
I want to create an array and add each Item from a List to the array. This is what I have so far:
count = 0
arr = []
with open(path,encoding='utf-8-sig') as f:
data = f.readlines() #Data is the List
for s in data:
arr[count] = s
count+=1
What am I doing wrong? The Error I get is IndexError: list assignment index out of range
When you try to access arr at index 0, there is not anything there. What you are trying to do is add to it. You should do arr.append(s)
Your arr is an empty array. So, arr[count] = s is giving that error.
Either you initialize your array with empty elements, or use the append method of array. Since you do not know how many elements you will be entering into the array, it is better to use the append method in this case.
for s in data:
arr.append(s)
count+=1
It's worth taking a step back and asking what you're trying to do here.
f is already an iterable of lines: something you can loop over with for line in f:. But it's "lazy"—once you loop over it once, it's gone. And it's not a sequence—you can loop over it, but you can't randomly access it with indexes or slices like f[20] or f[-10:].
f.readlines() copies that into a list of lines: something you can loop over, and index. While files have the readlines method for this, it isn't really necessary—you can convert any iterable to a list just like this by just calling list(f).
Your loop appears to be an attempt to create another list of the same lines. Which you could do with just list(data). Although it's not clear why you need another list in the first place.
Also, the term "array" betrays some possible confusion.
A Python list is a dynamic array, which can be indexed and modified, but can also be resized by appending, inserting, and deleting elements. So, technically, arr is an array.
But usually when people talk about "arrays" in Python, they mean fixed-size arrays, usually of fixed-size objects, like those provided by the stdlib array module, the third-party numpy library, or special-purpose types like the builtin bytearray.
In general, to convert a list or other iterable into any of these is the same as converting into a list: just call the constructor. For example, if you have a list of numbers between 0-255, you can do bytearray(lst) to get a bytearray of the same numbers. Or, if you have a list of lists of float values, np.array(lst) will give you a 2D numpy array of floats. And so on.
So, why doesn't your code work?
When you write arr = [], you're creating a list of 0 elements.
When you write arr[count] = s, you're trying to set the countth element in the list to s. But there is no countth element. You're writing past the end of the list.
One option is to call arr.append(s) instead. This makes the list 1 element longer than it used to be, and puts s in the new slot.
Another option is to create a list of the right size in the first place, like arr = [None for _ in data]. Then, arr[count] = s can replace the None in the countth slot with s.
But if you really just want a copy of data in another list, you're better off just using arr = list(data), or arr = data[:].
And if you don't have any need for another copy, just do arr = data, or just use data as-is—or even, if it works for your needs, just use f in the first place.
Seems like you are coming from matlab or R background. when you do arr=[], it creates an empty list, its not an array.
import numpy
count = 0
with open(path,encoding='utf-8-sig') as f:
data = f.readlines() #Data is the List
size = len(data)
array = numpy.zeros((size,1))
for s in data:
arr[count,0] = s
count+=1
I'm dealing with a big array D with which I'm running into memory problems. However, the entries of that big array are in fact just copies of elements of a much smaller array B. Now my idea would be to use something like a "dynamic view" into B instead of constructing the full D. For example, is it possible to use a function D_fun like an array which the reads the correct element of B? I.e. something like
def D_fun(B, I, J):
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B[i,j]
And then I could use D_fun to do some matrix and vector multiplications.
Of course, anything else that would keep me form copying the elements of B repeatedly into a huge matrix would be appreciated.
Edit: I realized that if I invest some time in my other code I can get the matrix D to be a block matrix with the Bs on the diagonal and zeros otherwise.
This is usually done by subclassing numpy.ndarray and overloading __getitem__, __setitem__, __delitem__
(array-like access via []) to remap the indices like D_fun(..) does. Still, I am not sure if this will work in combination with the numpy parts implemented in C.
Some concerns:
When you're doing calculations on your big matrix D via the small matrix B, numpy might create a copy of D with its real dimensions, thus using more space than wanted.
If several (I1,J1), (I2,J2).. are mapped to the same (i,j), D[I1,J1] = newValue will also set D(I2,J2) to newValue.
np.dot uses compiled libraries to perform fast matrix products. That constrains the data type (integer, floats), and requires that the data be contiguous. I'd suggest studying this recent question about large dot products, numpy: efficient, large dot products
Defining a class with a custom __getitem__ is a way of accessing a object with indexing syntax. Look in numpy/lib/index_tricks.py for some interesting examples of this, np.mgrid,np.r_, np.s_ etc. But this is largely a syntax enhancement. It doesn't avoid the issues of defining a robust and efficient mapping between your D and B.
And before trying to do much with subclassing ndarray take a look at the implementation for np.matrix or np.ma. scipy.sparse also creates classes that behave like ndarray in many ways, but does not subclass ndarray.
In your D_fun are I and J scalars? If so this conversion would be horribly in efficient. It would be better if they could be arrays, lists or slices (anything that B[atuple] implements), but that can be a lot of work.
def D_fun(B, I, J):
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B[i,j]
def __getitem__(self, atuple):
# sketch of a getitem version of your function
I, J = atuple
<select B based on I,J?>
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B.__getitem__((i,j))
What is the mapping from D to B like? The simplest, and most efficient mapping would be that D is just a higher dimensional collection of B, i.e.
D = np.array([B0,B1,B2,...,Bn])
D[0,...] == B0
Slightly more complicated is the case where D[n1:n2,....] == B0, a slice
But if the B0 values are scattered around D you chances of efficient, reliable mapping a very small.