Memory leak using pywin32com for opc - python

I am having a hard time trying to figure out how to address leaking memory. I think this might be an issue with pywin32 but I am not completely sure.
My code to do reading/writing individual items seems to work just fine, but when using group functions it slowly leaks memory. I suspect this is from the 1-based array that must be passed in the server_handles.
Does anyone know of a work around?
def read_group(self, group, mode=OPC_SYNC, source=OPC_DS_CACHE):
""" Read a group returning a list of Result tuples or None. """
if group not in self.groups:
raise OPCError("Group does not exist, add first.")
items = []
server_handles = []
for name, item in self.items.items():
if item.group == group:
items.append(name)
server_handles.append(item.server_handle)
if not server_handles:
return None
# Note that arrays are 1 based so must add an element to beginning
server_handles.insert(0, 0)
if mode == OPC_SYNC:
val, err, qual, ts = self.groups[group].SyncRead(
Source=source,
NumItems=len(server_handles) - 1,
ServerHandles=server_handles)
# Map the return arrays into a list of result tuples
result = []
for i in range(len(items)):
result.append(Result(
items[i],
val[i],
OPC_QUALITY[qual[i]],
str(ts[i]),
err[i],
))
return result

Related

Delete element of list in pool.map() python

processPool.map(parserMethod, ((inputFile[line:line + chunkSize], sharedQueue) for line in xrange(0, lengthOfFile, chunkSize)))
Here, I am passing control to parserMethod with a tuple of params inputFile[line:line + chunkSize] and a sharedQueue.
Can anyone tell me how I can delete the elements of inputFile[line:line + chunkSize] after it is passed to the parserMethod ?
Thanks !
del inputFile[line:line + chunkSize]
will remove those items. However, your map is stepping through the entire file, which makes me wonder: are you trying to remove them as they're parsed? This requires the map or parser to alter an input argument, which invites trouble.
If you're only trying to save memory usage, it's a little late: you already saved the entire file in InputFile. If you need only to clean up after the parsing, then use the extreme form of delete, once, after the parsing is finished:
del inputFile[:]
If you want to reduce the memory requirement up front, you have to back up a step. Instead of putting the entire file into a list, try making an nice input pipeline. You didn't post the context of this code, so I'm going to use a generic case with a couple of name assumptions:
def line_chunk_stream(input_stream, chunk_size):
# Generator to return a stream of paring units,
# <chunk_size> lines each.
# To make sure you could check the logic here,
# I avoided several Pythonic short-cuts.
line_count = 0
parse_chunk = []
for line in input_stream:
line_count += 1
parse_chunk.append(line)
if line_count % chunk_size == 0:
yield parse_chunk
del parse_chunk[:]
input_stream = open("source_file", 'r')
parse_stream = line_chunk_stream(input_stream, chunk_size)
parserMethod(parse_stream)
I hope that at least one of these solves your underlying problem.

Python Linear Search Better Efficiency

I've got a question regarding Linear Searching in Python. Say I've got the base code of
for l in lines:
for f in search_data:
if my_search_function(l[1],[f[0],f[2]]):
print "Found it!"
break
in which we want to determine where in search_data exists the value stored in l[1]. Say my_search_function() looks like this:
def my_search_function(search_key, search_values):
for s in search_values:
if search_key in s:
return True
return False
Is there any way to increase the speed of processing? Binary Search would not work in this case, as lines and search_data are multidimensional lists and I need to preserve the indexes. I've tried an outside-in approach, i.e.
for line in lines:
negative_index = -1
positive_index = 0
middle_element = len(search_data) /2 if len(search_data) %2 == 0 else (len(search_data)-1) /2
found = False
while positive_index < middle_element:
# print str(positive_index)+","+str(negative_index)
if my_search_function(line[1], [search_data[positive_index][0],search_data[negative_index][0]]):
print "Found it!"
break
positive_index = positive_index +1
negative_index = negative_index -1
However, I'm not seeing any speed increases from this. Does anyone have a better approach? I'm looking to cut the processing speed in half as I'm working with large amounts of CSV and the processing time for one file is > 00:15 which is unacceptable as I'm processing batches of 30+ files. Basically the data I'm searching on is essentially SKUs. A value from lines[0] could be something like AS123JK and a valid match for that value could be AS123. So a HashMap would not work here, unless there exists a way to do partial matches in a HashMap lookup that wouldn't require me breaking down the values like ['AS123', 'AS123J', 'AS123JK'], which is not ideal in this scenario. Thanks!
Binary Search would not work in this case, as lines and search_data are multidimensional lists and I need to preserve the indexes.
Regardless, it may be worth your while to extract the strings (along with some reference to the original data structure) into a flat list, sort it, and perform fast binary searches on it with help of the bisect module.
Or, instead of a large number of searches, sort also a combined list of all the search keys and traverse both lists in parallel, looking for matches. (Proceeding in a similar manner to the merge step in merge sort, without actually outputting a merged list)
Code to illustrate the second approach:
lines = ['AS12', 'AS123', 'AS123J', 'AS123JK','AS124']
search_keys = ['AS123', 'AS125']
try:
iter_keys = iter(sorted(search_keys))
key = next(iter_keys)
for line in sorted(lines):
if line.startswith(key):
print('Line {} matches {}'.format(line, key))
else:
while key < line[:len(key)]:
key = next(iter_keys)
except StopIteration: # all keys processed
pass
Depends on problem detail.
For instance if you search for complete words, you could create a hashtable on searchable elements, and the final search would be a simple lookup.
Filling the hashtable is pseudo-linear.
Ultimately, I was broke down and implemented Binary Search on my multidimensional lists by sorting using the sorted() function with a lambda as a key argument.Here is the first pass code that I whipped up. It's not 100% efficient, but it's a vast improvement from where we were
def binary_search(master_row, source_data,master_search_index, source_search_index):
lower_bound = 0
upper_bound = len(source_data) - 1
found = False
while lower_bound <= upper_bound and not found:
middle_pos = (lower_bound + upper_bound) // 2
if source_data[middle_pos][source_search_index] < master_row[master_search_index]:
if search([source_data[middle_pos][source_search_index]],[master_row[master_search_index]]):
return {"result": True, "index": middle_pos}
break
lower_bound = middle_pos + 1
elif source_data[middle_pos][source_search_index] > master_row[master_search_index] :
if search([master_row[master_search_index]],[source_data[middle_pos][source_search_index]]):
return {"result": True, "index": middle_pos}
break
upper_bound = middle_pos - 1
else:
if len(source_data[middle_pos][source_search_index]) > 5:
return {"result": True, "index": middle_pos}
else:
break
and then where we actually make the Binary Search call
#where master_copy is the first multidimensional list, data_copy is the second
#the search columns are the columns we want to search against
for line in master_copy:
for m in master_search_columns:
found = False
for d in data_search_columns:
data_copy = sorted(data_copy, key=lambda x: x[d], reverse=False)
results = binary_search(line, data_copy,m, d)
found = results["result"]
if found:
line = update_row(line, data_copy[results["index"]], column_mapping)
found_count = found_count +1
break
if found:
break
Here's the info for sorting a multidimensional list Python Sort Multidimensional Array Based on 2nd Element of Subarray

TypeError: 'filter' object is not subscriptable

I am receiving the error
TypeError: 'filter' object is not subscriptable
When trying to run the following block of code
bonds_unique = {}
for bond in bonds_new:
if bond[0] < 0:
ghost_atom = -(bond[0]) - 1
bond_index = 0
elif bond[1] < 0:
ghost_atom = -(bond[1]) - 1
bond_index = 1
else:
bonds_unique[repr(bond)] = bond
continue
if sheet[ghost_atom][1] > r_length or sheet[ghost_atom][1] < 0:
ghost_x = sheet[ghost_atom][0]
ghost_y = sheet[ghost_atom][1] % r_length
image = filter(lambda i: abs(i[0] - ghost_x) < 1e-2 and
abs(i[1] - ghost_y) < 1e-2, sheet)
bond[bond_index] = old_to_new[sheet.index(image[0]) + 1 ]
bond.sort()
#print >> stderr, ghost_atom +1, bond[bond_index], image
bonds_unique[repr(bond)] = bond
# Removing duplicate bonds
bonds_unique = sorted(bonds_unique.values())
And
sheet_new = []
bonds_new = []
old_to_new = {}
sheet=[]
bonds=[]
The error occurs at the line
bond[bond_index] = old_to_new[sheet.index(image[0]) + 1 ]
I apologise that this type of question has been posted on SO many times, but I am fairly new to Python and do not fully understand dictionaries. Am I trying to use a dictionary in a way in which it should not be used, or should I be using a dictionary where I am not using it?
I know that the fix is probably very simple (albeit not to me), and I will be very grateful if someone could point me in the right direction.
Once again, I apologise if this question has been answered already
Thanks,
Chris.
I am using Python IDLE 3.3.1 on Windows 7 64-bit.
filter() in python 3 does not return a list, but an iterable filter object. Use the next() function on it to get the first filtered item:
bond[bond_index] = old_to_new[sheet.index(next(image)) + 1 ]
There is no need to convert it to a list, as you only use the first value.
Iterable objects like filter() produce results on demand rather than all in one go. If your sheet list is very large, it might take a long time and a lot of memory to put all the filtered results into a list, but filter() only needs to evaluate your lambda condition until one of the values from sheet produces a True result to produce one output. You tell the filter() object to scan through sheet for that first value by passing it to the next() function. You could do so multiple times to get multiple values, or use other tools that take iterables to do more complex things; the itertools library is full of such tools. The Python for loop is another such a tool, it too takes values from an iterable one by one.
If you must have access to all filtered results together, because you have to, say, index into the results at will (e.g. because this time your algorithm needed to access index 223, index 17 then index 42) only then convert the iterable object to a list, by using list():
image = list(filter(lambda i: ..., sheet))
The ability to access any of the values of an ordered sequence of values is called random access; a list is such a sequence, and so is a tuple or a numpy array. Iterables do not provide random access.
Use list before filter condtion then it works fine. For me it resolved the issue.
For example
list(filter(lambda x: x%2!=0, mylist))
instead of
filter(lambda x: x%2!=0, mylist)
image = list(filter(lambda i: abs(i[0] - ghost_x) < 1e-2 and abs(i[1] - ghost_y) < 1e-2, sheet))

Size-Incremental Numpy Array in Python

I just came across the need of an incremental Numpy array in Python, and since I haven't found anything I implemented it. I'm just wondering if my way is the best way or you can come up with other ideas.
So, the problem is that I have a 2D array (the program handles nD arrays) for which the size is not known in advance and variable amount of data need to be concatenated to the array in one direction (let's say that I've to call np.vstak a lot of times). Every time I concatenate data, I need to take the array, sort it along axis 0 and do other stuff, so I cannot construct a long list of arrays and then np.vstak the list at once.
Since memory allocation is expensive, I turned to incremental arrays, where I increment the size of the array of a quantity bigger than the size I need (I use 50% increments), so that I minimize the number of allocations.
I coded this up and you can see it in the following code:
class ExpandingArray:
__DEFAULT_ALLOC_INIT_DIM = 10 # default initial dimension for all the axis is nothing is given by the user
__DEFAULT_MAX_INCREMENT = 10 # default value in order to limit the increment of memory allocation
__MAX_INCREMENT = [] # Max increment
__ALLOC_DIMS = [] # Dimensions of the allocated np.array
__DIMS = [] # Dimensions of the view with data on the allocated np.array (__DIMS <= __ALLOC_DIMS)
__ARRAY = [] # Allocated array
def __init__(self,initData,allocInitDim=None,dtype=np.float64,maxIncrement=None):
self.__DIMS = np.array(initData.shape)
self.__MAX_INCREMENT = maxIncrement
if self.__MAX_INCREMENT == None:
self.__MAX_INCREMENT = self.__DEFAULT_MAX_INCREMENT
# Compute the allocation dimensions based on user's input
if allocInitDim == None:
allocInitDim = self.__DIMS.copy()
while np.any( allocInitDim < self.__DIMS ) or np.any(allocInitDim == 0):
for i in range(len(self.__DIMS)):
if allocInitDim[i] == 0:
allocInitDim[i] = self.__DEFAULT_ALLOC_INIT_DIM
if allocInitDim[i] < self.__DIMS[i]:
allocInitDim[i] += min(allocInitDim[i]/2, self.__MAX_INCREMENT)
# Allocate memory
self.__ALLOC_DIMS = allocInitDim
self.__ARRAY = np.zeros(self.__ALLOC_DIMS,dtype=dtype)
# Set initData
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
self.__ARRAY[sliceIdxs] = initData
def shape(self):
return tuple(self.__DIMS)
def getAllocArray(self):
return self.__ARRAY
def getDataArray(self):
"""
Get the view of the array with data
"""
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
return self.__ARRAY[sliceIdxs]
def concatenate(self,X,axis=0):
if axis > len(self.__DIMS):
print "Error: axis number exceed the number of dimensions"
return
# Check dimensions for remaining axis
for i in range(len(self.__DIMS)):
if i != axis:
if X.shape[i] != self.shape()[i]:
print "Error: Dimensions of the input array are not consistent in the axis %d" % i
return
# Check whether allocated memory is enough
needAlloc = False
while self.__ALLOC_DIMS[axis] < self.__DIMS[axis] + X.shape[axis]:
needAlloc = True
# Increase the __ALLOC_DIMS
self.__ALLOC_DIMS[axis] += min(self.__ALLOC_DIMS[axis]/2,self.__MAX_INCREMENT)
# Reallocate memory and copy old data
if needAlloc:
# Allocate
newArray = np.zeros(self.__ALLOC_DIMS)
# Copy
sliceIdxs = [slice(self.__DIMS[i]) for i in range(len(self.__DIMS))]
newArray[sliceIdxs] = self.__ARRAY[sliceIdxs]
self.__ARRAY = newArray
# Concatenate new data
sliceIdxs = []
for i in range(len(self.__DIMS)):
if i != axis:
sliceIdxs.append(slice(self.__DIMS[i]))
else:
sliceIdxs.append(slice(self.__DIMS[i],self.__DIMS[i]+X.shape[i]))
self.__ARRAY[sliceIdxs] = X
self.__DIMS[axis] += X.shape[axis]
The code shows considerably better performances than vstack/hstack several random sized concatenations.
What I'm wondering about is: is it the best way? Is there anything that do this already in numpy?
Further it would be nice to be able to overload the slice assignment operator of np.array, so that as soon as the user assign anything outside the actual dimensions, an ExpandingArray.concatenate() is performed. How to do such overloading?
Testing code: I post here also some code I used to make comparison between vstack and my method. I add up random chunk of data of maximum length 100.
import time
N = 10000
def performEA(N):
EA = ExpandingArray(np.zeros((0,2)),maxIncrement=1000)
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
EA.concatenate(X,axis=0)
# Perform operations on EA.getDataArray()
return EA
def performVStack(N):
A = np.zeros((0,2))
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
A = np.vstack((A,X))
# Perform operations on A
return A
start_EA = time.clock()
EA = performEA(N)
stop_EA = time.clock()
start_VS = time.clock()
VS = performVStack(N)
stop_VS = time.clock()
print "Elapsed Time EA: %.2f" % (stop_EA-start_EA)
print "Elapsed Time VS: %.2f" % (stop_VS-start_VS)
I think the most common design pattern for these things is to just use a list for the small arrays. Sure you could do things like dynamic resizing (if you want to do crazy things, you can try to use the resize array method too). I think a typical method is to always double the size, when you really don't know how large things will be. Of course if you know how large the array will grow to, just allocating the full thing up front is simplest.
def performVStack_fromlist(N):
l = []
for i in range(N):
nNew = np.random.random_integers(low=1,high=100,size=1)
X = np.random.rand(nNew,2)
l.append(X)
return np.vstack(l)
I am sure there are some use cases where an expanding array could be useful (for example when the appending arrays are all very small), but this loop seems better handled with the above pattern. The optimization is mostly about how often you need to copy everything around, and doing a list like this (other then the list itself) this is exactly once here. So it is much faster normally.
When I faced a similar problem, I used ndarray.resize() (http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html#numpy.ndarray.resize). Most of the time, it will avoid reallocation+copying altogether. I can't guarantee it would prove to be faster (it probably would), but it's so much simpler.
As for your second question, I think overriding slice assignment for extending purposes is not a good idea. That operator is meant for assigning to existing items/slices. If you want to change that, it's not immediately clear how you'd want it to behave in some cases, e.g.:
a = MyExtendableArray(np.arange(100))
a[200] = 6 # resize to 200? pad [100:200] with what?
a[90:110] = 7 # assign to existing items AND automagically-allocated items?
a[::-1][200] = 6 # ...
My suggestion is that slice-assignment and data appending should remain separate.

depth-first algorithm in python does not work

I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)
I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation

Categories