Separate overlapping polygons into regions - python

I cam across the following post under Stack Overflow: Exploding overlapping polygons
I downloaded the source code that was posted by the initial author of the post and made adjustments trying to get it to work, but I'm currently receiving the following error message and not sure how to resolve it, please be advised that I'm still learning to code, so I'm lacking fundamental theory.
Error Message: Text
Executing: OverlapReg
E:\Projects\2015\H111225_6\ArcHydro\27Jan15\01SouthNorthAlign\OverlappingWatershedsAnalysis.gdb\Watershed
HydroID2 Start Time: Wed Mar 11 14:58:32 2015 Running script
OverlapReg... Failed script OverlapReg...
Traceback (most recent call last): File
"E:\Python\Masters\Scripts\OverlappingRegions\OverlappingRegions.py",
line 59, in
countOverlaps(fc,idName) File "E:\Python\Masters\Scripts\OverlappingRegions\OverlappingRegions.py",
line 58, in countOverlaps
urows.updateRow(urow) File "c:\program files (x86)\arcgis\desktop10.2\arcpy\arcpy\arcobjects\arcobjects.py", line
102, in updateRow
return convertArcObjectToPythonObject(self._arc_object.UpdateRow(*gp_fixargs(args)))
RuntimeError: ERROR 999999: Error executing function. The row contains
a bad value. [Watershed] The row contains a bad value. [overlaps]
Failed to execute (OverlapReg). Failed at Wed Mar 11 14:58:35 2015
(Elapsed Time: 2.45 seconds)
I'm trying to assign id's to my Watershed Feature Class based on the following code to be able to split my Watershed Feature Class into the least amount of separate feature classes where the Watersheds don't overlap each other, as I need to export them into a AutoCAD drawing where there are not overlapping features within a single layer.
import os
import arcpy
from arcpy import GetParameterAsText
fc = GetParameterAsText(0)
idName = GetParameterAsText(1)
dirname = os.path.dirname(arcpy.Describe(fc).catalogPath)
desc = arcpy.Describe(dirname)
if hasattr(desc, "datasetType") and desc.datasetType=='FeatureDataset':
dirname = os.path.dirname(dirname)
arcpy.env.workspace = dirname
def countOverlaps(fc,idName):
intersect = arcpy.Intersect_analysis(fc,'intersect')
findID = arcpy.FindIdentical_management(intersect,"explFindID","Shape")
arcpy.MakeFeatureLayer_management(intersect,"intlyr")
arcpy.AddJoin_management("intlyr",arcpy.Describe("intlyr").OIDfieldName,findID,"IN_FID","KEEP_ALL")
segIDs = {}
featseqName = "explFindID.FEAT_SEQ"
idNewName = "intersect."+idName
for row in arcpy.SearchCursor("intlyr"):
idVal = row.getValue(idNewName)
featseqVal = row.getValue(featseqName)
segIDs[featseqVal] = []
for row in arcpy.SearchCursor("intlyr"):
idVal = row.getValue(idNewName)
featseqVal = row.getValue(featseqName)
segIDs[featseqVal].append(idVal)
segIDs2 = {}
for row in arcpy.SearchCursor("intlyr"):
idVal = row.getValue(idNewName)
segIDs2[idVal] = []
for x,y in segIDs.iteritems():
for segID in y:
segIDs2[segID].extend([k for k in y if k != segID])
for x,y in segIDs2.iteritems():
segIDs2[x] = list(set(y))
arcpy.RemoveJoin_management("intlyr",arcpy.Describe(findID).name)
if 'overlaps' not in [k.name for k in arcpy.ListFields(fc)]:
arcpy.AddField_management(fc,'overlaps',"TEXT")
if 'ovlpCount' not in [k.name for k in arcpy.ListFields(fc)]:
arcpy.AddField_management(fc,'ovlpCount',"SHORT")
urows = arcpy.UpdateCursor(fc)
for urow in urows:
idVal = urow.getValue(idName)
if segIDs2.get(idVal):
urow.overlaps = str(segIDs2[idVal]).strip('[]')
urow.ovlpCount = len(segIDs2[idVal])
urows.updateRow(urow)
countOverlaps(fc,idName)
def explodeOverlaps(fc,idName):
countOverlaps(fc,idName)
arcpy.AddField_management(fc,'expl',"SHORT")
urows = arcpy.UpdateCursor(fc,'"overlaps" IS NULL')
for urow in urows:
urow.expl = 1
urows.updateRow(urow)
i=1
lyr = arcpy.MakeFeatureLayer_management(fc)
while int(arcpy.GetCount_management(arcpy.SelectLayerByAttribute_management(lyr,"NEW_SELECTION",'"expl" IS NULL')).getOutput(0)) > 0:
ovList=[]
urows = arcpy.UpdateCursor(fc,'"expl" IS NULL','','','ovlpCount D')
for urow in urows:
ovVal = urow.overlaps
idVal = urow.getValue(idName)
intList = ovVal.replace(' ','').split(',')
for x in intList:
intList[intList.index(x)] = int(x)
if idVal not in ovList:
urow.expl = i
urows.updateRow(urow)
ovList.extend(intList)
i+=1
explodeOverlaps(fc,idName)
Any assistance in how to resolve the following will truly be appreciated.

The clues are in the errors.
the row contains a bad value [Watershed]
the row contains a bad value [overlaps]
This is likely cause by trying to insert a value into the field overlaps, but due to something with the field properties like the length is 4 and your value is "long string", it therefore is too big to be inserted.
ESRI
GIS Stack Exchange

Related

How to Access WorkPlanes of Occurrences in Assemblies using the Inventor API with Python

I want to add constraints between the planes of the base-coordinate systems of components in an Inventor assmbly using the Inventor API with Python. The placement of the components works. My issue is, that i cannot access WorkPlanes in the Definition of my individual Occurrences.
My Code looks like this:
import win32com.client as win32
project_folder = "C:\\Users\\User_1\\210608_project\\"
#initialization
inv = win32.gencache.EnsureDispatch('Inventor.Application')
inv.Visible = True
#Open a new assembly
inv.Documents.Add(win32.constants.kAssemblyDocumentObject, "", True)
invActDoc = inv.ActiveDocument
invAssDoc = win32.CastTo(invActDoc, 'AssemblyDocument')
#Create the transient matrices
oTG = inv.TransientGeometry
oMatrix = oTG.CreateMatrix()
#Add component to assembly
invAssDocDef = invAssDoc.ComponentDefinition
invAssOcc = invAssDocDef.Occurrences
occ1 = invAssOcc.Add(project_folder + 'generic_part_1.ipt', oMatrix)
occ2 = invAssOcc.Add(project_folder + 'generic_part_2.ipt', oMatrix)
#create constraints
#get the Planes of the Base-Coordinate-System of Part 1
wp_YZ_1 = occ1.Definition.WorkPlanes.Item(1)
wp_XZ_1 = occ1.Definition.WorkPlanes.Item(2)
wp_XY_1 = occ1.Definition.WorkPlanes.Item(3)
#get the Planes of the Base-Coordinate-System of Part 2
wp_YZ_2 = occ2.Definition.WorkPlanes.Item(1)
wp_XZ_2 = occ2.Definition.WorkPlanes.Item(2)
wp_XY_2 = occ2.Definition.WorkPlanes.Item(3)
#Add the constraints
AssCons = invAssDoc.ComponentDefinition.Constraints
AssCons.AddFlushConstraint(wp_YZ_1, wp_YZ_2, 0)
AssCons.AddFlushConstraint(wp_XZ_1, wp_XZ_2, 0)
AssCons.AddFlushConstraint(wp_XY_1, wp_XY_2, 0)
It breaks when i try to get the WorkPlanes:
Traceback (most recent call last):
File "C:/Users/User1/210608_projekt/how_to_constrain_components_in_assemblies.py", line 27, in <module>
wp1 = occ1.Definition.WorkPlanes.Item(1)
File "C:\Program Files\Python37\lib\site-packages\win32com\client\__init__.py", line 473, in __getattr__
raise AttributeError("'%s' object has no attribute '%s'" % (repr(self), attr))
AttributeError: '<win32com.gen_py.Autodesk Inventor Object Library.ComponentDefinition instance at 0x2748634277928>' object has no attribute 'WorkPlanes'
This happens to everything that I tried, that is inside of Occurrence.Item(i).Definition.
If i open the same assembly in a VBA script, everything is where it should be. Am I missing something about working with occurrences using the API?
You need to create WorkPlaneProxy object of the WorkPlane. It means representation of workplane defined in part in context of specific occurrence in the assembly
Here is part of VB.NET code
'Define variables for workplane proxy
Dim wp_YZ_1_proxy As WorkPlaneProxy
Dim wp_YZ_2_proxy As WorkPlaneProxy
'You need to pass result variable as argument
' ByRef in VB.NET, out in C#
'I don't know how to do in Python
occ1.CreateGeometryProxy(wp_YZ_1, wp_YZ_1_proxy)
occ2.CreateGeometryProxy(wp_YZ_2, wp_YZ_2_proxy)
Dim AssCons As AssemblyConstraints = asm.Constraints
'Use this proxies for constraint creation
AssCons.AddFlushConstraint(wp_YZ_1_proxy, wp_YZ_2_proxy, 0)
So this is how the solution/replacement of the last 3 blocks looks like:
#cast the definitions to PartComponentDefinition
occ1_def = win32.CastTo(occ1.Definition, 'PartComponentDefinition')
occ2_def = win32.CastTo(occ2.Definition, 'PartComponentDefinition')
#create constraints
#get the Planes of the Base-Coordinate-System of Part 1
wp_YZ_1 = occ1_def.WorkPlanes.Item(1)
wp_XZ_1 = occ1_def.WorkPlanes.Item(2)
wp_XY_1 = occ1_def.WorkPlanes.Item(3)
#create Geometry-Proxys for Workplanes of Comp1
wp_YZ_1_proxy = occ1.CreateGeometryProxy(wp_YZ_1)
wp_XZ_1_proxy = occ1.CreateGeometryProxy(wp_XZ_1)
wp_XY_1_proxy = occ1.CreateGeometryProxy(wp_XY_1)
#get the Planes of the Base-Coordinate-System of Part 2
wp_YZ_2 = occ2_def.WorkPlanes.Item(1)
wp_XZ_2 = occ2_def.WorkPlanes.Item(2)
wp_XY_2 = occ2_def.WorkPlanes.Item(3)
#create Geometry-Proxys for Workplanes of Comp2
wp_YZ_2_proxy = occ2.CreateGeometryProxy(wp_YZ_2)
wp_XZ_2_proxy = occ2.CreateGeometryProxy(wp_XZ_2)
wp_XY_2_proxy = occ2.CreateGeometryProxy(wp_XY_2)
#Add the constraints
AssCons = invAssDoc.ComponentDefinition.Constraints
AssCons.AddFlushConstraint(wp_YZ_1_proxy, wp_YZ_2_proxy, 0)
AssCons.AddFlushConstraint(wp_XZ_1_proxy, wp_XZ_2_proxy, 0)
AssCons.AddFlushConstraint(wp_XY_1_proxy, wp_XY_2_proxy, 0)

Cannot generate subsets of feature class with arcpy (ArcGIS library in Python 2.7)

I'm having a hard time here on processing GIS data in Python, using library ArcPy.
I've been trying to generate independent features from a feature class based on a field of the attribute table which is a unique code representing productive forest units, but I can't get it done.
I've already done this in other situations, but this time I don't know what I am missing.
Here is the code and the error I get:
# coding utf-8
import arcpy
arcpy.env.overwriteOutput = True
ws = r'D:\Projeto_VANT\SIG\proc_parc.gdb'
arcpy.env.workspace = ws
talhoes = r'copy_talhoes'
estados = ('SP', 'MG')
florestas = ('PROPRIA', 'PARCERIA')
arcpy.MakeFeatureLayer_management(talhoes,
'talhoes_layer',
""" "ESTADO" IN {} AND "FLORESTA" IN {} """.format(estados, florestas),
ws)
arcpy.FeatureClassToFeatureClass_conversion(in_features = 'talhoes_layer',
out_path = ws,
out_name = 'talhoes1')
talhoes1 = r'talhoes1'
arcpy.AddField_management(talhoes1, 'CONCAT_T', 'TEXT')
arcpy.CalculateField_management(talhoes1, 'CONCAT_T', """ [ESTADO] & "_" & [CODIGO] & "_" & [TALHAO] """, 'VB')
with arcpy.da.SearchCursor(talhoes1, ['CONCAT_T', 'AREA']) as tal_cursor:
for x in tal_cursor:
print(x[0] + " " + str(x[1])) # This print is just to check if the cursor works and it does!
arcpy.MakeFeatureLayer_management(x,
'teste',
""" CONCAT_T = '{}' """.format(str(x[0]))) # Apparently the problem is here!
arcpy.CopyFeatures_management('teste',
'Layer{}'.format(x[0]))
Here is the error:
Traceback (most recent call last):
File "D:/ArcPy_Classes/Scripts/sampling_sig.py", line 32, in <module>
""" CONCAT_T = '{}' """.format(str(x[0])))
File "C:\Program Files (x86)\ArcGIS\Desktop10.5\ArcPy\arcpy\management.py", line 6965, in MakeFeatureLayer
raise e
RuntimeError: Object: Error in executing tool
I think the issue is with your In feature. you will want your in feature to be talhoes1 since x is the cursor object and not a feature.
arcpy.MakeFeatureLayer_management(talhoes1,'teste',""" CONCAT_T =
'{}'""".format(str(x[0])))

Python scripting with ete3 to query NCBI's Taxonomy: "sqlite3 Warning (can only execute one statement at a time)"

I am using this script:
import csv
import time
import sys
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
names = ncbi.get_taxid_translator(lineage)
lineage2ranks = ncbi.get_rank(names)
ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}
if __name__ == '__main__':
file = open(sys.argv[1], "r")
taxids = []
contigs = []
for line in file:
line = line.split("\n")[0]
taxids.append(line.split(",")[0])
contigs.append(line.split(",")[1])
desired_ranks = ['superkingdom', 'phylum']
results = list()
for taxid in taxids:
results.append(list())
results[-1].append(str(taxid))
ranks = get_desired_ranks(taxid, desired_ranks)
for key, rank in ranks.items():
if rank != '<not present>':
results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
else:
results[-1].append(rank)
i = 0
for result in results:
print(contigs[i] + ','),
print(','.join(result))
i += 1
file.close()
The script takes taxids from a file and fetches their respective lineages from a local copy of NCBI's Taxonomy database. Strangely, this script works fine when I run it on small sets of taxids (~70, ~100), but most of my datasets are upwards of 280k taxids and these break the script.
I get this complete error:
Traceback (most recent call last):
File "/data1/lstout/blast/scripts/getLineageByETE3.py", line 31, in <module>
ranks = get_desired_ranks(taxid, desired_ranks)
File "/data1/lstout/blast/scripts/getLineageByETE3.py", line 11, in get_desired_ranks
lineage = ncbi.get_lineage(taxid)
File "/data1/lstout/.local/lib/python2.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 227, in get_lineage
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %taxid)
sqlite3.Warning: You can only execute one statement at a time.
The first two files from the traceback are simply the script I referenced above, the third file is one of ete3's. And as I stated, the script works fine with small datasets.
What I have tried:
Importing the time module and sleeping for a few milliseconds/hundredths of a second before/after my offending lines of code on lines 11 and 31. No effect.
Went to line 227 in ete3's code...
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %merged_conversion[taxid])
and changed the "execute" function to "executescript" in order to be able to handle multiple queries at once (as that seems to be the problem). This produced a new error and led to a rabbit hole of me changing minor things in their script trying to fudge this to work. No result. This is the complete offending function:
def get_lineage(self, taxid):
"""Given a valid taxid number, return its corresponding lineage track as a
hierarchically sorted list of parent taxids.
"""
if not taxid:
return None
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %taxid)
raw_track = result.fetchone()
if not raw_track:
#perhaps is an obsolete taxid
_, merged_conversion = self._translate_merged([taxid])
if taxid in merged_conversion:
result = self.db.execute('SELECT track FROM species WHERE taxid=%s' %merged_conversion[taxid])
raw_track = result.fetchone()
# if not raise error
if not raw_track:
#raw_track = ["1"]
raise ValueError("%s taxid not found" %taxid)
else:
warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid]))
track = list(map(int, raw_track[0].split(",")))
return list(reversed(track))
What bothers me so much is that this works on small amounts of data! I'm running these scripts from my school's high performance computer and have tried running on their head node and in an interactive moab scheduler. Nothing has helped.

Optimize output of a script by varying input parameters

I have a written a script that uses the code below and I would like to optimize rsi_high and rsi_low to get the best sharpe_ratio:
#
import numpy
import talib as ta
global rsi_high, rsi_low
rsi_high = 63
rsi_low = 41
def myTradingSystem(DATE, OPEN, HIGH, LOW, CLOSE, VOL, exposure, equity, settings):
''' This system uses trend following techniques to allocate capital into the desired equities'''
nMarkets = CLOSE.shape[1] # SHAPE OF NUMPY ARRAY
result, rsi_pos = numpy.apply_along_axis(rsicalc, axis=0, arr=CLOSE)
pos = numpy.asarray(rsi_pos, dtype=numpy.float64)
return pos, settings
def rsicalc(num):
# print rsi_high
try:
rsival = ta.RSI(numpy.array(num,dtype='f8'),timeperiod=14)
if rsival[14] > rsi_high: pos_rsi = 1
elif rsival[14] < rsi_low: pos_rsi = -1
else: pos_rsi = 0
except:
rsival = 0
pos_rsi = 0
return rsival, pos_rsi
def mySettings():
''' Define your trading system settings here '''
settings = {}
# Futures Contracts
settings['markets'] = ['CASH','F_AD', 'F_BO', 'F_BP', 'F_C', 'F_CC', 'F_CD',
'F_CL', 'F_CT', 'F_DX', 'F_EC', 'F_ED', 'F_ES', 'F_FC', 'F_FV', 'F_GC',
'F_HG', 'F_HO', 'F_JY', 'F_KC', 'F_LB', 'F_LC', 'F_LN', 'F_MD', 'F_MP',
'F_NG', 'F_NQ', 'F_NR', 'F_O', 'F_OJ', 'F_PA', 'F_PL', 'F_RB', 'F_RU',
'F_S', 'F_SB', 'F_SF', 'F_SI', 'F_SM', 'F_TU', 'F_TY', 'F_US', 'F_W',
'F_XX', 'F_YM']
settings['slippage'] = 0.05
settings['budget'] = 1000000
settings['beginInSample'] = '19900101'
settings['endInSample'] = '19931231'
settings['lookback'] = 504
return settings
# Evaluate trading system defined in current file.
if __name__ == '__main__':
import quantiacsToolbox
results = quantiacsToolbox.runts(__file__, plotEquity=False)
sharpe_ratio = results['stats']['sharpe']
I suspect that using something like scipy minimize function would do the trick, but I am having trouble understanding how to package my script so that it can be in a usable form.
I have tried putting everything in a function and then running all the code through a number of loops, each time incrementing values but there must be a more elegant way of doing this.
Apologies for posting all my code but I thought it would help if the responder wanted to reproduce my setup and for anyone who is new to quantiacs to see a real example who is faced with the same issue.
Thanks for your help in advance!

Cannot grok python multiprocessing

I need to run a function for the each of the elements of my database.
When I try the following:
from multiprocessing import Pool
from pymongo import Connection
def foo():
...
connection1 = Connection('127.0.0.1', 27017)
db1 = connection1.data
my_pool = Pool(6)
my_pool.map(foo, db1.index.find())
I'm getting the following error:
Job 1, 'python myscript.py ' terminated by signal SIGKILL (Forced quit)
Which is, I think, caused by db1.index.find() eating all the available ram while trying to return millions of database elements...
How should I modify my code for it to work?
Some logs are here:
dmesg | tail -500 | grep memory
[177886.768927] Out of memory: Kill process 3063 (python) score 683 or sacrifice child
[177891.001379] [<ffffffff8110e51a>] out_of_memory+0xfa/0x250
[177891.021362] Out of memory: Kill process 3063 (python) score 684 or sacrifice child
[177891.025399] [<ffffffff8110e51a>] out_of_memory+0xfa/0x250
The actual function below:
def create_barrel(item):
connection = Connection('127.0.0.1', 27017)
db = connection.data
print db.index.count()
barrel = []
fls = []
if 'name' in item.keys():
barrel.append(WhitespaceTokenizer().tokenize(item['name']))
name = item['name']
elif 'name.utf-8' in item.keys():
barrel.append(WhitespaceTokenizer().tokenize(item['name.utf-8']))
name = item['name.utf-8']
else:
print item.keys()
if 'files' in item.keys():
for file in item['files']:
if 'path' in file.keys():
barrel.append(WhitespaceTokenizer().tokenize(" ".join(file['path'])))
fls.append(("\\".join(file['path']),file['length']))
elif 'path.utf-8' in file.keys():
barrel.append(WhitespaceTokenizer().tokenize(" ".join(file['path.utf-8'])))
fls.append(("\\".join(file['path.utf-8']),file['length']))
else:
print file
barrel.append(WhitespaceTokenizer().tokenize(file))
if len(fls) < 1:
fls.append((name,item['length']))
barrel = sum(barrel,[])
for s in barrel:
vs = re.findall("\d[\d|\.]*\d", s) #versions i.e. numbes such as 4.2.7500
b0 = []
for s in barrel:
b0.append(re.split("[" + string.punctuation + "]", s))
b1 = filter(lambda x: x not in string.punctuation, sum(b0,[]))
flag = True
while flag:
bb = []
flag = False
for bt in b1:
if bt[0] in string.punctuation:
bb.append(bt[1:])
flag = True
elif bt[-1] in string.punctuation:
bb.append(bt[:-1])
flag = True
else:
bb.append(bt)
b1 = bb
b2 = b1 + barrel + vs
b3 = list(set(b2))
b4 = map(lambda x: x.lower(), b3)
b_final = {}
b_final['_id'] = item['_id']
b_final['tags'] = b4
b_final['name'] = name
b_final['files'] = fls
print db.barrels.insert(b_final)
I've noticed interesting thing. Then I press ctrl+c to stop process I'm getting the following:
python index2barrel.py
Traceback (most recent call last):
File "index2barrel.py", line 83, in <module>
my_pool.map(create_barrel, db1.index.find, 6)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 280, in map_async
iterable = list(iterable)
TypeError: 'instancemethod' object is not iterable
I mean, why multiprocessing is trying to convert somethin to the list? Isn't it the source of the problem?
from the stack trace:
brk(0x231ccf000) = 0x231ccf000
futex(0x1abb150, FUTEX_WAKE_PRIVATE, 1) = 1
sendto(3, "+\0\0\0\260\263\355\356\0\0\0\0\325\7\0\0\0\0\0\0data.index\0\0"..., 43, 0, NULL, 0) = 43
recvfrom(3, "Some text from my database."..., 491663, 0, NULL, NULL) = 491663
... [manymany times]
brk(0x2320d5000) = 0x2320d5000
.... manymany times
The above sample goes and goes in strace output and for some reason strace -o logfile python myscript.py
does not halt. It just eats all the available ram and writes in log file.
UPDATE. Using imap instead of map solved my problem.
Since the find() operation is returning the cursor the the map function and since you say that this runs without a problem when you do
for item in db1.index.find(): create_barrel(item)
it looks like the create_barrel function is OK.
Can you try to limit the number of results returned in the cursor and see if this helps? I think the syntax would be:
db1.index.find().limit(100)
If you could try this and see if it helps it might help to get the cause of the problem.
EDIT1: I think you are going about this the wrong way by using the map function - I think you should be using map_reduce in the mongo python driver - that way the map function will be executed by the mongod process.
map() function gives the items in chunks to the given function. By default this chunksize is calculated like this (link to source):
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
This probably results in too big chunk size in your case and lets the process run out of memory. Try setting the chunk size manually like this:
my_pool.map(foo, db1.index.find(), 100)
EDIT: You should also consider reusing the db connection and closing them after usage. Now you create new db connection for each item, and you don't call close() to them.
EDIT2: Also check if the while loop gets into an infinite loop (would explain the symptoms).
EDIT3: Based on the traceback you added the map function tries to convert the cursor to a list, causing all the items to be fetched at once. This happens because it want's to find how many items there are in the set. This is part of map() code from pool.py:
if not hasattr(iterable, '__len__'):
iterable = list(iterable)
You could try this to avoid conversion to list:
cursor = db1.index.find()
cursor.__len__ = cursor.count()
my_pool.map(foo, cursor)

Categories