Having problems in extracting duplicates

Having problems in extracting duplicates - python

I am stumped with this problem, and no matter how I get around it, it is still giving me the same result.
Basically, supposedly I have 2 groups - GrpA_null and GrpB_null, each having 2 meshes in them and are named exactly the same, brick_geo and bars_geo
- Result: GrpA_null --> brick_geo, bars_geo
But for some reason, in the code below which I presume is the one giving me problems, when it is run, the program states that GrpA_null has the same duplicates as GrpB_null, probably they are referencing the brick_geo and bars_geo. As soon as the code is run, my children geo have a numerical value behind,
- Result: GrpA_null --> brick_geo0, bars_geo0, GrpB_null1 --> brick_geo, bars_geo1
And so, I tried to modify the code such that it will as long as the Parent (GrpA_null and GrpB_null) is different, it shall not 'touch' on the children.
Could someone kindly advice me on it?
def extractDuplicateBoxList(self, inputs):
result = {}
for i in range(0, len(inputs)):
print '<<< i is : %s' %i
for n in range(0, len(inputs)):
print '<<< n is %s' %n
if i != n:
name = inputs[i].getShortName()
# Result: brick_geo
Lname = inputs[i].getLongName()
# Result: |GrpA_null|concrete_geo
if name == inputs[n].getShortName():
# If list already created as result.
if result.has_key(name):
# Make sure its not already in the list and add it.
alreadyAdded = False
for box in result[name]:
if box == inputs[i]:
alreadyAdded = True
if alreadyAdded == False:
result[name].append(inputs[i])
# Otherwise create a new list and add it.
else:
result[name] = []
result[name].append(inputs[i])
return result

There are a couple of things you may want to be aware of. First and foremost, indentation matters in Python. I don't know if the indentation of your code as is is as intended, but your function code should be indented further in than your function def.
Secondly, I find your question a little difficult to understand. But there are several things which would improve your code.
In the collections module, there is (or should be) a type called defaultdict. This type is similar to a dict, except for it having a default value of the type you specify. So a defaultdict(int) will have a default of 0 when you get a key, even if the key wasn't there before. This allows the implementation of counters, such as to find duplicates without sorting.
from collections import defaultdict
counter = defaultdict(int)
for item in items:
counter[item] += 1
This brings me to another point. Python for loops implement a for-each structure. You almost never need to enumerate your items in order to then access them. So, instead of
for i in range(0,len(inputs)):
you want to use
for input in inputs:
and if you really need to enumerate your inputs
for i,input in enumerate(inputs):
Finally, you can iterate and filter through iterable objects using list comprehensions, dict comprehensions, or generator expressions. They are very powerful. See Create a dictionary with list comprehension in Python
Try this code out, play with it. See if it works for you.
from collections import defaultdict
def extractDuplicateBoxList(self, inputs):
counts = defaultdict(int)
for input in inputs:
counts[input.getShortName()] += 1
dup_shns = set([k for k,v in counts.items() if v > 1])
dups = [i for i in inputs if input.getShortName() in dup_shns]
return dups

I was on the point to write the same remarks as bitsplit, he has already done it.
So I just give you for the moment a code that I think is doing exactly the same as yours, based on these remarks and the use of the get dictionary's method:
from collections import defaultdict
def extract_Duplicate_BoxList(self, inputs):
result = defaultdict()
for i,A in enumerate(inputs):
print '<<< i is : %s' %i
name = A.getShortName() # Result: brick_geo
Lname = A.getLongName() # Result: |GrpA_null|concrete_geo
for n in (j for j,B in enumerate(inputs)
if j!=i and B.getShortName()==name):
print '<<< n is %s' %n
if A not in result.get(name,[])):
result[name].append(A)
return result
.
Secondly, as bitsplit said it, I find your question ununderstandable.
Could you give more information on the elements of inputs ?
Your explanations about GrpA_null and GrpB_null and the names and the meshes are unclear.
.
EDIT:
If my reduction/simplification is correct, examining it , I see that What you essentially does is to compare A and B elements of inputs (with A!=B) and you record A in the dictionary result at key shortname (only one time) if A and B have the same shortname shortname;
I think this code can still be reduced to just:
def extract_Duplicate_BoxList(inputs):
result = defaultdict()
for i,A in enumerate(inputs):
print '<<< i is : %s' %i
result[B.getShortName()].append(A)
return result

this may be do what your looking for if I understand it, which seems to be comparing the sub-hierarchies of different nodes to see if they are they have the same names.
import maya.cmds as cmds
def child_nodes(node):
''' returns a set with the relative paths of all <node>'s children'''
root = cmds.ls(node, l=True)[0]
children = cmds.listRelatives(node, ad=True, f=True)
return set( [k[len(root):] for k in children])
child_nodes('group1')
# Result: set([u'|pCube1|pCubeShape1', u'|pSphere1', u'|pSphere1|pSphereShape1', u'|pCube1']) #
# note the returns are NOT valid maya paths, since i've removed the root <node>,
# you'd need to add it back in to actually access a real shape here:
all_kids = child_nodes('group1')
real_children = ['group1' + n for n in all_kids ]
Since the returns are sets, you can test to see if they are equal, see if one is a subset or superset of the other, see what they have in common and so on:
# compare children
child_nodes('group1') == child_nodes('group2')
#one is subset:
child_nodes('group1').issuperset(child_nodes('group2'))
Iterating over a bunch of nodes is easy:
# collect all the child sets of a bunch of nodes:
kids = dict ( (k, child_nodes(k)) for k in ls(*nodes))

Related

Python elegant way to map string structure

Let's say I know beforehand that the string
"key1:key2[]:key3[]:key4" should map to "newKey1[]:newKey2[]:newKey3"
then given "key1:key2[2]:key3[3]:key4",
my method should return "newKey1[2]:newKey2[3]:newKey3"
(the order of numbers within the square brackets should stay, like in the above example)
My solution looks like this:
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3"}
def transform(parent_key, parent_key_with_index):
indexes_in_parent_key = re.findall(r'\[(.*?)\]', parent_key_with_index)
target_list = predefined_mapping[parent_key].split(":")
t = []
i = 0
for elem in target_list:
try:
sub_result = re.subn(r'\[(.*?)\]', '[{}]'.format(indexes_in_parent_key[i]), elem)
if sub_result[1] > 0:
i += 1
new_elem = sub_result[0]
except IndexError as e:
new_elem = elem
t.append(new_elem)
print ":".join(t)
transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4")
prints newKey1[2]:newKey2[3]:newKey3 as the result.
Can someone suggest a better and elegant solution (around the usage of regex especially)?
Thanks!

You can do it a bit more elegantly by simply splitting the mapped structure on [], then interspersing the indexes from the actual data and, finally, joining everything together:
import itertools
# split the map immediately on [] so that you don't have to split each time on transform
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3".split("[]")}
def transform(key, source):
mapping = predefined_mapping.get(key, None)
if not mapping: # no mapping for this key found, return unaltered
return source
indexes = re.findall(r'\[.*?\]', source) # get individual indexes
return "".join(i for e in itertools.izip_longest(mapping, indexes) for i in e if i)
print(transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4"))
# newKey1[2]:newKey2[3]:newKey3
NOTE: On Python 3 use itertools.zip_longest() instead.
I still think you're over-engineering this and that there is probably a much more elegant and far less error-prone approach to the whole problem. I'd advise stepping back and looking at the bigger picture instead of hammering out this particular solution just because it seems to be addressing the immediate need.

Start a dictionary for loop at a specific key value

Here is the code:
EDIT**** Please no more "it's not possible with unordered dictionary replies". I pretty much already know that. I made this post on the off-chance that it MIGHT be possible or someone has a workable idea.
#position equals some set of two dimensional coords
for name in self.regions["regions"]: # I want to start the iteration with 'last_region'
# I don't want to run these next two lines over every dictionary key each time since the likelihood is that the new
# position is still within the last region that was matched.
rect = (self.regions["regions"][name]["pos1"], self.regions["regions"][name]["pos2"])
if all(self.point_inside(rect, position)):
# record the name of this region in variable- 'last_region' so I can start with it on the next search...
# other code I want to run when I get a match
return
return # if code gets here, the points were not inside any of the named regions
Hopefully the comments in the code explain my situation well enough. Lets say I was last inside region "delta" (i.e., the key name is delta, the value will be sets of coordinates defining it's boundaries) and I have 500 more regions. The first time I find myself in region delta, the code may not have discovered this until, let's say (hypothetically), the 389th iteration... so it made 388 all(self.point_inside(rect, position)) calculations before it found that out. Since I will probably still be in delta the next time it runs (but I must verify that each time the code runs), it would be helpful if the key "delta" was the first one that got checked by the for loop.
This particular code can be running many times a second for many different users.. so speed is critical. The design is such that very often, the user will not be in a region and all 500 records may need to be cycled through and will exit the loop with no matches, but I would like to speed the overall program up by speeding it up for those that are presently in one of the regions.
I don't want an additional overhead of sorting the dictionary in any particular order, etc.. I just want it to start looking with the last one that it successfully matched all(self.point_inside(rect, position))
Maybe this will help a bit more.. The following is the dictionary I am using (only the first record shown) so you can see the structure I coded to above... and yes, despite the name "rect" in the code, it actually checks for the point in a cubical region.
{"regions": {"shop": {"flgs": {"breakprot": true, "placeprot": true}, "dim": 0, "placeplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}, "breakplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}, "protected": true, "banplayers": {}, "pos1": [5120025, 60, 5120208], "pos2": [5120062, 73, 5120257], "ownerUuid": "4f953255-6775-4dc6-a612-fb4230588eff", "accessplayers": {"4f953255-6775-4dc6-a612-fb4230588eff": "SurestTexas00"}}, more, more, more...}

You may try to implement some caching mechanism within a custom subclass of dict.
You could set a self._cache = None in __init__, add a method like set_cache(self, key) to set the cache and finally overriding __iter__ to yield self._cache before calling the default __iter__.
However, that can be kinda cumbersome, if you consider this stackoverflow answer and also this one.
For what it's written in your question, I would try, instead, to implement this caching logic in your code.
def _match_region(self, name, position):
rect = (self.regions["regions"][name]["pos1"], self.regions["regions"][name]["pos2"])
return all(self.point_inside(rect, position))
if self.last_region and self._match_region(self.last_region, position):
self.code_to_run_when_match(position)
return
for name in self.regions["regions"]:
if self._match_region(name, position):
self.last_region = name
self.code_to_run_when_match(position)
return
return # if code gets here, the points were not inside any of the named regions

That is right, dictionary is an unordered type. Therefore OrderedDict won't help you much for what you want to do.
You could store the last region into your class. Then, on the next call, check if last region is still good before check the entire dictionary ?

Instead of a for-loop, you could use iterators directly. Here's an example function that does something similar to what you want, using iterators:
def iterate(what, iterator):
iterator = iterator or what.iteritems()
try:
while True:
k,v = iterator.next()
print "Trying k = ", k
if v > 100:
return iterator
except StopIteration:
return None
Instead of storing the name of the region in last_region, you would store the result of this function, which is like a "pointer" to where you left off. Then, you can use the function like this (shown as if run in the Python interactive interpreter, including the output):
>>> x = {'a':12, 'b': 42, 'c':182, 'd': 9, 'e':12}
>>> last_region = None
>>> last_region = iterate(x, last_region)
Trying k = a
Trying k = c
>>> last_region = iterate(x, last_region)
Trying k = b
Trying k = e
Trying k = d
Thus, you can easily resume from where you left off, but there's one additional caveat to be aware of:
>>> last_region = iterate(x, last_region)
Trying k = a
Trying k = c
>>> x['z'] = 45
>>> last_region = iterate(x, last_region)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in iterate
RuntimeError: dictionary changed size during iteration
As you can see, it'll raise an error if you ever add a new key. So, if you use this method, you'll need to be sure to set last_region = None any time you add a new region to the dictionary.

TigerhawkT3 is right. Dicts are unordered in a sense that there is no guaranteed order or keys in the given dictionary. You can even have different order of keys if you iterate over same dictionary. If you want order you need to use either OrderedDict or just plain list. You can convert your dict to list and sort it the way it represents the order you need.

Without knowing what your objects are and whether self in the example is a user instance or an environment instance it is hard to come up with a solution. But if self in the example is the environment, its Class could have a class attribute that is a dictionary of all current users and their last known position, if the user instance is hashable.
Something like this
class Thing(object):
__user_regions = {}
def where_ami(self, user):
try:
region = self.__user_regions[user]
print 'AHA!! I know where you are!!'
except KeyError:
# find region
print 'Hmmmm. let me think about that'
region = 'foo'
self.__user_regions[user] = region
class User(object):
def __init__(self, position):
self.pos = position
thing = Thing()
thing2 = Thing()
u = User((1,2))
v = User((3,4))
Now you can try to retrieve the user's region from the class attribute. If there is more than one Thing they would share that class attribute.
>>>
>>> thing._Thing__user_regions
{}
>>> thing2._Thing__user_regions
{}
>>>
>>> thing.where_ami(u)
Hmmmm. let me think about that
>>>
>>> thing._Thing__user_regions
{<__main__.User object at 0x0433E2B0>: 'foo'}
>>> thing2._Thing__user_regions
{<__main__.User object at 0x0433E2B0>: 'foo'}
>>>
>>> thing2.where_ami(v)
Hmmmm. let me think about that
>>>
>>> thing._Thing__user_regions
{<__main__.User object at 0x0433EA90>: 'foo', <__main__.User object at 0x0433E2B0>: 'foo'}
>>> thing2._Thing__user_regions
{<__main__.User object at 0x0433EA90>: 'foo', <__main__.User object at 0x0433E2B0>: 'foo'}
>>>
>>> thing.where_ami(u)
AHA!! I know where you are!!
>>>

You say that you "don't want an additional overhead of sorting the dictionary in any particular order". What overhead? Presumably OrderedDict uses some additional data structure internally to keep track of the order of keys. But unless you know that this is costing you too much memory, then OrderedDict is your solution. That means profiling your code and making sure that an OrderedDict is the source of your bottleneck.
If you want the cleanest code, just use an OrderedDict. It has a move_to_back method which takes a key and puts it either in the front of the dictionary, or at the end. For example:
from collections import OrderedDict
animals = OrderedDict([('cat', 1), ('dog', 2), ('turtle', 3), ('lizard', 4)])
def check_if_turtle(animals):
for animal in animals:
print('Checking %s...' % animal)
if animal == 'turtle':
animals.move_to_end('turtle', last=False)
return True
else:
return False
Our check_if_turtle function looks through an OrderedDict for a turtle key. If it doesn't find it, it returns False. If it does find it, it returns True, but not after moving the turtle key to the beginning of the OrderedDict.
Let's try it. On the first run:
>>> check_if_turtle(animals)
Checking cat...
Checking dog...
Checking turtle...
True
we see that it checked all of the keys up to turtle. Now, if we run it again:
>>> check_if_turtle(animals)
Checking turtle...
True
we see that it checked the turtle key first.

Python nested for loop: what am I doing wrong?

I am working with data pulled from a spreadsheet-like file. I am trying to find, for each "ligand", the item with the lowest corresponding "energy". To do this I'm trying to make a list of all the ligands I find in the file, and compare them to one another, using the index value to find the energy of each ligand, keeping the one with the lowest energy. However, the following loop is not working out for me. The program won't finish, it just keeps running until I cancel it manually. I'm assuming this is due to an error in the structure of my loop.
for item in ligandList:
for i in ligandList:
if ligandList.index(item) != ligandList.index(i):
if ( item == i ) :
if float(lineList[ligandList.index(i)][42]) < float(lineList[ligandList.index(item)][42]):
lineList.remove(ligandList.index(item))
else:
lineList.remove(ligandList.index(i))
As you can see, I've created a separate ligandList containing the ligands, and am using the current index of that list to access the energy values in the lineList.
Does anyone know why this isn't working?

It is a bit hard to answer without some actual data to play with, but I hope this works, or at least leads you into the right direction:
for idx1, item1 in enumerate(ligandList):
for idx2, item2 in enumerate(ligandList):
if idx1 == idx2: continue
if item1 != item2: continue
if float(lineList[idx1][42]) < float(lineList[idx2][42]):
del lineList [idx1]
else:
del lineList [idx2]

That’s a really inefficient way of doing things. Lots of index calls. It might just feel infinite because it’s slow.
Zip your related things together:
l = zip(ligandList, lineList)
Sort them by “ligand” and “energy”:
l = sorted(l, key=lambda t: (t[0], t[1][42]))
Grab the first (lowest) “energy” for each:
l = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(l, key=lambda t: t[0]))
Yay.
result = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(
sorted(zip(ligandList, lineList), key=lambda t: (t[0], t[1][42])),
lambda t: t[0]
))
It would probably look more flattering if you made lineList contain classes of some kind.
Demo

You look like you're trying to find the element in ligandList with the smallest value in index 42. Let's just do that....
min(ligandList, key=lambda x: float(x[42]))
If these "Ligands" are something you use regularly, STRONGLY consider writing a class wrapper for them, something like:
class Ligand(object):
def __init__(self,lst):
self.attr_name = lst[index_of_attr] # for each attribute
... # for each attribute
... # etc etc
self.energy = lst[42]
def __str__(self):
"""This method defines what the class looks like if you call str() on
it, e.g. a call to print(Ligand) will show this function's return value."""
return "A Ligand with energy {}".format(self.energy) # or w/e
def transmogfiscate(self,other):
pass # replace this with whatever Ligands do, if they do things...
In which case you can simply create a list of the Ligands:
ligands = [Ligand(ligand) for ligand in ligandList]
and return the object with the smallest energy:
lil_ligand = min(ligands, key=lambda ligand: ligand.energy)
As a huge aside, PEP 8 encourages the use of the lowercase naming convention for variables, rather than mixedCase as many languages use.

Python large list manipulation

I have python list like below:
DEMO_LIST = [
[{'unweighted_criket_data': [-46.14554728131345, 2.997789122813151, -23.66171024766996]},
{'weighted_criket_index_input': [-6.275794430258629, 0.4076993207025885, -3.2179925936831144]},
{'manual_weighted_cricket_data': [-11.536386820328362, 0.7494472807032877, -5.91542756191749]},
{'average_weighted_cricket_data': [-8.906090625293496, 0.5785733007029381, -4.566710077800302]}],
[{'unweighted_football_data': [-7.586729834820534, 3.9521665714843675, 5.702038461085529]},
{'weighted_football_data': [-3.512655913521907, 1.8298531225972623, 2.6400438074826]},
{'manual_weighted_football_data': [-1.8966824587051334, 0.9880416428710919, 1.4255096152713822]},
{'average_weighted_football_data': [-2.70466918611352, 1.4089473827341772, 2.0327767113769912]}],
[{'unweighted_rugby_data': [199.99999999999915, 53.91020408163265, -199.9999999999995]},
{'weighted_rugby_data': [3.3999999999999857, 0.9164734693877551, -3.3999999999999915]},
{'manual_rugby_data': [49.99999999999979, 13.477551020408162, -49.99999999999987]},
{'average_weighted_rugby_data': [26.699999999999886, 7.197012244897959, -26.699999999999932]}],
[{'unweighted_swimming_data': [2.1979283454982053, 14.079951031527246, -2.7585499298828777]},
{'weighted_swimming_data': [0.8462024130168091, 5.42078114713799, -1.062041723004908]},
{'manual_weighted_swimming_data': [0.5494820863745513, 3.5199877578818115, -0.6896374824707194]},
{'average_weighted_swimming_data': [0.6978422496956802, 4.470384452509901, -0.8758396027378137]}]]
I want to manipulate list items and do some basic math operation,like getting each data type list (example taking all first element of unweighted data and do sum etc)
Currently I am doing it like this.
The current solution is a very basic one, I want to do it in such way that if the list length is grown, it can automatically calculate the results. Right now there are four list, it can be 5 or 8,the final result should be the summation of all the first element of unweighted values,example:
now I am doing result_u1/4,result_u2/4,result_u3/4
I want it like result_u0/4,result_u1/4.......result_n4/4 # n is the number of list inside demo list
Any idea how I can do that?
(sorry for the beginner question)

You can implement a specific list class for yourself, that adds your summary with new item's values in append function, or decrease them on remove:
class MyList(list):
def __init__(self):
self.summary = 0
list.__init__(self)
def append(self, item):
self.summary += item.sample_value
list.append(self, item)
def remove(self, item):
self.summary -= item.sample_value
list.remove(self, item)
And a simple usage:
my_list = MyList()
print my_list.summary # Outputs 0
my_list.append({'sample_value': 10})
print my_list.summary # Outputs 10

In Python, whenever you start counting how many there are of something inside an iterable (a string, a list, a set, a collection of any of these) in order to loop over it - its a sign that your code can be revised.
Things can can work for 3 of something, can work for 300, 3000 and 3 million of the same thing without changing your code.
In your case, your logic is - "For every X inside DEMO_LIST, do something"
This translated into Python is:
for i in DEMO_LIST:
# do something with i
This snippet will run through any size of DEMO_LIST and each time i is each of whatever is in side DEMO_LIST. In your case it is the list that contains your dictionaries.
Further expanding on that, you can say:
for i in DEMO_LIST:
for k in i:
# now you are in each list that is inside the outer DEMO_LIST
Expanding this to do a practical example; a sum of all unweighted_criket_data:
all_unweighted_cricket_data = []
for i in DEMO_LIST:
for k in i:
if 'unweighted_criket_data' in k:
for data in k['unweighted_cricket_data']:
all_unweighted_cricked_data.append(data)
sum_of_data = sum(all_unweighted_cricket_data)
There are various "shortcuts" to do the same, but you can appreciate those once you understand the "expanded" version of what the shortcut is trying to do.
Remember there is nothing wrong with writing it out the 'long way' especially when you are not sure of the best way to do something. Once you are comfortable with the logic, then you can use shortcuts like list comprehensions.

Start by replacing this:
for i in range(0,len(data_list)-1):
result_u1+=data_list[i][0].values()[0][0]
result_u2+=data_list[i][0].values()[0][1]
result_u3+=data_list[i][0].values()[0][2]
print "UNWEIGHTED",result_u1/4,result_u2/4,result_u3/4
With this:
sz = len(data_list[i][0].values()[0])
result_u = [0] * sz
for i in range(0,len(data_list)-1):
for j in range(0,sz):
result_u[j] += data_list[i][0].values()[0][j]
print "UNWEIGHTED", [x/len(data_list) for x in result_u]
Apply similar changes elsewhere. This assumes that your data really is "rectangular", that is to say every corresponding inner list has the same number of values.
A slightly more "Pythonic"[*] version of:
for j in range(0,sz):
result_u[j] += data_list[i][0].values()[0][j]
is:
for j, dataval in enumerate(data_list[i][0].values()[0]):
result_u[j] += dataval
There are some problems with your code, though:
values()[0] might give you any of the values in the dictionary, since dictionaries are unordered. Maybe it happens to give you the unweighted data, maybe not.
I'm confused why you're looping on the range 0 to len(data_list)-1: if you want to include all the sports you need 0 to len(data_list), because the second parameter to range, the upper limit, is excluded.
You could perhaps consider reformatting your data more like this:
DEMO_LIST = {
'cricket' : {
'unweighted' : [1,2,3],
'weighted' : [4,5,6],
'manual' : [7,8,9],
'average' : [10,11,12],
},
'rugby' : ...
}
Once you have the same keys in each sport's dictionary, you can replace values()[0] with ['unweighted'], so you'll always get the right dictionary entry. And once you have a whole lot of dictionaries all with the same keys, you can replace them with a class or a named tuple, to define/enforce that those are the values that must always be present:
import collections
Sport = collections.namedtuple('Sport', 'unweighted weighted manual average')
DEMO_LIST = {
'cricket' : Sport(
unweighted = [1,2,3],
weighted = [4,5,6],
manual = [7,8,9],
average = [10,11,12],
),
'rugby' : ...
}
Now you can replace ['unweighted'] with .unweighted.
[*] The word "Pythonic" officially means something like, "done in the style of a Python programmer, taking advantage of any useful Python features to produce the best idiomatic Python code". In practice it usually means "I prefer this, and I'm a Python programmer, therefore this is the correct way to write Python". It's an argument by authority if you're Guido van Rossum, or by appeal to nebulous authority if you're not. In almost all circumstances it can be replaced with "good IMO" without changing the sense of the sentence ;-)

Python dictionary error

In the below code d_arr is an array of dictionaries
def process_data(d_arr):
flag2 = 0
for dictionaries in d_arr:
for k in dictionaries:
if ( k == "*TYPE" ):
""" Here we determine the type """
if (dictionaries[k].lower() == "name"):
dictionaries.update({"type" : 0})
func = name(dictionaries)
continue
elif (dictionaries[k].lower() == "ma"):
dictionaries.update({"type" : 1})
func = DCC(dictionaries)
logging.debug(type(func))
continue
When the above is done i get an error saying
for k in dictionaries:
RuntimeError: dictionary changed size during iteration
2010-08-02 05:26:44,167 DEBUG Returning
Is this forbidden to do something like this

It is, indeed, forbidden. Moreover, you don't really need a loop over all keys here, given that the weirdly named dictionaries appears to be a single dict; rather than the for k in dictionaries: (or the workable for k in dictionaries.keys() that #Triptych's answer suggests), you could use...:
tp = dictionaries.get('*TYPE')
if tp is not None:
""" Here we determine the type """
if tp.lower() == 'name':
dictionaries.update({"type" : 0})
func = name(dictionaries)
elif tp.lower() == "ma":
dictionaries.update({"type" : 1})
func = DCC(dictionaries)
logging.debug(type(func))
This is going to be much faster if dictionaries has any considerable length, for you're reaching directly for the one entry you care about, rather than looping over all entries to check each of them for the purpose of seeing if it is the one you care about.
Even if you've chosen to omit part of your code, so that after this start the loop on dictionaries is still needed, I think my suggestion is still preferable because it lets you get any alteration to dictionaries done and over with (assuming of course that you don't keep altering it in the hypothetical part of your code I think you may have chosen to omit;-).

That error is pretty informative; you can't change the size of a dictionary you are currently iterating over.
The solution is to get the keys all at once and iterate over them:
# Do this
for k in dictionaries.keys():
# Not this
for k in dictionaries:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Having problems in extracting duplicates - python

Related

Python elegant way to map string structure

Start a dictionary for loop at a specific key value

Python nested for loop: what am I doing wrong?

Python large list manipulation

Python dictionary error

Categories

Resources