Recursively removing items from tree in python - python

seen2 = set()
def eliminate_abs(d): ##remove all entries that connect to the abstraction node, type(d) = list
def rec(x):
if x not in seen2:
seen2.add(x)
a = x.hypernyms()
if len(a) != 0:
kk = a[0]
if re.search('abstraction',str(kk)):
syns.remove(ii)
else:
rec(kk)
for ii in d: ##type(ii) = <class 'nltk.corpus.reader.wordnet.Synset'>
rec(ii)
eliminate_abs(syns)
The list "syns" will eventually be converted into a tree but I first need to remove all of the items which ultimately connect to the abstraction node. What I want this function to do is recursively look through all of the hypernyms for each item in "syns" and and if "abstraction" is ever found, remove the original term from "syns". For some reason this is only removing some of them.

Since you are mucking with syns while iterating over it, you should iterate over a slice of syns, i.e. make a copy of the list and iterate over the copy:
for ii in d[:]:
rec(ii)

Figured it out. It works fine but they're are a bunch of repeats in syns so all of them after the first one get skipped. Removing
if not x in seen2:
seen2.add(x)
makes it work fine.

Related

Python elegant way to map string structure

Let's say I know beforehand that the string
"key1:key2[]:key3[]:key4" should map to "newKey1[]:newKey2[]:newKey3"
then given "key1:key2[2]:key3[3]:key4",
my method should return "newKey1[2]:newKey2[3]:newKey3"
(the order of numbers within the square brackets should stay, like in the above example)
My solution looks like this:
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3"}
def transform(parent_key, parent_key_with_index):
indexes_in_parent_key = re.findall(r'\[(.*?)\]', parent_key_with_index)
target_list = predefined_mapping[parent_key].split(":")
t = []
i = 0
for elem in target_list:
try:
sub_result = re.subn(r'\[(.*?)\]', '[{}]'.format(indexes_in_parent_key[i]), elem)
if sub_result[1] > 0:
i += 1
new_elem = sub_result[0]
except IndexError as e:
new_elem = elem
t.append(new_elem)
print ":".join(t)
transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4")
prints newKey1[2]:newKey2[3]:newKey3 as the result.
Can someone suggest a better and elegant solution (around the usage of regex especially)?
Thanks!
You can do it a bit more elegantly by simply splitting the mapped structure on [], then interspersing the indexes from the actual data and, finally, joining everything together:
import itertools
# split the map immediately on [] so that you don't have to split each time on transform
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3".split("[]")}
def transform(key, source):
mapping = predefined_mapping.get(key, None)
if not mapping: # no mapping for this key found, return unaltered
return source
indexes = re.findall(r'\[.*?\]', source) # get individual indexes
return "".join(i for e in itertools.izip_longest(mapping, indexes) for i in e if i)
print(transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4"))
# newKey1[2]:newKey2[3]:newKey3
NOTE: On Python 3 use itertools.zip_longest() instead.
I still think you're over-engineering this and that there is probably a much more elegant and far less error-prone approach to the whole problem. I'd advise stepping back and looking at the bigger picture instead of hammering out this particular solution just because it seems to be addressing the immediate need.

mutating dna in parallel lists python

def mutations (list_a,string1,name,list_b):
""" (list of str, str, list of str, list of str) -> NoneType
"""
dna=list_a
for i in range(len(list_b)):
strand=dna[:dna.index(list_b[i])]
string1=string1[string1.index(list_b[i]):]
dna[strand+string1]
>>>dna=['TGCAGAATTCGGTT','ACGTCCCGGGTTGC']
>>>mutations(dna,'CCCGGGGAATTCTCGC',['EcoRI','SmaI'],['GAATTC','CCCGGG'])
>>>mutated
>>>['TGCAGAATTCTCGC','ACGTCCCGGGGAATTCTCGC']
It's suppose to modify the first parameter. So basically im trying to modify list_a and making it change to ['TGCAGAATTCTCGC','ACGTCCCGGGGAATTCTCGC'] however, i get an error saying
strand=dna[:dna.index(string1[i])].
ValueError: 'GAATTC' is not in list
Also, is there a way if the sequence does not exist, it doesn't modify the function?
Well if I understand you correctly, you want to check each element in list_a if it contains its corresponding element from list_b. If so, you want to modify the element from list_a by replacing the rest of the string (including the list_b element) with part of a control string that does also contain the element from list_b, right?!
Ideally you would put this in your question!!
A way of doing this would be as follow:
def mut(list_a, control, list_b):
check_others = Falsee
for i in range(len(list_a)): # run through list_a (use xrange in python 2.x)
if i == len(list_b): # if we are at the end of list_b we run will
# check all other elements if list_b (see below)
check_others = True
if not check_others: # this is the normal 1 to 1 match.
if list_b[i] in list_a[i]: # if the element from list_b is in it
# correct the element
list_a[i] = list_a[i][:list_a[i].index(list_b[i])] +\
control[control.index(list_b[i]):]
else: # this happens if we are at the end of list_b
for j in xrange(len(list_b)): # run through list_b for the start
if list_b[j] in list_a[i]:
list_a[i] = list_a[i][:list_a[i].index(list_b[j])] +\
control[control.index(list_b[j]):]
break # only the first match with an element in list_b is used!
As described in the comments, dna is a list, not a string, so finding a substring won't quite work how you want.
dna=list_a is unnecessary, and dna[strand+string1] doesn't modify a list, so not sure what you were trying to accomplish there.
All in all, I know the following code doesn't get the output you are expecting (or maybe it does), but hopefully it sets you on the more correct path.
(I removed name because it was not used)
def mutations (mutated,clean,recognition):
""" (list of str, str, list of str) -> NoneType
"""
# Loop over 'mutated' list. We need the index to update the list
for i,strand in enumerate(mutated):
# Loop over 'recognition' list
for rec in recognition:
# Find the indices in the two strings
strand_idx = strand.find(rec)
clean_idx = clean.find(rec)
# Check that 'rec' existed in both strings
if strand_idx > 0 and clean_idx > 0:
# both are found, so get the substrings
strand_str = strand[:strand_idx]
clean_str = clean[clean_idx:]
# debug these values
print(rec, (strand_idx, strand_str,), (clean_idx, clean_str, ))
# updated 'mutated' like this
mutated[i] = strand_str+clean_str
And, the output. (The first dna element changed, the second did not)
dna=['TGCAGAATTCGGTT','ACGTCCCGGGTTGC']
mutations(dna,'CCCGGGGAATTCTCGC',['GAATTC','CCCGGG'])
print(dna) # ['TGCAGAATTCTCGC', 'ACGTCCCGGGTTGC']

Python: remove() doesn't seems to work

I searched for a while but I can't find a solution to my problem. I'm still new to Python, so I'm sometime struggling with obvious things... Thanks by advance for your advises!
I have a list containing objects and duplicates of these objects, both have specific names: objects_ext and duplicatedObject_SREF_ext. What I want is that if there is a duplicated object in my list, check if the original object is also in list, if it is, remove the duplicated object from list.
I tried to use the remove() method, as there can only be one occurrence of each name in the list, but it doesn't work. Here is my code:
rawSelection = [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'crapacruk_SREF_high', u'doubidou_SREF_high', u'blahbli_SREF_high']
# objects with '_SREF_' in their names are the duplicated ones
for obj in rawSelection:
if '_SREF_' in str(obj):
rawName = str(obj).split('_')
rootName = rawName [0]
defName = rootName + '_' + '_'.join(rawName[2:])
if defName in rawSelection:
rawSelection.remove (obj)
# Always returns:
# [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'doubidou_SREF_high']
# Instead of:
# [u'crapacruk_high', u'doubidou_high', u'blahbli_high']
Edit: Oh, forgot to say that the duplicated object must be removed from list only if the original one is in it too.
The problem is that you're mutating the same list you're iterating over.
When you remove u'crapacruk_SREF_high' from the list, everything after it shifts to the left (this done on the C source code level) so the value of obj is now u'doubidou_SREF_high'. Then the end of the for loop comes and obj becomes the next element in the list, u'blahbli_SREF_high'.
To fix this you can copy the list over and get
for obj in rawSelection[:]:
...
You can turn the for loop from for obj in rawSelection: to for obj in list(rawSelection):. This should fix your issue as it iterates over the copy of the list. The way you do it, you modify the list while iterating over it, leading to problems.
rawSelection = [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'crapacruk_SREF_high', u'doubidou_SREF_high', u'blahbli_SREF_high']
for obj in list(rawSelection):
if '_SREF_' in str(obj):
rawName = str(obj).split('_')
rootName = rawName [0]
defName = rootName + '_' + '_'.join(rawName[2:])
if defName in rawSelection:
rawSelection.remove (obj)
print(rawSelection)
Break the problem up into subtasks
def get_orig_name(name):
if '_SREF_' in name:
return '_'.join(name.split('_SREF_'))
else:
return name
Then just construct a new list with no dups
rawSelection = [u'crapacruk_high',
u'doubidou_high',
u'blahbli_high',
u'crapacruk_SREF_high',
u'doubidou_SREF_high',
u'blahbli_SREF_high']
uniqueList = [ n for n in rawSelection if ('_SREF_' not in n) or
(get_orig_name(n) not in rawSelection ) ]
print uniqueList
You could use filter to get quite a clean solution.
def non_duplicate(s):
return not('_SREF_' in s and s.replace('_SREF', '') in raw_selection)
filtered_selection = filter(non_duplicate, raw_selection)
This will do what you want (note that it doesn't matter what order the items appear in):
rawSelection = list({i.replace('_SREF', '') for i in rawSelection})
This works by iterating through the original list, and removing the '_SREF' substring from each item. Then each edited string object is added to a set comprehension (that's what the {} brackets mean: a new set object is being created). Then the set object is turned back into a list object.
This works because for set objects, you can't have duplicate items, so when an attempt is made to add a duplicate, it fails (silently). Note that the order of the original items is not preserved.
EDIT: as #PeterDeGlopper pointed out in the comments, this does not work for the constraint that the _SREF_ item only gets removed only if the original appears. For that, we'll do the following:
no_SREF_Set = {i for i in rawSelection if '_SREF_' not in i}
rawSelection = list({i.replace('_SREF', '') if i.replace('_SREF', '') in no_SREF_Set else i for i in rawSelection})
You can combine this into a one-liner, but it's a little long for my taste:
rawSelection = list({i.replace('_SREF', '') if i.replace('_SREF', '') in {i for i in rawSelection if '_SREF_' not in i} else i for i in rawSelection})
This works by creating a set of the items that don't have '_SREF_', and then creating a new list (similar to the above) that only replaces the '_SREF' if the no '_SREF_' version of the item appears in the no_SREF_Set.

List comprehension break down, deconstruction and/or disassemble

In web2py I have been trying to break down this list comprehension so I can do what I like with the categories it creates. Any ideas as to what this breaks down to?
def menu_rec(items):
return [(x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children)) for x in items or []]
In addition the following is what uses it:
response.menu = [(SPAN('Catalog', _class='highlighted'), False, '',
menu_rec(db(db.category).select().as_trees()) )]
So far I've come up with:
def menu_rec(items):
for x in items:
return x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children))
I've got other variations of this but, every variation only gives me back 1(one) category, when compared to the original that gives me all the categories.
Can anyone see where I'm messing this up at? Any and all help is appreciated, thank you.
A list comprehension builds a list by appending:
def menu_rec(items):
result = []
for x in items or []:
url = URL('shop', 'category', args=pretty_url(x.id, x.slug))
menu = menu_rec(x.children) # recursive call
result.append((x.title, None, url, menu))
return result
I've added two local variables to break up the long line somewhat, and to show how it recursively calls itself.
Your version returned directly out of the for loop, during the first iteration, and never built up a list.
You don't want to do return. Instead append to a list and then return the list:
def menu_rec(items):
result = []
for x in items:
result.append(x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children)))
return result
If you do return, it will return the value after only the first iteration. Instead, keep adding it to a list and then return that list at the end. This will ensure that your result list only gets returned when all the values have been added instead of just return one value.

Python nested for loop: what am I doing wrong?

I am working with data pulled from a spreadsheet-like file. I am trying to find, for each "ligand", the item with the lowest corresponding "energy". To do this I'm trying to make a list of all the ligands I find in the file, and compare them to one another, using the index value to find the energy of each ligand, keeping the one with the lowest energy. However, the following loop is not working out for me. The program won't finish, it just keeps running until I cancel it manually. I'm assuming this is due to an error in the structure of my loop.
for item in ligandList:
for i in ligandList:
if ligandList.index(item) != ligandList.index(i):
if ( item == i ) :
if float(lineList[ligandList.index(i)][42]) < float(lineList[ligandList.index(item)][42]):
lineList.remove(ligandList.index(item))
else:
lineList.remove(ligandList.index(i))
As you can see, I've created a separate ligandList containing the ligands, and am using the current index of that list to access the energy values in the lineList.
Does anyone know why this isn't working?
It is a bit hard to answer without some actual data to play with, but I hope this works, or at least leads you into the right direction:
for idx1, item1 in enumerate(ligandList):
for idx2, item2 in enumerate(ligandList):
if idx1 == idx2: continue
if item1 != item2: continue
if float(lineList[idx1][42]) < float(lineList[idx2][42]):
del lineList [idx1]
else:
del lineList [idx2]
That’s a really inefficient way of doing things. Lots of index calls. It might just feel infinite because it’s slow.
Zip your related things together:
l = zip(ligandList, lineList)
Sort them by “ligand” and “energy”:
l = sorted(l, key=lambda t: (t[0], t[1][42]))
Grab the first (lowest) “energy” for each:
l = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(l, key=lambda t: t[0]))
Yay.
result = ((lig, lin[1].next()[1]) for lig, lin in itertools.groupby(
sorted(zip(ligandList, lineList), key=lambda t: (t[0], t[1][42])),
lambda t: t[0]
))
It would probably look more flattering if you made lineList contain classes of some kind.
Demo
You look like you're trying to find the element in ligandList with the smallest value in index 42. Let's just do that....
min(ligandList, key=lambda x: float(x[42]))
If these "Ligands" are something you use regularly, STRONGLY consider writing a class wrapper for them, something like:
class Ligand(object):
def __init__(self,lst):
self.attr_name = lst[index_of_attr] # for each attribute
... # for each attribute
... # etc etc
self.energy = lst[42]
def __str__(self):
"""This method defines what the class looks like if you call str() on
it, e.g. a call to print(Ligand) will show this function's return value."""
return "A Ligand with energy {}".format(self.energy) # or w/e
def transmogfiscate(self,other):
pass # replace this with whatever Ligands do, if they do things...
In which case you can simply create a list of the Ligands:
ligands = [Ligand(ligand) for ligand in ligandList]
and return the object with the smallest energy:
lil_ligand = min(ligands, key=lambda ligand: ligand.energy)
As a huge aside, PEP 8 encourages the use of the lowercase naming convention for variables, rather than mixedCase as many languages use.

Categories