mutating dna in parallel lists python - python

def mutations (list_a,string1,name,list_b):
""" (list of str, str, list of str, list of str) -> NoneType
"""
dna=list_a
for i in range(len(list_b)):
strand=dna[:dna.index(list_b[i])]
string1=string1[string1.index(list_b[i]):]
dna[strand+string1]
>>>dna=['TGCAGAATTCGGTT','ACGTCCCGGGTTGC']
>>>mutations(dna,'CCCGGGGAATTCTCGC',['EcoRI','SmaI'],['GAATTC','CCCGGG'])
>>>mutated
>>>['TGCAGAATTCTCGC','ACGTCCCGGGGAATTCTCGC']
It's suppose to modify the first parameter. So basically im trying to modify list_a and making it change to ['TGCAGAATTCTCGC','ACGTCCCGGGGAATTCTCGC'] however, i get an error saying
strand=dna[:dna.index(string1[i])].
ValueError: 'GAATTC' is not in list
Also, is there a way if the sequence does not exist, it doesn't modify the function?

Well if I understand you correctly, you want to check each element in list_a if it contains its corresponding element from list_b. If so, you want to modify the element from list_a by replacing the rest of the string (including the list_b element) with part of a control string that does also contain the element from list_b, right?!
Ideally you would put this in your question!!
A way of doing this would be as follow:
def mut(list_a, control, list_b):
check_others = Falsee
for i in range(len(list_a)): # run through list_a (use xrange in python 2.x)
if i == len(list_b): # if we are at the end of list_b we run will
# check all other elements if list_b (see below)
check_others = True
if not check_others: # this is the normal 1 to 1 match.
if list_b[i] in list_a[i]: # if the element from list_b is in it
# correct the element
list_a[i] = list_a[i][:list_a[i].index(list_b[i])] +\
control[control.index(list_b[i]):]
else: # this happens if we are at the end of list_b
for j in xrange(len(list_b)): # run through list_b for the start
if list_b[j] in list_a[i]:
list_a[i] = list_a[i][:list_a[i].index(list_b[j])] +\
control[control.index(list_b[j]):]
break # only the first match with an element in list_b is used!

As described in the comments, dna is a list, not a string, so finding a substring won't quite work how you want.
dna=list_a is unnecessary, and dna[strand+string1] doesn't modify a list, so not sure what you were trying to accomplish there.
All in all, I know the following code doesn't get the output you are expecting (or maybe it does), but hopefully it sets you on the more correct path.
(I removed name because it was not used)
def mutations (mutated,clean,recognition):
""" (list of str, str, list of str) -> NoneType
"""
# Loop over 'mutated' list. We need the index to update the list
for i,strand in enumerate(mutated):
# Loop over 'recognition' list
for rec in recognition:
# Find the indices in the two strings
strand_idx = strand.find(rec)
clean_idx = clean.find(rec)
# Check that 'rec' existed in both strings
if strand_idx > 0 and clean_idx > 0:
# both are found, so get the substrings
strand_str = strand[:strand_idx]
clean_str = clean[clean_idx:]
# debug these values
print(rec, (strand_idx, strand_str,), (clean_idx, clean_str, ))
# updated 'mutated' like this
mutated[i] = strand_str+clean_str
And, the output. (The first dna element changed, the second did not)
dna=['TGCAGAATTCGGTT','ACGTCCCGGGTTGC']
mutations(dna,'CCCGGGGAATTCTCGC',['GAATTC','CCCGGG'])
print(dna) # ['TGCAGAATTCTCGC', 'ACGTCCCGGGTTGC']

Related

Segregate the list based on condition that starts with same pattern `string`

I have below list where i would like to segregate based on condition where all strings that starts with same string would become a newlist
Eg:-
list1 = ["glibc-2.11.3/include/sys/file.h", "glibc-2.11.3/include/sys/ioctl.h", "glibc-2.11.3/lib/crtn.o", "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h" , "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h", "test-3.7.10/asm/posix_types.h", "test-3.7.10/dsm/posix_types.h"]
Here is my try:-
list1 = ["glibc-2.11.3/include/sys/file.h", "glibc-2.11.3/include/sys/ioctl.h", "glibc-2.11.3/lib/crtn.o", "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h" , "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h"]
element = list1[0].split("/")[0]
newlist = []
for i in list1:
if i.startswith(element):
newlist.append(i)
print newlist
o/p:- ['glibc-2.11.3/include/sys/file.h', 'glibc-2.11.3/include/sys/ioctl.h', 'glibc-2.11.3/lib/crtn.o']
I get the 1st set of paths that starts with same string. I need to loop over for other remaining sets.
Basically What i am looking is , for a 1st iteration i am expecting to get all paths that starts with glibc-2.11.3 and for 2nd iteration all paths that starts with linux-libc-headers-2.6.32..so on. Actually i need to perform some check on set of same paths (starts with same string) that gets returned. Please help!
Use a dictionary to keep track of your filepaths
list1 = ["glibc-2.11.3/include/sys/file.h", "glibc-2.11.3/include/sys/ioctl.h", "glibc-2.11.3/lib/crtn.o", "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h" , "linux-libc-headers-2.6.32/asm-generic/bitsperlong.h", "test-3.7.10/asm/posix_types.h", "test-3.7.10/dsm/posix_types.h"]
directories = {}
for filepath in list1:
key = filepath.split("/")[0]
directories.setdefault(key, []).append(filepath)
print(directories)
Outputs:
{'glibc-2.11.3': ['glibc-2.11.3/include/sys/file.h',
'glibc-2.11.3/include/sys/ioctl.h',
'glibc-2.11.3/lib/crtn.o'],
'linux-libc-headers-2.6.32': ['linux-libc-headers-2.6.32/asm-generic/bitsperlong.h',
'linux-libc-headers-2.6.32/asm-generic/bitsperlong.h'],
'test-3.7.10': ['test-3.7.10/asm/posix_types.h',
'test-3.7.10/dsm/posix_types.h']}
list(directories.items()) would give you the list of lists you were trying to create, but instead of doing that you can just use directories.items() the exact same way you would use a list of lists.
dictionary.setdefault(key, []) is a quirky way of saying give me the list at this dictionary key or if there is not already a list there, create a new list and save it in the dictionary under this dictionary key and then give me that. documentation.

Manipulate the output of if any for loop

I need to compare the following sequences list items:
sequences = ['sphere_v002_', 'sphere_v002_0240_', 'test_single_abc_f401']
to:
folder = 'sphere_v002'
and then work on the list items containing folder.
I have a working function for this but I want to improve it.
Current code is:
foundSeq = False
for seq in sequences:
headName = os.path.splitext(seq.head())[0]
#Check name added exception for when name has a last underscore
if headName == folder or headName[:-1] == folder:
foundSeq = True
sequence = seq
if not foundSeq:
...
My improvement looks like this:
if any(folder in os.path.splitext(seq.head())[0] for seq in sequences):
print seq
But then I get the following error:
local variable seq referenced before the assignment
How can I get the correct output working with the improved solution?
any returns a Boolean value only, it won't store in a variable seq the element within sequences when your condition is satisfied.
What you can do is use a generator and utilize the fact None is "Falsy":
def get_seq(sequences, folder):
for seq in sequences:
if folder in os.path.splitext(seq.head())[0]:
yield seq
for seq in get_seq(sequences, folder):
print seq
You can rewrite this, if you wish, as a generator expression:
for seq in (i for i in sequences if folder in os.path.splitext(i.head())[0]):
print seq
If the condition is never specified, the generator or generator expression will not yield any values and the logic within your loop will not be processed.
As pointed out by jpp, any just return a boolean. So the if any is not the good solution in this particular case.
Like suggested by thebjorn, the most efficient code for us so far consists in the use of filter function.
sequences = ['sphere_v002_', 'sphere_v002_0240_', 'test_single_abc_f401']
match = filter(lambda x: 'sphere_v002' == x[:-1] or 'sphere_v002' == x, sequences)
print match
['sphere_v002_']

Python: remove() doesn't seems to work

I searched for a while but I can't find a solution to my problem. I'm still new to Python, so I'm sometime struggling with obvious things... Thanks by advance for your advises!
I have a list containing objects and duplicates of these objects, both have specific names: objects_ext and duplicatedObject_SREF_ext. What I want is that if there is a duplicated object in my list, check if the original object is also in list, if it is, remove the duplicated object from list.
I tried to use the remove() method, as there can only be one occurrence of each name in the list, but it doesn't work. Here is my code:
rawSelection = [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'crapacruk_SREF_high', u'doubidou_SREF_high', u'blahbli_SREF_high']
# objects with '_SREF_' in their names are the duplicated ones
for obj in rawSelection:
if '_SREF_' in str(obj):
rawName = str(obj).split('_')
rootName = rawName [0]
defName = rootName + '_' + '_'.join(rawName[2:])
if defName in rawSelection:
rawSelection.remove (obj)
# Always returns:
# [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'doubidou_SREF_high']
# Instead of:
# [u'crapacruk_high', u'doubidou_high', u'blahbli_high']
Edit: Oh, forgot to say that the duplicated object must be removed from list only if the original one is in it too.
The problem is that you're mutating the same list you're iterating over.
When you remove u'crapacruk_SREF_high' from the list, everything after it shifts to the left (this done on the C source code level) so the value of obj is now u'doubidou_SREF_high'. Then the end of the for loop comes and obj becomes the next element in the list, u'blahbli_SREF_high'.
To fix this you can copy the list over and get
for obj in rawSelection[:]:
...
You can turn the for loop from for obj in rawSelection: to for obj in list(rawSelection):. This should fix your issue as it iterates over the copy of the list. The way you do it, you modify the list while iterating over it, leading to problems.
rawSelection = [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'crapacruk_SREF_high', u'doubidou_SREF_high', u'blahbli_SREF_high']
for obj in list(rawSelection):
if '_SREF_' in str(obj):
rawName = str(obj).split('_')
rootName = rawName [0]
defName = rootName + '_' + '_'.join(rawName[2:])
if defName in rawSelection:
rawSelection.remove (obj)
print(rawSelection)
Break the problem up into subtasks
def get_orig_name(name):
if '_SREF_' in name:
return '_'.join(name.split('_SREF_'))
else:
return name
Then just construct a new list with no dups
rawSelection = [u'crapacruk_high',
u'doubidou_high',
u'blahbli_high',
u'crapacruk_SREF_high',
u'doubidou_SREF_high',
u'blahbli_SREF_high']
uniqueList = [ n for n in rawSelection if ('_SREF_' not in n) or
(get_orig_name(n) not in rawSelection ) ]
print uniqueList
You could use filter to get quite a clean solution.
def non_duplicate(s):
return not('_SREF_' in s and s.replace('_SREF', '') in raw_selection)
filtered_selection = filter(non_duplicate, raw_selection)
This will do what you want (note that it doesn't matter what order the items appear in):
rawSelection = list({i.replace('_SREF', '') for i in rawSelection})
This works by iterating through the original list, and removing the '_SREF' substring from each item. Then each edited string object is added to a set comprehension (that's what the {} brackets mean: a new set object is being created). Then the set object is turned back into a list object.
This works because for set objects, you can't have duplicate items, so when an attempt is made to add a duplicate, it fails (silently). Note that the order of the original items is not preserved.
EDIT: as #PeterDeGlopper pointed out in the comments, this does not work for the constraint that the _SREF_ item only gets removed only if the original appears. For that, we'll do the following:
no_SREF_Set = {i for i in rawSelection if '_SREF_' not in i}
rawSelection = list({i.replace('_SREF', '') if i.replace('_SREF', '') in no_SREF_Set else i for i in rawSelection})
You can combine this into a one-liner, but it's a little long for my taste:
rawSelection = list({i.replace('_SREF', '') if i.replace('_SREF', '') in {i for i in rawSelection if '_SREF_' not in i} else i for i in rawSelection})
This works by creating a set of the items that don't have '_SREF_', and then creating a new list (similar to the above) that only replaces the '_SREF' if the no '_SREF_' version of the item appears in the no_SREF_Set.

List comprehension break down, deconstruction and/or disassemble

In web2py I have been trying to break down this list comprehension so I can do what I like with the categories it creates. Any ideas as to what this breaks down to?
def menu_rec(items):
return [(x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children)) for x in items or []]
In addition the following is what uses it:
response.menu = [(SPAN('Catalog', _class='highlighted'), False, '',
menu_rec(db(db.category).select().as_trees()) )]
So far I've come up with:
def menu_rec(items):
for x in items:
return x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children))
I've got other variations of this but, every variation only gives me back 1(one) category, when compared to the original that gives me all the categories.
Can anyone see where I'm messing this up at? Any and all help is appreciated, thank you.
A list comprehension builds a list by appending:
def menu_rec(items):
result = []
for x in items or []:
url = URL('shop', 'category', args=pretty_url(x.id, x.slug))
menu = menu_rec(x.children) # recursive call
result.append((x.title, None, url, menu))
return result
I've added two local variables to break up the long line somewhat, and to show how it recursively calls itself.
Your version returned directly out of the for loop, during the first iteration, and never built up a list.
You don't want to do return. Instead append to a list and then return the list:
def menu_rec(items):
result = []
for x in items:
result.append(x.title,None,URL('shop', 'category',args=pretty_url(x.id, x.slug)),menu_rec(x.children)))
return result
If you do return, it will return the value after only the first iteration. Instead, keep adding it to a list and then return that list at the end. This will ensure that your result list only gets returned when all the values have been added instead of just return one value.

How to compare an element of a tuple (int) to determine if it exists in a list

I have the two following lists:
# List of tuples representing the index of resources and their unique properties
# Format of (ID,Name,Prefix)
resource_types=[('0','Group','0'),('1','User','1'),('2','Filter','2'),('3','Agent','3'),('4','Asset','4'),('5','Rule','5'),('6','KBase','6'),('7','Case','7'),('8','Note','8'),('9','Report','9'),('10','ArchivedReport',':'),('11','Scheduled Task',';'),('12','Profile','<'),('13','User Shared Accessible Group','='),('14','User Accessible Group','>'),('15','Database Table Schema','?'),('16','Unassigned Resources Group','#'),('17','File','A'),('18','Snapshot','B'),('19','Data Monitor','C'),('20','Viewer Configuration','D'),('21','Instrument','E'),('22','Dashboard','F'),('23','Destination','G'),('24','Active List','H'),('25','Virtual Root','I'),('26','Vulnerability','J'),('27','Search Group','K'),('28','Pattern','L'),('29','Zone','M'),('30','Asset Range','N'),('31','Asset Category','O'),('32','Partition','P'),('33','Active Channel','Q'),('34','Stage','R'),('35','Customer','S'),('36','Field','T'),('37','Field Set','U'),('38','Scanned Report','V'),('39','Location','W'),('40','Network','X'),('41','Focused Report','Y'),('42','Escalation Level','Z'),('43','Query','['),('44','Report Template ','\\'),('45','Session List',']'),('46','Trend','^'),('47','Package','_'),('48','RESERVED','`'),('49','PROJECT_TEMPLATE','a'),('50','Attachments','b'),('51','Query Viewer','c'),('52','Use Case','d'),('53','Integration Configuration','e'),('54','Integration Command f'),('55','Integration Target','g'),('56','Actor','h'),('57','Category Model','i'),('58','Permission','j')]
# This is a list of resource ID's that we do not want to reference directly, ever.
unwanted_resource_types=[0,1,3,10,11,12,13,14,15,16,18,20,21,23,25,27,28,32,35,38,41,47,48,49,50,57,58]
I'm attempting to compare the two in order to build a third list containing the 'Name' of each unique resource type that currently exists in unwanted_resource_types. e.g. The final result list should be:
result = ['Group','User','Agent','ArchivedReport','ScheduledTask','...','...']
I've tried the following that (I thought) should work:
result = []
for res in resource_types:
if res[0] in unwanted_resource_types:
result.append(res[1])
and when that failed to populate result I also tried:
result = []
for res in resource_types:
for type in unwanted_resource_types:
if res[0] == type:
result.append(res[1])
also to no avail. Is there something i'm missing? I believe this would be the right place to perform list comprehension, but that's still in my grey basket of understanding fully (The Python docs are a bit too succinct for me in this case).
I'm also open to completely rethinking this problem, but I do need to retain the list of tuples as it's used elsewhere in the script. Thank you for any assistance you may provide.
Your resource types are using strings, and your unwanted resources are using ints, so you'll need to do some conversion to make it work.
Try this:
result = []
for res in resource_types:
if int(res[0]) in unwanted_resource_types:
result.append(res[1])
or using a list comprehension:
result = [item[1] for item in resource_types if int(item[0]) in unwanted_resource_types]
The numbers in resource_types are numbers contained within strings, whereas the numbers in unwanted_resource_types are plain numbers, so your comparison is failing. This should work:
result = []
for res in resource_types:
if int( res[0] ) in unwanted_resource_types:
result.append(res[1])
The problem is that your triples contain strings and your unwanted resources contain numbers, change the data to
resource_types=[(0,'Group','0'), ...
or use int() to convert the strings to ints before comparison, and it should work. Your result can be computed with a list comprehension as in
result=[rt[1] for rt in resource_types if int(rt[0]) in unwanted_resource_types]
If you change ('0', ...) into (0, ... you can leave out the int() call.
Additionally, you may change the unwanted_resource_types variable into a set, like
unwanted_resource_types=set([0,1,3, ... ])
to improve speed (if speed is an issue, else it's unimportant).
The one-liner:
result = map(lambda x: dict(map(lambda a: (int(a[0]), a[1]), resource_types))[x], unwanted_resource_types)
without any explicit loop does the job.
Ok - you don't want to use this in production code - but it's fun. ;-)
Comment:
The inner dict(map(lambda a: (int(a[0]), a[1]), resource_types)) creates a dictionary from the input data:
{0: 'Group', 1: 'User', 2: 'Filter', 3: 'Agent', ...
The outer map chooses the names from the dictionary.

Categories