Confused by output from simple Python loop - python

I have a list of items and want to remove all items containing the number 16 (here a string, ind='16'). The list has six items with '16' in and yet my loop consistently only removes five of them. Puzzled (I even ran it on two separate machines)!
lines=['18\t4', '8\t5', '16\t5', '19\t6', '15\t7', '5\t8', '16\t8', '21\t8', '20\t12', '22\t13', '7\t15', '5\t16', '8\t16', '21\t16', '4\t18', '6\t19', '12\t20', '8\t21', '16\t21', '13\t22']
ind='16'
for query in lines:
if ind in query:
lines.remove(query)
Subsequently, typing 'lines' gives me: ['18\t4', '8\t5', '19\t6', '15\t7', '5\t8', '21\t8', '20\t12', '22\t13', '7\t15', '8\t16', '4\t18', '6\t19', '12\t20', '8\t21', '13\t22']
i.e. the item '8\t16' is still in the list???
Thank you
Clive

It's a bad idea to modify the list you are iterating over, as removing an item can confuse the iterator. Your example can be handled with a simple list comprehension, which just creates a new list to assign to the original name.
lines = [query for query in lines if ind not in query]
Or, use the filter function:
# In Python 2, you can omit the list() wrapper
lines = list(filter(lambda x: ind not in x, lines))

Note: Never modify list while looping
lines=['18\t4', '8\t5', '16\t5', '19\t6', '15\t7', '5\t8', '16\t8', '21\t8', '20\t12', '22\t13', '7\t15', '5\t16', '8\t16', '21\t16', '4\t18', '6\t19', '12\t20', '8\t21', '16\t21', '13\t22']
ind = '16'
new_lines = [ x for x in lines if ind not in x ]

Related

Printing Items From a Weaved List in a For Loop

I am trying to write a short python script for use in a genome assembly. I have generated a long list of compressed files, that are listed alphabetically into List1 and List2.
List1=[R1_aa.fq.gz, R1_ab.fq.gz, R1_ac.fq.gz]
List2=[R2_aa.fq.gz, R2_ab.fq.gz, R2_ac.fq.gz]
Where both lists go all the way to R1/2_bv.fq.gz The first part of my script needs to generate pe# for as many items as there are in the list. In the above examples of my List1 and List2, it would be pe1 pe2 pe3. This is easily done, and I am not having an issue in my script. Where I encounter my problem, is in the second half, where I need to generate text that says where to locate my files in the lists. For instance:
pe1="Path/To_File/R1.aa.fq.gz Path/To_File/R2.aa.fq.gz", pe2="Path/To_File/R1.ab.fq.gz Path/To_file/R2.ab.fq.gz, and so on.
Below is a portion of my script.
List1 = file1in.read().split('\n')
List2 = file2in.read().split('\n')
CombinedList = []
for i,j in zip(List1, List2)
CombinedList.append([i,j])
for i in range(len(CombinedList)//2):
print("pe"+str(i+1), file=Output)
for i in range(len(CombinedList)//2):
print("pe"+str(i+1)+'="'f'{Path}/To_List1/'+CombinedList[i]+" "+f'{Path}/To_list2/'+CombinedList[i+1], file=Output)
exit()
What I am instead getting as an output is the following:
pe1="/Users/devonboland/Desktop/Test RGA/PU_L/PairedUnmapped_R1_split_aa.fq.gz /Users/devonboland/Desktop/Test RGA/PU_R/PairedUnmapped_R2_split_aa.fq.gz"
pe2="/Users/devonboland/Desktop/Test RGA/PU_L/PairedUnmapped_R2_split_aa.fq.gz /Users/devonboland/Desktop/Test RGA/PU_R/PairedUnmapped_R1_split_ab.fq.gz"
pe3="/Users/devonboland/Desktop/Test RGA/PU_L/PairedUnmapped_R1_split_ab.fq.gz /Users/devonboland/Desktop/Test RGA/PU_R/PairedUnmapped_R2_split_ab.fq.gz"
I have been at this small script for over 2 weeks now and have gotten nowhere fast, I would appreciate any help offered!
Based on comments and some lucky guessing, this is what you seem to be looking for.
List1 = ["R1_aa.fq.gz", "R1_ab.fq.gz", "R1_ac.fq.gz"]
List2 = ["R2_aa.fq.gz", "R2_ab.fq.gz", "R2_ac.fq.gz"]
# No need to .append, just create the list
CombinedList = list(zip(List1, List2))
# Apparently
Path = "/Users/devonboland/Desktop/Test RGA"
PeList = []
for i in range(len(CombinedList)):
PeList.append(
f'pe{str(i+1)}="{Path}/PU_L/{CombinedList[i][0]}'
f' {Path}/PU_R/{CombinedList[i][1]}"')
print(", ".join(PeList)
Notice how the items of CombinedList are pairs of items, and the length of the list is simply the number of pairs. Notice also how you can use braces inside an f-string to refer to variables, like you were already doing in some places but not in others. And of course there's no need to exit() at the end of a script; Python will naturally stop executing it when it reaches the end.
... Actually there is no need to zip the lists into pairs, just loop over one and fetch items at the same index from the other in the same loop.
for i in range(len(List1)):
PeList.append(
f'pe{str(i+1)}="{Path}/PU_L/{List1[i]}'
f' {Path}/PU_R/{List2[i]}"')

Grouping Similar Strings in a long string [Python]

I have four or five strings that all have a sequence, i want to group them in a list.
For example:
cake/1/
cake/2/
big/1/
nice/1/
cake/3/
I need the cakes in a list, the big in a list and the nice in a list
Here's what i've tried.
res = [list(i) for j, i in groupby(y, lambda a: a.split('/')[0])]
This didn't work, I thought of using regex but i'm not sure if that's anything to move forward through.
Here's the expected output
[['cake/1/', 'cake/2/', 'cake/3/'], ['big/1/'], ['nice/1/']]
You was nearly right:
groupby is changing the group each time the key is changing:
1112111 will be grouped as: 111 - 2 - 111
So if you want to guarantee that all your groups will be joined in one, you should sort your list of strings first so all same first-words-strings will be nearby and will not be splitted by another first-words-strings:
y = [
'cake/1/',
'cake/2/',
'big/1/',
'nice/1/',
'cake/3/'
]
res = [list(i) for j, i in groupby(sorted(y), lambda a: a.split('/')[0])]
^
|
HERE --------------------------------+
[['big/1/'], ['cake/1/', 'cake/2/', 'cake/3/'], ['nice/1/']]

How to filter elements of Cartesian product following specific ordering conditions

I have to generate multiple reactions with different variables. They have 3 elements. Let's call them B, S and H. And they all start with B1. S can be appended to the element if there is at least one B. So it can be B1S1 or B2S2 or B2S1 etc... but not B1S2. The same goes for H. B1S1H1 or B2S2H1 or B4S1H1 but never B2S2H3. The final variation would be B5S5H5. I tried with itertools.product. But I don't know how to get rid of the elements that don't match my condition and how to add the next element. Here is my code:
import itertools
a = list(itertools.product([1, 2, 3, 4], repeat=4))
#print (a)
met = open('random_dat.dat', 'w')
met.write('Reactions')
met.write('\n')
for i in range(1,256):
met.write('\n')
met.write('%s: B%sS%sH%s -> B%sS%sH%s' %(i, a[i][3], a[i][2], a[i][1], a[i][3], a[i][2], a[i][1]))
met.write('\n')
met.close()
Simple for loops will do what you want:
bsh = []
for b in range(1,6):
for s in range(1,b+1):
for h in range(1,b+1):
bsh.append( f"B{b}S{s}H{h}" )
print(bsh)
Output:
['B1S1H1', 'B2S1H1', 'B2S1H2', 'B2S2H1', 'B2S2H2', 'B3S1H1', 'B3S1H2', 'B3S1H3',
'B3S2H1', 'B3S2H2', 'B3S2H3', 'B3S3H1', 'B3S3H2', 'B3S3H3', 'B4S1H1', 'B4S1H2',
'B4S1H3', 'B4S1H4', 'B4S2H1', 'B4S2H2', 'B4S2H3', 'B4S2H4', 'B4S3H1', 'B4S3H2',
'B4S3H3', 'B4S3H4', 'B4S4H1', 'B4S4H2', 'B4S4H3', 'B4S4H4', 'B5S1H1', 'B5S1H2',
'B5S1H3', 'B5S1H4', 'B5S1H5', 'B5S2H1', 'B5S2H2', 'B5S2H3', 'B5S2H4', 'B5S2H5',
'B5S3H1', 'B5S3H2', 'B5S3H3', 'B5S3H4', 'B5S3H5', 'B5S4H1', 'B5S4H2', 'B5S4H3',
'B5S4H4', 'B5S4H5', 'B5S5H1', 'B5S5H2', 'B5S5H3', 'B5S5H4', 'B5S5H5']
Thanks to #mikuszefski for pointing out improvements.
Patrick his answer in list comprehension style
bsh = [f"B{b}S{s}H{h}" for b in range(1,5) for s in range(1,b+1) for h in range(1,b+1)]
Gives
['B1S1H1',
'B2S1H1',
'B2S1H2',
'B2S2H1',
'B2S2H2',
'B3S1H1',
'B3S1H2',
'B3S1H3',
'B3S2H1',
'B3S2H2',
'B3S2H3',
'B3S3H1',
'B3S3H2',
'B3S3H3',
'B4S1H1',
'B4S1H2',
'B4S1H3',
'B4S1H4',
'B4S2H1',
'B4S2H2',
'B4S2H3',
'B4S2H4',
'B4S3H1',
'B4S3H2',
'B4S3H3',
'B4S3H4',
'B4S4H1',
'B4S4H2',
'B4S4H3',
'B4S4H4']
I would implement your "use itertools.product and get rid off unnecessary elements" solution following way:
import itertools
a = list(itertools.product([1,2,3,4,5],repeat=3))
a = [i for i in a if (i[1]<=i[0] and i[2]<=i[1] and i[2]<=i[0])]
Note that I assumed last elements needs to be smaller or equal than any other. Note that a is now list of 35 tuples each holding 3 ints. So you need to made strs of them for example using so-called f-string:
a = [f"B{i[0]}S{i[1]}H{i[2]}" for i in a]
print(a)
output:
['B1S1H1', 'B2S1H1', 'B2S2H1', 'B2S2H2', 'B3S1H1', 'B3S2H1', 'B3S2H2', 'B3S3H1', 'B3S3H2', 'B3S3H3', 'B4S1H1', 'B4S2H1', 'B4S2H2', 'B4S3H1', 'B4S3H2', 'B4S3H3', 'B4S4H1', 'B4S4H2', 'B4S4H3', 'B4S4H4', 'B5S1H1', 'B5S2H1', 'B5S2H2', 'B5S3H1', 'B5S3H2', 'B5S3H3', 'B5S4H1', 'B5S4H2', 'B5S4H3', 'B5S4H4', 'B5S5H1', 'B5S5H2', 'B5S5H3', 'B5S5H4', 'B5S5H5']
However you might also use another methods of formatting instead of f-string if you wish.

List index out of range error must have an index indication

I have a nested_list that looks like
[
['"1"', '"Casey"', '176544.328149', '0.584286566204162', '0.415713433795838', '0.168573132408324'],
['"2"', '"Riley"', '154860.665173', '0.507639071226889', '0.492360928773111', '0.0152781424537786'],
['"3"', '"Jessie"', '136381.830656', '0.47783426831522', '0.52216573168478', '0.04433146336956'],
['"4"', '"Jackie"', '132928.78874', '0.421132601798505', '0.578867398201495', '0.15773479640299'],
['"5"', '"Avery"', '121797.419516', '0.335213073103216', '0.664786926896784', '0.329573853793568']
]
(My real nested_listis a very long list). And I tried to extract 2 data from each sublist and here is what I did
numerical_list = []
child_list = []
for l in nested_list:
child_list.append(l[1])
child_list.append(float(l[2]))
numerical_list.append(child_list)
print(numerical_list)
This gave me an list index out of range error on the line of child_list.append(l[1]). However, if I change that for l in nested_list: to for l in nested_list[:4]: or any range that is within the length of nested_list, it worked properly. This doesn't make any sense to me. Could someone help me out on finding where is wrong? Thank you~
If you are just interested in the first two elements, one way is to use try... except, other direct way is to check for the length of the list as following.
This way you only append the lists where the 1st and the 2nd element exist.
numerical_list = []
child_list = []
for l in nested_list:
if len(l>=3):
child_list.append(l[1])
child_list.append(float(l[2]))
numerical_list.append(child_list)
print(numerical_list)

Get just the very next list within a nested list in python

How do you get the very next list within a nested list in python?
I have a few lists:
charLimit = [101100,114502,124602]
conditionalNextQ = [101101, 101200, 114503, 114504, 124603, 124604]`
response = [[100100,4]
,[100300,99]
,[1100500,6]
,[1100501,04]
,[100700,12]
,[100800,67]
,[100100,64]
,[100300,26]
,[100500,2]
,[100501,035]
,[100700,9]
,[100800,8]
,[101100,"hello"]
,[101101,"twenty"] ... ]
for question in charLimit:
for limitQuestion in response:
limitNumber = limitQuestion[0]
if question == limitNumber:
print(limitQuestion)
The above code is doing what I want, i.e. printing the list instances in response when it contains one of the numbers in charlimit. However, I also want it to print the immediate next value in response also.
For example the second-to-last value in response contains 101100 (a value thats in charlimit) so I want it to not only print
101100,"hello"
(as the code does at the moment)
but the very next list also (and only the next)
101100,"hello"
101101,"twenty"
Thank is advance for any help here. Please note that response is a verrrrry long list and so I'm looking to make things fairly efficient if possible, although its not crucial in the context of this work. I'm probably missing something very simple but cant find examples of anyone doing this without using specific indexes in very small lists.
You can use enumerate
Ex:
charLimit = [101100,114502,124602]
conditionalNextQ = [101101, 101200, 114503, 114504, 124603, 124604]
response = [[100100,4]
,[100300,99]
,[1100500,6]
,[1100501,04]
,[100700,12]
,[100800,67]
,[100100,64]
,[100300,26]
,[100500,2]
,[100501,035]
,[100700,9]
,[100800,8]
,[101100,"hello"]
,[101101,"twenty"]]
l = len(response) - 1
for question in charLimit:
for i, limitQuestion in enumerate(response):
limitNumber = limitQuestion[0]
if question == limitNumber:
print(limitQuestion)
if (i+1) <= l:
print(response[i+1])
Output:
[101100, 'hello']
[101101, 'twenty']
I would eliminate the loop over charLimit and loop over response instead. Using enumerate in this loop allows us to access the next element by index, in the case that we want to print it:
for i, limitQuestion in enumerate(response, 1):
limitNumber = limitQuestion[0]
# use the `in` operator to check if `limitNumber` equals any
# of the numbers in `charLimit`
if limitNumber in charLimit:
print(limitQuestion)
# if this isn't the last element in the list, also
# print the next one
if i < len(response):
print(response[i])
If charLimit is very long, you should consider defining it as a set instead, because sets have faster membership tests than lists:
charLimit = {101100,114502,124602}

Categories