Python create two lists with one list comprehension - python

I am currently using nested for loops to add data to two lists at once. See below code.
headers, in the code below is a beautiful soup object.
openData = []
count = 0
openC = 7
closeData = []
closeC = 10
for j in headers:
for z in j:
for data in z:
count += 1
if count == 1:
openData.append(data)
elif count == openC:
openData.append(data)
openC += 6
if count == 4:
closeData.append(data)
elif closeC == count:
closeData.append(data)
closeC += 6
The two lists here are openData and closeData.
As a rough example, I know I could do something like
openData = [data for j in headers for z in j for data in z]
closeData = [data for j in headers for z in j for data in z]
I am worried that this would take roughly twice as long since the looping operations are being carried out twice. but is there a way to combine both statements like
openData, closeData = [list comprehension]
I am also confused about how to incorporate the if, elif logic into the list comprehension. Finally, is this something I should be doing? or would doing so be an abuse of list comprehensions. The above code I wrote works, but it looks ugly. My goal is better code than what I have.

My attempt (with some basic initial data):
#some initial data
data = list(range(20))
openData, closeData = data[1::6], data[4::6]
print(openData, closeData)
Prints:
[1, 7, 13, 19] [4, 10, 16]

Related

For all sets in a list, extract the first number only

I have a list that looks like this:
b = [{'dg_12.942_ch_293','dg_22.38_ca_627'},
{'dg_12.651_cd_286','dg_14.293_ce_334'},
{'dg_17.42_cr_432','dg_18.064_cm_461','dg_18.85_cn_474','dg_20.975_cf_489'}]
I want to keep only the first number for each item in each set:
b = [{'12','22'},
{'12','14'},
{'17','18','18','20'}]
I then want to find the difference between the smallest and the largest number of each set and put it in a list, so in this case I would have:
b = [3,2,3]
Ugly and without any sanity check, but do the work.
import re
SEARCH_NUMBER_REGEX = re.compile("(\d+)")
def foo(dataset):
out = []
for entries in dataset:
numbers = []
for entry in entries:
# Search for the first number in the str
n = SEARCH_NUMBER_REGEX.search(entry).group(1)
n = int(n)
numbers.append(n)
# Sort the numbers and sustract the last one (largest)
# by the first one (smallest)
numbers.sort()
out.append(numbers[-1] - numbers[0])
return out
b = [
{'dg_12.942_ch_293', 'dg_22.38_ca_627'},
{'dg_12.651_cd_286', 'dg_14.293_ce_334'},
{'dg_17.42_cr_432', 'dg_18.064_cm_461', 'dg_18.85_cn_474', 'dg_20.975_cf_489'}
]
print(b)
# > [10, 2, 3]
This is giving o/p as [10,2,3]
(The difference b/w 22 and 12 is 10)
b = [{'12','22'},
{'12','14'},
{'17','18','18','20'}]
l = []
for i in b:
large ,small = -99, 99
for j in i:
j = int(j)
if large < j:
large = j
if small >j:
small = j
l.append(large - small)
print(l)
Here's yet another way to do it:
import re
ba = [{'dg_12.942_ch_293', 'dg_22.38_ca_627'},
{'dg_12.651_cd_286', 'dg_14.293_ce_334'},
{'dg_17.42_cr_432', 'dg_18.064_cm_461', 'dg_18.85_cn_474', 'dg_20.975_cf_489'}]
bb = []
for s in ba:
ns = sorted([int(re.search(r'(\d+)', ss)[0]) for ss in s])
bb.append(ns[-1]-ns[0])
print(bb)
Output:
[10, 2, 3]
Or, if you want to be ridiculous:
ba = [{'dg_12.942_ch_293', 'dg_22.38_ca_627'},
{'dg_12.651_cd_286', 'dg_14.293_ce_334'},
{'dg_17.42_cr_432', 'dg_18.064_cm_461', 'dg_18.85_cn_474', 'dg_20.975_cf_489'}]
bb = [(n := sorted([int(re.search(r'(\d+)', ss)[0]) for ss in s]))[-1]-n[0] for s in ba]
print(bb)
In your final product I see it was "[3,2,3]" but if I am understanding your question correct, it would be [10,2,3]. Either way the code I have below will atleast point you in the right direction (hopefully).
This code will iterate through each tuple in the list and split the str (since that is all we want to compare) and add them into lists. These numbers are then evaluated and subtracts the smallest number from the biggest number, and places it in a separate array. This "separate array" is the final one as shown in your question.
Goodluck - hopefully this helps!
import re
b = [('dg_12.942_ch_293','dg_22.38_ca_627'), ('dg_12.651_cd_286','dg_14.293_ce_334'), ('dg_17.42_cr_432','dg_18.064_cm_461','dg_18.85_cn_474','dg_20.975_cf_489')]
final_array = []
for tup in b:
x = tup
temp_array = []
for num in x:
split_number = re.search(r'\d+', num).group()
temp_array.append(split_number)
difference = int(max(temp_array)) - int(min(temp_array))
final_array.append(difference)
print(final_array)

Fast way to sort lists in Python so that their corresponding index is at a similar time stamp?

I have two lists with the same number of elements, but the time stamps fluctuate and are not matched with the same element of the other lists. Is there a way to organize the lists so that the elements correspond with their same index from the other list?
Right now I have
sorted1 = []
sorted2 = []
for i in list1:
for x in list2:
if (i-1 <= x <= i+1):
sorted1.append(i)
sorted2.append(x)
break
This works, but runs extremely slow.
My lists are epoch times that need to be paired.
[1412121504, 1412121512, 1412121516, 1412121520, 1412121525, 1412121580]
[1412121470, 1412121515, 1412121525, 1412121560, 1412121580, 1412121600]
If they do not have a corresponding time in the other list that is within 1 second either way, I do not want to include them. I would want it to look like this,
[1412121516, 1412121525]
[1412121515, 1412121525]
Thank you for even reading all this.
You can maintain two counters to the lists after sorting them, then move them along while comparing the elements:
list1 = [1412121504, 1412121512, 1412121516, 1412121520, 1412121525, 1412121580]
list2 = [1412121470, 1412121515, 1412121525, 1412121560, 1412121580, 1412121600]
# sort the lists
list1.sort()
list2.sort()
# maintain two counters to each of the lists
list1i = 0
list2i = 0
paired1 = []
paired2 = []
while list1i < len(list1) and list2i < len(list2):
cur1 = list1[list1i]
cur2 = list2[list2i]
# too small, advance the first list counter
if cur1 < cur2 - 1:
list1i += 1
# too large, advance the second list counter
elif cur1 > cur2 + 1:
list2i += 1
# we found a pair, increment both to avoid duplicates
else:
paired1.append(cur1)
paired2.append(cur2)
list1i += 1
list2i += 1
print(paired1, paired2)

Python - multiple combinations maths question

I'm trying to make a program that lists all the 64 codons/triplet base sequences of DNA...
In more mathematical terms, there are 4 letters: A, T, G and C.
I want to list all possible outcomes where there are three letters of each and a letter can be used multiple times but I have no idea how!
I know there are 64 possibilities and I wrote them all down on paper but I want to write a program that generates all of them for me instead of me typing up all 64!
Currently, I am at this point but I have most surely overcomplicated it and I am stuck:
list = ['A','T','G','C']
list2 = []
y = 0
x = 1
z = 2
skip = False
back = False
for i in range(4):
print(list[y],list[y],list[y])
if i == 0:
skip = True
else:
y=y+1
for i in range(16):
print(list[y],list[y],list[x])
print(list[y],list[x], list[x])
print(list[y],list[x], list[y])
print(list[y],list[x], list[z])
if i == 0:
skip = True
elif z == 3:
back = True
x = x+1
elif back == True:
z = z-1
x = x-1
else:
x = x+1
z = z+1
Any help would be much appreciated!!!!
You should really be using itertools.product for this.
from itertools import product
l = ['A','T','G','C']
combos = list(product(l,repeat=3 ))
# all 64 combinations
Since this produces an iterator, you don't need to wrap it in list() if you're just going to loop over it. (Also, don't name your list list — it clobbers the build-in).
If you want a list of strings you can join() them as John Coleman shows in a comment under your question.
list_of_strings = ["".join(c) for c in product(l,repeat=3) ]
Look for for pemuations with repetitions there tons of code available for Python .
I would just use library , if you want to see how they implemented it look inside the library . These guys usually do it very efficiency
import itertools
x = [1, 2, 3, 4, 5, 6]
[p for p in itertools.product(x, repeat=2)]

Delete elements from list based on substring in Python

I have a huge list of strings where a couple of strings only differ in 2 or three characters like this:
ENSH-DFFEV1-5F
ENSH-DFFEV2-5F
ENSH-DFFEV3-5F
FVB.DFFVRV2-4T
FVB.DFFVRV3-4T
What I would like to do is to keep only those elements for which the number after the 'V' is the largest. From the above example I would like to have
ENSH-DFFEV3-5F
FVB.DFFVRV3-4T
Is there a simple way to do this in Python?
#stevieb is right, but anyway, I did the effort for you.
s = """
ENSH-DFFEV1-5F
ENSH-DFFEV2-5F
ENSH-DFFEV3-5F
FVB.DFFVRV2-4T
FVB.DFFVRV3-4T
""".split()
def custom_filter(s):
out = []
current_max = -1
for r in s:
v = int(r.rsplit('-', 1)[0][-1]) # <- you should probably edit this line to fit your data structure
if v > current_max:
current_max = v
out = []
if v == current_max:
out += [r]
return out
for e in custom_filter(s):
print e

python random.shuffle() in a while loop

I have a list:
k = [1,2,3,4,5]
Now I want 3 permutations of this list to be listed in another list but when I do this:
x = []
i = 0
while i < 3:
random.shuffle(k)
x.append(k)
i += 1
I end up with 3 times the same permutation of k in x, like this:
x = [[1,3,5,2,4], [1,3,5,2,4], [1,3,5,2,4]]
In stead of what I would like, something like this:
x = [[1,5,4,2,3], [1,3,5,2,4], [5,3,4,1,2]]
Note that it is not possible due to the way the data in k is gathered to place k inside the loop, as for I know this would solve the problem. The real code is this:
def create_random_chromosomes(genes):
temp_chromosomes = []
chromosomes = []
i = 0
while i < 2000:
print(genes)
random.shuffle(genes)
temp_chromosomes.append(genes)
i += 1
print(temp_chromosomes)
for element in temp_chromosomes:
if element not in chromosomes:
chromosomes.append(element)
return chromosomes
Shuffling a list changes it in-place, and you are creating 3 references to the same list. Create a copy of the list before shuffling:
x = []
for i in range(3):
kcopy = k[:]
random.shuffle(kcopy)
x.append(kcopy)
I've simplified your loop as well; just use for i in range(3). Or, to place this in the context of your full method:
def create_random_chromosomes(genes):
temp_chromosomes = []
chromosomes = []
for i in range(2000):
print(genes)
randomgenes = genes[:]
random.shuffle(randomgenes)
temp_chromosomes.append(randomgenes)
print(temp_chromosomes)
for element in temp_chromosomes:
if element not in chromosomes:
chromosomes.append(element)
return chromosomes
You can further simplify the above by using a set to weed out dupes:
def create_random_chromosomes(genes):
chromosomes = set()
randomgenes = genes[:]
for i in range(2000):
random.shuffle(randomgenes)
chromosomes.add(tuple(randomgenes))
return list(chromosomes)
This uses a tuple copy of the random genes list to fit the hashable constraint of set contents.
You can then even ensure that you return 2000 unique items regardless:
def create_random_chromosomes(genes):
chromosomes = set()
randomgenes = genes[:]
while len(chromosomes) < 2000:
random.shuffle(randomgenes)
chromosomes.add(tuple(randomgenes))
return list(chromosomes)

Categories