convert and process a dictionary to matrix in python - python

I have a big list in python like the following small example:
small example:
['GAATTCCTTGAGGCCTAAATGCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTACTTTGGTGCTCTTTATTTTGCGTATTTAAAAC', 'TAAGTCCCTAAGCATATATATAATCATGAGTAGTTGTGGGGAAAATAACACCATTAAATGTACCAAAACAAAAGACCGATCACAAACACTGCCGATGTTTCTCTGGCTTAAATTAAATGTATATACAACTTATATGATAAAATACTGGGC']
I want to make a new list in which every string will be converted to a new list and every list has some tuples. in fact I want to divide the length of each string by 10. the 1st tuple would be (1, 10) and the 2nd tuple would be (10, 20) until the end , depending on the length of the string. at the end, every string will be a list oftuples and finally I would have a list of lists.
in the small example the 1st string has 100 characters and the 2nd string has 150 characters.
for example the expected output for the small example would be:
new_list = [[(1, 10), (10, 20), (20, 30), (30, 40), (40, 50), (50, 60), (60, 70), (70, 80), (80, 90), (90, 100)], [(1, 10), (10, 20), (20, 30), (30, 40), (40, 50), (50, 60), (60, 70), (70, 80), (80, 90), (90, 100), (100, 110), (110, 120), (120, 130), (130, 140), (140, 150)]]
to make such list I made the following code but it does not return what I expect. do you know how to fix it?
mylist = []
valTup = list()
for count, char in enumerate(mylist):
if count % 10 == 0 and count > 0:
valTup.append(count)
else:
new_list.append(tuple(valTup))

I recommend to use the package boltons
boltons.iterutils
boltons.iterutils.chunked_iter(src, size) returns pieces of
the source iterable in size -sized chunks (this example was copied
from the docs):
>>> list(chunked_iter(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Example:
from boltons.iterutils import chunked_iter
adn = [
'GAATTCCTTGAGGCCTAAATGCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTACTTTGGTGCTCTTTATTTTGCGTATTTAAAAC',
'TAAGTCCCTAAGCATATATATAATCATGAGTAGTTGTGGGGAAAATAACACCATTAAATGTACCAAAACAAAAGACCGATCACAAACACTGCCGATGTTTCTCTGGCTTAAATTAAATGTATATACAACTTATATGATAAAATACTGGGC'
]
result = []
for s in adn:
result.append(list(chunked_iter(list(s), 10)))
print(result)

I suggest you the following solutions, the first one based on your code, the second one taking only one line, and finally the third one which is my preferred solution based on range(), zip() and slicing:
mylist = ['GAATTCCTTGAGGCCTAAATGCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTACTTTGGTGCTCTTTATTTTGCGTATTTAAAAC',
'TAAGTCCCTAAGCATATATATAATCATGAGTAGTTGTGGGGAAAATAACACCATTAAATGTACCAAAACAAAAGACCGATCACAAACACTGCCGATGTTTCTCTGGCTTAAATTAAATGTATATACAACTTATATGATAAAATACTGGGC']
# Here is the solution based on your code
resultlist = []
for s in mylist:
valTup = []
for count, char in enumerate(s, 1):
if count % 10 == 0:
valTup.append((count-10, count))
resultlist.append(valTup)
print(resultlist)
# Here is the one-line style solution
resultlist = [[(n-10, n) for n,char in enumerate(s, 1) if n % 10 == 0] for s in mylist]
print(resultlist)
# Here is my preferred solution
resultlist = []
for s in mylist:
temp = range(1+len(s))[::10]
resultlist.append(list(zip(temp[:-1], temp[1:])))
print(resultlist)

Are you looking for something like this?
mylist = ['GAATTCCTTGAGGCCTAAATGCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTACTTTGGTGCTCTTTATTTTGCGTATTTAAAAC', 'TAAGTCCCTAAGCATATATATAATCATGAGTAGTTGTGGGGAAAATAACACCATTAAATGTACCAAAACAAAAGACCGATCACAAACACTGCCGATGTTTCTCTGGCTTAAATTAAATGTATATACAACTTATATGATAAAATACTGGGC']
new_list1 = list()
new_list2 = list()
for i in range(len(mylist[0])/10):
if(10+i*10 <= len(mylist[0])):
new_list1.append(mylist[0][0+i*10:10+i*10])
else:
new_list1.append(mylist[0][0+i*10:])
for i in range(len(mylist[1])/10):
if(10+i*10 <= len(mylist[1])):
new_list2.append(mylist[1][0+i*10:10+i*10])
else:
new_list2.append(mylist[1][0+i*10:])
new_list = [new_list1,new_list2]
[['GAATTCCTTG', 'AGGCCTAAAT', 'GCATCGGGGT', 'GCTCTGGTTT',
'TGTTGTTGTT', 'ATTTCTGAAT', 'GACATTTACT', 'TTGGTGCTCT',
'TTATTTTGCG', 'TATTTAAAAC'], ['TAAGTCCCTA', 'AGCATATATA',
'TAATCATGAG', 'TAGTTGTGGG', 'GAAAATAACA', 'CCATTAAATG',
'TACCAAAACA', 'AAAGACCGAT', 'CACAAACACT', 'GCCGATGTTT',
'CTCTGGCTTA', 'AATTAAATGT', 'ATATACAACT', 'TATATGATAA',
'AATACTGGGC']]

Related

List comprehension not giving expected result

I have a list of tuples, with each tuple containing a tuple pair, coordinates.
list1 = [((11, 11), (12, 12)), ((21, 21), (22, 22)), ((31, 31), (32, 32))]
Using list comprehension I am trying to generate a list without them being paired but still in the same order.
I was able to get the result be looping and using .append() but I was trying to avoid this.
new_list = []
for i in list1:
for x in i:
new_list.append(x)
print(new_list)
this works and gives me the result I am looking for:
[(11, 11), (12, 12), (21, 21), (22, 22), (31, 31), (32, 32)]
But when I try list comprehension I get the last tuple pair repeated!
new_list = [x for x in i for i in list1]
print(new_list)
[(31, 31), (31, 31), (31, 31), (32, 32), (32, 32), (32, 32)]
I am sure it is a small thing I am doing wrong so would appreciate the help!!
try :
print([inner_tuple for outer_tuple in list1 for inner_tuple in outer_tuple])
Nested fors should come to the right side. for sub in list1 after that for x in sub.
And btw, you didn't get NameError: name 'i' is not defined because, you wrote that list-comprehension after the for loop. In that time i is a global variable and exists.
The execution of the loop is out-inner type
Your code misses is close but it takes x only
Your code:
new_list = []
for i in list1:
for x in i:
new_list.append(x)
which is equivalent to
new_list = [x for i in list1 for x in i]
So, now you can understand the variables how they are taking values from for loops

Return tuple with biggest increase of second value in a list of tuples

like the title says I have a list of tuples: [(3, 20), (9, 21), (18, 19)]. I need to find the tuple that has a positive y-increase wrt its predecessor. In this case 21-20 = 1. So tuple (9,21) should be returned. 19-21 = -1 so tuple (18,19) shouldn't be returned. The very first tuple in the list should never be returned. I've tried putting all the values in a list and then trying to figure it out but I'm clueless. It should work for lists of tuples of any length. I hope you guys can help me out, thanks in advance.
You could compare the second element of each tuple with the previous one, while iterating over the list:
data = [(3, 20), (9, 21), (18, 19), (1, 35), (4, 37), (1, 2)]
maxIncrease = [0, 0] # store max increase value and it's index
for i in range(1, len(data)):
lst = data[i - 1]
cur = data[i]
diff = cur[1] - lst[1]
if diff > maxIncrease[0]:
maxIncrease = [diff, i]
print(
f'Biggest increase of {maxIncrease[0]} found at index {maxIncrease[1]}: {data[maxIncrease[1]]}'
)
Out:
Biggest increase of 16 found at index 3: (1, 35)
I think something like that can solve your problem:
import numpy as np
data = [(3, 20), (9, 21), (18, 19), (10, 22)]
diff_with_previous = []
for i in range(len(data)):
if i == 0:
diff_with_previous.append(-np.inf)
else:
diff_with_previous.append(data[i][1] - data[i-1][1])
indices = np.where(np.array(diff_with_previous) > 0)
print([data[i] for i in indices[0]])
[EDIT]
Without numpy:
data = [(3, 20), (9, 21), (18, 19), (10, 22)]
indices = []
for i in range(1, len(data)):
if (data[i][1] - data[i-1][1]) > 0:
indices.append(i)
print([data[i] for i in indices])

How to sort list of two paired elements by their weight?

I'm dealing with such problem:
if __name__ == "__main__":
data = list(map(int, sys.stdin.read().split()))
n, capacity = data[0:2]
elem1 = data[2:(2 * n + 2):2]
elem2 = data[3:(2 * n + 2):2]
ziplist = list(zip(values,weights))
opt_value = get_optimal_value(capacity, elem1, elem2)
So, as I typing
3 40
20 40
50 60
70 80
I got such list
[(20, 40), (50, 60), (70, 80)]
I need to sort my list by the value of "weight", where weight is
elem1/elem2
While testing, I made such list
m = list(x/y for x,y in ziplist)
[0.5, 0.8333333333333334, 0.875]
And I see that the last element has the best weight, so I need my initial list sorted like this:
[(70, 80), (50, 60), (20, 40)]
I was reading about sorting with key, but I can't understand how to write my proper condition, something like that
newlist = ziplist.sort(key=lambda m = x/y for x, y in ziplist m)
And moreover, how can I work with my sorted list in order to get elem2 from the first index. So, I have such sorted list:
[(70, 80), (50, 60), (20, 40)]
#code implementation
a = 70 #output
b = 80 #output
sorted(l, key=lambda elem: elem[0] / elem[1], reverse=True)

Find duplicates in a list of lists with tuples

I am trying to find duplicates within tuples that are nested within a list. This whole construction is a list too. If there are other better ways to organize this to let my problem to be solved - I'd be glad to know, because this is something I build on the way.
pairsList = [
[1, (11, 12), (13, 14)], #list1
[2, (21, 22), (23, 24)], #list2
[3, (31, 32), (13, 14)], #list3
[4, (43, 44), (21, 22)], #list4
]
The first element in each list uniquely identifies each list.
From this object pairsList, I want to find out which lists have identical tuples. So I want to report that list1 has the same tuple as list3 (because both have (13,14). Likewise, list2 and list4 have the same tuple (both have (21,22)) and need to be reported. The position of tuples within the list doesn't matter (list2 and list4 both have (13,14) even though the position in the list the tuple has is different).
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
I am aware of sets and have used them to delete duplicates within the lists in other situations, but cannot understand how to solve this problem. I can check like this if one list contains any element from the other list:
list1 = [1, (11, 12), (13, 14)]
list2 = [3, (31, 32), (13, 14)]
print not set(list1).isdisjoint(list2)
>>>True
So, the code below lets me know what lists have same tuple(s) as the first one. But what is the correct way to perform this on all the lists?
counter = 0
for pair in pairsList:
list0 = pairsList[0]
iterList = pairsList[counter]
if not set(list0).isdisjoint(iterList):
print iterList[0] #print list ID
counter += 1
The first element in each list uniquely identifies each list.
Great, then let's convert it to a dict first:
d = {x[0]: x[1:] for x in pairsList}
# d:
{1: [(11, 12), (13, 14)],
2: [(21, 22), (23, 24)],
3: [(31, 32), (13, 14)],
4: [(43, 44), (21, 22)]}
Let's index the whole data structure:
index = {}
for k, vv in d.iteritems():
for v in vv:
index.setdefault(v, []).append(k)
Now index is:
{(11, 12): [1],
(13, 14): [1, 3],
(21, 22): [2, 4],
(23, 24): [2],
(31, 32): [3],
(43, 44): [4]}
The output result could be anything iterable later on such as (1,3),(2,4) or [1,3],[2,4]. It is the pairs I am interested in.
pairs = [v for v in index.itervalues() if len(v) == 2]
returns [[1,3],[2,4]].

List of minimal pairs from a pair of lists

Given two lists of integers, generate the shortest list of pairs where every value in both lists is present. The first of each pair must be a value from the first list, and the second of each pair must be a value from the second list. The first of each pair must be less than the second of the pair.
A simple zip will not work if the lists are different lengths, or if the same integer exists at the same position in each list.
def gen_min_pairs(uplist, downlist):
for pair in zip(uplist, downlist):
yield pair
Here is what I can come up with so far:
def gen_min_pairs(uplist, downlist):
up_gen = iter(uplist)
down_gen = iter(downlist)
last_up = None
last_down = None
while True:
next_out = next(up_gen, last_up)
next_down = next(down_gen, last_down)
if (next_up == last_up and
next_down == last_down):
return
while not next_up < next_down:
next_down = next(down_gen, None)
if next_down is None:
return
yield next_up, next_down
last_up = next_up
last_down = next_down
And here is a simple test routine:
if __name__ == '__main__':
from pprint import pprint
datalist = [
{
'up': [1,7,8],
'down': [6,7,13]
},
{
'up': [1,13,15,16],
'down': [6,7,15]
}
]
for dates in datalist:
min_pairs = [pair for pair in
gen_min_pairs(dates['up'], dates['down'])]
pprint(min_pairs)
The program produces the expect output for the first set of dates, but fails for the second.
Expected:
[(1, 6), (7, 13), (8, 13)]
[(1, 6), (1, 7), (13, 15)]
Actual:
[(1, 6), (7, 13), (8, 13)]
[(1, 6), (13, 15)]
I think this can be done while only looking at each element of each list once, so in the complexity O(len(up) + len(down)). I think it depends on the number elements unique to each list.
EDIT: I should add that we can expect these lists to be sorted with the smallest integer first.
EDIT: uplist and downlist were just arbitrary names. Less confusing arbitrary ones might be A and B.
Also, here is a more robust test routine:
from random import uniform, sample
from pprint import pprint
def random_sorted_sample(maxsize=6, pop=31):
size = int(round(uniform(1,maxsize)))
li = sample(xrange(1,pop), size)
return sorted(li)
if __name__ == '__main__':
A = random_sorted_sample()
B = random_sorted_sample()
min_pairs = list(gen_min_pairs(A, B))
pprint(A)
pprint(B)
pprint(min_pairs)
This generates random realistic inputs, calculates the output, and displays all three lists. Here is an example of what a correct implementation would produce:
[11, 13]
[1, 13, 28]
[(11, 13), (13, 28)]
[5, 15, 24, 25]
[3, 13, 21, 22]
[(5, 13), (15, 21), (15, 22)]
[3, 28]
[4, 6, 15, 16, 30]
[(3, 4), (3, 6), (3, 15), (3, 16), (28, 30)]
[2, 5, 20, 24, 26]
[8, 12, 16, 21, 23, 28]
[(2, 8), (5, 12), (5, 16), (20, 21), (20, 23), (24, 28), (26, 28)]
[3, 4, 5, 6, 7]
[1, 2]
[]
I had many ideas to solve this (see edit history ;-/) but none of them quite worked out or did it in linear time. It took me a while to see it, but I had a similar problem before so I really wanted to figure this out ;-)
Anyways, in the end the solution came when I gave up on doing it directly and started drawing graphs about the matchings. I think your first list simply defines intervals and you're looking for the items that fall into them:
def intervals(seq):
seq = iter(seq)
current = next(seq)
for s in seq:
yield current,s
current = s
yield s, float("inf")
def gen_min_pairs( fst, snd):
snd = iter(snd)
s = next(snd)
for low, up in intervals(fst):
while True:
# does it fall in the current interval
if low < s <= up:
yield low, s
# try with the next
s = next(snd)
else:
# nothing in this interval, go to the next
break
zip_longest is called izip_longest in python 2.x.
import itertools
def MinPairs(up,down):
if not (up or down):
return []
up=list(itertools.takewhile(lambda x:x<down[-1],up))
if not up:
return []
down=list(itertools.dropwhile(lambda x:x<up[0],down))
if not down:
return []
for i in range(min(len(up),len(down))):
if up[i]>=down[i]:
up.insert(i,up[i-1])
return tuple(itertools.zip_longest(up,down,fillvalue=(up,down)[len(up)>len(down)][-1]))
While not a complete answers (i.e. no code), have you tried looking at the numpy "where" module?

Categories