Split every character from string in list - python

I have a list and I need to split every string into individual characters.
mylist = ['TCTAGTCCAGATAATCTGGT', 'GTGTTGGTACTGTAATGAAA', 'AGTTCTCTGGATCCTTCGGA', 'GGAATTGACGTCCCCAGGAA', 'GTCGTTGTCGTTCAGGAGTT', 'GGAGTCCGTCAGAAGAGGTC', 'GATTCCGATCAGATGAAGAA', 'CTTTCTATCGGGAAGAGGAG', 'ATGTCTTGAGATCGGGTCGT', 'ATTAAGATCCTCCATGATTC', 'ATCGTCGAAAGTAGTGGGAA']
And I need
output = ['T', 'C', 'T', ... 'A', 'A']
If tried so many ways and can't figure it out.

You can just use embedded list comprehension for this.
mylist = ['TCTAGTCCAGATAATCTGGT', 'GTGTTGGTACTGTAATGAAA', 'AGTTCTCTGGATCCTTCGGA', 'GGAATTGACGTCCCCAGGAA', 'GTCGTTGTCGTTCAGGAGTT', 'GGAGTCCGTCAGAAGAGGTC', 'GATTCCGATCAGATGAAGAA', 'CTTTCTATCGGGAAGAGGAG', 'ATGTCTTGAGATCGGGTCGT', 'ATTAAGATCCTCCATGATTC', 'ATCGTCGAAAGTAGTGGGAA']
chars = [c for s in mylist for c in s]
print(chars)
# ['T', 'C', 'T', 'A', 'G', 'T', 'C', 'C', 'A', 'G', 'A', 'T', 'A', 'A', 'T', 'C', 'T', 'G', 'G', 'T', 'G', 'T', 'G', 'T', 'T', 'G', 'G', 'T', 'A', 'C', 'T', 'G', 'T', 'A', 'A', 'T', 'G', 'A', 'A', 'A', 'A', 'G', 'T', 'T', 'C', 'T', 'C', 'T', 'G', 'G', 'A', 'T', 'C', 'C', 'T', 'T', 'C', 'G', 'G', 'A', 'G', 'G', 'A', 'A', 'T', 'T', 'G', 'A', 'C', 'G', 'T', 'C', 'C', 'C', 'C', 'A', 'G', 'G', 'A', 'A', 'G', 'T', 'C', 'G', 'T', 'T', 'G', 'T', 'C', 'G', 'T', 'T', 'C', 'A', 'G', 'G', 'A', 'G', 'T', 'T', 'G', 'G', 'A', 'G', 'T', 'C', 'C', 'G', 'T', 'C', 'A', 'G', 'A', 'A', 'G', 'A', 'G', 'G', 'T', 'C', 'G', 'A', 'T', 'T', 'C', 'C', 'G', 'A', 'T', 'C', 'A', 'G', 'A', 'T', 'G', 'A', 'A', 'G', 'A', 'A', 'C', 'T', 'T', 'T', 'C', 'T', 'A', 'T', 'C', 'G', 'G', 'G', 'A', 'A', 'G', 'A', 'G', 'G', 'A', 'G', 'A', 'T', 'G', 'T', 'C', 'T', 'T', 'G', 'A', 'G', 'A', 'T', 'C', 'G', 'G', 'G', 'T', 'C', 'G', 'T', 'A', 'T', 'T', 'A', 'A', 'G', 'A', 'T', 'C', 'C', 'T', 'C', 'C', 'A', 'T', 'G', 'A', 'T', 'T', 'C', 'A', 'T', 'C', 'G', 'T', 'C', 'G', 'A', 'A', 'A', 'G', 'T', 'A', 'G', 'T', 'G', 'G', 'G', 'A', 'A']

you can use a list comprehension to create new sub lists where each char is split.
mylist = ['TCTAGTCCAGATAATCTGGT', 'GTGTTGGTACTGTAATGAAA', 'AGTTCTCTGGATCCTTCGGA', 'GGAATTGACGTCCCCAGGAA', 'GTCGTTGTCGTTCAGGAGTT', 'GGAGTCCGTCAGAAGAGGTC', 'GATTCCGATCAGATGAAGAA', 'CTTTCTATCGGGAAGAGGAG', 'ATGTCTTGAGATCGGGTCGT', 'ATTAAGATCCTCCATGATTC', 'ATCGTCGAAAGTAGTGGGAA']
my_split_list = [[char for char in element] for element in mylist]
print(mylist)
print(my_split_list)
OUTPUT
['TCTAGTCCAGATAATCTGGT', 'GTGTTGGTACTGTAATGAAA', 'AGTTCTCTGGATCCTTCGGA', 'GGAATTGACGTCCCCAGGAA', 'GTCGTTGTCGTTCAGGAGTT', 'GGAGTCCGTCAGAAGAGGTC', 'GATTCCGATCAGATGAAGAA', 'CTTTCTATCGGGAAGAGGAG', 'ATGTCTTGAGATCGGGTCGT', 'ATTAAGATCCTCCATGATTC', 'ATCGTCGAAAGTAGTGGGAA']
[['T', 'C', 'T', 'A', 'G', 'T', 'C', 'C', 'A', 'G', 'A', 'T', 'A', 'A', 'T', 'C', 'T', 'G', 'G', 'T'], ['G', 'T', 'G', 'T', 'T', 'G', 'G', 'T', 'A', 'C', 'T', 'G', 'T', 'A', 'A', 'T', 'G', 'A', 'A', 'A'], ['A', 'G', 'T', 'T', 'C', 'T', 'C', 'T', 'G', 'G', 'A', 'T', 'C', 'C', 'T', 'T', 'C', 'G', 'G', 'A'], ['G', 'G', 'A', 'A', 'T', 'T', 'G', 'A', 'C', 'G', 'T', 'C', 'C', 'C', 'C', 'A', 'G', 'G', 'A', 'A'], ['G', 'T', 'C', 'G', 'T', 'T', 'G', 'T', 'C', 'G', 'T', 'T', 'C', 'A', 'G', 'G', 'A', 'G', 'T', 'T'], ['G', 'G', 'A', 'G', 'T', 'C', 'C', 'G', 'T', 'C', 'A', 'G', 'A', 'A', 'G', 'A', 'G', 'G', 'T', 'C'], ['G', 'A', 'T', 'T', 'C', 'C', 'G', 'A', 'T', 'C', 'A', 'G', 'A', 'T', 'G', 'A', 'A', 'G', 'A', 'A'], ['C', 'T', 'T', 'T', 'C', 'T', 'A', 'T', 'C', 'G', 'G', 'G', 'A', 'A', 'G', 'A', 'G', 'G', 'A', 'G'], ['A', 'T', 'G', 'T', 'C', 'T', 'T', 'G', 'A', 'G', 'A', 'T', 'C', 'G', 'G', 'G', 'T', 'C', 'G', 'T'], ['A', 'T', 'T', 'A', 'A', 'G', 'A', 'T', 'C', 'C', 'T', 'C', 'C', 'A', 'T', 'G', 'A', 'T', 'T', 'C'], ['A', 'T', 'C', 'G', 'T', 'C', 'G', 'A', 'A', 'A', 'G', 'T', 'A', 'G', 'T', 'G', 'G', 'G', 'A', 'A']]

Related

Find "seasonality" in a categorical time series in python

I have the following sequence:
states_list = ['H', 'M', 'M', 'M', 'H', 'H', 'H', 'H', 'C', 'C', 'H', 'H', 'C', 'C', 'H', 'A', 'A', 'A', 'A', 'A', 'S', 'S', 'S', 'A', 'S', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'H', 'S', 'H', 'S', 'S', 'S', 'H', 'H', 'H', 'H', 'H', 'H', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'H', 'H', 'H', 'H', 'H', 'C', 'C', 'C', 'A', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'H', 'H', 'H', 'H', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C']
Is there a way to find "seasonality" on this time series ?
By "seasonality" I mean, if there is a specific a specific sub-sequence of letters popping up every "n" letters
The standard technique for seasonality detection is lagged auto correlation plot. That is, you shift your series by various time lags and check if the shifted series is correlated with the original (google acf and acf plot).
Now you have a categorical time series, so standard stuff won't work out of the box. I googled briefly, don't find anything ready made, but all the ingredients are there.
The main of which is the correlation for categorical variables, and that's Cramer's V. For example here https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.contingency.association.html.
Then you will need to write some code that for each k=1, 2, 3, ... shifts the series by k, computes the Cramer's V correlation between shifted and unshifted, and saves the result.
Afther that plot k vs. correlations and see if things stand out.

How do I sort by the first element of each line?

I am attempting to sort the first element of each line alphabetically but struggling to get this to work.
[['^', 'G', 'A', 'T', 'T', 'A', 'C', 'A', '!']]
[['G', 'A', 'T', 'T', 'A', 'C', 'A', '!', '^']]
[['A', 'T', 'T', 'A', 'C', 'A', '!', '^', 'G']]
[['T', 'T', 'A', 'C', 'A', '!', '^', 'G', 'A']]
[['T', 'A', 'C', 'A', '!', '^', 'G', 'A', 'T']]
[['A', 'C', 'A', '!', '^', 'G', 'A', 'T', 'T']]
[['C', 'A', '!', '^', 'G', 'A', 'T', 'T', 'A']]
[['A', '!', '^', 'G', 'A', 'T', 'T', 'A', 'C']]
[['!', '^', 'G', 'A', 'T', 'T', 'A', 'C', 'A']]
Tried the sort and sorted function aswell as pandas but can't seem to make it work
with open ('BWT_test.txt','r') as seq1:
sequence1 = seq1.read()
seq1.read()
list1 = list(sequence1)
list1.insert(0,'^')
list1.append('!')
for seq1 in range(len(list1)):
table1 = [list1[seq1:] + list1[:seq1]]
sorted(table1)
print(table1)
The code should organsie the list to this:
[['A', 'C', 'A', '!', '^', 'G', 'A', 'T', 'T']]
[['A', 'T', 'T', 'A', 'C', 'A', '!', '^', 'G']]
[['A', '!', '^', 'G', 'A', 'T', 'T', 'A', 'C']]
[['C', 'A', '!', '^', 'G', 'A', 'T', 'T', 'A']]
[['G', 'A', 'T', 'T', 'A', 'C', 'A', '!', '^']]
[['T', 'A', 'C', 'A', '!', '^', 'G', 'A', 'T']]
[['T', 'T', 'A', 'C', 'A', '!', '^', 'G', 'A']]
[['^', 'G', 'A', 'T', 'T', 'A', 'C', 'A', '!']]
[['!', '^', 'G', 'A', 'T', 'T', 'A', 'C', 'A']]
sorted(data, key=lambda x: x[0])
Or...
from operator import itemgetter
sort = sorted(data, key=itemgetter(0))
Change the loop to this:
rotations = []
for seq1 in range(len(list1)):
table1 = list1[seq1:] + list1[:seq1]
rotations.append(table1)
rotations = sorted(rotations, key=lambda x: (x[0] not in string.ascii_letters, x[0]))
print([x[-1] for x in rotations])

Python group a range of percentage with increment and count the number of group

I have 3 groups of lists which are A1, A2, A3 as in group A, B1, B2, B3 in group B,
and C1, C2, C3 in group C.
a1 = ["ID_A1", ['T', 'T', 'C', 'C', 'A', 'C', 'A', 'G', 'C', 'T', 'T', 'T', 'T', 'C', 'G', 'C', 'C', 'A', 'A', 'G', 'C', 'T', 'C']]
a2 = ["ID_A2", ['T', 'T', 'C', 'C', 'A', 'C', 'A', 'G', 'C', 'T', 'T', 'T', 'T', 'C', 'G', 'C', 'C', 'A', 'A', 'G', 'C', 'T', 'T']]
a3 = ["ID_A3", ['T', 'T', 'C', 'C', 'A', 'C', 'A', 'G', 'C', 'T', 'T', 'T', 'T', 'C', 'G', 'C', 'C', 'A', 'A', 'G', 'C', 'T', 'G']]
b1 = ["ID_B1", ['C', 'T', 'C', 'C', 'A', 'C', 'C', 'A', 'C', 'T', 'T', 'T', 'C', 'C', 'A', 'C', 'C', 'A', 'A', 'A', 'C', 'T', 'C']]
b2 = ["ID_B2", ['C', 'T', 'C', 'C', 'A', 'C', 'C', 'A', 'C', 'T', 'T', 'T', 'C', 'C', 'A', 'C', 'C', 'A', 'A', 'A', 'C', 'A', 'C']]
b3 = ["ID_B3", ['C', 'T', 'C', 'C', 'A', 'C', 'C', 'A', 'C', 'T', 'T', 'T', 'C', 'C', 'A', 'C', 'C', 'A', 'A', 'A', 'C', 'G', 'C']]
c1 = ["ID_C1", ['T', 'T', 'C', 'C', 'A', 'C', 'A', 'A', 'C', 'T', 'T', 'T', 'T', 'C', 'G', 'C', 'C', 'A', 'A', 'G', 'C', 'T', 'T']]
c2 = ["ID_C2", ['T', 'T', 'C', 'C', 'A', 'C', 'A', 'G', 'C', 'T', 'T', 'T', 'T', 'C', 'G', 'C', 'C', 'A', 'A', 'G', 'C', 'T', 'T']]
c3 = ["ID_C3", ['T', 'T', 'C', 'C', 'A', 'C', 'A', 'G', 'C', 'T', 'T', 'T', 'T', 'C', 'G', 'C', 'C', 'A', 'A', 'G', 'C', 'T', 'G']]
data_set = [a1, a2, a3, b1, b2, b3, c1, c2, c3]
I have already compared their similarities with the codes below:
def compare(_from, _to):
similarity = 0
length = len(_from)
if len(_from) != len(_to):
raise Exception("Cannot be compared due to different length.")
for i in range(length):
if _from[i] == _to[i]:
similarity += 1
return similarity / length * 100
result = list()
for entry1 in data_set:
for entry2 in data_set:
percentage = compare(entry1[1], entry2[1])
print("Compare ", entry1[0], " to ", entry2[0], "Percentage :", round(percentage, 2))
result.append(round(percentage, 2))
print(result)
Instead of sorting all the similarities into a group according to their own value of similarities, I want it to be grouped like in a range of 95% to 96% with an increment 0.1, depends on how user want to input the range. I want it to have 0.1 increment because i have really big data but I cant insert here. When I loop the group (A compares from ID_A1 to ID_C3), every 95% to 96% will group into group A, and the number of group = 1, and when I loop the group (B compares from ID_A1 to ID_C3), every 95% to 96% will group into group B, and the number of group will be +1. The result that I want is showing the total number of groups in the range of 95% to 96%.
I would like to add something which is when in the range of 95.0% to 96.0%, IF there are 95.5% and 95.6%, how to group them as individual group?
The example output would be like:
"In the range of 95% to 96%, there is 1 group of 95.5% and 1 group of 95.6%"
"Total number of groups: ... "
PS: I need to use the number of groups to plot a graph

Select n items by sequence of a list repeatedly in python

Say I have a long list:
>>> import string
>>> my_list = list(string.ascii_lowercase)
>>> my_list
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
I want to loop over this list and select n items by sequence repeatedly. E.g. if I want to select 5 items, then it should be like:
step 1: ['a', 'b', 'c', 'd', 'e']
step 2: ['f', 'g', 'h', 'i', 'j']
step 3: ['k', 'l', 'm', 'n', 'o']
step 4: ['p', 'q', 'r', 's', 't']
step 5: ['u', 'v', 'w', 'x', 'y']
step 6: ['z', 'a', 'b', 'c', 'd']
step 7: ['e', 'f', 'g', 'h', 'i']
......
So the point is: I want to make sure when I reach the last item of the list, the first items can be appended to the last one and the looping just keep going.
For appending the first items to the last items, I've tried something like this:
def loop_slicing(lst_, i):
""" Slice iterable repeatedly """
if i[0] > i[1]:
return [n for n in lst_[i[0]:]+lst_[:i[1]]]
else:
return lst_[i[0]:i[1]]
When I call this function, I can do this:
>>> loop_slicing(my_list, (0, 5))
['a', 'b', 'c', 'd', 'e']
>>> loop_slicing(my_list, (25, 4))
['z', 'a', 'b', 'c', 'd']
Where I can just make a generator which can generate 5 sequential numbers in range(0, 26) to loop over my_list and get 5 items each time.
I don't know if this is the best approach. So is there any more efficient way to do the stuff?
Using the itertools module you can cycle and slice a string via an infinite generator:
from itertools import cycle, islice
from string import ascii_lowercase
def gen(x, n):
c = cycle(x)
while True:
yield list(islice(c, n))
G = gen(ascii_lowercase, 5)
print(next(G)) # ['a', 'b', 'c', 'd', 'e']
print(next(G)) # ['f', 'g', 'h', 'i', 'j']
...
print(next(G)) # ['u', 'v', 'w', 'x', 'y']
print(next(G)) # ['z', 'a', 'b', 'c', 'd']
Debatably simpler solution using a list comprehension:
def batch_list(ns, batch_size):
return [ns[i:i+batch_size] for i in range(0, len(ns), batch_size)]
>>> batch_list('abcdefghijk', 3)
['abc', 'def', 'ghi', 'jk']
This is a simple construction that I find myself writing often when I want to batch some list of tasks to perform.
EDIT: Just realized the OP asked for the construction to cycle around to the beginning to complete the last batch if needed. This does not do that and will have the last batch truncated.
Thanks for asking,
I took some time to understand the objective of your algorithm but if you want to loop and save all of your sublists I think this should work :
def slicing_items(slc_len = 5, lst, iterate_num = 25):
# slc_len correspond to the number of slices, lst is the list of sequences
n = len(lst)
k = 1
p = k * slc_len
slicing_list = []
while k < iterate_num:
current_slice = []
if p >= n:
for i in range (1, p//n):
current_slice += lst #How many times we passed the length of the list
p = p % n #How many items remaining ?
current_slice += lst[-(slc_len-p):]
current_slice += lst[:p]
else:
current_slice = lst[p-slc_len:p]
k += 1
p += slc_len
slicing_list.append(current_slice)
return slicing_list
Output :
slicing_items(5,my_list,10)
>>> [['a', 'b', 'c', 'd', 'e'],
['f', 'g', 'h', 'i', 'j'],
['k', 'l', 'm', 'n', 'o'],
['p', 'q', 'r', 's', 't'],
['u', 'v', 'w', 'x', 'y'],
['z', 'a', 'b', 'c', 'd'],
['e', 'f', 'g', 'h', 'i'],
['j', 'k', 'l', 'm', 'n'],
['o', 'p', 'q', 'r', 's']]
However if you just want the last slice over your iterate_num then your function should fit perfectly (maybe you should use slicing over than rewriting the list in your first boolean statement for rapidity)
Using generator and slicing:
from string import ascii_lowercase
def gen(x, n):
start, stop = 0, n
while True:
if start < stop:
yield list(x[start:stop])
else:
yield ((list(x[start:])) + (list(x[:stop])))
start = stop
stop = (stop + n) % len(x)
G = gen(ascii_lowercase, 5)
print(next(G)) # ['a', 'b', 'c', 'd', 'e']
print(next(G)) # ['f', 'g', 'h', 'i', 'j']
print(next(G))
print(next(G))
print(next(G)) # ['u', 'v', 'w', 'x', 'y']
print(next(G)) # ['z', 'a', 'b', 'c', 'd']
print(next(G))
OUTPUT :
['a', 'b', 'c', 'd', 'e']
['f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o']
['p', 'q', 'r', 's', 't']
['u', 'v', 'w', 'x', 'y']
['z', 'a', 'b', 'c', 'd']
['e', 'f', 'g', 'h', 'i']
This is a really interesting problem , if you want to "just" solve the problem go for itertools cycle approach , It have already in-built function, But if you want to enjoy the joy of algorithms building, Go with your own solution and try something :
Here i tried with recursion approach, As you said it will keep going so you have to handle recursion by setting your max limit:
import math
import sys
sys.setrecursionlimit(500)
data=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
try:
def recursive_approch(m, x, n, hold=0, value=0, left=0):
print(hold)
max_time = len(m) / n
max_t = int(math.modf(max_time)[1])
left = len(m) % n
if value == max_t:
if len(x) == left:
x = x + m[:-len(x)]
value = 0
left = 0
else:
hold = x[:n]
value = value + 1
return recursive_approch(m, x[n:], n, hold=hold, value=value, left=left)
return recursive_approch(m, x, n, hold=hold, value=value, left=left)
print(recursive_approch(data, data, 6))
except RecursionError:
print('maximum recursion')
You have to pass the number for slice so if you want to slice 6-6 then:
print(recursive_approch(data, data, 6))
output:
['a', 'b', 'c', 'd', 'e', 'f']
['g', 'h', 'i', 'j', 'k', 'l']
['m', 'n', 'o', 'p', 'q', 'r']
['s', 't', 'u', 'v', 'w', 'x']
['s', 't', 'u', 'v', 'w', 'x']
['y', 'z', 'a', 'b', 'c', 'd']
['e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p']
['q', 'r', 's', 't', 'u', 'v']
['q', 'r', 's', 't', 'u', 'v']
['w', 'x', 'a', 'b', 'c', 'd']
['e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p']
['q', 'r', 's', 't', 'u', 'v']
['q', 'r', 's', 't', 'u', 'v']
['w', 'x', 'a', 'b', 'c', 'd']
['e', 'f', 'g', 'h', 'i', 'j']
...................
If you want 3-3 then:
['a', 'b', 'c']
['d', 'e', 'f']
['g', 'h', 'i']
['j', 'k', 'l']
['m', 'n', 'o']
['p', 'q', 'r']
['s', 't', 'u']
['v', 'w', 'x']
['v', 'w', 'x']
['y', 'z', 'a']
['b', 'c', 'd']
['e', 'f', 'g']
['h', 'i', 'j']
['k', 'l', 'm']
['n', 'o', 'p']
['q', 'r', 's']
['t', 'u', 'v']
['t', 'u', 'v']
['w', 'x', 'a']
['b', 'c', 'd']
['e', 'f', 'g']
['h', 'i', 'j']
['k', 'l', 'm']
['n', 'o', 'p']
['q', 'r', 's']
['t', 'u', 'v']
['t', 'u', 'v']
['w', 'x', 'a']
['b', 'c', 'd']
['e', 'f', 'g']
['h', 'i', 'j']
['k', 'l', 'm']
['n', 'o', 'p']
['q', 'r', 's']
['t', 'u', 'v']
['t', 'u', 'v']
['w', 'x', 'a']
['b', 'c', 'd']
['e', 'f', 'g']
['h', 'i', 'j']
['k', 'l', 'm']
['n', 'o', 'p']
['q', 'r', 's']
['t', 'u', 'v']
['t', 'u', 'v']
['w', 'x', 'a']
['b', 'c', 'd']
.......
if you pass 12 :
0
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
['m', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x']
['m', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x']
['y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['w', 'x', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['w', 'x', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['w', 'x', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['w', 'x', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
['k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v']
....

Generate a matrix from a given string

I need to create a matrix from a string s where m is the given number of rows and len(s)/m is the number of columns. First column must be filled with the first m chars in the string s (I.E.: 0*m+i chars for every i in range(m) ); the second column with the 1*m+i and so on.
What's the best way to do this in python?
EDIT:
this is the code I wrote by now.
def split_by_n( seq, n ):
"""A generator to divide a sequence into chunks of n units."""
while seq:
yield seq[:n]
seq = seq[n:]
#print list(split_by_n("1234567890",2))
input=list("ZPFKYLGJPNSGNMQGFGCITLVRIWMGFBLBFDSIOAJGBGAVFVHBGLFSRPNIOFSYOBTFCGRQLWWZAAJFUPGAFZSNXLTGARUVFKOLGAIWGUUCMVSEKLIAGJGGUZFBAOILVRIZPORNXWVFRGNMEGCEUNUZSPNIUAHFRQLWALHWEQGQKDFDCCKLUZWFSITKWIKLSMUQKNJUWRTKZAHJGABKDEGEMNCVIMBFRNYXSSKYPWLWHUKKISHFAJPOOFGJBJTBXXSGTRYAJGBNRMYHOGXQBLSFEWVUCHRLEJWAQBIWFRLWSSKRKSBFRAKDFJVRGZUOCJUZEKWAPIQSBRYM")
l = list(split_by_n(input,6))
for i in range(len(l[-2])-len(l[-1])):
l[-1].append('$')
print l
I learned from your comment that you want to make transpose of your matrix that is formed from a given string. Your code creates the matrix from a given string just fine. I have tweaked your code only slightly, and added code for making transpose.
def split_by_n( seq, n ):
while seq:
yield seq[:n]
seq = seq[n:]
def make_matrix(string):
col_count = 6
matrix = list(split_by_n(string,6))
row_count = len(matrix)
# the last row has "less_by" fewer elements than the rest of the rows
less_by = len(matrix[-2]) - len(matrix[-1])
matrix[-1] += '$' * less_by
return matrix
def make_transpose(matrix):
col_count = len(matrix[0])
transpose = []
for i in range(col_count):
transpose.append([row[i] for row in matrix])
return transpose
string = list("ZPFKYLGJPNSGNMQGFGCITLVRIWMGFBLBFDSIOAJGBGAVFVHBGLFSRPNIOFSYOBTFCGRQLWWZAAJFUPGAFZSNXLTGARUVFKOLGAIWGUUCMVSEKLIAGJGGUZFBAOILVRIZPORNXWVFRGNMEGCEUNUZSPNIUAHFRQLWALHWEQGQKDFDCCKLUZWFSITKWIKLSMUQKNJUWRTKZAHJGABKDEGEMNCVIMBFRNYXSSKYPWLWHUKKISHFAJPOOFGJBJTBXXSGTRYAJGBNRMYHOGXQBLSFEWVUCHRLEJWAQBIWFRLWSSKRKSBFRAKDFJVRGZUOCJUZEKWAPIQSBRYM")
matrix = make_matrix(string)
transpose = make_transpose(matrix)
for e in matrix:
print(e)
print('\nThe transpose:')
for e in transpose:
print(e)
Output:
['Z', 'P', 'F', 'K', 'Y', 'L']
['G', 'J', 'P', 'N', 'S', 'G']
['N', 'M', 'Q', 'G', 'F', 'G']
['C', 'I', 'T', 'L', 'V', 'R']
['I', 'W', 'M', 'G', 'F', 'B']
['L', 'B', 'F', 'D', 'S', 'I']
['O', 'A', 'J', 'G', 'B', 'G']
['A', 'V', 'F', 'V', 'H', 'B']
['G', 'L', 'F', 'S', 'R', 'P']
['N', 'I', 'O', 'F', 'S', 'Y']
['O', 'B', 'T', 'F', 'C', 'G']
['R', 'Q', 'L', 'W', 'W', 'Z']
['A', 'A', 'J', 'F', 'U', 'P']
['G', 'A', 'F', 'Z', 'S', 'N']
['X', 'L', 'T', 'G', 'A', 'R']
['U', 'V', 'F', 'K', 'O', 'L']
['G', 'A', 'I', 'W', 'G', 'U']
['U', 'C', 'M', 'V', 'S', 'E']
['K', 'L', 'I', 'A', 'G', 'J']
['G', 'G', 'U', 'Z', 'F', 'B']
['A', 'O', 'I', 'L', 'V', 'R']
['I', 'Z', 'P', 'O', 'R', 'N']
['X', 'W', 'V', 'F', 'R', 'G']
['N', 'M', 'E', 'G', 'C', 'E']
['U', 'N', 'U', 'Z', 'S', 'P']
['N', 'I', 'U', 'A', 'H', 'F']
['R', 'Q', 'L', 'W', 'A', 'L']
['H', 'W', 'E', 'Q', 'G', 'Q']
['K', 'D', 'F', 'D', 'C', 'C']
['K', 'L', 'U', 'Z', 'W', 'F']
['S', 'I', 'T', 'K', 'W', 'I']
['K', 'L', 'S', 'M', 'U', 'Q']
['K', 'N', 'J', 'U', 'W', 'R']
['T', 'K', 'Z', 'A', 'H', 'J']
['G', 'A', 'B', 'K', 'D', 'E']
['G', 'E', 'M', 'N', 'C', 'V']
['I', 'M', 'B', 'F', 'R', 'N']
['Y', 'X', 'S', 'S', 'K', 'Y']
['P', 'W', 'L', 'W', 'H', 'U']
['K', 'K', 'I', 'S', 'H', 'F']
['A', 'J', 'P', 'O', 'O', 'F']
['G', 'J', 'B', 'J', 'T', 'B']
['X', 'X', 'S', 'G', 'T', 'R']
['Y', 'A', 'J', 'G', 'B', 'N']
['R', 'M', 'Y', 'H', 'O', 'G']
['X', 'Q', 'B', 'L', 'S', 'F']
['E', 'W', 'V', 'U', 'C', 'H']
['R', 'L', 'E', 'J', 'W', 'A']
['Q', 'B', 'I', 'W', 'F', 'R']
['L', 'W', 'S', 'S', 'K', 'R']
['K', 'S', 'B', 'F', 'R', 'A']
['K', 'D', 'F', 'J', 'V', 'R']
['G', 'Z', 'U', 'O', 'C', 'J']
['U', 'Z', 'E', 'K', 'W', 'A']
['P', 'I', 'Q', 'S', 'B', 'R']
['Y', 'M', '$', '$', '$', '$']
The transpose:
['Z', 'G', 'N', 'C', 'I', 'L', 'O', 'A', 'G', 'N', 'O', 'R', 'A', 'G', 'X', 'U', 'G', 'U', 'K', 'G', 'A', 'I', 'X', 'N', 'U', 'N', 'R', 'H', 'K', 'K', 'S', 'K', 'K', 'T', 'G', 'G', 'I', 'Y', 'P', 'K', 'A', 'G', 'X', 'Y', 'R', 'X', 'E', 'R', 'Q', 'L', 'K', 'K', 'G', 'U', 'P', 'Y']
['P', 'J', 'M', 'I', 'W', 'B', 'A', 'V', 'L', 'I', 'B', 'Q', 'A', 'A', 'L', 'V', 'A', 'C', 'L', 'G', 'O', 'Z', 'W', 'M', 'N', 'I', 'Q', 'W', 'D', 'L', 'I', 'L', 'N', 'K', 'A', 'E', 'M', 'X', 'W', 'K', 'J', 'J', 'X', 'A', 'M', 'Q', 'W', 'L', 'B', 'W', 'S', 'D', 'Z', 'Z', 'I', 'M']
['F', 'P', 'Q', 'T', 'M', 'F', 'J', 'F', 'F', 'O', 'T', 'L', 'J', 'F', 'T', 'F', 'I', 'M', 'I', 'U', 'I', 'P', 'V', 'E', 'U', 'U', 'L', 'E', 'F', 'U', 'T', 'S', 'J', 'Z', 'B', 'M', 'B', 'S', 'L', 'I', 'P', 'B', 'S', 'J', 'Y', 'B', 'V', 'E', 'I', 'S', 'B', 'F', 'U', 'E', 'Q', '$']
['K', 'N', 'G', 'L', 'G', 'D', 'G', 'V', 'S', 'F', 'F', 'W', 'F', 'Z', 'G', 'K', 'W', 'V', 'A', 'Z', 'L', 'O', 'F', 'G', 'Z', 'A', 'W', 'Q', 'D', 'Z', 'K', 'M', 'U', 'A', 'K', 'N', 'F', 'S', 'W', 'S', 'O', 'J', 'G', 'G', 'H', 'L', 'U', 'J', 'W', 'S', 'F', 'J', 'O', 'K', 'S', '$']
['Y', 'S', 'F', 'V', 'F', 'S', 'B', 'H', 'R', 'S', 'C', 'W', 'U', 'S', 'A', 'O', 'G', 'S', 'G', 'F', 'V', 'R', 'R', 'C', 'S', 'H', 'A', 'G', 'C', 'W', 'W', 'U', 'W', 'H', 'D', 'C', 'R', 'K', 'H', 'H', 'O', 'T', 'T', 'B', 'O', 'S', 'C', 'W', 'F', 'K', 'R', 'V', 'C', 'W', 'B', '$']
['L', 'G', 'G', 'R', 'B', 'I', 'G', 'B', 'P', 'Y', 'G', 'Z', 'P', 'N', 'R', 'L', 'U', 'E', 'J', 'B', 'R', 'N', 'G', 'E', 'P', 'F', 'L', 'Q', 'C', 'F', 'I', 'Q', 'R', 'J', 'E', 'V', 'N', 'Y', 'U', 'F', 'F', 'B', 'R', 'N', 'G', 'F', 'H', 'A', 'R', 'R', 'A', 'R', 'J', 'A', 'R', '$']

Categories