I've got a txt which looks like this:
JB651 ACDCCADBCADCDA
JB831 ACACCBBBBBDDAC
JC124 DACBDBBACBDCDC
JD329 BAACDBABCCDDAB
JD830 BDDCDBABBBAAAD
JI428 DCBCCBBBBCCCCA
And I have to cut it to a matrix (I would say first I have to count the lines for a for loop) looks like this:
[[JI428] [D,C,B,C,C,B,B,B,B,C,C,C,C,A]]
And then, how can I refer to any line or letter of the 2nd part? (I'm a totally beginer)
You can open the file and read each line, splitting at the space present. To be able to access any line and thus any letter, it may be best to use a dictionary:
with open('filename.txt') as f:
f = [i.strip('\n').split() for i in f]
final_data = {a:list(b) for a, b in f}
Output:
{'JD830': ['B', 'D', 'D', 'C', 'D', 'B', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'D'], 'JC124': ['D', 'A', 'C', 'B', 'D', 'B', 'B', 'A', 'C', 'B', 'D', 'C', 'D', 'C'], 'JB651': ['A', 'C', 'D', 'C', 'C', 'A', 'D', 'B', 'C', 'A', 'D', 'C', 'D', 'A'], 'JB831': ['A', 'C', 'A', 'C', 'C', 'B', 'B', 'B', 'B', 'B', 'D', 'D', 'A', 'C'], 'JD329': ['B', 'A', 'A', 'C', 'D', 'B', 'A', 'B', 'C', 'C', 'D', 'D', 'A', 'B'], 'JI428': ['D', 'C', 'B', 'C', 'C', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'A']}
You can access the rows by passing a key from the first column:
print(final_data['JD830'])
Output:
['B', 'D', 'D', 'C', 'D', 'B', 'A', 'B', 'B', 'B', 'A', 'A', 'A', 'D']
Related
I have following sequence of data:
['A',
'A',
'A',
'A',
'A',
'A',
'A',
'D',
'D',
'D',
'D',
'D',
'D',
'A',
'A',
'A',
'A',
'A',
'D',
'D',
'D',
'D',
'D',
'D']
How would I be able to create subsequence (list of list) as an example:
[['A',
'A',
'A',
'A',
'A',
'A',
'A'],
['D',
'D',
'D',
'D',
'D',
'D'],
['A',
'A',
'A',
'A',
'A'], ['D',
'D',
'D',
'D', 'D', 'D']]
That is I want to create a sublist which accumulates the first encountered value (eg 'A' or 'D' and append that to sublist and continue until it arrives at a different alphabet. The second list contains the sequence of the letters that were different than the first sequence and appends as sublist. The process continues until the last element of the main list.
itertools.groupby is a good tool for this kind of tasks:
from itertools import groupby
lst = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D']
output = [list(g) for _, g in groupby(lst)]
print(output)
# [['A', 'A', 'A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D'], ['A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D']]
Solution:
import itertools
lis = ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D', 'A', 'A', 'A', 'A', 'A', 'D', 'D', 'D', 'D', 'D', 'D']
print([list(x[1]) for x in itertools.groupby(lis)])
Output:
[['A', 'A', 'A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D'], ['A', 'A', 'A', 'A', 'A'], ['D', 'D', 'D', 'D', 'D', 'D']]
For example, I have a list:
[['aabbbb'], ['bbbbab'], ['babbab'], ['baaaaa'], ['bbbaaa'], ['bbbbaa']]
how do I split it so that I get [['a', 'a', 'b', 'b', 'b', 'b'],... etc? It would be very useful thanks!
You can use list comprehension.
mylist = [['aabbbb'], ['bbbbab'], ['babbab'], ['baaaaa'], ['bbbaaa'], ['bbbbaa']]
new_list = [list(item[0]) for item in mylist]
This will return,
[['a', 'a', 'b', 'b', 'b', 'b'], ['b', 'b', 'b', 'b', 'a', 'b'], ['b', 'a', 'b', 'b', 'a', 'b'], ['b', 'a', 'a', 'a', 'a', 'a'], ['b', 'b', 'b', 'a', 'a', 'a'], ['b', 'b', 'b', 'b', 'a', 'a']]
Try the code below
oldList = [['aabbbb'], ['bbbbab'], ['babbab'], ['baaaaa'], ['bbbaaa'], ['bbbbaa']]
newList = []
for i in oldList:
newList.append(list(i[0]))
Is there a way to measure and plot the dissimilarity between cluster centroids and rows in a matrix - maybe without clustering both? The centroids are from previous cluster analysis. I have five cluster centroids and want to check if the rows in the matrix can be assigned to a cluster. Both are categorical data.
In [1]: matrixDf = array([['a', 'a', 'c', 'b', 'b'],
['a', 'c', 'c', 'b', 'b'],
['a', 'c', 'e', 'c', 'b'],
['a', 'c', 'e', 'b', 'b'],
...
['a', 'a', 'c', 'c', 'b']])
In [10]: clusterCentroidsDf = pd.DataFrame(km_cao.cluster_centroids_)
Out [3]: clusterCentroidsDf = array([['c', 'a', 'a', 'c', 'b'],
['a', 'c', 'a', 'c', 'a'],
['b', 'f', 'b', 'd', 'c'],
['d', 'c', 'f', 'd', 'd'],
['d', 'e', 'd', 'c', 'b']])
I have a list of lists containing unique strings and I want to produce an arbitrary number of different ways of sorting it. The list might look like the following:
list = [[a], [b,c], [d], [e,f,g]]
The order of the lists need to be the same but I want to shuffle the ordering within a list and then have them in a single list, e.g
list1 = [a,b,c,d,e,f,g]
list2 = [a,c,b,d,f,e,g]
...
...
listN = [a,c,b,d,f,g,e]
What is a good pythonic way of achieving this? I'm on python 2.7.
from itertools import permutations, product
L = [['a'], ['b','c'], ['d'], ['e', 'f', 'g']]
for l in product(*map(lambda l: permutations(l), L)):
print([item for s in l for item in s])
produces:
['a', 'b', 'c', 'd', 'e', 'f', 'g']
['a', 'b', 'c', 'd', 'e', 'g', 'f']
['a', 'b', 'c', 'd', 'f', 'e', 'g']
['a', 'b', 'c', 'd', 'f', 'g', 'e']
['a', 'b', 'c', 'd', 'g', 'e', 'f']
['a', 'b', 'c', 'd', 'g', 'f', 'e']
['a', 'c', 'b', 'd', 'e', 'f', 'g']
['a', 'c', 'b', 'd', 'e', 'g', 'f']
['a', 'c', 'b', 'd', 'f', 'e', 'g']
['a', 'c', 'b', 'd', 'f', 'g', 'e']
['a', 'c', 'b', 'd', 'g', 'e', 'f']
['a', 'c', 'b', 'd', 'g', 'f', 'e']
You can do this by taking the Cartesian product of the permutations of the sub-lists, and then flattening the resulting nested tuples.
from itertools import permutations, product, chain
lst = [['a'], ['b', 'c'], ['d'], ['e', 'f', 'g']]
for t in product(*[permutations(u) for u in lst]):
print([*chain.from_iterable(t)])
output
['a', 'b', 'c', 'd', 'e', 'f', 'g']
['a', 'b', 'c', 'd', 'e', 'g', 'f']
['a', 'b', 'c', 'd', 'f', 'e', 'g']
['a', 'b', 'c', 'd', 'f', 'g', 'e']
['a', 'b', 'c', 'd', 'g', 'e', 'f']
['a', 'b', 'c', 'd', 'g', 'f', 'e']
['a', 'c', 'b', 'd', 'e', 'f', 'g']
['a', 'c', 'b', 'd', 'e', 'g', 'f']
['a', 'c', 'b', 'd', 'f', 'e', 'g']
['a', 'c', 'b', 'd', 'f', 'g', 'e']
['a', 'c', 'b', 'd', 'g', 'e', 'f']
['a', 'c', 'b', 'd', 'g', 'f', 'e']
If you need to do this in Python 2, you can replace the print line with this:
print list(chain.from_iterable(t))
Here's a more compact version, inspired by ewcz's answer:
for t in product(*map(permutations, lst)):
print list(chain.from_iterable(t))
This might not be the most elegant solution, but I think it does what you want
from itertools import permutations
import numpy as np
def fac(n):
if n<=1:
return 1
else:
return n * fac(n-1)
lists = [['a'], ['b','c'], ['d'], ['e','f','g']]
combined = [[]]
for perm in [permutations(l,r=len(l)) for l in lists]:
expanded = []
for e in list(perm):
expanded += [list(l) + list(e) for l in combined]
combined = expanded
## check length
print np.prod(map(fac,map(len,lists))), len(combined)
print '\n'.join(map(str,combined))
You can flatten the list then simply generate its permutations:
from itertools import chain, permutations
li = [['a'], ['b','c'], ['d'], ['e','f','g']]
flattened = list(chain.from_iterable(li))
for perm in permutations(flattened, r=len(flattened)):
print(perm)
>> ('a', 'b', 'c', 'd', 'e', 'f', 'g')
('a', 'b', 'c', 'd', 'e', 'g', 'f')
('a', 'b', 'c', 'd', 'f', 'e', 'g')
('a', 'b', 'c', 'd', 'f', 'g', 'e')
('a', 'b', 'c', 'd', 'g', 'e', 'f')
('a', 'b', 'c', 'd', 'g', 'f', 'e')
('a', 'b', 'c', 'e', 'd', 'f', 'g')
('a', 'b', 'c', 'e', 'd', 'g', 'f')
('a', 'b', 'c', 'e', 'f', 'd', 'g')
...
...
...
from itertools import chain, permutations
your_list = [[a], [b,c], [d], [e,f,g]]
flattened = chain.from_iterable(your_list)
perms = permutations(flattened)
for perm in perms:
print perm
References:
permutations in Python 2
chain in Python 2
I have a matrix:
matrix = [['F', 'B', 'F', 'A', 'C', 'F'],
['D', 'E', 'B', 'E', 'B', 'E'],
['F', 'A', 'D', 'B', 'F', 'B'],
['B', 'E', 'F', 'B', 'D', 'D']]
I want to remove and collect the first two elements of each sub-list, and add them to a new list.
so far i have got:
while messagecypher:
for vector in messagecypher:
final.extend(vector[:2])
the problem is; the slice doesn't seem to remove the elements, and I end up with a huge list of repeated chars. I could use .pop(0) twice, but that isn't very clean.
NOTE: the reason i remove the elements is becuase i need to keep going over each vector until the matrix is empty
You can keep your slice and do:
final = []
for i in range(len(matrix)):
matrix[i], final = matrix[i][:2], final + matrix[i][2:]
Note that this simultaneously assigns the sliced list back to matrix and adds the sliced-off part to final.
Well you can use a list comprehension to get the thing done, but its perhaps counter-intuitive:
>>> matrix = [['F', 'B', 'F', 'A', 'C', 'F'],
['D', 'E', 'B', 'E', 'B', 'E'],
['F', 'A', 'D', 'B', 'F', 'B'],
['B', 'E', 'F', 'B', 'D', 'D']]
>>> while [] not in matrix: print([i for var in matrix for i in [var.pop(0), var.pop(0)]])
['F', 'B', 'D', 'E', 'F', 'A', 'B', 'E']
['F', 'A', 'B', 'E', 'D', 'B', 'F', 'B']
['C', 'F', 'B', 'E', 'F', 'B', 'D', 'D']
EDIT:
Using range makes the syntax look cleaner:
>>> matrix = [['C', 'B', 'B', 'D', 'F', 'B'], ['D', 'B', 'B', 'A', 'B', 'A'], ['B', 'D', 'E', 'F', 'C', 'B'], ['B', 'A', 'C', 'B', 'E', 'F']]
>>> while [] not in matrix: print([var.pop(0) for var in matrix for i in range(2)])
['C', 'B', 'D', 'B', 'B', 'D', 'B', 'A']
['B', 'D', 'B', 'A', 'E', 'F', 'C', 'B']
['F', 'B', 'B', 'A', 'C', 'B', 'E', 'F']
Deleting elements is not an efficient way to go about your task. It requires Python to perform a lot of unnecessary work shifting things around to fill the holes left by the deleted elements. Instead, just shift your slice over by two places each time through the loop:
final = []
for i in xrange(0, len(messagecypher[0]), 2):
for vector in messagecypher:
final.extend(vector[i:i+2])