compare list of lists and create clusters - python

I have a list that has 10,000 lists of strings of different lengths. For this question, I will make it simple and give an example of only a list that has 10 lists as follows.
list = [['a','w','r', 't'], ['e','r', 't', 't', 'r', 'd', 's'], ['a','w','r', 't'], ['n', 'g', 'd', 'e', 's'], ['a', 'b', 'c'], ['t', 'f', 'h', 'd', 'p'], ['a', 'b', 'c'], ['a','w','r', 't'], ['s','c','d'], ['e','r', 't', 't', 'r', 'd', 's']]
what I want is to compare each list with all other lists and group the similar lists into one new list (called a cluster) and also group the list indices.
Expected output:
cluster_1_lists = [['a','w','r', 't'], ['a','w','r', 't'], ['a','w','r', 't']]
cluster_1_indices = [0,2,7]
cluster_2_lists = [['e','r', 't', 't', 'r', 'd', 's'],['e','r', 't', 't', 'r', 'd', 's']]
cluster_2_indices = [1,9]
cluster_3_lists = [['n', 'g', 'd', 'e', 's']]
cluster_3_indices = [3]
cluster_4_lists = [['a', 'b', 'c'], ['a', 'b', 'c']]
cluster_4_indices = [4,6]
cluster_5_lists = [['t', 'f', 'h', 'd', 'p']]
cluster_5_indices = [5]
cluster_6_lists = [['s','c','d']]
cluster_6_indices = [8]
Can you help me to implement this in python?

Ok so here, I'll basically be using a dictionary to make a cluster. Here's what I've done:
list= [['a','w','r', 't'], ['e','r', 't', 't', 'r', 'd', 's'], ['a','w','r', 't'], ['n', 'g', 'd', 'e', 's'], ['a', 'b', 'c'], ['t', 'f', 'h', 'd', 'p'], ['a', 'b', 'c'], ['a','w','r', 't'], ['s','c','d'], ['e','r', 't', 't', 'r', 'd', 's']]
cluster = {}
for i in list:
cluster[''.join(i)] = []
cluster[''.join(i)+'_indices'] = []
for j in range(len(list)-1):
for k in cluster:
if ''.join(list[j]) == k:
cluster[k].append(list[j])
cluster[k+'_indices'].append(j)
print(cluster)
The first for loop basically creates a key with the joint name of your list, because you cannot have a key as a list. Then, it stores it val as an empty list which will further be appended. In the second for loop, it iterates again through the list and inside it I have iterated through the keys in the cluster (dict). Then, it basically checks if the joint list is equal to the key name, if yes it appends the value. The output will look like this:
Output: {'awrt': [['a', 'w', 'r', 't'], ['a', 'w', 'r', 't'], ['a', 'w', 'r', 't']], 'awrt_indices': [0, 2, 7], 'erttrds': [['e', 'r', 't', 't', 'r', 'd', 's']], 'erttrds_indices': [1], 'ngdes': [['n', 'g', 'd', 'e', 's']], 'ngdes_indices': [3], 'abc': [['a', 'b', 'c'], ['a', 'b', 'c']], 'abc_indices': [4, 6], 'tfhdp': [['t', 'f', 'h', 'd', 'p']], 'tfhdp_indices': [5], 'scd': [['s', 'c', 'd']], 'scd_indices': [8]}
Note: Creating separate variables as you want will just make the code messy, python has a solution to it which is dictionaries and thus I've used it.

Here is the working answer:
for i in list:
cluster[''.join(i)] = []
xx = []
xx_idx=[]
for k in cluster:
yy = []
yy_ixd = []
for j in range(len(list)):
if k == ''.join(list[j]):
yy.append(list[j])
yy_ixd.append(j)
xx.append(yy)
xx_idx.append(yy_ixd)
print("output", xx)
print("indices: ", xx_idx)
Output:
output [[['a', 'w', 'r', 't'], ['a', 'w', 'r', 't'], ['a', 'w', 'r', 't']], [['e', 'r', 't', 't', 'r', 'd', 's'], ['e', 'r', 't', 't', 'r', 'd', 's']], [['n', 'g', 'd', 'e', 's']], [['a', 'b', 'c'], ['a', 'b', 'c']], [['t', 'f', 'h', 'd', 'p']], [['s', 'c', 'd']]]
indices: [[0, 2, 7], [1, 9], [3], [4, 6], [5], [8]]

Related

How do I record the history of what happened while using sort in python for one string, and apply that to other strings?

with open(sys.argv[1]) as f:
lst = list(f.readline().strip())
sortedLst = sorted(lst, key = lambda x: (x.lower(), x.swapcase()))
print(lst)
print(sortedLst)
The word I am using as an example is 'ThatCcer'.
My outputs are ['T', 'h', 'a', 't', 'C', 'c', 'e', 'r'] for lst and my outputs are ['a', 'c', 'C', 'e', 'h', 'r', 't', 'T'] for sortedLst.
This is exactly what I am going for - to sort a word in alphabetical order with lower case letters taking precedence over upper case.
What I am trying to achieve is to match other 8-letter inputs by sorting them in the exact way that I have sorted ThatCcher. How would I go about achieving this?
EDIT: I am being told the question is unclear - my apologies but it is a bit difficult to explain so I will try again.
By sorting ThatCcer to become acCehrtT, lst[0] ('T') took the position of sortedLst[7], lst[1] ('h') took the position of sortedLst[4], and so on...
This is the history I want to record and so that given any other string can copy the steps that 'ThatCcer' took, for example: s = ['h', 'o', 'w', 'e', 'v', 'e', 'r', 's'] I want s[0] to to take its' position in sortedS[7], just like ThatCcer did.
I hope this made it a little clearer!
IIUC, you want to achieve a behavior similar to that of numpy.argsort.
You can sort a range based on your criteria, and use it to reindex any string:
lst = ['T', 'h', 'a', 't', 'C', 'c', 'e', 'r']
idx = list(range(len(lst)))
sorted_idx = sorted(idx, key=lambda x: (lst[x].lower(), lst[x].swapcase()))
# [2, 5, 4, 6, 1, 7, 3, 0]
# now use the index to sort
[lst[i] for i in sorted_idx]
# ['a', 'c', 'C', 'e', 'h', 'r', 't', 'T']
# keep the same order on another string
lst2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
[lst2[i] for i in sorted_idx]
# ['c', 'f', 'e', 'g', 'b', 'h', 'd', 'a']
Another approach using zip:
lst = ['T', 'h', 'a', 't', 'C', 'c', 'e', 'r']
lst2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
list(zip(*sorted(zip(lst, lst2), key=lambda x: (x[0].lower(), x[0].swapcase()))))
# or as individual lists
# (lst_sorted, lst2_sorted) = list(zip(*sorted(zip(lst, lst2),
# key=lambda x: # (x[0].lower(), x[0].swapcase()))))
output:
[('a', 'c', 'C', 'e', 'h', 'r', 't', 'T'),
('c', 'f', 'e', 'g', 'b', 'h', 'd', 'a')]
Sort the enumerated string on the string characters, then separate the (sorted) indices and characters; use operator.itemgetter to create a callable that you can re-use.
import operator
def f(thing):
s_lst = sorted(enumerate(thing),key = lambda x: (x[1].lower(), x[1].swapcase()))
argsort = operator.itemgetter(*[x[0] for x in s_lst])
s_lst = [x[1] for x in s_lst]
return s_lst,argsort
>>> s_lst, argsort = f('ThatCcerCTaa')
>>> s_lst
['a', 'a', 'a', 'c', 'C', 'C', 'e', 'h', 'r', 't', 'T', 'T']
>>> argsort('ThatCcerCTaa')
('a', 'a', 'a', 'c', 'C', 'C', 'e', 'h', 'r', 't', 'T', 'T')
>>> argsort
operator.itemgetter(2, 10, 11, 5, 4, 8, 6, 1, 7, 3, 0, 9)
>>>

Split lists into chunk based of index of another list

I want to split a list into chunks using values of of another list as the range to split.
indices = [3, 5, 9, 13, 18]
my_list = ['a', 'b', 'c', ..., 'x', 'y', 'z']
So basically, split my_list from range:
my_list[:3], mylist[3:5], my_list[5:9], my_list[9:13], my_list[13:18], my_list[18:]
I have tried to indices into chunks of 2 but the result is not what i need.
[indices[i:i + 2] for i in range(0, len(indices), 2)]
My actual list length is 1000.
You could also do it using simple python.
Data
indices = [3, 5, 9, 13, 18]
my_list = list('abcdefghijklmnopqrstuvwxyz')
Solution
Use list comprehension.
[(my_list+[''])[slice(ix,iy)] for ix, iy in zip([0]+indices, indices+[-1])]
Output
[['a', 'b', 'c'],
['d', 'e'],
['f', 'g', 'h', 'i'],
['j', 'k', 'l', 'm'],
['n', 'o', 'p', 'q', 'r'],
['s', 't', 'u', 'v', 'w', 'x', 'y', 'z']]
Check if correct order of indices are extracted
dict(((ix,iy), (my_list+[''])[slice(ix,iy)]) for ix, iy in zip([0]+indices, indices+[-1]))
Output
{(0, 3): ['a', 'b', 'c'],
(3, 5): ['d', 'e'],
(5, 9): ['f', 'g', 'h', 'i'],
(9, 13): ['j', 'k', 'l', 'm'],
(13, 18): ['n', 'o', 'p', 'q', 'r'],
(18, -1): ['s', 't', 'u', 'v', 'w', 'x', 'y', 'z']}
Can use itertools.zip_longest
[my_list[a:b] for a,b in it.zip_longest([0]+indices, indices)]
[['a', 'b', 'c'],
['d', 'e'],
['f', 'g', 'h', 'i'],
['j', 'k', 'l', 'm'],
['n', 'o', 'p', 'q', 'r'],
['s', 't', 'u', 'v', 'x', 'y', 'z']]
A little bit of code golf for fun:
map(my_list.__getitem__, map(lambda s: slice(*s), it.zip_longest([0]+indices, indices)))
One way using itertools.tee and pairwise:
from itertools import tee
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
chunks = [my_list[i:j] for i, j in pairwise([0, *indices, len(my_list)])]
print(chunks)
Output:
[['a', 'b', 'c'],
['d', 'e'],
['f', 'g', 'h', 'i'],
['j', 'k', 'l', 'm'],
['n', 'o', 'p', 'q', 'r'],
['s', 't', 'u', 'v', 'w', 'x', 'y', 'z']]
If numpy is an option, use numpy.array_split, which is meant for this:
import numpy as np
np.array_split(my_list, indices)
Output:
[array(['a', 'b', 'c'], dtype='<U1'),
array(['d', 'e'], dtype='<U1'),
array(['f', 'g', 'h', 'i'], dtype='<U1'),
array(['j', 'k', 'l', 'm'], dtype='<U1'),
array(['n', 'o', 'p', 'q', 'r'], dtype='<U1'),
array(['s', 't', 'u', 'v', 'w', 'x', 'y', 'z'], dtype='<U1')]

Sort a list by a key and sort another list in the same way as the first?

For example there are three lists:
unsorted_key = ['q', 'w', 'e', 'r', 't', 'y', 'u', 'i', 'o', 'p']
sorted_key = ['e', 'i', 'o', 'p', 'q', 'r', 't', 'u', 'w', 'y']
ciphertext = [
['u', 't', 'x', 'e'],
['p', 'r', 'k', 'p'],
['v', 'n', 'x', 'a'],
['n', 'h', 'e', 'x'],
['x', 'h', 'm', 's'],
['l', 'x', 'c', 'x'],
['x', 'c', 'y', 'a'],
['t', 'u', 'o', 'x'],
['e', 'r', 'm', 'e'],
['y', 'y', 'e', 'x']
]
Is it possible to take the order of the sorted_key and sort it into the unsorted_key, and take the order of the ciphertext and sort it in an identical way?
When moving 'q' from sorted_key[4] to sorted_key[0], it should move ciphertext[4] to ciphertext[0].
All three lists will always be of equal length.
The sorted_key and unsorted_key will never have repeating elements.
The sorted_key will always be a sorted version of unsorted_key.
I've been thinking about it and the only way I can think of would be to use a helper function to dynamically generate and return a lambda function from the order of unsorted_key, and then use something like:
sorted_key, ciphertext = (list(i) for i in zip(*sorted(zip(sorted_key, ciphertext), key=generate(unsorted_key))))
But I really don't know how zip() or lambda functions work or how to make a custom sorting order into one, or if one can even be returned to be used in sorted(). I really can't seem to wrap my head around this problem, so any help would be greatly appreciated!
An efficient approach to solve this problem in linear time is to create a dict that maps keys to indices of sorted_key, and then create a mappping dict that maps indices of unsorted_key to indices of sorted_key based on the same keys, so that you can iterate an index through the range of length of ciphertext to generate a list in the mapped order:
order = dict(map(reversed, enumerate(sorted_key)))
mapping = {i: order[k] for i, k in enumerate(unsorted_key)}
print([ciphertext[mapping[i]] for i in range(len(ciphertext))])
This outputs:
[['x', 'h', 'm', 's'], ['e', 'r', 'm', 'e'], ['u', 't', 'x', 'e'], ['l', 'x', 'c', 'x'], ['x', 'c', 'y', 'a'], ['y', 'y', 'e', 'x'], ['t', 'u', 'o', 'x'], ['p', 'r', 'k', 'p'], ['v', 'n', 'x', 'a'], ['n', 'h', 'e', 'x']]
The builtin sorted with a custom key can do it for you:
sorted(ciphertext, key=lambda x: unsorted_key.index(sorted_key[ciphertext.index(x)]))
Output:
[['x', 'h', 'm', 's'],
['e', 'r', 'm', 'e'],
['u', 't', 'x', 'e'],
['l', 'x', 'c', 'x'],
['x', 'c', 'y', 'a'],
['y', 'y', 'e', 'x'],
['t', 'u', 'o', 'x'],
['p', 'r', 'k', 'p'],
['v', 'n', 'x', 'a'],
['n', 'h', 'e', 'x']]
The lambda basically boils down to:
Find the current index
Find the value of current index in sorted_key
Find the index of sorted_key value in unsorted_key
Sort it
The one thing that I'm not clear about is why do you need to "sort" sorted_key if the end result is identical to unsorted_key? Just sorted_key = unsorted_key[:] is simple enough if that's the case. But if you really need to sort sorted_key as well, you can do this (it would actually make the lambda simpler):
ciphertext, sorted_key = map(list, zip(*sorted(zip(ciphertext, sorted_key), key=lambda x: unsorted_key.index(x[1]))))
ciphertext
[['x', 'h', 'm', 's'],
['e', 'r', 'm', 'e'],
['u', 't', 'x', 'e'],
['l', 'x', 'c', 'x'],
['x', 'c', 'y', 'a'],
['y', 'y', 'e', 'x'],
['t', 'u', 'o', 'x'],
['p', 'r', 'k', 'p'],
['v', 'n', 'x', 'a'],
['n', 'h', 'e', 'x']]
sorted_key
['q', 'w', 'e', 'r', 't', 'y', 'u', 'i', 'o', 'p']
I'm not sure I get the point, but...
First determine the moves (can be the opposite, it0s not clear to me):
moves = [ [i, sorted_key.index(c)] for i, c in enumerate(unsorted_key) ]
#=> [[0, 4], [1, 8], [2, 0], [3, 5], [4, 6], [5, 9], [6, 7], [7, 1], [8, 2], [9, 3]]
Maybe swap elements in [i, sorted_key.index(c)].
Apply the moves to a receiver (res):
res = [ None for _ in range(len(ciphertext))]
for a, b in moves:
res[a] = ciphertext[b]
So the output should be:
for line in res:
print(line)
# ['x', 'h', 'm', 's']
# ['e', 'r', 'm', 'e']
# ['u', 't', 'x', 'e']
# ['l', 'x', 'c', 'x']
# ['x', 'c', 'y', 'a']
# ['y', 'y', 'e', 'x']
# ['t', 'u', 'o', 'x']
# ['p', 'r', 'k', 'p']
# ['v', 'n', 'x', 'a']
# ['n', 'h', 'e', 'x']
For testing execution time
import timeit, functools
def custom_sort(ciphertext, sorted_key, unsorted_key):
return [ ciphertext[b] for _, b in [ [i, sorted_key.index(c)] for i, c in enumerate(unsorted_key) ] ]
custom_sort = timeit.Timer(functools.partial(custom_sort, ciphertext, sorted_key, unsorted_key))
print(custom_sort.timeit(20000))
I'm not sure I'm understanding your question properly, but if you're attempting to sort the unsorted key, and ensure that the ciphertexts are sorted accordingly, this should do what you want:
pairs = zip(unsorted_key, ciphertext)
sorted_key = []
sorted_ciphertexts = []
for t in sorted(pairs):
sorted_key.append(t[0])
sorted_ciphertexts.append(t[1])
I'm sure there's probably a more elegant way to do it, but this will ensure that the key and ciphertexts are placed at the same index.

Check if a Duplicated list present out of multiple lists

I have the following Lists: I need to check if it has duplicates we assume (['f', 't'] = ['t', 'f']) (order of elements in the list does not matter) and hence this should return 'duplicate' as it has both the lists
['f', 't']
['f', 'r']
['t', 'f']
['f', 'u']
['b', 't']
['b', 'r']
['b', 'l']
['b', 'u']
['r', 't']
['r', 'u']
['l', 't']
['l', 'u']
and I did try to run an iteration to check if any duplicate lists but it fails as each element is compared to itself in the iteration one time. Any leads to the same will be appreciated
Try this:
duplicate_list = [['f', 't'],
['f', 'r'],
['t', 'f'],
['f', 'u'],
['b', 't'],
['b', 'r'],
['b', 'l'],
['b', 'u'],
['r', 't'],
['r', 'u'],
['l', 't'],
['l', 'u']]
seen = set()
for el in duplicate_list:
el = frozenset(el)
if el in seen:
print("Duplicate")
break
seen.add(el)

Python: Iterating through the columns in a list of list to find the palindromes

Here I have a word list as:
[['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
And I have to display all the palindromes in this list which are in rows as well as columns.
I have coded to find all the palindromes in the rows. But cannot implement a method to find the palindromes in the columns.
Here is my code so far:
result_1=""
if len(palindrome)==len_line_str:
for row in range(len(palindrome)):
for horizontal_line in range(len(palindrome[row])):
if ''.join(palindrome[row])==''.join(reversed(palindrome[row])):
result_1=''.join(palindrome[row])+" is a palindrome starting at ["+str(row)+"]["+str(row)+"] and is a row in the table"
print(result_1)
Which will display the output:
rotor is a palindrome starting at [0][0] and is a row in the table
Where "rotor" is a palindrome.
I need a method to get the palindromes in the columns which are:
"refer", "tenet", "radar"
Any help is much appreciated. Thanks in advance!
You can use zip to transpose your lists:
>>> t = [['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
[['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
>>> list(zip(*t))
[('r', 'e', 'f', 'e', 'r'), ('o', 'v', 'i', 'n', 'a'), ('t', 'e', 'n', 'e', 't'), ('o', 'i', 'e', 't', 'e'), ('r', 'a', 'd', 'a', 'r')]
Your columns are now rows, and you can apply the same method than before. If you just need the words, you can use list comprehensions:
>>> rows = [['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
>>> [''.join(row) for row in rows if row[::-1] == row ]
['rotor']
>>> [''.join(column) for column in zip(*rows) if column[::-1] == column ]
['refer', 'tenet', 'radar']
This will do the job:
palindrome=[['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
n=len(palindrome)
for col in range(len(palindrome[0])):
col_word=[palindrome[i][col] for i in range(n)]
if ''.join(col_word)==''.join(reversed(col_word)):
result=''.join(col_word)+" is a palindrome starting at ["+str(col)+"] and is a col in the table"
print(result)
This prints
refer is a palindrome starting at [0] and is a col in the table
tenet is a palindrome starting at [2] and is a col in the table
radar is a palindrome starting at [4] and is a col in the table
Basically, in order to access the words in the column, you can do
col_word=[palindrome[i][col] for i in range(n)]
This fixes the column and iterates over the rows. The rest of the code is structures similarly to yours.
​
I saw you did not want to use Zip (which I would recommend using):
Alternative answer:
list_ = [['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
You can get the palindromes (rows) by checking each list with the reversed list [::-1]:
[i==i[::-1] for i in list_]
# prints [True, False, False, False, False]
And get the palindromes (columns) by 1. create the column list (called list_2 below) with a list comprehension and 2. same principle as above:
list_2 = [[i[ind] for i in list_] for ind in range(len(list_))]
[i==i[::-1] for i in list_2]
# prints [True, False, True, False, True]
Update
If you want the answers directly you can do:
[i for i in list_ if i==i[::-1]]
# prints [['r', 'o', 't', 'o', 'r']]
# and list_2: [['r', 'e', 'f', 'e', 'r'],['t', 'e', 'n', 'e', 't'],['r', 'a', 'd', 'a', 'r']]
There are a lot of ways to do it. I will take as example your code because of your effort on it
Another alternative following your code, is creating the columns in another list and check wich of them are palindromes:
palindrome = [['r', 'o', 't', 'o', 'r'],
['e', 'v', 'e', 'i', 'a'],
['f', 'i', 'n', 'e', 'd'],
['e', 'n', 'e', 't', 'a'],
['r', 'a', 't', 'e', 'r']]
len_line_str = 5
result_1=""
def is_pal(string):
return string == reversed(string)
colums = []
if len(palindrome)==len_line_str:
for row in range(len(palindrome)):
vertical = []
if ''.join(palindrome[row])==''.join(reversed(palindrome[row])):
result_1+=''.join(palindrome[row])+" is a palindrome starting at ["+str(0)+"]["+str(row)+"] and is a row in the table. " + "\n"
for horizontal_line in range(len(palindrome[row])):
if(len_line_str-1 > horizontal_line):
vertical += [palindrome[horizontal_line][row]]
else:
vertical += [palindrome[horizontal_line][row]]
colums += [(vertical,row)]
for word in colums:
if ''.join(word[0])==''.join(reversed(word[0])):
result_1+=''.join(word[0])+" is a palindrome starting at ["+str(0)+"]["+str(word[1])+"] and is a column in the table" + "\n"
print(result_1)
This should work. First loop iterates through the list s and the second loop iterates through each list.
Assuming s is the name of the list- [['r', 'o', 't', 'o', 'r'], ['e', 'v', 'e', 'i', 'a'], ['f', 'i', 'n', 'e', 'd'], ['e', 'n', 'e', 't', 'a'], ['r', 'a', 't', 'e', 'r']]
for i in xrange(0,len(s),1):
str = ""
for j in s:
str = str + j[i]
print str
if str == str[::-1]:
print str," is a pallindrome - column", i
else:
print str," is not a pallindrome - column", i
There is no column wise traversal in Python. One hacky way you can follow is to perform transpose operation on your input matrix. Below is a simple way to implement transpose using list comprehensions.
def transpose(matrix):
if not matrix:
return []
return [[row[i] for row in matrix] for i in range(len(matrix[0]))]
Your same logic should work once modify your input using transpose.
Hope this helps!!

Categories