Finding unique elements in nested list of strings - python

Similar to the query posted at this URL:
https://stackoverflow.com/questions/54477996/finding-unique-elements-in-nested-list/,
I have another query.
If I have a list that I have imported from Pandas and I need to get a single list as an output with all the unique elements as 
[Ac, Ad, An, Bi, Co, Cr, Dr, Fa, Mu, My, Sc]
Once I have all the unique elements, I want to check the count of each of these elements within the whole list.
Can someone advise as to how can I accomplish that?
mylist = df.Abv.str.split().tolist()
mylist
[[‘Ac,Cr,Dr’],
[‘Ac,Ad,Sc'],
[‘Ac,Bi,Dr’],
[‘Ad,Dr,Sc'],
[‘An,Dr,Fa’],
[‘Bi,Co,Dr’],
[‘Dr,Mu’],
[‘Ac,Co,My’],
[‘Co,Dr’],
[‘Ac,Ad,Sc'],
[‘An,Ac,Ad’],
]
I have tried different things but can't seem to make it work.
Tried to convert it into a string and apply split function on the string, but to no avail.

You can do it this way in Python3
mylist = [['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad'],
]
uniquedict = {}
for sublist in mylist:
for item in sublist[0].split(','):
if item in uniquedict.keys():
uniquedict[item] += 1
else:
uniquedict[item] = 1
print(uniquedict)
print(list(uniquedict.keys()))
{'Ac': 6, 'Cr': 1, 'Dr': 7, 'Ad': 4, 'Sc': 3, 'Bi': 2, 'An': 2, 'Fa': 1, 'Co': 3, 'Mu': 1, 'My': 1}
['Ac', 'Cr', 'Dr', 'Ad', 'Sc', 'Bi', 'An', 'Fa', 'Co', 'Mu', 'My']

you can create a dictionary with keys as list value and value as their count
your code may look like this
.
mylists = [[‘Ac,Cr,Dr’],
[‘Ac,Ad,Sc'],
[‘Ac,Bi,Dr’],
[‘Ad,Dr,Sc'],
[‘An,Dr,Fa’],
[‘Bi,Co,Dr’],
[‘Dr,Mu’],
[‘Ac,Co,My’],
[‘Co,Dr’],
[‘Ac,Ad,Sc'],
[‘An,Ac,Ad’],
]
unique = {}
for mylist in mylists:
for elem in mylist:
unique[elem] = unique[elem]+1 if elem in unique else 1
unique.keys() will give unique element array and if you want the count of any value you can get this from dictionary e.g unique['Ad']

You can use collections.Counter to make a dictionary of the counts of the elements. This will also give you easy access to a list of all unique elements. It looks like you have a list of lists where each sublist contains a ingle string. You will need to split these before you add them to the counter.
from collections import Counter
count = Counter()
mylist = [['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad'],
]
for arr in mylist:
count.update(arr[0].split(','))
print(count) # dictionary of symbols: counts
print(list(count.keys())) # list of all unique elements

You can take advantage of the very powerful tools offered by collections, itertools and functools and get a one-line solution.
If your lists contain only one element:
from collections import Counter
from itertools import chain
from functools import partial
if __name__ == '__main__':
mylist = [
['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad'],
]
# if lists contain only one element
occurrence_count = Counter(chain(*map(lambda x: x[0].split(','), mylist)))
items = list(occurrence_count.keys()) # items, with no repetitions
all_items = list(occurrence_count.elements()) # all items
ac_occurrences = occurrence_count['Ac'] # occurrences of 'Ac'
print(f"Unique items: {items}")
print(f"All list elements: {all_items}")
print(f"Occurrences of 'Ac': {ac_occurrences}")
And this is what you get:
Unique items: ['Ac', 'Cr', 'Dr', 'Ad', 'Sc', 'Bi', 'An', 'Fa', 'Co', 'Mu', 'My']
All list elements: ['Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Cr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Ad', 'Ad', 'Ad', 'Ad', 'Sc', 'Sc', 'Sc', 'Bi', 'Bi', 'An', 'An', 'Fa', 'Co', 'Co', 'Co', 'Mu', 'My']
Occurrences of 'Ac': 6
Otherwise, if your lists have more than one element:
from collections import Counter
from itertools import chain
from functools import partial
if __name__ == '__main__':
mylist_complex = [
['Ac,Cr,Dr', 'Ac,Ad,Sc'],
['Ac,Ad,Sc', 'Ac,Bi,Dr'],
['Ac,Bi,Dr', 'Ad,Dr,Sc'],
['Ad,Dr,Sc', 'An,Dr,Fa'],
['An,Dr,Fa', 'Bi,Co,Dr'],
['Bi,Co,Dr', 'Dr,Mu'],
['Dr,Mu', 'Ac,Co,My'],
['Ac,Co,My', 'Co,Dr'],
['Co,Dr', 'Ac,Ad,Sc'],
['Ac,Ad,Sc', 'An,Ac,Ad'],
['An,Ac,Ad', 'Ac,Cr,Dr'],
]
# if lists contain more than one element
occurrence_count_complex = Counter(chain(*map(lambda x: chain(*map(partial(str.split, sep=','), x)), mylist_complex)))
items = list(occurrence_count_complex.keys()) # items, with no repetitions
all_items = list(occurrence_count_complex.elements()) # all items
ac_occurrences = occurrence_count_complex['Ac'] # occurrences of 'Ac'
print(f"Unique items: {items}")
print(f"All list elements: {all_items}")
print(f"Occurrences of 'Ac': {ac_occurrences}")
And this is what you get in this case:
Unique items: ['Ac', 'Cr', 'Dr', 'Ad', 'Sc', 'Bi', 'An', 'Fa', 'Co', 'Mu', 'My']
All list elements: ['Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Ac', 'Cr', 'Cr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Dr', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Ad', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Bi', 'Bi', 'Bi', 'Bi', 'An', 'An', 'An', 'An', 'Fa', 'Fa', 'Co', 'Co', 'Co', 'Co', 'Co', 'Co', 'Mu', 'Mu', 'My', 'My']
Occurrences of 'Ac': 12

Try below:
from itertools import chain
mylist = [['Ac,Cr,Dr'],
['Ac,Ad,Sc'],
['Ac,Bi,Dr'],
['Ad,Dr,Sc'],
['An,Dr,Fa'],
['Bi,Co,Dr'],
['Dr,Mu'],
['Ac,Co,My'],
['Co,Dr'],
['Ac,Ad,Sc'],
['An,Ac,Ad']
]
flat_list = list(chain.from_iterable(mylist))
unique_list = set(','.join(flat_list).split(','))

Related

How to create a list containing random pairs from an original list?

I have a list:
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']
I would like to take 2 random values from this list and put them in to a new list as pairs until the original list is empty.
For example:
new_list = [('ab', 'ef'), ('ij', 'yz') exc. ]
lst = []
How can I do this using a while and for loop?
I've tried using this method to generate a random pair from the list:
random_lst = random.randint(0,len(lst)-1)
However I'm not sure how to remove the values from the original lsit and then add them to the new list as pairs.
I'm sure there are lots of ways. Here's a simple one.
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']
result = []
random.shuffle(lst)
for i in range(0, len(lst), 2):
result.append((lst[i], lst[i+1]))
Try this
import random
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']
new_list = []
for i in range(len(lst)//2):
# Get a random index in the current list
idx = random.randint(0,len(lst)-1)
# Remove the respective element and store it
element1 = lst.pop(idx)
# Get another random index in the current list
idx = random.randint(0,len(lst)-1)
# Remove and store that element as well
element2 = lst.pop(idx)
# Create an entry (tuple) in the final list
new_list.append((element1, element2))
print(new_list)
The output for me is [('yz', 'ij'), ('wx', 'ab'), ('st', 'cd'), ('gh', 'uv'), ('qr', 'ef'), ('op', 'mn')]
The sample() function from the random module is ideal for this
from random import sample as SAMPLE
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']
output = []
while lst:
output.append(SAMPLE(lst, min(2, len(lst))))
for e in output[-1]:
lst.remove(e)
print(output)
Note:
min() function is essential for the case where the input list contains an odd number of elements
There are a lot of approaches.
With your method, you could try:
import random
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz']
random_lst = []
while lst:
random_lst.append((lst.pop(random.randint(0,len(lst)-1)), lst.pop(random.randint(0,len(lst)-1))))
print(random_lst)
A slight modification to be a bit faster would be:
import random
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz', 'xr']
random_lst = []
random.shuffle(lst)
while lst:
random_lst.append((lst.pop(), lst.pop()))
print(random_lst)
This example accounts for odd length of lists:
import random
lst = ['ab', 'cd','ef', 'gh', 'ij', 'mn', 'op', 'qr', 'st', 'uv', 'wx', 'yz', 'xr']
random_lst = []
random.shuffle(lst)
while lst:
if len(lst) == 1:
# The list is not even
random_lst.append(lst.pop())
break
random_lst.append((lst.pop(), lst.pop()))
print(random_lst)
Check out the results: https://colab.research.google.com/drive/19gwP5zPGPXXjGx1AR6VjNkXIIqa16oep#scrollTo=ySBobtc3rtsj
If you want to work with larger lists, I'll suggest checking out numpy.
You can try something using np.random.choice:
new_list = []
while lst:
tmp=np.random.choice(lst, 2).tolist()
for t in tmp:
lst.remove(t)
new_list.append(tmp)
print(new_list)
[['op', 'wx'], ['yz', 'cd'], ['ef', 'qr'], ['mn', 'ij'], ['uv', 'gh'], ['ab', 'st']]

Generating a pair of letter from a given sequence

I have a problem to be solved and I would appreciate if anyone can help. I want to generate all possible two-letters string from the given sequence. For example from string 'ACCG', I want to generate a list of [AA, CC, GG, AC,CA,AG,GA,CG,GC].
Does anyone have an idea how I can do that ?
An efficient solution can be coded using itertools module
CODE
import itertools
string = 'ACCG'
num = 2
combinations = list(itertools.product(string, repeat=num))
result = [*set([''.join(tup) for tup in combinations])]
print(result)
OUTPUT
['CG', 'GG', 'GC', 'GA', 'AG', 'AA', 'CC', 'AC', 'CA']
If you want a one-liner (using product from itertools) then try this:
from itertools import product
out = [''.join(p) for p in set(product('ACCG', repeat=2))]
print(out)
Output:
['AA', 'GG', 'CC', 'GA', 'AC', 'CG', 'GC', 'CA', 'AG']

Create DNA Sequences of length n

How can we use recursion to calculate all dna sequences of length n in a function.
For instance if the function is given 2, it returns ['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
etc...
functools.permutations will give all combinations of a given iterable, the second argument r is the length of the combinations returned
itertools.permutations('ACGT', length)
Here is one way:
def all_seq(n, curr, e, ways):
"""All possible sequences of size n given elements e.
ARGS
n: size of sequence
curr: a list used for constructing sequences
e: the list of possible elements (could have been a global list instead)
ways: the final list of sequences
"""
if len(curr) == n:
ways.append(''.join(curr))
return
for element in e:
all_seq(n, list(curr) + [element], e, ways)
perms = []
all_seq(2, [], ['A', 'C', 'T', 'G'], perms)
print(perms)
The ouput:
['AA', 'AC', 'AT', 'AG', 'CA', 'CC', 'CT', 'CG', 'TA', 'TC', 'TT', 'TG', 'GA', 'GC', 'GT', 'GG']
You actually want itertools.product('ACGT', repeat=n). Note that this will grow enormously fast (4^n elements of n length).
If your assignment is to do it recursively, consider how you would get all n+1-length options that start with a n-length prefix. The naive recursive option might be rather slow compared to itertools, if you need to use it in anger.

I find it hard implementing a function which replaces the 'chars' with the alphabetically corresponding 'periodic table' element

from random import choice, choices, randint
def periodic_table_word_char(word):
chars = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
periodic_table = ['Ac','Ag','Al','Am','Ar','As','At','Au','B','Ba','Be','Bh','Bi','Bk','Br','C','Ca','Cd','Ce','Cf','Cl','Cm','Cn','Co','Cr','Cs','Cu','Db','Ds','Dy','Er','Es','Eu','F','Fe','Fl','Fm','Fr','Ga','Gd','Ge','H ','He ','Hf','Hg','Ho','Hs','I','In','Ir','K','Kr','La','Li','Lr','Lu','Lv','Mc','Md','Mg','Mn','Mo','Mt','N','Na','Nb','Nd','Ne','Nh','Ni','No','Np','O','Og','Os','P','Pa','Pb','Pd','Pm','Po','Pr','Pt','Pu','Ra','Rb','Re','Rf','Rg','Rh','Rn','Ru','S','Sb','Sc','Se','Sg','Si','Sm','Sn ','Sr','Ta','Tb','Tc','Te','Th','Ti','Tl','Tm','Ts','U','V','W','Xe','Y','Yb','Zn','Zr']
#word_modified = word.replace(chars, periodic_table)
The above is a sort of idea I had, but I don't know how to implement it right so the character is replaced alphabetically by the element in the 'periodic_table'
print(word_modified)
if __name__ == '__main__':
w = input("Enter any word:")
periodic_table_word_char(w)
You need a one-to-many mapping, so use a dictionary that maps each character to all the elements that start with that character.
Then you can make a random choice among the values a character maps to.
Example:
import random
table = {'a': (1,2,3), 'b': (4,5), 'c': (6,7,8)}
for c in "aaabbbc":
print(random.choice(table[c]))
You could do something like this:
import random
random.seed(42)
periodic_table = ['Ac', 'Ag', 'Al', 'Am', 'Ar', 'As', 'At', 'Au', 'B', 'Ba', 'Be', 'Bh', 'Bi', 'Bk', 'Br', 'C',
'Ca', 'Cd', 'Ce', 'Cf', 'Cl', 'Cm', 'Cn', 'Co', 'Cr', 'Cs', 'Cu', 'Db', 'Ds', 'Dy', 'Er', 'Es',
'Eu', 'F', 'Fe', 'Fl', 'Fm', 'Fr', 'Ga', 'Gd', 'Ge', 'H ', 'He ', 'Hf', 'Hg', 'Ho', 'Hs', 'I',
'In', 'Ir', 'K', 'Kr', 'La', 'Li', 'Lr', 'Lu', 'Lv', 'Mc', 'Md', 'Mg', 'Mn', 'Mo', 'Mt', 'N',
'Na', 'Nb', 'Nd', 'Ne', 'Nh', 'Ni', 'No', 'Np', 'O', 'Og', 'Os', 'P', 'Pa', 'Pb', 'Pd', 'Pm',
'Po', 'Pr', 'Pt', 'Pu', 'Ra', 'Rb', 'Re', 'Rf', 'Rg', 'Rh', 'Rn', 'Ru', 'S', 'Sb', 'Sc', 'Se',
'Sg', 'Si', 'Sm', 'Sn ', 'Sr', 'Ta', 'Tb', 'Tc', 'Te', 'Th', 'Ti', 'Tl', 'Tm', 'Ts', 'U', 'V',
'W', 'Xe', 'Y', 'Yb', 'Zn', 'Zr']
to_element = {}
for element in periodic_table:
to_element.setdefault(element[0].lower(), []).append(element)
def periodic_table_word_char(word):
return ''.join(random.choice(to_element.get(c, [c])) for c in word)
result = periodic_table_word_char('agriculture')
print(result)
Output
AgGaRgICeULaTsURnEr
The idea is that to_element is a mapping from letters to elements, then you can use that mapping to choose the corresponding elements for each letter.
building on molbdnilo
import random
table = {'a': (1,2,3), 'b': (4,5), 'c': (6,7,8)}
ns = ''
for c in "aaabbbc":
ns+=str(random.choice(table[c]))
print(ns)
So, I think the best way to do this is to actually create a list of lists, where each sublist has elements that start with the same letter. As in:
periodic_table = [ ["Ac", "Ag", "Al", "Am", "Ar", "As", "At", "Au"],
["B", "Ba", "Be", "Bh", "Bi", "Bk", "Br"],
... ]
Since you have a mapping of letters to first letters of elements already, you know how to index into the outer list (alphabetical ordering). So, if you wanted all elements that started with "A", you could do
elementsStartingWithA = periodic_table[0]
Then you need to generate a random number between 0 and the length of your smaller list, and exchange the letter you're looking up your function's input with the corresponding random string value from the smaller list. So, if your input word was the article a and your randomly generated index into elementsStartingWithA was the integer 3, you'd return the string Am.

Python Slicing List

I have a 2d List
a= [['Turn', 'the', 'light', 'out', 'now'], ['Now', ',', 'I', 'll', 'take', 'you', 'by', 'the', 'hand'], ['Hand', 'you', 'anoth', 'drink'], ['Drink', 'it', 'if', 'you', 'can'], ['Can', 'you', 'spend', 'a', 'littl', 'time'], ['Time', 'is', 'slip', 'away'], ['Away', 'from', 'us', ',', 'stay'], ['Stay', 'with', 'me', 'I', 'can', 'make'], ['Make', 'you', 'glad', 'you', 'came']]
Is there an easy way to get a range of values from this list?
if I write:
print a[0][0]
I get 'Turn', but is there a way to slice multiple things out of the list and make a new list? If I have a value i=0 and n=2 I am looking to make a new list that starts at a=[i][n] and makes a new list with the list items on either side of that index.
Thanks for the help
Maybe you mean
print a[0][0:2]
or
print [d[0] for d in a]

Categories