looking for unique elements in chemical species - python

Determining unique elements: Write a function which, when given a list of species,
will return an alphabetized list of the unique elements contained in the set of species.
Make use of the parser from the previous step. Example: calling your function with
an input of ['CO', 'H2O', 'CO2', 'CH4'] should return an output of ['C', 'H',
'O']
This is part of a larger project that I am doing.
The problem I am having is how to look at the individual characters of each element. Once I have this I should be able to check if its unique or not. I know this is not right, its just a rough idea of something I am thinking.
def unique_elements(x):
if x in y
else
y.append(x)
return y

>>> def sanitize(compound):
return compound.translate(None,string.digits)
>>> def elementazie(compoud):
return re.findall("([A-Z][a-z]*)",compoud)
>>> sorted(set(chain(*(elementazie(sanitize(s)) for s in species))))
['Au', 'C', 'H', 'O']

Related

Getting Nth permutation of a sequence and getting back original using its index and modified sequence

I know the most popular permutation algorithms (thanks to wonderful question/answer on SO, and other related sites, such as Wikipedia, etc), but I recently wanted to see if I could get the Nth permutation without exhausting the whole permutation space.
Factorial comes to mind, so I ended up looking at posts such as this one that implements the unrank and rank algorithm, as well as many, many other ones. (Here as mentioned, I take into account other sites as "post")
I stumbled upon this ActiveState recipe which seems like it fit what I wanted to do, but it doesn't support doing the reverse (using the result of the function and reusing the index to get back the original sequence/order).
I also found a similar and related answer on SO: https://stackoverflow.com/a/38166666/12349101
But the same problem as above.
I tried and made different versions of the unrank/rank implementation(s) above, but they require that the sorted sequence be passed as well as the index given by the rank function. If a random (even within the range of the total permutation count) is given, it won't work (most of the time I tried at least).
I don't know how to implement this and I don't think I saw anyone on SO doing this yet. Is there any existing algorithm or way to do this/approach this?
To make things clearer:
Here is the Activestate recipe I posted above (at least the one posted in the comment):
from functools import reduce
def NPerms (seq):
"computes the factorial of the length of "
return reduce(lambda x, y: x * y, range (1, len (seq) + 1), 1)
def PermN (seq, index):
"Returns the th permutation of (in proper order)"
seqc = list (seq [:])
result = []
fact = NPerms (seq)
index %= fact
while seqc:
fact = fact // len (seqc)
choice, index = index // fact, index % fact
result += [seqc.pop (choice)]
return result
As mentioned, this handles doing part of what I mentioned in the title, but I don't know how to get back the original sequence/order using both the result of that function + the same index used.
Say I use the above on a string such as hello world inside a list:
print(PermN(list("hello world"), 20))
This output: ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'd', 'r', 'o', 'l']
Now to see if this can go back to the original using the same index + result of the above:
print(PermN(['h', 'e', 'l', 'l', 'o', ' ', 'w', 'd', 'r', 'o', 'l'], 20))
Output: ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'l', 'r', 'd', 'o']
I think this does what you want, and has the benefit that it doesn't matter what the algorithm behind PermN is:
def NmreP(seq,index):
# Copied from PermN
seqc = list (seq [:])
result = []
fact = NPerms (seq)
index %= fact
seq2 = list(range(len(seq))) # Make a list of seq indices
fwd = PermN(seq2,index) # Arrange them as PermN would
result = [0]*len(seqc) # Make an array to fill with results
for i,j in enumerate(fwd): # For each position, find the element in seqc in the position this started from
result[j] = seqc[i]
return result

How do i do the checksum for Singapore Car License Plate Recognition with Python

I am able to do recognition of car license plate and extract the plate values. Sometimes, the results are inaccurate as i am using OCR to do the recognition. I uses a checksum to ensure only the correct results are being printed and viewed. After the calculation for the checksum, I need to use another formula to get the last letter of the plate. match with these 19 letter, A=0, Z=1, Y=2, X=3, U=4, T=5, S=6, R=7, P=8, M=9, L=10, K=11, J=12, H=13, G=14, E=15, D=16, C=17, B=18. Is there any way i can use a loop to like declare the values of this letters instead of doing it one by one manually? Please help out. Thank you.
You can use a list and perform the lookups according to your needs.
The list looks like this:
plate_letter_list = ['A', 'Z', 'Y', 'X', 'U', 'T', 'S', 'R', 'P', 'M', 'L', 'K', 'J', 'H', 'G', 'E', 'D', 'C', 'B']
Case 1: Lookup value from letter
If you need to find the numeric value associated with a letter, use the index method:
letter = 'T'
print(plate_letter_list.index(letter))
>> 5
Case 2: Lookup letter from value
If you need to find the letter associated with a numeric value, use it as index:
value = 13
print(plate_letter_list[value])
>> H
There are two ways to search for these values. One of it is to use the List datatype like #sal provided. The other way is to use Dictionary datatype
**
Solution using List Datatype
**
pl_vals_list = ['A', 'Z', 'Y', 'X', 'U', 'T', 'S', 'R', 'P', 'M', 'L', 'K', 'J', 'H', 'G', 'E', 'D', 'C', 'B']
You can then do a lookup either by position or by value
To search by position, will provide you the value you assigned to the alphabet.
print(pl_vals_list[0]). This will result in A.
Alternatively, you can search by the alphabet itself. In this case, you have to use the index() function.
print(pl_vals_list.index('A')). This will result in the assigned number you gave to the alphabet. The result will be 0
This provides you a means to look-up based on alphabet or value.
You can also check if the alphabet is inside the list using:
if 'A' in pl_vals_list:
#then do something for value A
elif 'B' in pl_vals_list:
#then do something for value B
else:
#then do something
You can also iterate through the list using a for loop and enumerate. However, I don't know if that will be of value to you.
for i, v in enumerate(pl_vals_list):
#do something with each value in the list
#here i will store the index, and v will have the value - A, B, C, etc
You can get the value of each of these and determine what you want to do.
**
Solution using Dictionary Datatype
**
Similarly, you can do the same using a dictionary.
pl_vals_dict = {'A':0, 'Z':1, 'Y':2, 'X':3, 'U':4, 'T':5, 'S':6, 'R':7, 'P':8, 'M':9, 'L':10, 'K':11, 'J':12, 'H':13, 'G':14, 'E':15, 'D':16, 'C':17, 'B':18}
To look for alphabets within a dictionary, you can use
if 'A' in pl_vals_dict.keys():
#then do something for value A
elif 'A' in pl_vals_dict.keys():
#then do something for value B
else:
#do something else
An alternate way to check for something would be:
x = True if 'F' in pl_vals_dict.keys() else False
In this case x will have a value of False
You can also use the get() function to get the value.
x = pl_vals_dict.get('A') # OR
print (pl_vals_dict.get('A')
The simplest way to look up a dictionary value is:
print (pl_vals_dict['A'])
which will result in 0
However, you have to be careful if you try to print value of 'F', it will throw an error as 'F' is not part of the key value pair within the dictionary.
print (pl_vals_dict['F'])
this will give you the following error:
Traceback (most recent call last):
File "<pyshell#48>", line 1, in <module>
pl_vals_dict['F']
KeyError: 'F'
Similar to list, you can also iterate through the dictionary for keys and values. Not sure if you will need to use this but here's an example for you.
for k, v in pl_vals_dict.items():
#do something with each pair of key and value
#here k will have the keys A Z Y X ....
#and v will have the values 1, 2, 3, 4, ....

What is the pop order for set?

As set('ate')==set('aet') is True, why the result comes like below?
Input: list(set('ate'))
Output: ['e', 'a', 't']
Input: list(set('aet'))
Output: ['a', 't', 'e']
I want an explanations for how the output is produced. To me the element's order of output is random.
I have tried with
x = set('ate')
x.pop()
# 'e'
x.pop()
# 'a'
x.pop()
# 't'
Same problem, the order makes me confused.
Sets are unordered collections; lists are ordered. A set is equal to another set if it contains the same elements, regardless of order.
However, a list is an ordered collection. Lists are equal if and only if they contain the same elements in the same order.

Generator not closing over data as expected

Sorry if the title is poorly worded, I'm not sure how to phrase it. I have a function that basically iterates over the 2nd dimension of a 2 dimensional iterable. The following is a simple reproduction:
words = ['ACGT', 'TCGA']
def make_lists():
for i in range(len(words[0])):
iter_ = iter([word[i] for word in words])
yield iter_
lists = list(make_lists())
for list_ in lists:
print(list(list_))
Running this outputs:
['A', 'T']
['C', 'C']
['G', 'G']
['T', 'A']
I would like to yield generators instead of having to evaluate words, in case words is very long, so I tried the following:
words = ['ACGT', 'TCGA']
def make_generators():
for i in range(len(words[0])):
gen = (word[i] for word in words)
yield gen
generators = list(make_iterator())
for gen in generators:
print(list(gen))
However, running outputs:
['T', 'A']
['T', 'A']
['T', 'A']
['T', 'A']
I'm not sure exactly what's happening. I suspect it has something to do with the generator comprehension not closing over its scope when yielded, so they're all sharing. If I create the generators inside a separate function and yield the return from that function it seems to work.
i is a free variable for those generators now, and they are now going to use its last value, i.e 3. In simple words, they know from where they are supposed to fetch the value of i but are not aware of actual value of i when they were created. So, something like this:
def make_iterator():
for i in range(len(words[0])):
gen = (word[i] for word in words)
yield gen
i = 0 # Modified the value of i
will result in:
['A', 'T']
['A', 'T']
['A', 'T']
['A', 'T']
Generator expressions are implemented as function scope, on the other hand a list comprehension runs right away and can fetch the value of i during that iteration itself.(Well list comprehensions are implemented as function scope in Python 3, but the difference is that they are not lazy)
A fix will be to use a inner function that captures the actual value of i in each loop using a default argument value:
words = ['ACGT', 'TCGA']
def make_iterator():
for i in range(len(words[0])):
# default argument value is calculated at the time of
# function creation, hence for each generator it is going
# to be the value at the time of that particular iteration
def inner(i=i):
return (word[i] for word in words)
yield inner()
generators = list(make_iterator())
for gen in generators:
print(list(gen))
You may also want to read:
What do (lambda) function closures capture?
Python internals: Symbol tables, part 1

Python function - Global variable not defined

I joined a course to learn programming with Python. For a certain assignment we had to write the code which I have pasted below.
This part of the code consist of two functions, the first one being make_str_from_row and the second one being contains_word_in_row. As you might have noticed the second function reuses the first function. I already have passed the first function but I cannot pass the second one because when it has to reuse it gives an error about the first function, which is confusing because I did not get any errors for my first function. It says that global variable row_index is not defined.
By the way the second function has been given in a starter code so it cannot be wrong. I don't know what's wrong, especially because I have passed the code which presumable has to be wrong.
I tried asking the team for some feedback in case it might be some error in the grader but it has been a week and I have had no reply while the deadline is 2 days away. I am not asking for answers here I only would like to ask somebody for an explanation about the given error so I can figure out a solution myself. I would really appreciate the help.
def makestrfromrow(board, rowindex):
""" (list of list of str, int) -> str
Return the characters from the row of the board with index row_index
as a single string.
>>> make_str_from_row([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 0)
'ANTT'
"""
string = ''
for i in board[row_index]:
string = string + i
return string
def boardcontainswordinrow(board, word):
""" (list of list of str, str) -> bool
Return True if and only if one or more of the rows of the board contains
word.
Precondition: board has at least one row and one column, and word is a
valid word.
>>> board_contains_word_in_row([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 'SOB')
True
"""
for row_index in range(len(board)):
if word in make_str_from_row(board, row_index):
return True
return False
You named the argument rowindex but use the name row_index in the function body.
Fix one or the other.
Demo, fixing the name used in the body of the function to match the argument:
>>> def makestrfromrow(board, rowindex):
... string = ''
... for i in board[rowindex]:
... string = string + i
... return string
...
>>> makestrfromrow([['A', 'N', 'T', 'T'], ['X', 'S', 'O', 'B']], 0)
'ANTT'
Do note that both this function and boardcontainswordinrow are not consistent with the docstring; there they are named as make_str_from_row and board_contains_word_in_row. Your boardcontainswordinrow function uses make_str_from_row, not makestrfromrow, so you'll have to correct that as well; one direction or the other.

Categories