I have this dictionary, where the keys represent atom types and the values represent the atomic masses:
mass = {'H': 1.007825, 'C': 12.01, 'O': 15.9994, 'N': 14.0067, 'S': 31.972071,
'P': 30.973762}
what I want to do is to create a function that given a molecule, for instance ('H2-N-C6-H4-C-O-2H'), iterates over the mass dictionary and calculates the atomic mass on the given molecule. The value of the mass must be multiplied by the number that comes right after the atom type: H2 = H.value * 2
I know that firstly I must isolate the keys of the given molecules, for this I could use string.split('-'). Then, I think I could use and if block to stablish a condition to accomplish if the key of the given molecule is in the dictionary. But later I'm lost about how I should proceed to find the mass for each key of the dictionary.
The expected result should be something like:
mass_counter('H2-N15-P3')
out[0] 39351.14
How could I do this?
EDIT:
This is what I've tried so far
# Atomic masses
mass = {'H': 1.007825, 'C': 12.01, 'O': 15.9994, 'N': 14.0067, 'S': 31.972071,
'P': 30.973762}
def calculate_atomic_mass(molecule):
"""
Calculate the atomic mass of a given molecule
"""
mass = 0.0
mol = molecule.split('-')
for key in mass:
if key in mol:
atom = key
return mass
print calculate_atomic_mass('H2-O')
print calculate_atomic_mass('H2-S-O4')
print calculate_atomic_mass('C2-H5-O-H')
print calculate_atomic_mass('H2-N-C6-H4-C-O-2H')
Given all components have the shape Aa123, It might be easier here to identify parts with a regex, for example:
import re
srch = re.compile(r'([A-Za-z]+)(\d*)')
mass = {'H': 1.007825, 'C': 12.01, 'O': 15.9994, 'N': 14.0067, 'S': 31.972071, 'P': 30.973762}
def calculate_atomic_mass(molecule):
return sum(mass[a[1]]*int(a[2] or '1') for a in srch.finditer(molecule))
Here our regular expression [wiki] thus captures a sequence of [A-Z-a-z]s, and a (possibly empty) sequence of digits (\d*), these are the first and second capture group respectively, and thus can be obtained for a match with a[1] and a[2].
this then yields:
>>> print(calculate_atomic_mass('H2-O'))
18.01505
>>> print(calculate_atomic_mass('H2-S-O4'))
97.985321
>>> print(calculate_atomic_mass('C2-H5-O-H'))
46.06635
>>> print(calculate_atomic_mass('H2-N-C6-H4-C-O-2H'))
121.130875
>>> print(calculate_atomic_mass('H2-N15-P3'))
305.037436
We thus take the sum of the mass[..] of the first capture group (the name of the atom) times the number at the end, and we use '1' in case no such number can be found.
Or we can first split the data, and then look for a atom part and a number part:
import re
srch = re.compile(r'^([A-Za-z]+)(\d*)$')
def calculate_atomic_mass(molecule):
"""
Calculate the atomic mass of a given molecule
"""
result = 0.0
mol = molecule.split('-')
if atm in mol:
c = srch.find(atm)
result += result[c[1]] * int(c[2] or '1')
return result
Here is an answer without regex:
import string
# Atomic masses
masses = {'H': 1.007825, 'C': 12.01, 'O': 15.9994, 'N': 14.0067, 'S': 31.972071,
'P': 30.973762}
def calculate_atomic_mass(molecule):
"""
Calculate the atomic mass of a given molecule
"""
mass = 0.0
for key in molecule.split('-'):
# check if any number is available
if not key[-1] in string.digits:
el, n = key, 1
# check length of element label (1 or 2)
elif key[1] in string.digits:
el, n = key[:1], int(key[1:])
else:
el, n = key[:2], int(key[2:])
mass += masses[el]*n
return mass
print calculate_atomic_mass('H2-O')
print calculate_atomic_mass('H2-S-O4')
print calculate_atomic_mass('C2-H5-O-H')
print calculate_atomic_mass('H2-N-C6-H4-C-O-H2')
Here's how I would do it. You don't really need to iterate over the dictionary. Instead you need to iterate over the atom(s) in the molecule and look things up (randomly) in the dictionary.
Here's an example of doing that which assumes that there'll never be more that 10 atoms of any kind making up the molecule and the each element's name is only one letter long.
# Atomic masses.
MASSES = {'H': 1.007825, 'C': 12.01, 'O': 15.9994, 'N': 14.0067, 'S': 31.972071,
'P': 30.973762}
def calculate_atomic_mass(molecule):
""" Calculate the atomic mass of a given molecule. """
mass = 0.0
for atom in molecule.split('-'):
if len(atom) == 1:
mass += MASSES[atom]
else:
atom, count = atom[0], atom[1]
mass += MASSES[atom] * int(count)
return mass
print calculate_atomic_mass('H2-O') # -> 18.01505
print calculate_atomic_mass('H2-S-O4') # -> 97.985321
print calculate_atomic_mass('C2-H5-O-H') # -> 46.06635
print calculate_atomic_mass('H2-N-C6-H4-C-O-H2') # -> 122.1387
Related
Implement the function most_popular_character(my_string), which gets the string argument my_string and returns its most frequent letter. In case of a tie, break it by returning the letter of smaller ASCII value.
Note that lowercase and uppercase letters are considered different (e.g., ‘A’ < ‘a’). You may assume my_string consists of English letters only, and is not empty.
Example 1: >>> most_popular_character("HelloWorld") >>> 'l'
Example 2: >>> most_popular_character("gggcccbb") >>> 'c'
Explanation: cee and gee appear three times each (and bee twice), but cee precedes gee lexicographically.
Hints (you may ignore these):
Build a dictionary mapping letters to their frequency;
Find the largest frequency;
Find the smallest letter having that frequency.
def most_popular_character(my_string):
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] = 1
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
sorted_chars = sorted(char_count) # sort the dictionary
char_count = char_count.keys() # place the dictionary in a list
max_per = 0
for i in range(len(sorted_chars) - 1):
if sorted_chars[i] >= sorted_chars[i+1]:
max_per = sorted_chars[i]
break
return max_per
my function returns 0 right now, and I think the problem is in the last for loop and if statement - but I can't figure out what the problem is..
If you have any suggestions on how to adjust the code it would be very appreciated!
Your dictionary didn't get off to a good start by you forgetting to add 1 to the character count, instead you are resetting to 1 each time.
Have a look here to get the gist of getting the maximum value from a dict: https://datagy.io/python-get-dictionary-key-with-max-value/
def most_popular_character(my_string):
# NOTE: you might want to convert the entire sting to upper or lower case, first, depending on the use
# e.g. my_string = my_string.lower()
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] += 1 # add 1 to it
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
# Never under estimate the power of print in debugging
print(char_count)
# max(char_count.values()) will give the highest value
# But there may be more than 1 item with the highest count, so get them all
max_keys = [key for key, value in char_count.items() if value == max(char_count.values())]
# Choose the lowest by sorting them and pick the first item
low_item = sorted(max_keys)[0]
return low_item, max(char_count.values())
print(most_popular_character("HelloWorld"))
print(most_popular_character("gggcccbb"))
print(most_popular_character("gggHHHAAAAaaaccccbb 12 3"))
Result:
{'H': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}
('l', 3)
{'g': 3, 'c': 3, 'b': 2}
('c', 3)
{'g': 3, 'H': 3, 'A': 4, 'a': 3, 'c': 4, 'b': 2, ' ': 2, '1': 1, '2': 1, '3': 1}
('A', 4)
So: l and 3, c and 3, A and 4
def most_popular_character(my_string):
history_l = [l for l in my_string] #each letter in string
char_dict = {} #creating dict
for item in history_l: #for each letter in string
char_dict[item] = history_l.count(item)
return [max(char_dict.values()),min(char_dict.values())]
I didn't understand the last part of minimum frequency, so I make this function return a maximum frequency and a minimum frequency as a list!
Use a Counter to count the characters, and use the max function to select the "biggest" character according to your two criteria.
>>> from collections import Counter
>>> def most_popular_character(my_string):
... chars = Counter(my_string)
... return max(chars, key=lambda c: (chars[c], -ord(c)))
...
>>> most_popular_character("HelloWorld")
'l'
>>> most_popular_character("gggcccbb")
'c'
Note that using max is more efficient than sorting the entire dictionary, because it only needs to iterate over the dictionary once and find the single largest item, as opposed to sorting every item relative to every other item.
I've written a very naive token string search matcher. It's a little too naive though, as with the following code, it would bring back every artists in the artists list, due to how 'a r i z o n a' is tokenised.
import collections
import re
def __tokenised_match(artist, search_artist):
matches = []
if len(re.split(r'[\\\s/-]', search_artist)) > 1:
a = [artist.sanitisedOne, search_artist]
bag_of_words = [ collections.Counter(re.findall(r'\w+', words)) for words in a]
sumbags = sum(bag_of_words, collections.Counter())
print(sumbags)
for key, value in sumbags.items():
if len(re.findall(r'\b({k})\b'.format(k=key), search_artist)) > 0 and value > 1:
matches.append(artist)
if len(matches):
return matches
artists = [
{ 'artist': 'A R I Z O N A', 'sanitisedOne': 'a r i z o n a'},
{ 'artist': 'Wutang Clan', 'sanitisedOne': 'wutang clan'}
]
search_artist = 'a r i z o n a'
for artist in artists:
print(__tokenised_match(artist, search_artist))
this'll create a sumbags like this:
Counter({'a': 4, 'r': 2, 'i': 2, 'z': 2, 'o': 2, 'n': 2})
Counter({'a': 2, 'wutang': 1, 'clan': 1, 'r': 1, 'i': 1, 'z': 1, 'o': 1, 'n': 1})
this is kind of edge casey, but i wonder how i can tighten up against this kind of edge case. it would be fine for 'wutang clang' to match, but when it's single letters like this... it's a little much and will bring back every artist due to a matching twice.
The basic problem is that you return success on only a single match. This will kill your accuracy for any artist with an easily matched token in the name. We could tune your algorithm for matching a certain percentage of words, or for doing a bag-of-letters, intersection-over-union ratio, but ...
I recommend that you use something a bit stronger, such as string similarity, which is easily found in Python code. Being already packaged, it's much easier to use than coding your own solution.
I am wondering how I can go about condensing these elif statements into a method of some sorts. I also don't know how to go about storing a chosen coordinate so that I can perform checks of surrounding coordinates. I know my code is nooby, but so am I, I learn better starting with the long way :)
Below is how I'm going about storing a coordinate inside a variable. (Not sure this is even the right way to do it yet...)
grab = board[x][y]
if(SjumpX == 'A1'):
grab = [0][0]
elif(SjumpX == 'A2'):
grab = [0][1]
elif(SjumpX == 'A3'):
grab = [0][2]
elif(SjumpX == 'A4'):
grab = [0][3]
elif(SjumpX == 'B1'):
grab = [1][0]
elif(SjumpX == 'B2'):
grab = [1][1]
elif(SjumpX == 'B3'):
grab = [1][2]
elif(SjumpX == 'B4'):
grab = [1][3]
elif(SjumpX == 'C1'):
grab = [2][0]
elif(SjumpX == 'C2'):
grab = [2][1]
elif(SjumpX == 'C3'):
grab = [2][2]
elif(SjumpX == 'C4'):
grab = [2][3]
SjumpX is the coordinate of the piece my player wants to grab, and DjumpX is the coordinate of the destination. My logic behind this is if the player enters a coordinate(ie A1 B2 C3...), I can then store that coordinate into the variable 'grab', then use that variable to test if the destination coordinate is empty, also if the coordinate between the two is the an opposing players piece.
Here is the board:
1 2 3 4
A - X O X
B X O - O
C O X O X
This where I'm checking that the "jumpable" destination coordinates are empty based on the current coordinates of my 'grab' variable. In this case 'A3' <==> grab = [0][2]
if((grab[x][y-2] == '-' or grab[x][y+2] == '-' or grab[x-2][y] == '-' or grab[x+2][y] == '-')and
(grab[x][y-1] == 'X' or grab[x][y+1] == 'X' or grab[x-1][y] == 'X' or grab[x+1][y] == 'X'):
My main Questions Are:
1- How do I condense my huge elif statement list?
2- What is the correct format/process to store a coordinate to perform checks on surrounding coordinate content?
3- How can I condense my if statement that checks to see if the destination coordinate is empty('-').
We can make a map
then using it we can initialize the grab
i.e,
field_map = {'A1':(0,0),'A2':(0,0)......}
if SjumpX in field_map.keys():
x,y = field_map[SjumpX]
grab = [x][y]
I think it helps
Assuming you would want to grab the board's position corresponding to SjumpX value, the following would be a simple code for the task.
grab = board[ord(SjumpX[0]) - 65][int(SjumpX[1]) - 1]
This would mean converting the first letter of SjumpX to its ASCII ordinate value (A, B, C, ...) and converting it to numbers (65, 66, 67, ...). Since the offset is 65, subtracting it from the ordinate should give you the numbers you need (0, 1, 2, ...)
On the other hand you could go for a direct method suggested by #khachik's comment.
grab = board[{'A':0, 'B':1, 'C':2}[SjumpX[0]]][int(SjumpX[1]) - 1]
This directly maps (A, B, C) to (0, 1, 2), although this statement would grow longer for larger boards (D, E, and so on).
I have two suggestions:
First: Keep an adjacency list for the or a matrix representation (this answer depends on your design, I personally like adjacency lists better)
# Adding only some of the values here
map = {'A1': ['A2','B1'], 'A2': ['A1','A3', 'B2'], 'B1': ['A1','B2','C1']}
val_map = {'A1': '-', 'B1': 'X'}
grab = SjumpX
# You can also get the values by iterating over the list from next statement
nearby_ele[grab] = map[grab]
Second: Store the mapping of row, col in a dict {'A1': (0,0), 'A2': (0,1)}. Dict is constant time lookup and you can directly get the co-ordinate making things fast. Use a matrix representation as
map = {'A1': (0,0), 'A2': (0,1), 'A3': (0,2), 'A4': (0,3),
'B1': (1,0), 'B2': (1,1), 'B3': (1,2), 'B4': (1,3),
'C1': (2,0), 'C2': (2,1), 'C3': (2,2), 'C4': (2,3),
}
val_map = [['-', 'X', 'O', 'X'], ['X', 'O', '-', 'O'],['O','X','O','X']]
grab = map[SjumpX]
nearby_ele[grab] = [(grab[0]-1,grab[1]), (grab[0]+1,grab[1]),
(grab[0],grab[1]-1), (grab[0],grab[1]+1)]
I am a beginner in python and I am trying to solve a coding problem, got this error. Don't understand why ? I went through a couple of Q/A's here but they don't seem to solve my problem. Essentially what I am trying to do is iterate over a string, through its characters and fill these characters in a dictionary. With characters being the keys and values being the number of times these characters appeared. So I'm trying the following:
def myfunc(mystring):
for i in mystring:
if charCounter[i]:
charCounter[i] += 1
charCounter[i] = 1
mystring = "hello! how are you ?"
myfunc(mystring)
and Im getting following error:
File "xyq.py", line 3, in myfunc
if CharCounter[i]:
KeyError: 'h'
Can someone please suggest, where am I going wrong ? And if possible how can I improve the code ?
Thanks
You need to check if i is in charCounter before you try to retrieve it:
if i in charCounter:
charCounter[i] += 1
else:
charCounter[i] = 1
Or alternatively:
if charCounter.get(i):
...
if charCounter[i]:
throws KeyError if the key does not exist. What you want to do isuse if i in charCounter: instead:
if i in char_counter:
char_counter[i] += 1
else:
char_counter[i] = 1
Alternatively you could use get which gets the value if it exists, or returns the second (optional) value if it didn't exist:
char_counter[i] = char_counter.get(i, 0) + 1
However this counting pattern is so popular that a whole class exists for it: collections.Counter:
from collections import Counter
def my_func(my_string):
return Counter(my_string)
Example:
>>> counts = my_func('hello! how are you ?')
>>> counts
Counter({' ': 4, 'o': 3, 'h': 2, 'l': 2, 'e': 2, '!': 1, 'r': 1, 'a': 1,
'?': 1, 'w': 1, 'u': 1, 'y': 1})
>>> counts[' ']
4
collections.Counter is a subclass of dictionary, so it would behave in the same way that an ordinary dictionary would do with item access and so forth.
I have this kind of file (part):
H DX=615 DY=425 DZ=22.15 -AB C=0 T=0 R=999 *MM /"def" BX=2.5 BY=452.5 BZ=25 ;M20150710.
XBO X=100 Y=50 Z=5 V=1000 R=0 x=0 y=0 D=10 N="P" F=1 ;Test F1/10P.
...
which I want to convert to a new programming system. What I want to do is first read the header (H) and put the DX, DY and DZ values in respectively named variables. I managed to do this, but when I came to process my XBO line (a drilling, from which I need X, Y, Z, V, R, x, y, D, N, F and ;, also in separate variables) my code started looking very ugly very fast.
So I started over, and came up with this:
f = open("input.xxl") # open input file
for line in f:
if Debug==1: print line
for char in line:
charbuffr=charbuffr+char
if "H" in charbuffr:
if Debug==1: print'HEADER found!'
charbuffr=""
if "XBO" in charbuffr:
if Debug==1: print'XBO found!'
charbuffr=""
This correctly identifies the separate commands H and XBO, but I'm kind of stuck now. I can use the same method to extract all the variables, from loops inside the H and XBO loops, but this does not seem like good coding...
Can anyone set me on the right foot please? I don't want a full solution, as I love coding (well my main job is coding for CNC machines, which seems easy now compared to Python), but would love to know which approach is best...
Instead of converting data types by hand, you could use ast. literal_eval. This helper function takes a list of the form ['a=2', 'b="abc"'] and converts into a dictionary {'a': 2, 'b': 'abc'}:
import ast
def dict_from_row(row):
"""Convert a list of strings in the form 'name=value' into a dict."""
res = []
for entry in row:
name, value = entry.split('=')
res.append('"{name}": {value}'.format(name=name, value=value))
dict_string = '{{{}}}'.format(', '.join(res))
return ast.literal_eval(dict_string)
Now parsing the file becomes a bit simpler:
for line in f:
row = line.split()
if not row:
continue
if row[0] == 'H':
header = dict_from_row(row[1:4])
elif line[0] == 'XBO':
xbo = dict_from_row(row[1:11])
Results:
>>> header
{'DX': 615, 'DY': 425, 'DZ': 22.15}
>>> xbo
{'D': 10, 'F': 1, 'R': 0, 'V': 1000, 'X': 100, 'Y': 50, 'Z': 5, 'x': 0, 'y': 0}
As an inspiration, you can do something like this:
for raw_line in f:
line = raw_line.split()
if not line:
continue
if line[0] == 'H':
header = {}
for entry in line[1:4]:
name, value = entry.split('=')
header[name] = float(value)
elif line[0] == 'XBO':
xbo = {}
for entry in line[1:11]:
name, value = entry.split('=')
try:
xbo[name] = int(value)
except ValueError:
xbo[name] = value[1:-1] # stripping of the ""
Now headercontains the extensions of your domain:
{'DX': 615.0, 'DY': 425.0, 'DZ': 22.15}
and xbo the other values:
{'D': 10,
'F': 1,
'N': 'P',
'R': 0,
'V': 1000,
'X': 100,
'Y': 50,
'Z': 5,
'x': 0,
'y': 0}
Access the individual values in the dictionaries:
>>> header['DX']
615.0