How to split up certain characters but not others? - python

I want to take an input of a string of elements and make one list with the atoms and the amount of that atom.
["H3", "He4"]
That sections works, however I also need to make a list of only the elements. It would look something like
["H", "He"]
However when I try and split it into individual atoms it comes out like.
["H", "H", "He"]
Here is my current code for the function:
def molar_mass():
nums = "0123456789"
print("Please use the format H3 He4")
elements = input("Please leaves spaces between elements and their multipliers: ")
element_list = elements.split()
print(element_list)
elements_only_list = []
for element_pair in element_list:
for char in element_pair:
if char not in nums:
elements_only_list.append(char)
test = element_pair.split()
print(test)
print(elements_only_list)
I'm aware that there is a library for something similar, however I don't wish to use it.

Your problem here is that you are appending each non-numeric character to elements_only_list, as a new element of that list. You want instead to get the portion of element_pair that contains non-numeric characters, and append that string to the list. A simple way to do this is to use the rstrip method to remove the numeric characters from the end of the string.
for element_pair in element_list:
element_only = element_pair.rstrip(nums)
elements_only_list.append(element_only)
It could also be done using regular expressions, but that's more complicated than you need right now.
FYI, you don't really need your nums variable. The string module contains constants for various standard groups of characters. In this case you could import string.digits.

To my understanding, you will have user input such as H3 He4 and expect the output to be ['H','He'], accordingly i modified your function:
def molar_mass():
print("Please use the format H3 He4")
elements = input("Please leaves spaces between elements and their multipliers: ")
element_list = elements.split() # splits text to a list
print(element_list)
results = []
for elem in element_list: # loops over elements list
#seperate digits from characters in a list and replace digits with ''
el1 = list(map(lambda x: x if not x.isdigit() else '' , elem))
el2 = ''.join(el1)
results.append(el2)
return results
molar_mass()
using this function, with an input as below:
H3 He4
output will be:
['H','He']

Related

Is there a way to detect words without searching for whitespace or underscores

I am trying to write a CLI for generating python classes. Part of this requires validating the identifiers provided in user input, and for python this requires making sure that identifiers conform to the pep8 best practices/standards for identifiers- classes with CapsCases, fields with all_lowercase_with_underscores, packages and modules with so on so fourth-
# it is easy to correct when there is a identifier
# with underscores or whitespace and correcting for a class
def package_correct_convention(item):
return item.strip().lower().replace(" ","").replace("_","")
But when there is no whitespaces or underscores between tokens, I'm not sure how to how to correctly capitalize the first letter of each word in an identifier. Is it possible to implement something like that without using AI or something like that:
say for example:
# providing "ClassA" returns "classa" because there is no delimiter between "class" and "a"
def class_correct_convention(item):
if item.count(" ") or item.count("_"):
# checking whether space or underscore was used as word delimiter.
if item.count(" ") > item.count("_"):
item = item.split(" ")
elif item.count(" ") < item.count("_"):
item = item.split("_")
item = list(map(lambda x: x.title(), item))
return ("".join(item)).replace("_", "").replace(" ","")
# if there is no white space, best we can do it capitalize first letter
return item[0].upper() + item[1:]
Well, with AI-based approach it will be difficult, not perfect, a lot of work. If it does not worth it, there is maybe simpler and certainly comparably efficient.
I understand the worst scenario is "todelineatewordsinastringlikethat".
I would recommend you to download a text file for english language, one word by line, and to proceed this way:
import re
string = "todelineatewordsinastringlikethat"
#with open("mydic.dat", "r") as msg:
# lst = msg.read().splitlines()
lst = ['to','string','in'] #Let's say the dict contains 3 words
lst = sorted(lst, key=len, reverse = True)
replaced = []
for elem in lst:
if elem in string: #Very fast
replaced_str = " ".join(replaced) #Faster to check elem in a string than elem in a list
capitalized = elem[0].upper()+elem[1:] #Prepare your capitalized word
if elem not in replaced_str: #Check if elem could be a substring of something you replaced already
string = re.sub(elem,capitalized,string)
elif elem in replaced_str: #If elem is a sub of something you replaced, you'll protect
protect_replaced = [item for item in replaced if elem in item] #Get the list of replaced items containing the substring elem
for protect in protect_replaced: #Uppercase the whole word to protect, as we do a case sensitive re.sub()
string = re.sub(protect,protect.upper(),string)
string = re.sub(elem,capitalized,string)
for protect in protect_replaced: #Deprotect by doing the reverse, full uppercase to capitalized
string = re.sub(protect.upper(),protect,string)
replaced.append(capitalized) #Append replaced element in the list
print (string)
Output:
TodelIneatewordsInaStringlikethat
#You see that String has been protected but not delIneate, cause it was not in our dict.
This is certainly not optimal, but will perform certainly comparably to AI for a problem which would certainly not be presented as it is for AI anyway (input prep are very important in AI).
Note it is important to reverse sort the list of words. Cause you want to detect full string words first, not sub. Like in beforehand you want the full one, not before or and.

How to make shortcut of first letters of any text?

I need to write a function that returns the first letters (and make it uppercase) of any text like:
shortened = shorten("Don't repeat yourself")
print(shortened)
Expected output:
DRY
and:
shortened = shorten("All terrain armoured transport")
print(shortened)
Expected output:
ATAT
Use list comprehension and join
shortened = "".join([x[0] for x in text.title().split(' ') if x])
Using regex you can match all characters except the first letter of each word, replace them with an empty string to remove them, then capitalize the resulting string:
import re
def shorten(sentence):
return re.sub(r"\B[\S]+\s*","",sentence).upper()
print(shorten("Don't repeat yourself"))
Output:
DRY
text = 'this is a test'
output = ''.join(char[0] for char in text.title().split(' '))
print(output)
TIAT
Let me explain how this works.
My first step is to capitalize the first letter of each work
text.title()
Now I want to be able to separate each word by the space in between, this will become a list
text.title()split(' ')
With that I'd end up with 'This','Is','A','Test' so now I obviously only want the first character of each word in the list
for word in text.title()split(' '):
print(word[0]) # T I A T
Now I can lump all that into something called list comprehension
output = [char[0] for char in text.title().split(' ')]
# ['T','I','A','T']
I can use ''.join() to combine them together, I don't need the [] brackets anymore because it doesn't need to be a list
output = ''.join(char[0] for char in text.title().split(' ')

how to split String with a "pattern" from list

I've got a small problem with finding part of string in a list with python.
I load the string from a file and the value is one of the following: (none, 1 from list, 2 from list, 3 from list or more...)
I need to perform different actions depending on whether the String equals "", the String equals 1 element from list, or if the String is for 2 or more elements. For Example:
List = [ 'Aaron', 'Albert', 'Arcady', 'Leo', 'John' ... ]
String = "" #this is just example
String = "Aaron" #this is just example
String = "AaronAlbert" #this is just example
String = "LeoJohnAaron" #this is just example
I created something like this:
if String == "": #this works well on Strings with 0 values
print "something"
elif String in List: #this works well on Strings with 1 value
print "something else"
elif ... #dont know what now
The best way would be to split this String with a pattern from a list. I was trying:
String.Find(x) #failed.
I tried to find similar posts but couldn't.
if String == "": #this works well on Strings with 0 values
print "something"
elif String in List: #this works well on Strings with 1 value
print "something else"
elif len([1 for x in List if x in String]) == 2
...
This is called a list comprehension, it will go through the list and find all of the list elements that have a substring in common with the string at hand, then return the length of that.
Note that there may be some issues if you have a name like "Ann" and "Anna", the name "Anna" in the string will get counted twice. If you need a solution that accounts for that, I would suggest splitting on capital letters to explicitly separate the list into separate names by splitting on capital letters (If you want I can update this solution to show how to do that with regex)
I think the most straightforward approach would be to loop over the list of names and for each of them check if its in your string.
for name in List:
if name in String:
print("do something here")
So, you want to find whether some string contains any members of the given list.
Iterate over the list and check whether the string contains the current item:
for data in List:
if data in String:
print("Found it!")

Optionally replacing a substring python

My list of replacement is in the following format.
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II')),.....]
What I want to achieve is optionally change the occurrence of the letter by all the possible replacements. The input word should also be a member of the list.
e.g.
input - DArA
Expected output -
['DArA','DaarA','Daaraa','DAraa','DaArA','DAraA','DaAraA','DAarA','DAarAa', 'DArAa','DAArA','DAArAA','DArAA']
My try was
lstrep = [('A',('aa','aA','Aa','AA'))]
def alte(word,lstrep):
output = [word]
for (a,b) in lstrep:
for bb in b:
output.append(word.replace(a,bb))
return output
print alte('DArA',lstrep)
The output I received was ['DArA', 'Daaraa', 'DaAraA', 'DAarAa', 'DAArAA'] i.e. All occurrences of 'A' were replaced by 'aa','aA','Aa' and 'AA' respectively. What I want is that it should give all permutations of optional replacements.
itertools.product will give all of the permutations. You can build up a list of substitutions and then let it handle the permutations.
import itertools
lstrep = [('A',('aa','aA','Aa','AA')),('I',('ii','iI','Ii','II'))]
input_str = 'DArA'
# make substitution list a dict for easy lookup
lstrep_map = dict(lstrep)
# a substitution is an index plus a string to substitute. build
# list of subs [[(index1, sub1), (index1, sub2)], ...] for all
# characters in lstrep_map.
subs = []
for i, c in enumerate(input_str):
if c in lstrep_map:
subs.append([(i, sub) for sub in lstrep_map[c]])
# build output by applying each sub recorded
out = [input_str]
for sub in itertools.product(*subs):
# make input a list for easy substitution
input_list = list(input_str)
for i, cc in sub:
input_list[i] = cc
out.append(''.join(input_list))
print(out)
Try constructing tuples of all possible permutations based on the replaceable characters that occur. This will have to be achieved using recursion.
The reason recursion is necessary is that you would need a variable number of loops to achieve this.
For your example "DArA" (2 replaceable characters, "A" and "A"):
replaceSet = set()
replacements = ['A':('aa','aA','Aa','AA'),'I':('ii','iI','Ii','II'),.....]
for replacement1 in replacements["A"]:
for replacement2 in replacements["A"]:
replaceSet.add((replacement1, replacement2))
You see you need two loops for two replaceables, and n loops for n replaceables.
Think of a way you could use recursion to solve this problem. It will likely involve creating all permutations for a substring that contains n-1 replaceables (if you had n in your original string).

How to replace a list of words with a string and keep the formatting in python?

I have a list containing the lines of a file.
list1[0]="this is the first line"
list2[1]="this is the second line"
I also have a string.
example="TTTTTTTaaaaaaaaaabcccddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefffff"
I want to replace list[0] with the string (example). However I want to keep the word length. For example the new list1[0] should be "TTTT TT TTa aaaaa aaaa". The only solution I could come up with was to turn the string example into a list and use a for loop to read letter by letter from the string list into the original list.
for line in open(input, 'r'):
list1[i] = listString[i]
i=i+1
However this does not work from what I understand because Python strings are immutable? What's a good way for a beginner to approach this problem?
I'd probably do something like:
orig = "this is the first line"
repl = "TTTTTTTaaaaaaaaaabcccddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeefffff"
def replace(orig, repl):
r = iter(repl)
result = ''.join([' ' if ch.isspace() else next(r) for ch in orig])
return result
If repl could be shorter than orig, consider r = itertools.cycle(repl)
This works by creating an iterator out of the replacement string, then iterating over the original string, keeping the spaces, but using the next character from the replacement string instead of any non-space characters.
The other approach you could take would be to note the indexes of the spaces in one pass through orig, then insert them at those indexes in a pass of repl and return a slice of the result
def replace(orig, repl):
spaces = [idx for idx,ch in enumerate(orig) if ch.isspace()]
repl = list(repl)
for idx in spaces:
repl.insert(idx, " ")
# add a space before that index
return ''.join(repl[:len(orig)])
However I couldn't imagine the second approach to be any faster, is certain to be less memory-efficient, and I don't find it easier to read (in fact I find it HARDER to read!) It also don't have a simple workaround if repl is shorter than orig (I guess you could do repl *= 2 but that's uglier than sin and still doesn't guarantee it'll work)

Categories