I have a problem with my homework and it's just confusing for me, this is the problem:
So input is a string that is a linear Equation Like " A + B = C ".
but for some reason one of A, B or C is not clear to us and we can't see it right.
for example:
"1# + 24 = 34" or "5131 + #251 = 76382"
Note that: It can happen to One part of Equation; A, B or C! and '#' can be more than one Digit!
(((( if input is = "10# + 50 = 10052" , output shoul be "10002 + 50 = 10052"))))
So here is a Question! How can I Highlight or Select part of this String that contains '#'?
I tried to search in RegExr and I can't find a pattern that matches my situation!
This retrieves the part of string that contains #:
import re
textExample = "5131 + #251 = 76382"
re.findall(r'[^ ]*#[^ ]*',textExample)
In case the expression does not always separate operators and numbers with spaces, you should search for a preceding or subsequent digit around the pound sign:
import re
equation = "5131 + #251 = 76382"
r = re.findall(r"((?<=\d)#|#(?=\d))",equation)
If you only intend to replace the pound sign with some digits, you don't need to find/highlight it. Simply use the built-in string replace function
equality = equation.replace("#","71") #==> '5131 + 71251 = 76382'
Related
I am trying to pull a substring out of a function result, but I'm having trouble figuring out the best way to strip the necessary string out using Python.
Output Example:
[<THIS STRING-STRING-STRING THAT THESE THOSE>]
In this example, I would like to grab "STRING-STRING-STRING" and throw away all the rest of the output. In this example, "[<THIS " &" THAT THESE THOSE>]" are static.
Many many ways to solve this. Here are two examples:
First one is a simple replacement of your unwanted characters.
targetstring = '[<THIS STRING-STRING-STRING THAT THESE THOSE>]'
#ALTERNATIVE 1
newstring = targetstring.replace(r" THAT THESE THOSE>]", '').replace(r"[<THIS ", '')
print(newstring)
and this drops everything except your target pattern:
#ALTERNATIVE 2
match = "STRING-STRING-STRING"
start = targetstring.find(match)
stop = len(match)
targetstring[start:start+stop]
These can be shortened but thought it might be useful for OP to have them written out.
I found this extremely useful, might be of help to you as well: https://www.computerhope.com/issues/ch001721.htm
If by '"[<THIS " &" THAT THESE THOSE>]" are static' you mean that they are always the exact same string, then:
s = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
before = len("[<THIS ")
after = len(" THAT THESE THOSE>]")
s[before:-after]
# 'STRING-STRING-STRING'
Like so (as long as the postition of the characters in the string doesn't change):
myString = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
myString = myString[7:27]
Another alternative method;
import re
my_str = "[<THIS STRING-STRING-STRING THAT THESE THOSE>]"
string_pos = [(s.start(), s.end()) for s in list(re.finditer('STRING-STRING-STRING', my_str))]
start, end = string_pos[0]
print(my_str[start: end + 1])
STRING-STRING-STRING
If the STRING-STRING-STRING occurs multiple times in the string, start and end indexes of the each occurrences will be given as tuples in string_pos.
The transform_comments function converts comments in a Python script into those usable by a C compiler. This means looking for text that begins with a hash mark (#) and replacing it with double slashes (//), which is the C single-line comment indicator. For the purpose of this exercise, we'll ignore the possibility of a hash mark embedded inside of a Python command, and assume that it's only used to indicate a comment. We also want to treat repetitive hash marks (##), (###), etc., as a single comment indicator, to be replaced with just (//) and not (#//) or (//#). Fill in the parameters of the substitution method to complete this function.
This is my try:
import re
def transform_comments(line_of_code):
result = re.sub(r'###',r'//', line_of_code)
return result
print(transform_comments("### Start of program"))
# Should be "// Start of program"
print(transform_comments(" number = 0 ## Initialize the variable"))
# Should be " number = 0 // Initialize the variable"
print(transform_comments(" number += 1 # Increment the variable"))
# Should be " number += 1 // Increment the variable"
print(transform_comments(" return(number)"))
# Should be " return(number)"
Use the * regex operator
def transform_comments(line_of_code):
result = re.sub(r'##*',r'//', line_of_code)
return result
from the re library docs
* Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match a, ab, or a followed by any number of bs.
We could use the + to indicate one or more occurrences of #
result = re.sub(r"#+",r"//",line_of_code)
import re
def transform_comments(line_of_code):
result = re.sub(r"#{1,}",r"//", line_of_code)
return result
Below code is working:
result = re.sub(r"[#]+","//",line_of_code)
import re
def transform_comments(line_of_code):
result = re.sub(r'(#*#) ',r'// ',line_of_code)
return result
I have this string:
-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)
but actually I have a lot of string like this:
a*p**(-1.0) + b*p**(c)
where a,b and c are double. And I would like to extract a,b and c of this string. How can I do this using Python?
import re
s = '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)'
pattern = r'-?\d+\.\d*'
a,_,b,c = re.findall(pattern,s)
print(a, b, c)
Output
('-1007.88670550662', '67293.8347365694', '-0.416543501823503')
s is your test strings and what not, pattern is the regex pattern, we are looking for floats, and once we find them using findall() we assign them back to a,b,c
Note this method works only if your string is in format of what you've given. else you can play with the pattern to match what you want.
Edit like most people stated in the comments if you need to include a + in front of your positive numbers you can use this pattern r'[-+]?\d+\.\d*'
Using the reqular expression
(-?\d+\.?\d*)\*p\*\*\(-1\.0\)\s*\+\s*(-?\d+\.?\d*)\*p\*\*\((-?\d+\.?\d*)\)
We can do
import re
pat = r'(-?\d+\.?\d*)\*p\*\*\(-1\.0\)\s*\+\s*(-?\d+\.?\d*)\*p\*\*\((-?\d+\.?\d*)\)'
regex = re.compile(pat)
print(regex.findall('-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)'))
will print [('-1007.88670550662', '67293.8347365694', '-0.416543501823503')]
If your formats are consistent, and you don't want to deep dive into regex (check out regex101 for this, btw) you could just split your way through it.
Here's a start:
>>> s= "-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)"
>>> a, buf, c = s.split("*p**")
>>> b = buf.split()[-1]
>>> a,b,c
('-1007.88670550662', '67293.8347365694', '(-0.416543501823503)')
>>> [float(x.strip("()")) for x in (a,b,c)]
[-1007.88670550662, 67293.8347365694, -0.416543501823503]
The re module can certainly be made to work for this, although as some of the comments on the other answers have pointed out, the corner cases can be interesting -- decimal points, plus and minus signs, etc. It could be even more interesting; e.g. can one of your numbers be imaginary?
Anyway, if your string is always a valid Python expression, you can use Python's built-in tools to process it. Here is a good generic explanation about the ast module's NodeVisitor class. To use it for your example is quite simple:
import ast
x = "-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)"
def getnums(s):
result = []
class GetNums(ast.NodeVisitor):
def visit_Num(self, node):
result.append(node.n)
def visit_UnaryOp(self, node):
if (isinstance(node.op, ast.USub) and
isinstance(node.operand, ast.Num)):
result.append(-node.operand.n)
else:
ast.NodeVisitor.generic_visit(self, node)
GetNums().visit(ast.parse(s))
return result
print(getnums(x))
This will return a list with all the numbers in your expression:
[-1007.88670550662, -1.0, 67293.8347365694, -0.416543501823503]
The visit_UnaryOp method is only required for Python 3.x.
You can use something like:
import re
a,_,b,c = re.findall(r"[\d\-.]+", subject)
print(a,b,c)
Demo
While I prefer MooingRawr's answer as it is simple, I would extend it a bit to cover more situations.
A floating point number can be converted to string with surprising variety of formats:
Exponential format (eg. 2.0e+07)
Without leading digit (eg. .5, which is equal to 0.5)
Without trailing digit (eg. 5., which is equal to 5)
Positive numbers with plus sign (eg. +5, which is equal to 5)
Numbers without decimal part (integers) (eg. 0 or 5)
Script
import re
test_values = [
'-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)',
'-2.000e+07*p**(-1.0) + 1.23e+07*p**(-5e+07)',
'+2.*p**(-1.0) + -1.*p**(5)',
'0*p**(-1.0) + .123*p**(7.89)'
]
pattern = r'([-+]?\.?\d+\.?\d*(?:[eE][-+]?\d+)?)'
for value in test_values:
print("Test with '%s':" % value)
matches = re.findall(pattern, value)
del matches[1]
print(matches, end='\n\n')
Output:
Test with '-1007.88670550662*p**(-1.0) + 67293.8347365694*p**(-0.416543501823503)':
['-1007.88670550662', '67293.8347365694', '-0.416543501823503']
Test with '-2.000e+07*p**(-1.0) + 1.23e+07*p**(-5e+07)':
['-2.000e+07', '1.23e+07', '-5e+07']
Test with '+2.*p**(-1.0) + -1.*p**(5)':
['+2.', '-1.', '5']
Test with '0*p**(-1.0) + .123*p**(7.89)':
['0', '.123', '7.89']
So I'm kind of new to Python. Right now I'm making a chemical equation balancer and I've got stuck because what I want to do right now is that if you receive a compound in parenthesis, with a subindex outside (like this: (NaCl)2), I want to expand it to this form: Na2Cl2 (and also get rid of the parenthesis). Right now I've managed just to get rid of the parenthesis with this code:
import string
import re
linealEquation = ""
def linealEq(equation):
#missing code
allow = string.letters + string.digits + '+' + '-' + '>'
linealEquation = re.sub('[^%s]' % allow, '', equation)
print linealEquation
linealEq("(CrNa)2 -> Cr+Na")
But how can I trace the string and multiply the indexes out of the parenthesis?
I know how to iterate over a string, but I cannot think of how to specifically multiply the sub index.
Thanks for the help.
Probably not the shortest solution and won't work in all cases, but works for your example:
left, right = equation.split('->')
exp = left.strip()[-1]
inside = left[1:-3]
c2 = re.findall('[A-Z][^A-Z]*', inside)
l = [s + exp for s in c2]
res =''.join(l)
N.B. you can add print statements to better understand each step...
I have a number of codes which I need to process, and these come through in a number of different formats which I need to manipulate first to get them in the right format:
Examples of codes:
ABC1.12 - correct format
ABC 1.22 - space between letters and numbers
ABC1.12/13 - 2 codes joined together and leading 1. missing from 13, should be ABC1.12 and ABC1.13
ABC 1.12 / 1.13 - codes joined together and spaces
I know how to remove the spaces but am not sure how to handle the codes which have been split. I know I can use the split function to create 2 codes but not sure how I can then append the letters (and first number part) to the second code. This is the 3rd and 4th example in the list above.
WHAT I HAVE SO FAR
val = # code
retList = [val]
if "/" in val:
(code1, code2) = session_codes = val.split("/", 1)
(inital_letters, numbers) = code1.split(".", 1)
if initial_letters not in code2:
code2 = initial_letters + '.' + code2
# reset list so that it returns both values
retList = [code1, code2]
This won't really handle the splits for 4 as the code2 becomes ABC1.1.13
You can use regex for this purpose
A possible implementation would be as follows
>>> def foo(st):
parts=st.replace(' ','').split("/")
parts=list(re.findall("^([A-Za-z]+)(.*)$",parts[0])[0])+parts[1:]
parts=parts[0:1]+[x.split('.') for x in parts[1:]]
parts=parts[0:1]+['.'.join(x) if len(x) > 1 else '.'.join([parts[1][0],x[0]]) for x in parts[1:]]
return [parts[0]+p for p in parts[1:]]
>>> foo('ABC1.12')
['ABC1.12']
>>> foo('ABC 1.22')
['ABC1.22']
>>> foo('ABC1.12/13')
['ABC1.12', 'ABC1.13']
>>> foo('ABC 1.12 / 1.13')
['ABC1.12', 'ABC1.13']
>>>
Are you familiar with regex? That would be an angle worth exploring here. Also, consider splitting on the space character, not just the slash and decimal.
I suggest you write a regular expression for each code pattern and then form a larger regular expression which is the union of the individual ones.
Using PyParsing
The answer by #Abhijit is a good, and for this simple problem reg-ex may be the way to go. However, when dealing with parsing problems, you'll often need a more extensible solution that can grow with your problem. I've found that pyparsing is great for that, you write the grammar it does the parsing:
from pyparsing import *
index = Combine(Word(alphas))
# Define what a number is and convert it to a float
number = Combine(Word(nums)+Optional('.'+Optional(Word(nums))))
number.setParseAction(lambda x: float(x[0]))
# What do extra numbers look like?
marker = Word('/').suppress()
extra_numbers = marker + number
# Define what a possible line could be
line_code = Group(index + number + ZeroOrMore(extra_numbers))
grammar = OneOrMore(line_code)
From this definition we can parse the string:
S = '''ABC1.12
ABC 1.22
XXX1.12/13/77/32.
XYZ 1.12 / 1.13
'''
print grammar.parseString(S)
Giving:
[['ABC', 1.12], ['ABC', 1.22], ['XXX', 1.12, 13.0, 77.0, 32.0], ['XYZ', 1.12, 1.13]]
Advantages:
The number is now in the correct format, as we've type-casted them to floats during the parsing. Many more "numbers" are handled, look at the index "XXX", all numbers of type 1.12, 13, 32. are parsed, irregardless of decimal.
Take a look at this method. The might be the simple and yet best way to do.
val = unicode(raw_input())
for aChar in val:
if aChar.isnumeric():
lastIndex = val.index(aChar)
break
part1 = val[:lastIndex].strip()
part2 = val[lastIndex:]
if "/" not in part2:
print part1+part2
else:
if " " not in part2:
codes = []
divPart2 = part2.split(".")
partCodes = divPart2[1].split("/")
for aPart in partCodes:
codes.append(part1+divPart2[0]+"."+aPart)
print codes
else:
codes = []
divPart2 = part2.split("/")
for aPart in divPart2:
aPart = aPart.strip()
codes.append(part1+aPart)
print codes