I am currently working on a bot within Discord and in one of the functions I require it to be able to extract integers and floats from message content while ignoring anything else. I am wondering what would be the best way to do this while still keeping in mind mathematical operations (e.g if a message said "27+3 whats good dude" I would expect the variable to return "30"). Currently I am just extracting the whole message content and using that for my input however say someone were to put any text after they put the numbers (e.g "594 what's up man") it throws off the rest of the function as it requires a stripped integer or float value.
Here is the sample of the code that is currently being thrown off whenever a non-integer input is entered
and here are both examples of what I want the bot to require as correct inputs
Expecting 373:
Expecting 613:
I think that the easiest way you'll achieve extracting numbers from the string is by using regex.
You can use Pythex to formulate your regex formula, but I believe the one that best suits your request would be this:
import re
string = "602,11 Are the numbers I would like to extract"
a = [int(i) for i in re.findall(r'\d+', string)]
a = [602, 11]
Now, Regular expressions (regex) can only be used to recognize regular languages. The language of mathematical expressions is not regular;
So, for you to extract and parse a math expression from a string you'll need to implement an actual parser in order to do this.
Related
I am currently using a roblox API which returns some very LARGE json response but I am only looking for a specific data inside, the data I need looks something like this.
gameinstanceId=f4beb4fc-82d1-4573-82f1-dd94c13a94eb
I am only for the data after the "=" and save all the ones it finds into separate variables, I just need to find ALL of them basically.
I don't know how to get around doing this, I thought of using substrings but again I have no idea on how to do it.
Any pointers would be helpful.
If I've understood right, I think the regular expressions package (re) is your friend here. The following will return all instances found in a long string.
PS building regular expressions (regexes) can be tedious and I always forget the notation, so I always go to https://pythex.org/ to build my expressions.
import re
longstring = 'gameinstanceId=f4beb4fc-82d1-4573-82f1-dd94c13a94eb\ngameinstanceId=f4beb4fc-82d1-4573-82f1-dd94c13a94eb\n'
re.findall(r'gameinstanceId=([\w-]*)', longstring)
This code returns a list with all matches:
['f4beb4fc-82d1-4573-82f1-dd94c13a94eb',
'f4beb4fc-82d1-4573-82f1-dd94c13a94eb']
With further feedback and a URL, this approach is probably what you want:
import requests
resp = requests.get('https://rankbotddtgrcm.glitch.me/gameInstances?Place=2679871702')
re.findall(r'gameInstanceId=([\w-]*)', resp.text)
I use list comprehensions for this sort of thing:
mylist = [line.split("=",1)[1] for line in resp.text.splitlines() if line.startswith("gameinstanceId=")]
I typed this in on the fly, but it should be close.
I have been researching a lot of questions about It on stackOverflow and google. But none of then solved my problem.
Let's say, I have an irregular length string (like a phone number or specific and territorial documents) for example:
docA="A123B456"
docB="A123B456CD"
docC="6CD"
I'm writing a function to print documents. But they don't have a definite pattern, my approach was to use a default variable using the most common pattern and give the responsibility of corner cases to the programmer.
e.g:
def printDoc(text, pattern="{}{}{}{}#{}{}{}-{}")
print(pattern.format(*text))
But It would be much more clean and explicit if there's a way to simplify the pattern like
def printDoc(text, pattern="{0:3}#{4:-1}-{-1}")
print(pattern.format(*text))
Then I could use It like:
printDoc(docA)
printDoc(docB)
printDoc(docC, "{0:1}-{2}")
But It's not a valid syntax. Is there a way of doing this properly?
If my approach is wrong, is there a better way of doing this?
You could use regular expression to parse the indexes/slices from the format string and use those to index given text. You'd also have to remove the indeces from format string before using it with str.format. The only tricky part is actually getting format parameters out from text but if you consider eval acceptable you could do following:
import re
def printDoc(text, pattern="{0:3}#{4:-1}-{-1}"):
params = []
# Find occurrences of '{}' from format string and extract format parameter
for m in re.finditer(r'\{([-:\d]+)\}', pattern):
params.append(eval('text[{}]'.format(m.group(1))))
# Remove indeces from format string
pattern = re.sub(r'\{([-:\d]+)\}', '{}', pattern)
print(pattern.format(*params))
printDoc('A123B456')
Output:
A12#B45-6
Note that using eval is generally considered bad and unsafe practice. Although the potential risks are limited here because of restricted character set given to eval you might want to consider other alternatives unless you're the one who controls the format strings.
I am trying to match a string with a regular expression but it is not working.
What I am trying to do is simple, it is the typical situation when an user intruduces a range of pages, or single pages. I am reading the string and checking if it is correct or not.
Expressions I am expecting, for a range of pages are like: 1-3, 5-6, 12-67
Expressions I am expecting, for single pages are like: 1,5,6,9,10,12
This is what I have done so far:
pagesOption1 = re.compile(r'\b\d\-\d{1,10}\b')
pagesOption2 = re.compile(r'\b\d\,{1,10}\b')
Seems like the first expression works, but not the second.
And, would it be possible to merge both of them in one single regular expression?, In a way that, if the user introduces either something like 1-2, 7-10 or something like 3,5,6,7 the expression will be recogniced as good.
Simpler is better
Matching the entire input isn't simple, as the proposed solutions show, at least it is not as simple as it could/should be. Will become read only very quickly and probably be scrapped by anyone that isn't regex savvy when they need to modify it with a simpler more explicit solution.
Simplest
First parse the entire string and .split(","); into individual data entries, you will need these anyway to process. You have to do this anyway to parse out the useable numbers.
Then the test becomes a very simple, test.
^(\d+)(?:-\(d+))?$
It says, that there the string must start with one or more digits and be followed by optionally a single - and one or more digits and then the string must end.
This makes your logic as simple and maintainable as possible. You also get the benefit of knowing exactly what part of the input is wrong and why so you can report it back to the user.
The capturing groups are there because you are going to need the input parsed out to actually use it anyway, this way you get the numbers if they match without having to add more code to parse them again anyway.
This regex should work -
^(?:(\d+\-\d+)|(\d+))(?:\,[ ]*(?:(\d+\-\d+)|(\d+)))*$
Demo here
Testing this -
>>> test_vals = [
'1-3, 5-6, 12-67',
'1,5,6,9,10,12',
'1-3,1,2,4',
'abcd',
]
>>> regex = re.compile(r'^(?:(\d+\-\d+)|(\d+))(?:\,[ ]*(?:(\d+\-\d+)|(\d+)))*$')
>>> for val in test_vals:
print val
if regex.match(val) == None:
print "Fail"
else:
print "Pass"
1-3, 5-6, 12-67
Pass
1,5,6,9,10,12
Pass
1-3,1,2,4.5
Fail
abcd
Fail
I'm trying to create a calculator program in which the user can type an equation and get an answer. I don't want the full code for this, I just need help with a specific part.
The approach I am trying to take is to have the user input the equation as a string (raw_input) and then I am trying to convert the numbers from their input to integers. After that I need to know how I can get the operands to do what I want them to do depending on which operand the user uses and where it is in the equation.
What are some methods I might use to accomplish this task?
Here is basically what I have right now:
equation_number = raw_input("\nEnter your equation now: ")
[int(d) for d in equation_number if d.isdigit()]
Those lines are just for collecting input and attempting to convert the numbers into integers. Unfortunately, it does not seem to be working very well and .isdigit will only work for positive numbers anyway.
Edit- aong152 mentioned recursive parsing, which I looked into, and it appears to have desirable results:
http://blog.erezsh.com/how-to-write-a-calculator-in-70-python-lines-by-writing-a-recursive-descent-parser/
However, I do not understand the code that the author of this post is using, could anyone familiarize me with the basics of recursive parsing?
The type of program you are trying to make is probably more complicated than you think
The first step would be separating the string into each argument.
Let's say that the user inputs:
1+2.0+3+4
Before you can even convert to ints, you are going to need to split the string up into its components:
1
+
2.0
+
3
+
4
This will require a recursive parser, which (seeing as you are new to python) maybe be a bit of a hurdle.
Assuming that you now have each part seperately as strings,
float("2.0") = 2.0
int(2.0) = 2
Here is a helper function
def num (s):
try:
return int(s)
except exceptions.ValueError:
return int(float(s))
instead of raw_input just use input because raw_input returns a string and input returns ints
This is a very simple calculator:
def calculate():
x = input("Equation: ")
print x
while True:
calculate()
the function takes the input and prints it then the while loop executes it again
im not sure if this is what you want but here you go and also you should make a way to end the loop
After using raw_input() you can use eval() on the result to compute the value of this string. eval() evaluates any valid Python expression and returns the outcome.
But I think this is not to your liking. You probably want to do more by yourself.
So I think you should have a look at the re module to split the input using regular expressions into tokens (sth like numbers and operators). After this you should write a parser which gets the token stream as input. You should decide whether this parser shall just return the computed value (e. g. a number) or maybe an abstract syntax tree, i. e. a data structure which represents the expression in an object-oriented (instead of character-oriented) way. Such an Absy could then be evaluated to get the final result.
Are you familiar with regular expressions? If not, it's probably a good idea to first learn about them. They are the weak, non-recursive cousin of parsing. Don't go deep, just understand the building blocks — A then B, A many times, A or B.
The blog post you found is hard because it implements the parsing by hand. It's using recursive descent, which is the only way to write a parser by hand and keep your sanity, but it's still tricky.
What people do most of the time is only write a high level grammar and use a library (or code generator) to do the hard work of parsing.
Indeed he had an earlier post where he uses a library:
http://blog.erezsh.com/how-to-write-a-calculator-in-50-python-lines-without-eval/
At least the beginning should be very easy. Things to pay attention to:
How precedence arises from the structure of the grammar — add consists of muls, not vice versa.
The moment he adds a rule for parentheses:
atom: neg | number | '(' add ')';
This is where it really becomes recursive!
6-2-1 should parse as (6-2)-1, not 6-(2-1). He doesn't discuss it, but if you look
carefully, it also arises from the structure of the grammar. Don't waste tome on this; just know for future reference that this is called associativity.
The result of parsing is a tree. You can then compute its value in a bottom-up manner.
In the "Calculating!" chapter he does that, but the in a sort of magic way.
Don't worry about that.
To build a calculator yourself, I suggest you strip the problem as much as possible.
Recognizing where numbers end etc. is a bit messy. It could be part of the grammar, or done by a separate pass called lexer or tokenizer.
I suggest you skip it — require the user to type spaces around all operators and parens. Or just assume you're already given a list of the form [2.0, "*", "(", 3.0, "+", -1.0, ")"].
Start with a trivial parser(tokens) function that only handles 3-element expressions — [number, op, number].
Return a single number, the result of the computation. (I previously said parsers output a tree which is processed later. Don't worry about that, returning a number is simpler.)
Write a function that expects either a number or parentheses — in the later case it calls parser().
>>> number_or_expr([1.0, "rest..."])
(1.0, ["rest..."])
>>> number_or_expr(["(", 2.0, "+", 2.0, ")", "rest..."])
(4.0, ["rest..."])
Note that I'm now returning a second value - the remaining part of the input. Change parser() to also use this convention.
Now Rewrite parser() to call number_or_expr() instead of directly assuming tokens[0] and tokens[2] are numbers.
Viola! You now have a (mutually) recursive calculator that can compute anything — it just has to be written in verbose style with parens around everything.
Now stop and admire your code, for at least a day :-) It's still simple but has the essential recursive nature of parsing. And the code structure reflects the grammar 1:1 (which is the nice property of recursive descent. You don't want to know how the other algorithms look).
From here there many improvements possible — support 2+2+2, allow (1), precedence... — but there are 2 ways to go about it:
Improve your code step by step. You'll have to refactor a lot.
Stop working hard and use a parsing library, e.g. pyparsing.
This will allow you to experiment with grammar changes faster.
I cannot find a way to serially search a string and append replacements. Let's say I am implementing a templating language. A simplified template looks something like this:
Hello words on #DATE# in #COUNTRY# on this beautiful day.
Imagine a very long template, with many #SOMETHING# tags. Now I want to use regex to parse through this, and every time I found #SOMETHING#, do some python logic, replace it with some string, append it, and continue. All I found is that I can break the string up into tokens and matches and then reassemble it. Is there something better, without generating all those string chunks? Maybe I am trying to optimize too early, but in Java, we have the
appendReplacement(StringBuffer,String) and appendTail(StringBuffer)
methods and I was wondering if something similar can be done in Python.
See http://docs.oracle.com/javase/tutorial/essential/regex/matcher.html
You can use a function as the "replacement" in re.sub. Then re.sub will invoke your function for every match in the string, and the return value of the function will be the replacement in the string.