Regex python help for coordinate format - python

I'm working on a program that takes the input in a particular format:
example "(1,2)(2,3)(4,3)". They are coordinates and there can be infinitely many coordinates "(1,2)(2,3)(4,3)...(a,b)". I'm writing a function "checkFormat(str)" that returns true if the format is satisfied. I've tried writing a function without the use of regex but it proved too difficult. Need help with the regex expression.

Use ^ and $ to match the whole input. in between is one or more set of (...) filled with digits.
Assuming coordinates are integer and no extra space in between:
^((\((\d)+\,(\d)+)\))+$
if +/- is allowed and 0 has no sign and could not be extended (00 or 01 not accepted)
^(\(([-\+]?[1-9]\d*|0)\,(([-\+]?[1-9]\d*)|0)\))+$
If decimal numbers are included:
^(\(([-\+]?[1-9]\d*|0)([.]\d+)?\,(([-\+]?[1-9]\d*)|0)([.]\d+)?\))+$
To check if the input match or not:
import re
pattern=r'^(\(([-\+]?[1-9]\d*|0)([.]\d+)?\,(([-\+]?[1-9]\d*)|0)([.]\d+)?\))+$'
input='(0,2)(1,2)'
result=bool(re.match(pattern,input))

Related

Regex range of hex number with excusions

In a program I am writing, I need to check whether or not a certain number in hexadecimal is in a given range.
I got it all figured out, except of a single problem which I am stuck at:
lets say i receive the following range: 52-71
I need to check if a given number is within that range, for example: 6e
How can I write a regex expression that supports that?
Writing a regex expression that detects 50-7f is easy since every number can be generated in it --> [5-7][0-9a-fA-F].
The problem is that the ranges cannot be simplified because it must except 6e, 53, 71 but reject 51, 72
Is there a clever way of excluding the ranges 50-51, 72-7f from the expression mentioned before:
[5-7][0-9a-fA-F]
Thank you very much,
By the way, I am working with python.
One approach is to partition the ranges of interest building an alternation from regexen that matches said partitions.
Addressing your sample range ( [52-71] ):
(5[2-9a-f]|6[0-9a-f]|7[01])
Use the case-insensitive matching of your regex engine. In case it is not available, add the repsective uppercase ranges to the character classes.
It would be simpler to convert the string (since you are using regex I assume you receive the value as a string) into an int and evaluate with the normal int operators.
Using regex for this job will only make everything more complex, since they match patterns and have no concept of value. If you insist on doing that, this should do the job (but remember, that every range you exclude is going to make it even more complex!):
5[2-9a-fA-F]|6[0-9a-fA-F]|7[0-1]
You can see it with test cases and explanation here

Python "SyntaxError: invalid token" on numbers starting with 0 (zeroes)

I know someone might think this question has been answered here but it doesn't have answer to what I want to achieve.
I have list of phone numbers, a very large one, and a whole lot of them starts with 08 and there is a lot of duplication, which is what I am trying to remove. Now I need to put them in a list or set so that I can use them in my program but it returns Invalid token as shown in the picture below:
Python assumes anything that starts with 0 as octal. How do I device a mean to bypass this and have these numbers in a list and then in a set?
read your phone input file, save each phone as string to a set, then the duplicates will be removed due to set only hold unique elements, and you can do further work on them.
def get_unique_phones_set():
phones_set = set()
with open("/path/to/your/duplicated_phone_file", "r") as inputs:
for phone in inputs:
# phone is read as a string
phones_set.add(phone.strip())
return phones_set
If you need to have them prepended by 08, use strings instead of ints.
a = ["08123","08234","08123"]
a = list(set(a)) # will now be ["08123","08234"]
Since (as you say) you don't have an easy way of surrounding the numerous numbers with quotes, go to http://www.regexr.com/ and enter the following:
Expression: ([0-9]+)
Text: Your numbers
Substitution (expandable pane at the bottom of the screen: "$&"

Extracting a Floating point exponential formatted number from a text file

I'm trying to extract a floating point exponential number from a number of .txt files that is searched for using a phrase and then extracted. For example I have a .txt file that looks like this.
FEA Results:
Tip rotation (deg) =, 7.107927E-18
Tip displacement =, 3.997556E-07
And I'm extracting the tip rotation data using the following script:
regexp = re.compile(r'Tip rotation .*?([0-9.-]+)')
with open(fileName) as f:
for line in f:
match = regexp.match(line)
if match:
rotations.append(float((match.group(1))))
The problem is it only returns the first part of the floating point exponential (i.e. 7.107927 instead of 7.107927E-18). Any idea on how I could correct it?
Your regex has this:
([0-9.-]+)
It's missing the E - add that in the brackets (at the front or the back, doesn't matter). Also, you may need to move the minus sign to the front, so it isn't interpreted as a range. Like this:
([-0-9.E]+)
Your regular expression doesn't allow for E-18. Specifically, E isn't mentioned.
See this question for better regexps: How to detect a floating point number using a regular expression

Implementing error message for code in Python

def parse(expression):
operators= set("*/+-")
numbers= set("0123456789")#not used anywhere as of now
opExtrapolate= []
numExtrapolate= []
buff=[]
for i in expression:
if i in operators:
if len(buff) >0: #prevents buff if multiple operators
numExtrapolate.append(''.join(buff))
buff= []
opExtrapolate.append(i)
opExtrapolation=opExtrapolate
else:
buff.append(i)
numExtrapolate.append(''.join(buff))
numExtrapolation=numExtrapolate
print(numExtrapolation)
print("z:", len(opExtrapolation))
return numExtrapolation, opExtrapolation
def errors():
numExtrapolation,opExtrapolation=parse(expression)
#Error for muliple operators
if (len(numExtrapolation) ==3) and (len(opExtrapolation) !=2):
print("Bad1")
if (len(numExtrapolation) ==2) and (len(opExtrapolation) !=1):
print("Bad2")
#
I posted similar code in an older question however the premise for questions is different in this post.
The code above takes a mathematical input entered in a variable expression by the user and it splits it into operands and operators. The errors function will later print errors if the input is incorrect.
Input would look something like this , where the operators can only be in the set("*/+-") and operands are real numbers. so an example input would be 45/23+233
With the help of an SO user I was able to get one of the errors to work(error for multiple operators), but I am having trouble implementing a few more error messages.
1)If the input contains items that are not numbers or not the allowed operators then an error message is displayed
2)If a user enters a number such as .23 or something like 554. where there is no number before the decimal place or after the decimal place then a different error is displayed.(note that a number like 0.23 is fine).
3)If the user attempts to divide by zero an error is displayed.
::What I have tried:
In the else statement of parse(), I tried to put conditions on buff.append(i) so that it would only run that code if buff.isdigit()==true but I got errors saying that there were no digits in buff. I also tried creating a set called "numbers"(in code below) and limiting buff.append(i) to that set through a for statement similar to the initial for statement. But unfortunately nothing worked. Any and all help would be appreciated.
Please don't introduce large amounts of code more advanced than the code below. I am trying to fix a problem, not completely change my code. Thanks for all your help.
You can use regular expressions to do these checks:
If the input contains items that are not numbers or not the allowed operators then an error message is displayed
if not re.match(r'[\d.*/+\- ]+$', expression):
print("Bad3") # Characters exist that are not allowed
Explanation: [\d.*/+\- ] will only match digits, your operators, and spaces, the + means to allow one or more of those characters, and the $ matches at the very end of the string. re.match() starts at the beginning of the string so this means that only those characters are allowed.
If a user enters a number such as .23 or something like 554. where there is no number before the decimal place or after the decimal place then a different error is displayed.(note that a number like 0.23 is fine).
if re.search(r'(?<!\d)\.|\.(?!\d)', expression):
print("Bad4") # There is a '.' without a digit before or after it
Explanation: \. in a regex matches a literal '.' character. The | in the middle is an alternation, so the regex will match if the expression on either side of it matches. (?<!\d) means that the previous character is not a number, and (?!\d) means that the next character is not a number, so this regex means "match a '.' that is not preceeded by a digit OR match a '.' that is not followed by a digit".
If the user attempts to divide by zero an error is displayed.
if re.search(r'/ *[0.]+(?![.\d])', expression):
print("Bad5") # Division by 0
Explanation: This matches / followed by any number of spaces, then one or more 0 or . characters, so this will match if anywhere in expression you have something like / 0, / 0.0, or / 0.00. The (?![.\d]) means that the next character can't be a digit or ., which will prevent you from matching something like / 0.4.
I will give you some indications, but not solve it for you :).
If you need more, ask a precise question and I'll answer it.
The answers I give you are NOT directly related with your code.
You can test if a string variable can be an integer by trying to cast it :
try:
var2 = int(var)
I let you see what Error it gives
For a version that doesn't use try, you can look at the isdigit method
You can see if a string variable if one of your operator by checking it
if (var in ["+", "-", "/", "*"])
to check even more, you can look at the variable's length first
if len(var) != and ... see above
To check if a user inputs something like .543 and refuse it, and can look at the first element of your string variable :
if myvar[0] is ".":
To check if your user wants to divide by 0, you can simply check whether the last number is equals to 0
if int(myvar) == 0:
All these expect you to be able to get operators and numbers first though.
The other solution would be to use regular expressions to perform these checks before parsing your numbers and operators.
It seems quite complex compared to the exercise you are trying to achieve though as it is homework. Might be a good idea to look at them anyway.

How to go through data files with integers and scientific noation using regex?

The data files which I have look like:
Title
10000XX 1.09876543e+02
There are many lines in this form with the column 1 values ranging from 1000000-2000099 and with column 2 values ranging from -9000 to 9000 including some values with negative exponents. I am very new to regex so any help would be useful. The rest of my program is written in python so I am using:
re.search()
Some help with this syntax would be great.
Thanks
As Robert says, you can just use the split() function.
Assuming the separator is spaces like you have in the question, you can run the code below to give a list of values, then do with that as you will:
>>> line = "10000XX 1.09876543e+02"
>>> line.split()
['10000XX', '1.09876543e+02']
You can convert the second item to a floating point number with float(). e.g. float('1.09876543e+02')
Just iterate over your lines and ignore any that don't start with a number.
Regular expressions are a bit more fiddly.

Categories