Extracting a Floating point exponential formatted number from a text file - python

I'm trying to extract a floating point exponential number from a number of .txt files that is searched for using a phrase and then extracted. For example I have a .txt file that looks like this.
FEA Results:
Tip rotation (deg) =, 7.107927E-18
Tip displacement =, 3.997556E-07
And I'm extracting the tip rotation data using the following script:
regexp = re.compile(r'Tip rotation .*?([0-9.-]+)')
with open(fileName) as f:
for line in f:
match = regexp.match(line)
if match:
rotations.append(float((match.group(1))))
The problem is it only returns the first part of the floating point exponential (i.e. 7.107927 instead of 7.107927E-18). Any idea on how I could correct it?

Your regex has this:
([0-9.-]+)
It's missing the E - add that in the brackets (at the front or the back, doesn't matter). Also, you may need to move the minus sign to the front, so it isn't interpreted as a range. Like this:
([-0-9.E]+)

Your regular expression doesn't allow for E-18. Specifically, E isn't mentioned.
See this question for better regexps: How to detect a floating point number using a regular expression

Related

Regex python help for coordinate format

I'm working on a program that takes the input in a particular format:
example "(1,2)(2,3)(4,3)". They are coordinates and there can be infinitely many coordinates "(1,2)(2,3)(4,3)...(a,b)". I'm writing a function "checkFormat(str)" that returns true if the format is satisfied. I've tried writing a function without the use of regex but it proved too difficult. Need help with the regex expression.
Use ^ and $ to match the whole input. in between is one or more set of (...) filled with digits.
Assuming coordinates are integer and no extra space in between:
^((\((\d)+\,(\d)+)\))+$
if +/- is allowed and 0 has no sign and could not be extended (00 or 01 not accepted)
^(\(([-\+]?[1-9]\d*|0)\,(([-\+]?[1-9]\d*)|0)\))+$
If decimal numbers are included:
^(\(([-\+]?[1-9]\d*|0)([.]\d+)?\,(([-\+]?[1-9]\d*)|0)([.]\d+)?\))+$
To check if the input match or not:
import re
pattern=r'^(\(([-\+]?[1-9]\d*|0)([.]\d+)?\,(([-\+]?[1-9]\d*)|0)([.]\d+)?\))+$'
input='(0,2)(1,2)'
result=bool(re.match(pattern,input))

Python Regular Expression Clobbering Text Between Multiple Latex Expression Matches

I am trying to clean conversational text from a StackExchange corpus which contains sentences which may have Latex expressions inside. Latex expressions are delimited by the $ sign: For instance $y = ax + b$
Here is a line of example text from the data containing multiple Latex expressions:
#Gruber - this is another example, when applied like so: $\mathrm{Var} \left(X^2\right) = 4 X^2 \mathrm{Var} (X)$ doesn't make any sense, on the left side you have a constant and on the right a random variable. Did you mean $4E(X)^2 Var(X)$ bless those that take the road less travelled. Another exception in your theory is $4E(X)^2 Var(X)$. What were you thinking? :)
Here is what I have so far: It seems to clobber text between each Latex Expression match and gives one huge match which is incorrect.
([\$](.*)[\$]){1,3}?
I don't understand why you put {1,3} at the end, what goal did you try to achieve. Anyway, your mistake is that you use [\$], which gives you a set of two characters - a backslash and a dollar. I suggest you use
\$([^$]*)\$
and replace it with an empty string: demo here

Extracting 4 characters out of an HTML Array, Python

I am working on scraping a betting website for odds as my first web-scraping project. I have successfully scraped what I want so far and now have an array like this
[<b>+5\xbd\xa0-110</b>, <b>-5\xbd\xa0-110</b>]
[<b>+6\xa0-115</b>, <b>-6\xa0-105</b>]
[<b>+6\xa0-115</b>, <b>-6\xa0-105</b>]
Is there a way I can just pull out the -105/110/115? The numbers I am looking for are those 3 to the left of the </b> and I also need to include the positive or negative sign to the left of the three numbers. Do I need to use a regular expression?
Thanks a lot!
Weston
regex will work depending on if this is the only format the numbers are in.
Also, do you know if the positive sign is shown or it only shows negative?
If it does show positive...
([+-][\d]{3})<\/b>
If it doesn't show positive use...
([+-]?[\d]{3})<\/b>
http://regexr.com/3h08d
You should be able to extract the contents inside the round brackets.
Edit: you probably want to do something like below. This code will get each string from the list and then do a regex search on the string. It will append the result to the nums list. The result will be a 3 digit number with the sign in front, since it extracts the first group inside the round brackets.
import re
nums = []
for line in odds:
result = re.search(('[+-][\d]{3})<\/b>',line)
nums.append(result.group(1)))
print (nums)

Using Regular Expressions to extract numerical quantities from a file and find the sum

I am a beginner and learning python. The problem is that I have to extract numbers from a file (in which numbers can be anywhere. can be multiple times in the same line. some lines may not have numbers and some lines may be new lines) and find their sum. I did know how to solve it, and this was my code
import re
new=[]
s=0
fhand=open("sampledata.txt")
for line in fhand:
if re.search('^.+',line): #to exclude lines which have nothing
y=re.findall('([0-9]*)',line) #this part is supposed to extract only the
for i in range(len(y)): #the numerical part, but it extracts all the words. why?
try:
y[i]=float(y[i])
except:
y[i]=0
s=s+sum(y)
print s
The code works, but it is not a pythonic way to do it. Why is the ([0-9]*) extracting all the words instead of only numbers?
What is the pythonic way to do it?
Your regular expression has ([0-9]*) which will find all words with zero or more numbers. You probably want ([0-9]+) instead.
Hello you made a mistake in the regular expression by adding the "*", like this should work:
y=re.findall('([0-9])',line)
Expanding on wind85's answer, you might want to fine tune your regular expression depending on what kind of numbers you expect to find in your file. For example, if your numbers might have a decimal point in them, then you might want something like [0-9]+(?:\.[0-9]+)? (one or more digits optionally followed by a period and one or more digits).
As for making it more pythonic, here's how I'd probably write it:
s=0
for line in open("sampledata.txt"):
s += sum(float(y) for y in re.findall(r'[0-9]+',line))
print s
If you want to get really fancy, you can make it a one-liner:
print sum(float(y) for line in open('sampledata.txt')
for y in re.findall(r'[0-9]+',line))
but personally I find that kind of thing hard to read.

How to go through data files with integers and scientific noation using regex?

The data files which I have look like:
Title
10000XX 1.09876543e+02
There are many lines in this form with the column 1 values ranging from 1000000-2000099 and with column 2 values ranging from -9000 to 9000 including some values with negative exponents. I am very new to regex so any help would be useful. The rest of my program is written in python so I am using:
re.search()
Some help with this syntax would be great.
Thanks
As Robert says, you can just use the split() function.
Assuming the separator is spaces like you have in the question, you can run the code below to give a list of values, then do with that as you will:
>>> line = "10000XX 1.09876543e+02"
>>> line.split()
['10000XX', '1.09876543e+02']
You can convert the second item to a floating point number with float(). e.g. float('1.09876543e+02')
Just iterate over your lines and ignore any that don't start with a number.
Regular expressions are a bit more fiddly.

Categories