I have a list containing floating numbers (positive or negative) which are separated by a hyphen. I would like to split them.
For example:
input: -76.833-106.954, -76.833--108.954
output: -76.833,106.954,-76.833,-108.954
I've tried re.split(r"([-+]?\d*\.)-", but it doesn't work. I get an invalid literal statement for int()
Please let me know what code would you recommend me to use. Thank you!
Completing #PyHunterMan's answer:
You want only one hyphen to be optional before the number indicating a negative float:
import re
target = '-76.833-106.954, -76.833--108.954, 83.4, -92, 76.833-106.954, 76.833--108.954'
pattern = r'(-?\d+\.\d+)' # Find all float patterns with an and only one optional hypen at the beggining (others are ignored)
match = re.findall(pattern, target)
numbers = [float(item) for item in match]
print(numbers)
>>> [-76.833, -106.954, -76.833, -108.954, 83.4, 76.833, -106.954, 76.833, -108.954]
You will notice this does not catch -92 and besides -92 is part the Real numbers set, is not written in float format.
If you want to catch the -92 which is an integer use:
import re
input_ = '-76.833-106.954, -76.833--108.954, 83.4, -92, 76.833-106.954, 76.833--108.954'
pattern = r'(-?\d+(\.\d+)?)'
match = re.findall(pattern, input_)
print(match)
result = [float(item[0]) for item in match]
print(result)
>>> [-76.833, -106.954, -76.833, -108.954, 83.4, -92.0, 76.833, -106.954, 76.833, -108.954]
Related
I would like to split a two different strings in python to the same length of 5. For example,
string1= '21007250.12000 -18047085.73200 1604.90200 59.10000 21007239.94800'
string2= '24784864.18300-318969464.50000 -1543.53600 34.48000 24784864.9700'
string1_final = ['21007250.12000','-18047085.73200','1604.90200','59.10000','21007239.94800']
string2_final = ['24784864.18300','-318969464.50000','-1543.53600','34.48000','24784864.9700']
Notice the separation of the white space and separating the two numbers while keeping the minus sign. I've tried using string2.split() and string2.split('-'), but it removes the minus. Any help would be greatly appreciated.
You can use a similar code to the answer to this question and get this:
import re
string1 = '21007250.12000 -18047085.73200 1604.90200 59.10000 21007239.94800'
string2 = '24784864.18300-318969464.50000 -1543.53600 34.48000 24784864.9700'
def process_string (string):
string_spaces_added = re.sub('-', ' -', string)
string_spaces_removed = re.sub(' +', ' ', string_spaces_added)
return string_spaces_removed.split()
print(process_string(string1))
print(process_string(string2))
Output:
['21007250.12000', '-18047085.73200', '1604.90200', '59.10000', '21007239.94800']
['24784864.18300', '-318969464.50000', '-1543.53600', '34.48000', '24784864.9700']
You could try something like this:
string1 = '21007250.12000 -18047085.73200 1604.90200 59.10000 21007239.94800'
string2 = '24784864.18300-318969464.50000 -1543.53600 34.48000 24784864.9700'
def splitter(string_to_split: str) -> list:
out = []
for item in string_to_split.split():
if "-" in item and not item.startswith("-"):
out.extend(item.replace("-", " -").split())
else:
out.append(item)
return out
for s in [string1, string2]:
print(splitter(s))
Output:
['21007250.12000', '-18047085.73200', '1604.90200', '59.10000', '21007239.94800']
['24784864.18300', '-318969464.50000', '-1543.53600', '34.48000', '24784864.9700']
Well, it looks like you want the numbers in the strings, rather than "split on variable delimiters"; ie it's not a string like "123 -abc def ghi", it's always a string of numbers.
So using simple regex to identify: an optional negtive sign, some numbers, an optional decimal place and then decimal digits (assuming it will always have digits after the decimal place, unlike numpy's representation of numbers like 2. == 2.0).
import re
numbers = re.compile(r'(-?\d+(?:\.\d+)?)')
string1 = numbers.findall(string1)
string1 == string1_final
# True
string2 = numbers.findall(string2)
string2 == string2_final
# True
# also works for these:
string3 = '123 21007250.12000 -5000 -67.89 200-300.4-7'
numbers.findall(string3)
# ['123', '21007250.12000', '-5000', '-67.89', '200', '-300.4', '-7']
If you expect and want to avoid non-arabic digits, like roman numerals, fractions or anything marked as numerals in unicode, then change each \d in the regex to [0-9].
Note: this regex doesn't include the possibility for exponents, complex numbers, powers, etc.
I am trying to find all numbers in text and return them in a list of floats.
In the text:
Commas are used to separate thousands
Several consecutive numbers are separated by a comma and a space
Numbers can be attached to words
My code seems to extract numbers separated with a comma and space and numbers attached to words.
However, it extracts numbers separated by commas as separate numbers
text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"
list(map(int, re.findall('\d+', text)))
The suggestions below work beautifully
Unfortunately, the output of the below returns a string:
nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)
I need to return the output as a list of floats, with commas between but no speech marks.
Eg.
extract_numbers("1, 2, 3, un pasito pa'lante Maria")
is [1.0, 2.0, 3.0]
Unfortunately, I have not yet been successful in my attempts. Currently, my code reads
def extract_numbers(text):
nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
return (("[{0}]".format(
', '.join(map(str, nums)))))
extract_numbers(TEXT_SAMPLE)
You may try doing a regex re.findall search on the following pattern:
\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)
Sample script - try it here
import re
text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"
nums = re.findall(r'\b\d{1,3}(?:,\d{3})*(?:\.\d+)?(?!\d)', text)
print(nums)
This prints:
['30', '10', '1', '2', '137', '40', '2,137,040']
Here is an explanation of the regex pattern:
\b word boundary
\d{1,3} match 1 to 3 leading digits
(?:,\d{3})* followed by zero or more thousands terms
(?:\.\d+)? match an optional decimal component
(?!\d) assert the "end" of the number by checking for a following non digit
Create a pattern with an optional character group []
Code try it here
import re
text = "30feet is about 10metre but that's 1 rough estimate several numbers are like 2, 137, and 40 or something big numbers are like 2,137,040 or something"
out = [
int(match.replace(',', ''))
for match in re.findall('[\d,]+', text)
]
print(out)
Output
[30, 10, 1, 2, 137, 40, 2137040]
you need to match the commas as well, then strip them before turning them into an integer:
list(map(lambda n: int(n.replace(',','')), re.findall('[\d,]+', text)))
Also, you should probably be using list comprehensions unless you need python2 compatibility for some reason:
[int(n.replace(',', '')) for n in re.findall('[\d,]+', text)]
y not use?
array = re.findall(r'[0-9]+', str)
I have text with values like:
this is a value £28.99 (0.28/ml)
I want to remove everything to return the price only so it returns:
£28.99
there could be any number of digits between the £ and .
I think
r"£[0-9]*\.[0-9]{2}"
matches the pattern I want to keep but i'm unsure on how to remove everything else and keep the pattern instead of replacing the pattern like in usual re.sub() cases.
I want to remove everything to return the price only so it returns:
Why not trying to extract the proper information instead?
import re
s = "this is a value £28.99 (0.28/ml)"
m = re.search("£\d*(\.\d+)?",s)
if m:
print(m.group(0))
to find several occurrences use findall or finditer instead of search
You don't care how many digits are before the decimal, so using the zero-or-more matcher was correct. However, you could just rely on the digit class (\d) to provide that more succinctly.
The same is true of after the decimal. You only need two so your limiting the matches to 2 is correct.
The issue then comes in with how you actually capture the value. You can use a capturing group to be sure that you only ever get the value you care about.
Complete regex:
(£\d*.\d{2})
Sample code:
import re
r = re.compile("(£\d*.\d{2})")
match = r.findall("this is a value £28.99 (0.28/ml)")
if match: # may bring back an empty list; check for that here
print(match[0]) # uses the first group, and will print £28.99
If it's a string, you can do something like this:
x = "this is a value £28.99 (0.28/ml)"
x_list = x.split()
for i in x_list:
if "£" in i: #or if i.startswith("£") Credit – Jean-François Fabre
value=i
print(value)
>>>£28.99
You can try:
import re
t = "this is a value £28.99 (0.28/ml)"
r = re.sub(".*(£[\d.]+).*", r"\1", t)
print(r)
Output:
£28.99
Python Demo
I would like to construct a reg expression pattern for the following string, and use Python to extract:
str = "hello w0rld how 34 ar3 44 you\n welcome 200 stack000verflow\n"
What I want to do is extract the independent number values and add them which should be 278. A prelimenary python code is:
import re
x = re.findall('([0-9]+)', str)
The problem with the above code is that numbers within a char substring like 'ar3' would show up. Any idea how to solve this?
Why not try something simpler like this?:
str = "hello w0rld how 34 ar3 44 you\n welcome 200 stack000verflow\n"
print sum([int(s) for s in str.split() if s.isdigit()])
# 278
s = re.findall(r"\s\d+\s", a) # \s matches blank spaces before and after the number.
print (sum(map(int, s))) # print sum of all
\d+ matches all digits. This gives the exact expected output.
278
How about this?
x = re.findall('\s([0-9]+)\s', str)
The solutions posted so far only work (if at all) for numbers that are preceded and followed by whitespace. They will fail if a number occurs at the very start or end of the string, or if a number appears at the end of a sentence, for example. This can be avoided using word boundary anchors:
s = "100 bottles of beer on the wall (ignore the 1000s!), now 99, now only 98"
s = re.findall(r"\b\d+\b", a) # \b matches at the start/end of an alphanumeric sequence
print(sum(map(int, s)))
Result: 297
To avoid a partial match
use this:
'^[0-9]*$'
I want to check if a string ends with a decimal of varying numbers, from searching for a while, the closest solution I found was to input values into a tuple and using that as the condition for endswith(). But is there any shorter way instead of inputting every possible combination?
I tried hard coding the end condition but if there are new elements in the list it wont work for those, I also tried using regex it returns other elements together with the decimal elements as well. Any help would be appreciated
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70"]
for e in list1:
if e.endswith('.0') or e.endswith('.98'):
print 'pass'
Edit: Sorry should have specified that I do not want to have 'qwe -70' to be accepted, only those elements with a decimal point should be accepted
I'd like to propose another solution: using regular expressions to search for an ending decimal.
You can define a regular expression for an ending decimal with the following regex [-+]?[0-9]*\.[0-9]+$.
The regex broken apart:
[-+]?: optional - or + symbol at the beginning
[0-9]*: zero or more digits
\.: required dot
[0-9]+: one or more digits
$: must be at the end of the line
Then we can test the regular expression to see if it matches any of the members in the list:
import re
regex = re.compile('[-+]?[0-9]*\.[0-9]+$')
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70", "test"]
for e in list1:
if regex.search(e) is not None:
print e + " passes"
else:
print e + " does not pass"
The output for the previous script is the following:
abcd 1.01 passes
zyx 22.98 passes
efgh 3.0 passes
qwe -70 does not pass
test does not pass
Your example data leaves many possibilities open:
Last character is a digit:
e[-1].isdigit()
Everything after the last space is a number:
try:
float(e.rsplit(None, 1)[-1])
except ValueError:
# no number
pass
else:
print "number"
Using regular expressions:
re.match('[.0-9]$', e)
suspects = [x.split() for x in list1] # split by the space in between and get the second item as in your strings
# iterate over to try and cast it to float -- if not it will raise ValueError exception
for x in suspects:
try:
float(x[1])
print "{} - ends with float".format(str(" ".join(x)))
except ValueError:
print "{} - does not ends with float".format(str(" ".join(x)))
## -- End pasted text --
abcd 1.01 - ends with float
zyx 22.98 - ends with float
efgh 3.0 - ends with float
qwe -70 - ends with float
I think this will work for this case:
regex = r"([0-9]+\.[0-9]+)"
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70"]
for e in list1:
str = e.split(' ')[1]
if re.search(regex, str):
print True #Code for yes condition
else:
print False #Code for no condition
As you correctly guessed, endswith() is not a good way to look at the solution, given that the number of combinations is basically infinite. The way to go is - as many suggested - a regular expression that would match the end of the string to be a decimal point followed by any count of digits. Besides that, keep the code simple, and readable. The strip() is in there just in case one the input string has an extra space at the end, which would unnecessarily complicate the regex.
You can see this in action at: https://eval.in/649155
import re
regex = r"[0-9]+\.[0-9]+$"
list1 = ["abcd 1.01", "zyx 22.98", "efgh 3.0", "qwe -70"]
for e in list1:
if re.search(regex, e.strip()):
print e, 'pass'
The flowing maybe help:
import re
reg = re.compile(r'^[a-z]+ \-?[0-9]+\.[0-9]+$')
if re.match(reg, the_string):
do something...
else:
do other...