Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
How can I verify that a string represents a valid US currency value?
The string might be composed strictly of digits, or optionally have a dollar sign, commas, etc.
Context: I wish to verify a string is a proper dollar value, and then convert it to a number after removing or otherwise handling the non-numeric characters.
Regular Expression
Example: \$?(-?(\d+[,.])*\d+)
import re
re.match("\$?(-?(\d+[,.])*\d+)", "$-12,000.01") # match
re.match("\$?(-?(\d+[,.])*\d+)", "$-12,000.01").group(1) # extract matched value
>>> '-12,000.01'
re.sub('[,$]', '', '$-12,000.01') # remove comma and dollar sign
>>> '-12000.01'
float(re.sub('[,$]', '', '$-12,000.01')) # convert to float if the result doesn't contain any special character such as comma
>>> -12000.01
Add more cases to the regular expression if there are any in your dataset.
There can be many edge-cases that are invalid such as 13.000,000
This regular expression will fix it: \$?(-?\d*(\d+,)*\.?\d+).
So add as many cases as you need.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed last year.
Improve this question
I have converted integer value to string although it is not returning in quotes ad strings does
[cmd ][1]
print(str(31))
31 #I think it should give result as "31" because now it is string
When string is printed, quotes are usually not added. If you wanna see quote then use the repr() function. For example,
print(repr("31"))
Because while returning it prints the value, the double quotes are not printed. Though it will be treated as a string, you can verify it by using concatenation through + or string multiplication as str(32)*3 this will give you 323232
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a bunch of strings that look like this:
7EE1,
4NF1,
5NF4a,
8F1
They all start with a number, following a few characters, and then another number, then another few characters. And there is no limit on how many chucks they can go. There is no limit for consecutive characters. What I am trying to do is adding "." into the string whenever it changes from character to number or vice verse. For example, the desired output is:
7.EE.1,
4.NF.1,
5.NF.4.a,
8.F.1
I think it can be solved with regular expression, but I haven't learned it before. I am working on creating a regex for this. Any tips would be appreciated!
Here is a very compact way of doing this using regular expressions:
inp = ["7EE1", "4NF1", "5NF4a", "8F1"];
output = [re.sub(r'(\d+(?=\D)|\D+(?=\d))', r'\1.', x) for x in inp]
print(output) # ['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']
The regex works by matching (and capturing) a series of either all digit characters, or all non digit characters, which in turn is followed by a character of the opposite class. It then replaces with whatever was capture followed by a dot separator. Here is an explanation:
( match AND capture:
\d+ one or more digits
(?=\D) followed by a non digit character
| OR
\D+ one or more non digits
(?=\d) followed by a digit character
) stop capture
Note that the lookaheads used above are zero width, so nothing is captured from them.
One way without using re:
from itertools import groupby
inp = ["7EE1", "4NF1", "5NF4a", "8F1"]
def add_dot(string):
return ".".join(["".join(g)
for k, g in groupby(string, key=str.isdigit)])
[add_dot(i) for i in inp]
Output:
['7.EE.1', '4.NF.1', '5.NF.4.a', '8.F.1']
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a string I am trying to create a regex for in order to extract everything inside the brackets. An example of such a string is as follows
[-At(A),+CarAt(B),-CarAt(A),-InCar]
The current regex I'm using is re.search(r'\[.*?\]', string), but this only returns -At(A),-InCar instead of -At(A),+CarAt(B),-CarAt(A),-InCar
I am not sure why it's matching one set of parentheses in -At(A); I thought the regex I had would work because it would match everything between the brackets.
How can I get everything inside the brackets of my original string?
I think the problem is with the question mark. Because question marks, when they come after a quantifiers make them 'lazy'.
So try to use:
r'\[.*\]'
You didn't say you wanted the contained members, but I suspect it to be the eventual case
To do so, I've found it better to slice or .strip() brackets off and then .split() this sort of string to get its members before doing further validation
>>> s = "[-At(A),+CarAt(B),-CarAt(A),-InCar]"
>>> s = s.strip('[]')
>>> s
'-At(A),+CarAt(B),-CarAt(A),-InCar'
>>> values = s.split(',')
>>> values
['-At(A)', '+CarAt(B)', '-CarAt(A)', '-InCar']
Using a regex to validate the individual results of this is often
easier to write and explain
is better at highlighting mismatches than re.findall(), which will silently omit mismatches
can be much more computationally efficient (though it may not be for your case) than trying to do the operation in a single step (ex1 ex2)
>>> import re
>>> RE_wanted = re.compile(r"[+-](At|Car|In){1,2}(\([A-Z]\))?")
>>> all((RE_wanted.match(a) for a in values))
True
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am new to regex and encountered a problem. I need to parse a list of last names and first names to use in a url and fetch an html page. In my last names or first names, if it's something like "John, Jr" then it should only return John but if it's something like "J.T.R", it should return "JTR" to make the url work. Here is the code I wrote but it doesn't capture "JTR".
import re
last_names_parsed=[]
for ln in last_names:
L_name=re.match('\w+', ln)
last_names_parsed.append(L_name[0])
However, this will not capture J.T.R properly. How should I modify the code to properly handle both?
you can add \. to the regular expression:
import re
final_data = [re.sub('\.', '', re.findall('(?<=^)[a-zA-Z\.]+', i)[0]) for i in last_names]
Regex explanation:
(?<=^): positive lookbehind, ensures that the ensuring regex will only register the match if the match is found at the beginning of the string
[a-zA-Z\.]: matches any occurrence of alphabetical characters: [a-zA-Z], along with a period .
+: searches the previous regex ([a-zA-Z\.]) as long as a period or alphabetic character is found. For instance, in "John, Jr", only John will be matched, because the comma , is not included in the regex expression [a-zA-Z\.], thus halting the match.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
What is the fastes way to convert a string to float if it doesn has a standard format?
In my special case I need to read these strings and convert them to float
-7.5-4
7.5-5
that correspond to the numbers -7.5E-4 and 7.5E-5
I need the fastest because I'm loading big size files.
Thanks
This lambda works with your test cases (also with a leading '+'):
to_num = lambda s: (1,-1)[s[0]=='-']*
float(s.lstrip('-+').replace('-','E-').replace('+','E+'))
The opening (1,-1)[s[0]=='-'] takes care of multiplying by -1 if there is a leading '-', then the float conversion strips leading '+' and '-' signs, and replaces embedded '+' and '-' with 'E+' and 'E-', permitting a valid conversion to float.