Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a string I am trying to create a regex for in order to extract everything inside the brackets. An example of such a string is as follows
[-At(A),+CarAt(B),-CarAt(A),-InCar]
The current regex I'm using is re.search(r'\[.*?\]', string), but this only returns -At(A),-InCar instead of -At(A),+CarAt(B),-CarAt(A),-InCar
I am not sure why it's matching one set of parentheses in -At(A); I thought the regex I had would work because it would match everything between the brackets.
How can I get everything inside the brackets of my original string?
I think the problem is with the question mark. Because question marks, when they come after a quantifiers make them 'lazy'.
So try to use:
r'\[.*\]'
You didn't say you wanted the contained members, but I suspect it to be the eventual case
To do so, I've found it better to slice or .strip() brackets off and then .split() this sort of string to get its members before doing further validation
>>> s = "[-At(A),+CarAt(B),-CarAt(A),-InCar]"
>>> s = s.strip('[]')
>>> s
'-At(A),+CarAt(B),-CarAt(A),-InCar'
>>> values = s.split(',')
>>> values
['-At(A)', '+CarAt(B)', '-CarAt(A)', '-InCar']
Using a regex to validate the individual results of this is often
easier to write and explain
is better at highlighting mismatches than re.findall(), which will silently omit mismatches
can be much more computationally efficient (though it may not be for your case) than trying to do the operation in a single step (ex1 ex2)
>>> import re
>>> RE_wanted = re.compile(r"[+-](At|Car|In){1,2}(\([A-Z]\))?")
>>> all((RE_wanted.match(a) for a in values))
True
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am trying to detect if a string contains any element in an array. I want to know if the string(msg) has any element from the array(prefixes) in it.
I want this because I want to make a discord bot with multiple prefixes, heres my garbage if statement.
if msg.startswith(prefixes[any]):
The existing answers show two ways of doing a linear search, and this is probably your best choice.
If you need something more scalable (ie, you have a lot of potential prefixes, they're very long, and/or you need to scan very frequently) then you could write a prefix tree. That's the canonical search structure for this problem, but it's obviously a lot more work and you still need to profile to see if it's really worthwhile for your data.
Try something like this:
prefixes = ('a','b','i')
if msg.startswith(prefixes):
The prefixes must be tuple because startswith function does not supports lists as a parameter.
There are algorithms for such a search, however, a functional implementation in Python may look like this:
prefixes = ['foo', 'bar']
string = 'foobar'
result = any(map(lambda x: string.startswith(x), prefixes))
If you search for x at any position in string, then change string.startswith(x) to x in string.
UPDATE
According to #MisterMiyagi in comments, the following is a more readable (possibly more efficient) statement:
result = any(string.startswith(prefix) for prefix in prefixes)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am trying to use regex to see if the given string is an IPv4 address. I want to return a boolean value True/False depending on the string. This is my code:
import re
def isIPv4Address(inputString):
pattern = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s')
return pattern.match(inputString)
The value is null. At this point, I can tell that the function does not return a boolean value. However, all questions I see about regex and IP addresses is about writing the pattern instead of a full implementation. I know that the actual implementation shouldn't be any longer than this because it just takes the input and compares it against the regex.
match returns the match (a re.Match object) or None if the expression doesn't match. If you want to return a boolean whether the regex matches, you probably want to use pattern.match(inputString) is not None
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am new to regex and encountered a problem. I need to parse a list of last names and first names to use in a url and fetch an html page. In my last names or first names, if it's something like "John, Jr" then it should only return John but if it's something like "J.T.R", it should return "JTR" to make the url work. Here is the code I wrote but it doesn't capture "JTR".
import re
last_names_parsed=[]
for ln in last_names:
L_name=re.match('\w+', ln)
last_names_parsed.append(L_name[0])
However, this will not capture J.T.R properly. How should I modify the code to properly handle both?
you can add \. to the regular expression:
import re
final_data = [re.sub('\.', '', re.findall('(?<=^)[a-zA-Z\.]+', i)[0]) for i in last_names]
Regex explanation:
(?<=^): positive lookbehind, ensures that the ensuring regex will only register the match if the match is found at the beginning of the string
[a-zA-Z\.]: matches any occurrence of alphabetical characters: [a-zA-Z], along with a period .
+: searches the previous regex ([a-zA-Z\.]) as long as a period or alphabetic character is found. For instance, in "John, Jr", only John will be matched, because the comma , is not included in the regex expression [a-zA-Z\.], thus halting the match.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I am having a hard time understanding str.partition() function in python. I have read the definition of the function and searched online without finding an explanation that makes sense to me.
I have some code that uses it pretty heavily and have been trying to understand it. I could post the code if it would help but it is a pretty precise code segment that would probably complicate things.
Need in-depth, probably low-level, explanation of str.partition() function in python.
The docs are pretty clear ...
Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.
So ...
>>> 'test'.partition('s')
('te', 's', 't')
>>> 'test'.partition('a')
('test', '', '')
You either get the front, splitter character, and tail, or you get the full string and two blank strings (depending on whether or not the partition character is present).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am using python to do some text comparison. The text format is like 44=100. Let's say, I have 2 text, 44=100 and 44=3001. I call the string on the left of = is tag, right is value. Now I need to compare the tag and value for them. The tag must be the same, 44 equals 44, but the values don't have to, as long as its format is the same. ie. 100 and 3001 are in the same format(normal digits). But 1.0E+7 in 44=1.0E+7 is different. tThe point is on value comparison. ie. I write a script comp.py, when I run comp.py 2000 30010, I will get output true; while I run comp.py 100000 1.0E+8, output is false. How can I do it? I am thinking about converting the value into an regular expression and comparing it with other.
pseudo code:
rex1 = '100000'.getRegrex(), rex2 = '1.0E+8'.getRegrex(), rex1.compare(rex2)
Is it a feasible way? any advice?
rex1 = '100000'.getRegrex(), rex2 = '1.0E+8'.getRegrex(), rex1.compare(rex2)
Your approach is wrong. It is not only difficult but also illogical to "deduce" a regexp from a given string. What you would do is:
Define your types. With each type you would have a corresponding regexp.
Compare your input text against all your defined types and check which type it is of.
Compare the two types.
Actually, your idea of rex1 = '100000'.getRegrex() could be done
rex1 = re.compile('10000')
But as Thustmaster pointed out, you may want to define the regular expression with more abstraction of the pattern of your data.