Shorten the regex usage syntax in Python [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I apologize because I am coming from Perl and I am new to Python.
The following example looks very long to me:
#!/usr/bin/python
import re
r = re.compile('(?i)m(|[ldf])(\d+)')
m = r.match(text)
if m:
print m.group(2)
In Perl for example it is only one line and it's pretty readable.
#!/usr/bin/perl
print $2 if /m(|[ldf])(\d+)/i
How can I rewrite my Python example to be simpler. If possible to be as light as it is with Perl.
I am planning to write plenty tests and if I want to keep my code readable I would like to avoid consuming lines that will not help people to understand my program. I guess that something like this below would be more readable that my first solution:
r = R()
if r.exec('(?i)m(|[ldf])(\d+)', text): print r.group(2)
if r.exec('(?i)k(|[rhe])(\d{2})', text): print r.group(2)
Unfortunately in this case I have to write a class for this.

The Python way values clarity over brevity, so things are generally going to be more verbose than they are in Perl. That said, the re.compile step is optional.
m = re.match('(?i)m(|[ldf])(\d+)', text)
if m:
print m.group(2)
In Python, assignments are not expressions; they can't be used as values. So there's no way to skip the separate assignment statement (m = ...) or combine it with the if . And if you want to refer to the match object later, you do need an explicit assignment - there's no global implicit state analogous to the Perl $n variables that stores the capture groups automatically.

Related

Structural pattern matching in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to parse some open-source code of Python to check if the source contains some specific patterns.
For example:
for i in range...:
if(i == 2):
.......
I might want to find if the source code contains a pattern just like above: an if statement inside a for loop. I know the expression pattern matching technique, but it does not work for this case.
Does anyone know how to find this kind of pattern matching automatically? Any useful tool?
Use ast.parse().
import ast
code = '''
for i in range(1, 10):
if (i == 2):
print(i)
'''
parsed = ast.parse(code)
for stmt in parsed.body:
if isinstance(stmt, ast.For):
for stmt2 in stmt.body:
if isinstance(stmt2, ast.If):
print("Found if in for")
break
This example is very simple, it only looks for for at the top level of the code, and if at the second level. You should be able to extend it to a recursive solution that searches for nested constructs.

Newbie need Help python regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.

(Python) Coding style parentheses in ifs, loops, etc [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Just your personal preference, which do you prefer?
if filename in filesAndFoldersList:
while a != "TEST":
a = input("Input: ")
Or
if(filename in filesAndFoldersList):
while(a != "TEST"):
a = input("Input: ")
Either one works, so this is just personal preference I think. Second one is more similar to Java/C++. But which you do prefer and why?
You should never use parentheses directly after the keyword of a statement, as you do in your second style. You are confusing the reader by making it look like they are functions. All you are doing is group the expression in parentheses, Python will ignore these and all you have achieved is the removal of the space after the keyword.
Neither can you use the style with all compound statements; you cannot use the style with a for loop or a with statement that includes the as <target> clause.
The Python Style Guide makes no mention of the second (parenthesized) style, at all; it makes the assumption that no-one would use it.
Note that this is separate from using parentheses around long expressions, where you use (...) around the if condition expression if it is otherwise too long to fit on a single line. In such a situation you want to put a space between the opening ( and the if keyword:
if (
this_is_one_thing and
that_is_another_thing or
(more_conditions and such_things)
):
do_something()

Is there a Python convention to avoid long lines of code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Should I create variables just to avoid having long lines of code? For example, in the code below the variable stream_records is only used once after it's set.
stream_records = stream.get_latest_records( num_records_to_correlate ).values('value')
stream_values = [float(record['value']) for record in stream_records]
Should I have done this instead?
stream_values = [float(record['value']) for record in stream.get_latest_records( num_records_to_correlate ).values('value')]
I am trying to optimize for readability. I'd love some opinions on whether it's harder to have to remember lots of variable names or harder to read long lines of code.
EDIT:
Another interesting option to consider for readability (thanks to John Smith Optional):
stream_values = [
float(record['value'])
for record in stream.get_latest_records(
num_records_to_correlate
).values('value')
]
PEP 8 is the style guideline from the beginning of Python, and it recommends that no source lines be longer than 79 characters.
Using a variable for intermediate results also performs a kind of documentation, if you do your variable naming well.
Newlines (carriage return) inside parentheses count as blank.
The same applies to newlines inside brackets, as pointed out by Bas Swinckels.
So you could do something like that:
stream_values = [
float(record['value'])
for record in stream.get_latest_records(
num_records_to_correlate
).values('value')
]
You can also use \ to continue a statement on the following line.
For example:
long_variable_name = object1.attribute1.method1(arg1, arg2, arg3) + \
object2.attribute2.method2(arg1, arg2, arg3)
the first one is definitely easier to read so if there are no performance concerns I would totally go for more variables as long as they have self-explanatory names. it's also way easier to debug if somebody (not you) has to.
The first one is indeed more readable. Creating a variable doesn't cost you a lot here, so you can leave your code with it. Plus, if you are debugging it, it will be much easier to see if the error comes from the call of get_latests_records or from your list comprehension.
However, another way nice way would be to construct your list with the map function:
stream_values = map(lambda x: float(x['value']),
stream.get_latest_records(num_records_to_correlate) \
.values('value')])
The lambda here is necessary, to apply float on the correct value of the dictionary. As you can see, I have limited the number of chars per line to respect the PEP8 convention, which can help to make the code readable too.

Regex not working in python script [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
for some reason when I get regex to get the number i need it returns none.
But when I run it here http://regexr.com/38n3o it works
the regex was designed to get the last number of the ip so it can be removed
lanip=74.125.224.72
notorm=re.search("/([1-9])\w+$/g", lanip)
That is not how you define a regular expressions in Python. The correct way would be:
import re
lanip="74.125.224.72"
notorm=re.search("([1-9])\w+$", lanip)
print notorm
Output:
<_sre.SRE_Match object at 0x10131df30>
You were using a javascript regex style. To read more on correct python syntax read the documentation
If you want to match the last number of an IP use:
import re
lanip="74.125.224.72"
notorm=re.search("(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", lanip)
print notorm.group(4)
Output:
72
Regex used from http://www.regular-expressions.info/examples.html
Your example did work in this scenario, but would match a lot of false positives.
What is lanip's type? That can't run.
It needs to be a string, i.e.
lanip = "74.125.224.72"
Also your RE syntax looks strange, make sure you've read the documentation on Python's RE syntax.

Categories