Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to parse some open-source code of Python to check if the source contains some specific patterns.
For example:
for i in range...:
if(i == 2):
.......
I might want to find if the source code contains a pattern just like above: an if statement inside a for loop. I know the expression pattern matching technique, but it does not work for this case.
Does anyone know how to find this kind of pattern matching automatically? Any useful tool?
Use ast.parse().
import ast
code = '''
for i in range(1, 10):
if (i == 2):
print(i)
'''
parsed = ast.parse(code)
for stmt in parsed.body:
if isinstance(stmt, ast.For):
for stmt2 in stmt.body:
if isinstance(stmt2, ast.If):
print("Found if in for")
break
This example is very simple, it only looks for for at the top level of the code, and if at the second level. You should be able to extend it to a recursive solution that searches for nested constructs.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I am trying to match the following pattern
sn,n+1
where n is an integer. Examples include
s1,2 s3,4
Right now I know how to match the following: sn,n
(for example s1,1) with regex. The syntax I use for this is s(\d+),\1
Is it possible to do something like the following? s(\d+),\1+1
More detail on my specific problem (these details aren't necessarily relevant to the solution! But I am including them anyway), I am using CST, an electromagnetic simulator that supports regex for sorting s-parameter data. With a high port count, it is cumbersome to manually select isolation between ports. So I want to use a regex in the above way.
Regex can not do arithmetic but python does. You can find all matching patterns in your string and filter them later with python. This might help:
Regex
import re
string= 's1,2 test s2,3 , test test s12,12'
pattern = 's\d+,\d+'
pattern_list = re.findall(pattern, string)
print(pattern_list)
['s1,2', 's2,3', 's12,12']
Filtering with Python
result_list = [i for i in pattern_list if int(i.split(',')[0].split('s')[1]) + 1 == int(i.split(',')[1]) ]
print(result_list)
['s1,2', 's2,3']
Basically, it finds all patterns by using regex then python checks the patterns that fits to sn,n+1
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I apologize because I am coming from Perl and I am new to Python.
The following example looks very long to me:
#!/usr/bin/python
import re
r = re.compile('(?i)m(|[ldf])(\d+)')
m = r.match(text)
if m:
print m.group(2)
In Perl for example it is only one line and it's pretty readable.
#!/usr/bin/perl
print $2 if /m(|[ldf])(\d+)/i
How can I rewrite my Python example to be simpler. If possible to be as light as it is with Perl.
I am planning to write plenty tests and if I want to keep my code readable I would like to avoid consuming lines that will not help people to understand my program. I guess that something like this below would be more readable that my first solution:
r = R()
if r.exec('(?i)m(|[ldf])(\d+)', text): print r.group(2)
if r.exec('(?i)k(|[rhe])(\d{2})', text): print r.group(2)
Unfortunately in this case I have to write a class for this.
The Python way values clarity over brevity, so things are generally going to be more verbose than they are in Perl. That said, the re.compile step is optional.
m = re.match('(?i)m(|[ldf])(\d+)', text)
if m:
print m.group(2)
In Python, assignments are not expressions; they can't be used as values. So there's no way to skip the separate assignment statement (m = ...) or combine it with the if . And if you want to refer to the match object later, you do need an explicit assignment - there's no global implicit state analogous to the Perl $n variables that stores the capture groups automatically.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
for some reason when I get regex to get the number i need it returns none.
But when I run it here http://regexr.com/38n3o it works
the regex was designed to get the last number of the ip so it can be removed
lanip=74.125.224.72
notorm=re.search("/([1-9])\w+$/g", lanip)
That is not how you define a regular expressions in Python. The correct way would be:
import re
lanip="74.125.224.72"
notorm=re.search("([1-9])\w+$", lanip)
print notorm
Output:
<_sre.SRE_Match object at 0x10131df30>
You were using a javascript regex style. To read more on correct python syntax read the documentation
If you want to match the last number of an IP use:
import re
lanip="74.125.224.72"
notorm=re.search("(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", lanip)
print notorm.group(4)
Output:
72
Regex used from http://www.regular-expressions.info/examples.html
Your example did work in this scenario, but would match a lot of false positives.
What is lanip's type? That can't run.
It needs to be a string, i.e.
lanip = "74.125.224.72"
Also your RE syntax looks strange, make sure you've read the documentation on Python's RE syntax.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
So I want to do something like this
"(1.0)" which returns ["1","0"]
similarly "((1.0).1)" which returns ["(1.0)", "1")
How do i do this python? Thanks for the help
so basically I want to break the string "(1.0)" into a list [1,0] where the dot is the separator.
some examples
((1.0).(2.0)) -> [(1.0), (2.0)]
(((1.0).(2.0)).1) -> [((1.0).(2.0)), 1]
I hope this is more clear.
Here is my version:
def countPar(s):
s=s[1:-1]
openPar=0
for (i,c) in enumerate(s):
if c=="(":
openPar+=1
elif c==")":
openPar-=1
if openPar==0:
break
return [s[0:i+1],s[i+2:]]
You'll need to build a little parser. Iterate through the characters of the string, keeping track of the current nesting level of parentheses. Then you can detect the . you care about by checking that first, the character is a ., and second, there's only one level of parentheses open at that point. Then just place the characters in one buffer or another depending on whether you've reached that . or not.