About python re raw pattern search [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to perform re.search using the pattern as a raw string like below.
m=re.search(r'pattern',string)
But if I have the 'pattern' in variable like pat='pattern'. How do I perform raw search?

You declare the pattern string as a raw string:
regexpattern = r'pattern'
m=re.search(regexpattern,string)

you can give the raw input this way. test is the string variable.
pat = """pat%s""" % test
pattern = re.compile(pat, re.I | re.M)
match = pattern.search(l)

Related

How to match the pattern using re using the commas? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How do I match the following pattern using re?
2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490
I used python's split() function to separate the attributes out but as the data is huge, the process is getting killed due to memory errors.
If you put the long version of string it would be better.
So how can you make it ? That is the answer:
import re
str = "2016-02-13 02:00:00.0,3525,http://www.heatherllindsey.com/2016/02/my-husband-left-his-9-5-job-for-good-it.html,158,0,2584490"
pattern = re.compile("(.*?),", re.DOTALL) #we use re.DOTALL to continue splitting after endlines.
result = pattern.findall(str) #we can't find the last statement (2584490) because of the pattern so we will apply second process
pattern = re.compile("(.*?)", re.DOTALL)
str2 = str[-50:-1]+str[-1] #we take last partition of string to find out last statement by using split() method
result.append(str2.split(",")[-1])
print result
It works...

Newbie need Help python regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence:
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)
There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.
The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.

python regular expression grammar [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
... html ...
[{"url":"/test/test/url","id":"111111"},{"url":"/test/test/url","id":"111111"}, {"url":"/test/test/url","id":"1111"}]
.... html ...
I have some json type string in html.
How make rex expression to extract pattern as
"/test/test/url" and "1111" comes after "id":
Thanks in advance,
Don't use regular expressions here, use the json module. This is what it's designed for.
import json
mylist = json.loads(html)
for subdict in mylist:
print subdict['url']
print subdict['id']
You should go with #Haidro's answer on this, but if you want to use a regex, or see how you would, then here's some sample code:
regex = re.compile(r'\"url\":("[^"]+"),\"id\":("[^"]+")')
match = re.finditer(regex, yourString)
for m in match:
print m.group(1), m.group(2)
[^"] is a character class for accepting all non- " characters.
EDIT:
I love how I recommend the other answer, but explain how to do it if one really wants to know, yet I somehow still get downvoted.

Finding Zip number from String [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a string like
x = '''
Anrede:*
Herr
*Name:*
Tobias
*Firma:*
*Strasse/Nr:*
feringerweg
*PLZ/Ort:*
72531
*Mail:*
tovoe#gmeex.de [1]
'''
In that there is a zip number PLZ/Ort:, this is zip number, i wanted to find the zip number from whole string, so the possible way is to use regex, but don't know regex,
Assuming the input in your example is file with multiple strings, you can try something like this:
import re
for line in open(filename, 'r'):
matchPattern = "^(\d{5})$"
match = re.match(matchPattern, line, flags=0)
print match.group(0) #the whole match
If this is just a long string, you can use the same match pattern but without ^ (line begin) and $ (line end) indicators --> (\d{5})
I'm assuming that the Postleitzahl always follows two lines that look like *PLZ/Ort:* and
, and that it's the only text on its line. If that's the case, then you can use something like:
import re
m = re.search('^\*PLZ/Ort:\*\n
\n(\d{5})', x, re.M)
if m:
print m.group(1)
You can try this regex:
(?<=PLZ\/Ort)[\s\S]+?([a-zA-Z0-9\- ]{3,9})
It will support Alpha numeric postal codes as well. You can see postal codes length/format from here.

Parsing a string to extract a delimited unit having an alphabetic starting character and and an unknown length [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm new to python regular expression so any help will be appreciated. Thanks in advance.
I have this
string = "Restaurant_Review-g503927-d3864736-Reviews"
I would like extract 'g503927' and 'd3864736' from it.
I know you can use re.match(pattern, string, flags=0)
But not sure how to write the regex for it. Plz help
Using re.findall:
>>> s = "Restaurant_Review-g503927-d3864736-Reviews"
>>> re.findall('[a-z]\d+', s)
['g503927', 'd3864736']
[a-z]\d+ matches lowercase alphabet followed by digits.
This should work
import re
pattern = re.compile("[a-z][0-9]+")
a non-regex solution but it depends on what is delimiting the units, here i assume it's a -:
s = "Restaurant_Review-g503927-d3864736-Reviews"
outputs = [i for i in s.split('-') if i[0].isalpha() and i[1:].isdigit()]
no need to use Regex... use the split() method:
s = "Restaurant_Review-g503927-d3864736-Reviews"
print s.split('-')
print s.split('-')[1]
print s.split('-')[2]
more info here: http://docs.python.org/2/library/stdtypes.html#str.split

Categories