python regular expression grammar [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
... html ...
[{"url":"/test/test/url","id":"111111"},{"url":"/test/test/url","id":"111111"}, {"url":"/test/test/url","id":"1111"}]
.... html ...
I have some json type string in html.
How make rex expression to extract pattern as
"/test/test/url" and "1111" comes after "id":
Thanks in advance,

Don't use regular expressions here, use the json module. This is what it's designed for.
import json
mylist = json.loads(html)
for subdict in mylist:
print subdict['url']
print subdict['id']

You should go with #Haidro's answer on this, but if you want to use a regex, or see how you would, then here's some sample code:
regex = re.compile(r'\"url\":("[^"]+"),\"id\":("[^"]+")')
match = re.finditer(regex, yourString)
for m in match:
print m.group(1), m.group(2)
[^"] is a character class for accepting all non- " characters.
EDIT:
I love how I recommend the other answer, but explain how to do it if one really wants to know, yet I somehow still get downvoted.

Related

Removing links from a reddit comments using python and regex [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to remove links in the format Reddit uses
comment = "Hello this is my [website](https://www.google.com)"
no_links = RemoveLinks(comment)
# no_links == "Hello this is my website"
I found a similar question about the same thing, but I don't know how to translate it to python.
I am not that familiar with regex so I would appreciate it if you explained what's happening.
You could do the following:
import re
pattern = re.compile('\[(.*?)\]\(.*?\)')
comment = "Hello this is my [website](https://www.google.com)"
print(pattern.sub(r'\1', comment))
The line:
pattern = re.compile('\[(.*?)\]\(.*?\)')
creates a regex pattern that will search for anything surrounded by square brackets, followed by anything surrounded by parenthesis, the '?' indicates that they should match as little text as possible (non-greedy).
The function sub(r'\1', comment) replaces a match by the first capturing group in this case the text inside the brackets.
For more information about regex I suggest you read this.

Regex not working in python script [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
for some reason when I get regex to get the number i need it returns none.
But when I run it here http://regexr.com/38n3o it works
the regex was designed to get the last number of the ip so it can be removed
lanip=74.125.224.72
notorm=re.search("/([1-9])\w+$/g", lanip)
That is not how you define a regular expressions in Python. The correct way would be:
import re
lanip="74.125.224.72"
notorm=re.search("([1-9])\w+$", lanip)
print notorm
Output:
<_sre.SRE_Match object at 0x10131df30>
You were using a javascript regex style. To read more on correct python syntax read the documentation
If you want to match the last number of an IP use:
import re
lanip="74.125.224.72"
notorm=re.search("(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", lanip)
print notorm.group(4)
Output:
72
Regex used from http://www.regular-expressions.info/examples.html
Your example did work in this scenario, but would match a lot of false positives.
What is lanip's type? That can't run.
It needs to be a string, i.e.
lanip = "74.125.224.72"
Also your RE syntax looks strange, make sure you've read the documentation on Python's RE syntax.

About python re raw pattern search [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to perform re.search using the pattern as a raw string like below.
m=re.search(r'pattern',string)
But if I have the 'pattern' in variable like pat='pattern'. How do I perform raw search?
You declare the pattern string as a raw string:
regexpattern = r'pattern'
m=re.search(regexpattern,string)
you can give the raw input this way. test is the string variable.
pat = """pat%s""" % test
pattern = re.compile(pat, re.I | re.M)
match = pattern.search(l)

How do i make Python count how many letters are in a word? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a lab for my programming class and i need to know how to make python count how many letters are in a word excluding spaces. Can anyone help me out? This is what i have so far.
def Na():
Name = raw_input("What is your name? : ")
count [char]
Na()
sum(c.isalpha() for c in Name)
Why not use len?
len([ltr for ltr in Name if ltr.isalpha()])
How about using string.ascii_letters:
len([c for c in Name if c in string.ascii_letters])
(see #abarnert's comment below about the option)
Or, may be use regular expressions:
len(re.findall('[a-zA-Z]', s))
You could try splitting the string by spaces, joining the newly formed list and taking the length of the resulting string like so:
len("".join(name.split(" ")))

Parsing a string to extract a delimited unit having an alphabetic starting character and and an unknown length [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm new to python regular expression so any help will be appreciated. Thanks in advance.
I have this
string = "Restaurant_Review-g503927-d3864736-Reviews"
I would like extract 'g503927' and 'd3864736' from it.
I know you can use re.match(pattern, string, flags=0)
But not sure how to write the regex for it. Plz help
Using re.findall:
>>> s = "Restaurant_Review-g503927-d3864736-Reviews"
>>> re.findall('[a-z]\d+', s)
['g503927', 'd3864736']
[a-z]\d+ matches lowercase alphabet followed by digits.
This should work
import re
pattern = re.compile("[a-z][0-9]+")
a non-regex solution but it depends on what is delimiting the units, here i assume it's a -:
s = "Restaurant_Review-g503927-d3864736-Reviews"
outputs = [i for i in s.split('-') if i[0].isalpha() and i[1:].isdigit()]
no need to use Regex... use the split() method:
s = "Restaurant_Review-g503927-d3864736-Reviews"
print s.split('-')
print s.split('-')[1]
print s.split('-')[2]
more info here: http://docs.python.org/2/library/stdtypes.html#str.split

Categories