Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
... html ...
[{"url":"/test/test/url","id":"111111"},{"url":"/test/test/url","id":"111111"}, {"url":"/test/test/url","id":"1111"}]
.... html ...
I have some json type string in html.
How make rex expression to extract pattern as
"/test/test/url" and "1111" comes after "id":
Thanks in advance,
Don't use regular expressions here, use the json module. This is what it's designed for.
import json
mylist = json.loads(html)
for subdict in mylist:
print subdict['url']
print subdict['id']
You should go with #Haidro's answer on this, but if you want to use a regex, or see how you would, then here's some sample code:
regex = re.compile(r'\"url\":("[^"]+"),\"id\":("[^"]+")')
match = re.finditer(regex, yourString)
for m in match:
print m.group(1), m.group(2)
[^"] is a character class for accepting all non- " characters.
EDIT:
I love how I recommend the other answer, but explain how to do it if one really wants to know, yet I somehow still get downvoted.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to remove links in the format Reddit uses
comment = "Hello this is my [website](https://www.google.com)"
no_links = RemoveLinks(comment)
# no_links == "Hello this is my website"
I found a similar question about the same thing, but I don't know how to translate it to python.
I am not that familiar with regex so I would appreciate it if you explained what's happening.
You could do the following:
import re
pattern = re.compile('\[(.*?)\]\(.*?\)')
comment = "Hello this is my [website](https://www.google.com)"
print(pattern.sub(r'\1', comment))
The line:
pattern = re.compile('\[(.*?)\]\(.*?\)')
creates a regex pattern that will search for anything surrounded by square brackets, followed by anything surrounded by parenthesis, the '?' indicates that they should match as little text as possible (non-greedy).
The function sub(r'\1', comment) replaces a match by the first capturing group in this case the text inside the brackets.
For more information about regex I suggest you read this.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
for some reason when I get regex to get the number i need it returns none.
But when I run it here http://regexr.com/38n3o it works
the regex was designed to get the last number of the ip so it can be removed
lanip=74.125.224.72
notorm=re.search("/([1-9])\w+$/g", lanip)
That is not how you define a regular expressions in Python. The correct way would be:
import re
lanip="74.125.224.72"
notorm=re.search("([1-9])\w+$", lanip)
print notorm
Output:
<_sre.SRE_Match object at 0x10131df30>
You were using a javascript regex style. To read more on correct python syntax read the documentation
If you want to match the last number of an IP use:
import re
lanip="74.125.224.72"
notorm=re.search("(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", lanip)
print notorm.group(4)
Output:
72
Regex used from http://www.regular-expressions.info/examples.html
Your example did work in this scenario, but would match a lot of false positives.
What is lanip's type? That can't run.
It needs to be a string, i.e.
lanip = "74.125.224.72"
Also your RE syntax looks strange, make sure you've read the documentation on Python's RE syntax.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to perform re.search using the pattern as a raw string like below.
m=re.search(r'pattern',string)
But if I have the 'pattern' in variable like pat='pattern'. How do I perform raw search?
You declare the pattern string as a raw string:
regexpattern = r'pattern'
m=re.search(regexpattern,string)
you can give the raw input this way. test is the string variable.
pat = """pat%s""" % test
pattern = re.compile(pat, re.I | re.M)
match = pattern.search(l)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a lab for my programming class and i need to know how to make python count how many letters are in a word excluding spaces. Can anyone help me out? This is what i have so far.
def Na():
Name = raw_input("What is your name? : ")
count [char]
Na()
sum(c.isalpha() for c in Name)
Why not use len?
len([ltr for ltr in Name if ltr.isalpha()])
How about using string.ascii_letters:
len([c for c in Name if c in string.ascii_letters])
(see #abarnert's comment below about the option)
Or, may be use regular expressions:
len(re.findall('[a-zA-Z]', s))
You could try splitting the string by spaces, joining the newly formed list and taking the length of the resulting string like so:
len("".join(name.split(" ")))
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm new to python regular expression so any help will be appreciated. Thanks in advance.
I have this
string = "Restaurant_Review-g503927-d3864736-Reviews"
I would like extract 'g503927' and 'd3864736' from it.
I know you can use re.match(pattern, string, flags=0)
But not sure how to write the regex for it. Plz help
Using re.findall:
>>> s = "Restaurant_Review-g503927-d3864736-Reviews"
>>> re.findall('[a-z]\d+', s)
['g503927', 'd3864736']
[a-z]\d+ matches lowercase alphabet followed by digits.
This should work
import re
pattern = re.compile("[a-z][0-9]+")
a non-regex solution but it depends on what is delimiting the units, here i assume it's a -:
s = "Restaurant_Review-g503927-d3864736-Reviews"
outputs = [i for i in s.split('-') if i[0].isalpha() and i[1:].isdigit()]
no need to use Regex... use the split() method:
s = "Restaurant_Review-g503927-d3864736-Reviews"
print s.split('-')
print s.split('-')[1]
print s.split('-')[2]
more info here: http://docs.python.org/2/library/stdtypes.html#str.split