python regex match querystring path - python

I'm trying to write a regex to match any path that contains /? to determine whether it is a querystring or not.
a sample string to be matched would be this: /mysite/path/to/whatever/?page=1
so far I thought this would match re.match(r'/\?', '/mysite/path/to/whatever/?page=1')
but it doesn't seem to be matching

This code is already written for you. No need to reinvent the wheel:
import urlparse
print urlparse.urlparse('/mysite/path/to/whatever/?page=1')
http://docs.python.org/library/urlparse.html#module-urlparse

Your problem is that you're using re.match. That function looks for matches at the beginning of the string. So, either you change your regexp to '.*/\?', or use re.search instead.

You don't need a regular expression here. Just use the in operator: '/?' in the_string.
The problem is that re.match only looks at the beginning of the string.
You could use re.search instead, if you need the power of REs.

Related

python regex match string does not start with

I want to match any string that does not start with 4321
I came about it with the positive condition: match any string that starts with 4321:
^4321.*
regex here
Now I want to reverse that condition, for example:
1234555 passes
12322222 passess
None passess
4321ZZZ does not pass
43211111 does not pass
Please help me find the simplest regex as possible that accomplishes this.
I am using a mongo regex but the regex object is build in python so please no python code here (like startswith)
You could use a negative look-ahead (needs a multiline modifier):
^(?!4321).*
You can also use a negative look-behind (doesn't match empty string for now):
(^.{1,3}$|^.{4}(?<!4321).*)
Note: like another answer stated, regex is not required (but is given since this was the question verbatim) -> instead just use if not mystring.startswith('4321').
Edit: I see you are explicitly asking for a regex now so take my first one it's the shortest I could come up with ;)
You don't need a regex for that. Just use not and the startswith() method:
if not mystring.startswith('4321'):
You can even just slice it and compare equality:
if mystring[:4] != '4321':
Why don't you match the string, and negate the boolean value using not:
import re
result = re.match('^4321.*', value)
if not result:
print('no match!')
Thank, #idos. For a complete answer I used the mongo's $or opertator
mongo_filter = {'$or': [{'db_field': re.compile("^(?!4321).*$")}, {'db_field1': {'$exists': False}}]})
This ensure not only strings that starts with 4321 but also if the field does not exists or is None

Extract string using regex in Python

I'm struggling a bit on how to extract (i.e. assign to variable) a string based on a regex. I have the regex worked out -- I tested on regexpal. But I'm lost on how I actually implement that in Python. My regex string is:
http://jenkins.mycompany.com/job/[^\s]+
What I want to do is take string and if there's a pattern in there that matches the regex, put that entire "pattern" into a variable. So for example, given the following string:
There is a problem with http://jenkins.mycompany.com/job/app/4567. We should fix this.
I want to extract http://jenkins.mycompany.com/job/app/4567and assign it a variable. I know I'm supposed to use re but I'm not sure if I want re.match or re.search and how to get what I want. Any help or pointers would be greatly appreciated.
import re
p = re.compile('http://jenkins.mycompany.com/job/[^\s]+')
line = 'There is a problem with http://jenkins.mycompany.com/job/app/4567. We should fix this.'
result = p.search(line)
print result.group(0)
Output:
http://jenkins.mycompany.com/job/app/4567.
Try the first found match in the string, using the re.findall method to select the first match:
re.findall(pattern_string, input_string)[0] # pick the first match that is found

Python - regex to match url with mongo object id

I am trying to write a regex that matches a url of the following format:
/api/v1/users/<mongo_object_id>/submissions
Where an example of a mongo_object_id is 556b352f87d4693546d31185.
I have cooked up the following pattern, but it does not seems to work.
/api/v1/users\\/(?=[a-f\\d]{24}$)(\\d+[a-f]|[a-f]+\\d)\\/submissions
Any help is appreciated.
This will do (considering 24 hex chars), using raw keyword before string so no need to escape with double slashes:
r'\/api\/v1\/users\/([a-f\d]{24})\/submissions'
Python console:
>>> re.findall(r'\/api\/v1\/users\/([a-f\d]{24})\/submissions','/api/v1/users/556b352f87d4693546d31185/submissions')
['556b352f87d4693546d31185']
It looks like an object's ID is a hexadecimal number, which means that it's matched by something as simple as this:
[0-9a-f]+
If you want to make sure it's always 24 characters:
[0-9a-f]{24}
Toss that between the slashes:
/api/v1/users/([0-9a-f]{24})/submissions
And it should work.
Note: You will probably have to escape the slashes, depending on how Python's regex syntax works. If I remember right, you can do this:
import re
re.findall(r'/api/v1/users/([0-9a-f]{24})/submissions', url)
or
re.findall(r'/api/v1/users/([0-9a-f]{24})/submissions', url, re.I)
if you wanna make the whole thing case-insensitive.

python re.match() returns true when part of the string matches

This might be trivial for those in this forum, but I ended up debugging this for the entire day. I have a python script running, but the issue boils down to this:
import re
spice="IN_N1"
rtl="IN_N13"
re.match(spice,rtl)
This returns a match object. Python seems to match the string IN_N1 anywhere in the second string and returns a match. I want it to compare the entire string and return a no match for this case. In other words, I want the above to be a match only if spice="IN_N13". It would be great if someone can suggest a solution.
Thanks!
Your Python regexp is interpreted as IN_N1.*
Change your pattern to IN_N1$ and it should work.
However consider Aleksanders comment ;)
//Edit: fixed regexp to consider comment

python and regex

#!/usr/bin/python
import re
str = raw_input("String containing email...\t")
match = re.search(r'[\w.-]+#[\w.-]+', str)
if match:
print match.group()
it's not the most complicated code, and i'm looking for a way to get ALL of the matches, if it's possible.
It sounds like you want re.findall():
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
As far as the actual regular expression for identifying email addresses goes... See this question.
Also, be careful using str as a variable name. This will hide the str built-in.
I guess that re.findall is what you're looking for.
You should give a try for find() or findall()
findall() matches all occurrences of a
pattern, not just the first one as
search() does. For example, if one was
a writer and wanted to find all of the
adverbs in some text, he or she might
use findall()
http://docs.python.org/library/re.html#finding-all-adverbs
You don't use raw_input in the way you used. Just use raw_input to get the input from the console.
Don't override built-in's such as str. Use a meaningful name and assign it a whole string value.
Also it is a good idea many a times to compile your pattern have it a Regex object to match the string against. (illustrated in the code)
I just realized that a complete regex to match an email id exactly as per RFC822 could be a pageful otherwise this snippet should be useful.
import re
inputstr = "something#exmaple.com, 121#airtelnet.com, ra#g.net, etc etc\t"
mailsrch = re.compile(r'[\w\-][\w\-\.]+#[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
matches = mailsrch.findall(inputstr)
print matches

Categories