python regex match string does not start with - python

I want to match any string that does not start with 4321
I came about it with the positive condition: match any string that starts with 4321:
^4321.*
regex here
Now I want to reverse that condition, for example:
1234555 passes
12322222 passess
None passess
4321ZZZ does not pass
43211111 does not pass
Please help me find the simplest regex as possible that accomplishes this.
I am using a mongo regex but the regex object is build in python so please no python code here (like startswith)

You could use a negative look-ahead (needs a multiline modifier):
^(?!4321).*
You can also use a negative look-behind (doesn't match empty string for now):
(^.{1,3}$|^.{4}(?<!4321).*)
Note: like another answer stated, regex is not required (but is given since this was the question verbatim) -> instead just use if not mystring.startswith('4321').
Edit: I see you are explicitly asking for a regex now so take my first one it's the shortest I could come up with ;)

You don't need a regex for that. Just use not and the startswith() method:
if not mystring.startswith('4321'):
You can even just slice it and compare equality:
if mystring[:4] != '4321':

Why don't you match the string, and negate the boolean value using not:
import re
result = re.match('^4321.*', value)
if not result:
print('no match!')

Thank, #idos. For a complete answer I used the mongo's $or opertator
mongo_filter = {'$or': [{'db_field': re.compile("^(?!4321).*$")}, {'db_field1': {'$exists': False}}]})
This ensure not only strings that starts with 4321 but also if the field does not exists or is None

Related

Pandas: find substring in a column

I need to find in dataframe some strings
url
003.ru/*/mobilnyj_telefon_bq_phoenix*
003.ru/*/mobilnyj_telefon_fly_*
003.ru/*mobile*
003.ru/telefony_i_smartfony/mobilnye_telefony_smartfony
003.ru/telefony_i_smartfony/mobilnye_telefony_smartfony/%brands%5D%5Bbr_23%
1click.ru/*iphone*
1click.ru/catalogue/chasy-motorola
problen in next: when I use
df_update = df[df['url'].str.contains(substr.url)]
it return error, because some url contain *.
How can I fix that problem?
Try:
df[df['url'].str.contains(substr.url, regex=False)]
You have to specify whether or not you want your pattern to be interpreted as a regular expression or a normal string. In this case, you want to set the regex argument to False because it is set to True by default. That way, the asterisks in your pattern won't be interpreted as regular expression.
I hope this helps.

python re.match() returns true when part of the string matches

This might be trivial for those in this forum, but I ended up debugging this for the entire day. I have a python script running, but the issue boils down to this:
import re
spice="IN_N1"
rtl="IN_N13"
re.match(spice,rtl)
This returns a match object. Python seems to match the string IN_N1 anywhere in the second string and returns a match. I want it to compare the entire string and return a no match for this case. In other words, I want the above to be a match only if spice="IN_N13". It would be great if someone can suggest a solution.
Thanks!
Your Python regexp is interpreted as IN_N1.*
Change your pattern to IN_N1$ and it should work.
However consider Aleksanders comment ;)
//Edit: fixed regexp to consider comment

Python regex - Ignore parenthesis as indexing?

I've currently written a nooby regex pattern which involves excessive use of the "(" and ")" characters, but I'm using them for 'or' operators, such as (A|B|C) meaning A or B or C.
I need to find every match of the pattern in a string.
Trying to use the re.findall(pattern, text) method is no good, since it interprets the parenthesis characters as indexing signifiers (or whatever the correct jargon be), and so each element of the produced List is not a string showing the matched text sections, but instead is a tuple (which contain very ugly snippets of pattern match).
Is there an argument I can pass to findall to ignore paranthesis as indexing?
Or will I have to use a very ugly combination of re.search, and re.sub
(This is the only solution I can think of; Find the index of the re.search, add the matched section of text to the List then remove it from the original string {by using ugly index tricks}, continuing this until there's no more matches. Obviously, this is horrible and undesirable).
Thanks!
Yes, add ?: to a group to make it non-capturing.
import re
print re.findall('(.(foo))', "Xfoo") # [('Xfoo', 'foo')]
print re.findall('(.(?:foo))', "Xfoo") # ['Xfoo']
See re syntax for more information.
re.findall(r"(?:A|B|C)D", "BDE")
or
re.findall(r"((?:A|B|C)D)", "BDE")

python regex match querystring path

I'm trying to write a regex to match any path that contains /? to determine whether it is a querystring or not.
a sample string to be matched would be this: /mysite/path/to/whatever/?page=1
so far I thought this would match re.match(r'/\?', '/mysite/path/to/whatever/?page=1')
but it doesn't seem to be matching
This code is already written for you. No need to reinvent the wheel:
import urlparse
print urlparse.urlparse('/mysite/path/to/whatever/?page=1')
http://docs.python.org/library/urlparse.html#module-urlparse
Your problem is that you're using re.match. That function looks for matches at the beginning of the string. So, either you change your regexp to '.*/\?', or use re.search instead.
You don't need a regular expression here. Just use the in operator: '/?' in the_string.
The problem is that re.match only looks at the beginning of the string.
You could use re.search instead, if you need the power of REs.

python and regex

#!/usr/bin/python
import re
str = raw_input("String containing email...\t")
match = re.search(r'[\w.-]+#[\w.-]+', str)
if match:
print match.group()
it's not the most complicated code, and i'm looking for a way to get ALL of the matches, if it's possible.
It sounds like you want re.findall():
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
As far as the actual regular expression for identifying email addresses goes... See this question.
Also, be careful using str as a variable name. This will hide the str built-in.
I guess that re.findall is what you're looking for.
You should give a try for find() or findall()
findall() matches all occurrences of a
pattern, not just the first one as
search() does. For example, if one was
a writer and wanted to find all of the
adverbs in some text, he or she might
use findall()
http://docs.python.org/library/re.html#finding-all-adverbs
You don't use raw_input in the way you used. Just use raw_input to get the input from the console.
Don't override built-in's such as str. Use a meaningful name and assign it a whole string value.
Also it is a good idea many a times to compile your pattern have it a Regex object to match the string against. (illustrated in the code)
I just realized that a complete regex to match an email id exactly as per RFC822 could be a pageful otherwise this snippet should be useful.
import re
inputstr = "something#exmaple.com, 121#airtelnet.com, ra#g.net, etc etc\t"
mailsrch = re.compile(r'[\w\-][\w\-\.]+#[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
matches = mailsrch.findall(inputstr)
print matches

Categories