python regex: pattern not found

python regex: pattern not found - python

I have a pattern compiled as
pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
join_pattern = '|'.join(pattern_strings)
pattern = re.compile(join_pattern)
and then I find pattern in file as
def find_pattern(path):
with open(path, 'r') as f:
for line in f:
print line
found = pattern.search(line)
if found:
print dir(found)
logging.info('found - ' + found)
and my input as path file is
\xc2d
d\xa0
\xe7
\xc3\ufffdd
\xc3\ufffdd
\xc2\xa0
\xc3\xa7
\xa0\xa0
'619d813\xa03697'
When I run this program, nothing happens.
I it not able to catch these patterns, what is am I doing wrong here?
Desired output
- each line because each line has one or the other matching pattern
Update
After changing the regex to
pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
It is still the same, no output
UPDATE
after making regex to
pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
join_pattern = '[' + '|'.join(pattern_strings) + ']'
pattern = re.compile(join_pattern)
Things started to work, but partially, the patterns still not caught are for line
\xc2\xa0
\xc3\xa7
\xa0\xa0
for which my pattern string is ['\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0']

escape the \ in the search patterns
either with r"\xa0" or as "\\xa0"
do this ....
['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
like everyones been saying to do except the one guy you listened too...

Does your file actually contain \xc2d --- that is, five characters: a backslash followed by c, then 2, then d? If so, your regex won't match it. Each of your regexes will match one or two characters with certain character codes. If you want to match the string \xc2d your regex needs to be \\xc2d.

Related

Implement regular expression in Python to replace every occurence of "meshname = x" in a text file

I want to replace every line in a textfile with " " which starts with "meshname = " and ends with any letter/number and underscore combination. I used regex's in CS but I never really understood the different notations in Python. Can you help me with that?
Is this the right regex for my problem and how would i transform that into a Python regex?
m.e.s.h.n.a.m.e.' '.=.' '.{{_}*,{0,...,9}*,{a,...,z}*,{A,...,Z}*}*
x.y = Concatenation of x and y
' ' = whitespace
{x} = set containing x
x* = x.x.x. ... .x or empty word
What would the script look like in order to replace every string/line in a file containing meshname = ... with the Python regex? Something like this?
fin = open("test.txt", 'r')
data = fin.read()
data = data.replace("^meshname = [[a-z]*[A-Z]*[0-9]*[_]*]+", "")
fin.close()
fin = open("test.txt", 'w')
fin.write(data)
fin.close()
or is this completely wrong? I've tried to get it working with this approach, but somehow it never matched the right string: How to input a regex in string.replace?

Following the current code logic, you can use
data = re.sub(r'^meshname = .*\w$', ' ', data, flags=re.M)
The re.sub will replace with a space any line that matches
^ - line start (note the flags=re.M argument that makes sure the multiline mode is on)
meshname - a meshname word
= - a = string
.* - any zero or more chars other than line break chars as many as possible
\w - a letter/digit/_
$ - line end.

how to avoid regex matching "Revert "Revert"

I have the following code which matches both line1 and line2,I ONLY want to match line2 but not line1, can anyone provide guidance on how to do this?
import re
line1 = '''Revert "Revert <change://problem/47614601> [tech feature][V3 Driver] Last data path activity timestamp update is required for feature""'''
line2 = '''Revert <change://problem/47614601> [tech feature][V3 Driver] Last data path activity timestamp update is required for feature"'''
if re.findall(".*?(?:Revert|revert)\s*\S*(?:change:\/\/problem\/)(\d{8})", line2):
match = re.findall(".*?(?:Revert|revert)\s*\S*(?:change:\/\/problem\/)(\d{8})", line2)
print "Revert radar match%s"%match
revert_radar = True
print revert_radar

Something like this should do what you want:
>>> regex = "(?!:(?:R|r)evert.*)(?:Revert|revert)\s*\S*(?:change:\/\/problem\/)(\d{8})"
>>> re.match(regex, line1) is None
True
>>> re.match(regex, line2).groups()
('47614601',)

Negative look behind: word NOT proceeded by sameword-space-doublequote
r'''(?<![Rr]evert ")[Rr]evert\s<change:[/][/]problem[/]\d{8}.*"'''
Negative look ahead: pattern you want NOT followed by a double doublequote
r'''[Rr]evert\s<change:[/][/]problem[/]\d{8}.*?\w"(?!")'''
this doesn't work if the unwanted line immediately precedes the wanted line without an intervening newline character.
it does work if the unwanted line follows the wanted line.
If looking at individual lines, look for the pattern at the start of string - of the three this is the least work, most efficient for the regex engine
r'''^[Rr]evert\s<change:[/][/]problem[/]\d{8}.*$'''
If the line you are searching for is embedded in a long string but there is a newline character before that line you can use the previous pattern with the multiline flag.

extract a line that matches a string with IP address

I am working on python to find a line that matches a particular pattern of an IP address.
f = open('file.txt','r')
for line in f:
if line.find("N2:42.61.0.69")
print line
The pattern that I am trying to match is "node number":"IP address", where
"node number" is the alphabet 'N' followed by a 'number'. Such as N23, N456, N98765, etc.
I used the pattern N2:42\.61\.0\.69, but it didn't yield any result.
Most of the examples talks about regex to match a particular pattern such as for an IP address "^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$".
But here I want to match a particular string with IP address.
Thanks for the help!

The function find():
Returns the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure. Defaults for start and end and interpretation of negative values is the same as for slices.
So if you want to find lines containing the given string( "N2:42.61.0.69" in your case), the condition should be:
if line.find("N2:42.61.0.69") != -1:

If I understand your question, the value of the node may change, but the value of the IP address is known and fixed. Since the dot has a special meaning in regular expressions, you must place a backslash before.
import re
REGEX = re.compile( r"^N\d+:42\.61\.0\.69$" )
f = open('file.txt','r')
for line in f:
line = line.strip()
if REGEX.match(line):
print line
f.close()
Notice the use of a raw string (r" ..."), without it all the backslashes would need to appear twice (\\). Also, the ^ at the beginning and the $ at the end are used to match the whole line. If other text can appear in the lines, you can remove them and use search instead of match. The rstrip() is used to remove end-of-line characters (\n for example).
When used with the following data (contents of file.txt)
N1:42.61.0.69
N2:10.0.0.1
N123456:42.61.0.69
it prints only the first and third lines

Are you looking for this:
(?:^|\s*)N\d+:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])(?:$|\s+)
REGEX 101 DEMO
In python 2.7, to use the pattern in your example, you could:
import re
p = '(?:^|\s*)N\d+:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9]).){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])(?:$|\s+)'
with open('file.txt', 'r') as f:
print [line.strip() for line in f.readlines() if re.findall(p, line) ]

If you want to process file.txt a line at a time, the following should work:
import re
with open('file.txt') as f_input:
for line in f_input:
match = re.search(r'(.*?(N\d+):42\.61\.0\.69 .*?)', line)
if match:
line, node_number = match.groups()
print "{} found in {}".format(node_number, line)
Given file.txt contains the following:
link L518523: N1:42.61.0.69 N248066
non matching line
link L518533: N2:42.61.0.69 N248066
link L518553: N3:43.61.0.69 N248066
link L518553: N4:42.61.0.69 N248066
You would get the following output:
N1 found in link L518523: N1:42.61.0.69
N2 found in link L518533: N2:42.61.0.69
N4 found in link L518553: N4:42.61.0.69

Split group of special characters from string

In test.txt:
quiet confidence^_^
want:P
(:let's start
Codes:
import re
file = open('test.txt').read()
for line in file.split('\n'):
line = re.findall(r"[^\w\s$]+|[a-zA-z]+|[^\w\s$]+", line)
print " ".join(line)
Results showed:
quiet confidence^_^
want : P
(: let ' s start
I tried to separate group of special characters from string but still incorrect.
Any suggestion?
Expected results:
quiet confidence ^_^
want :P
(: let's start

as #interjay said, you must define what you consider a word and what is "special characters". Still I would use 2 separate regexes to find what a word is and what is not.
word = re.compile("[a-zA-Z\']+")
not_word = re.compile("[^a-zA-Z\']+")
for line in file.split('\n'):
matched_words = re.findall(word, line)
non_matching_words = re.findall(not_word, line)
print " ".join(matched_words)
print " ".join(non_matching_words)
Have in mind that spaces \s+ will be grouped as non words.

python regular expression to match strings

I want to parse a string, such as:
package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'
I want to get:
string1: jp.tjkapp.droidllwp`
string2: 1.1
Because there are multiple uses-permission, I want to get permission as a list, contains:
WRITE_APN_SETTINGS, RECEIVE_BOOT_COMPLETED and ACCESS_NETWORK_STATE.
Could you help me write the python regular expression to get the strings I want?
Thanks.

Assuming the code block you provided is one long string, here stored in a variable called input_string:
name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)
Explanation:
name
(?<=name\=\'): check ahead of the main string in order to return only strings that are preceded by name='. The \ in front of = and ' serve to escape them so that the regex knows we're talking about the = string and not a regex command. name=' is not also returned when we get the result, we just know that the results we get are all preceded by it.
[\w\.]+?: This is the main string we're searching for. \w means any alphanumeric character and underscore. \. is an escaped period, so the regex knows we mean . and not the regex command represented by an unescaped period. Putting these in [] means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _, or .. + afterwords means at least one of the previous thing, meaning at least one (but possibly more) of [\w\.]. Finally, the ? means don't be greedy--we're telling the regex to get the smallest possible group that meets these specifications, since + could go on for an unlimited number of repeats of anything matched by [\w\.].
(?=\'): check behind the main string in order to return only strings that are followed by '. The \ is also an escape, since otherwise regex or Python's string execution might misinterpret '. This final ' is not returned with our results, we just know that in the original string, it followed any result we do end up getting.

You can do this without regex by reading the file content line by line.
>>> def split_string(s):
... if s.startswith('package'):
... return [i.split('=')[1] for i in s.split() if "=" in i]
... elif s.startswith('uses-permission'):
... return s.split('.')[-1]
...
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>>

Here is one example code
#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
if line.startswith("package"):
words = line.split()
string1 = words[1].split("=")[1].replace("'","")
string2 = words[3].split("=")[1].replace("'","")
test.txt file contains input data you mentioned earlier..

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python regex: pattern not found - python

escape the \ in the search patterns either with r"\xa0" or as "\\xa0" do this .... ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9'] like everyones been saying to do except the one guy you listened too...

Does your file actually contain \xc2d --- that is, five characters: a backslash followed by c, then 2, then d? If so, your regex won't match it. Each of your regexes will match one or two characters with certain character codes. If you want to match the string \xc2d your regex needs to be \\xc2d.

Related

Implement regular expression in Python to replace every occurence of "meshname = x" in a text file

how to avoid regex matching "Revert "Revert"

extract a line that matches a string with IP address

Split group of special characters from string

python regular expression to match strings

Categories

Resources