I have to parse some numbers from file names that have no common logic. I want to use the python way of "try and thou shall be forgiven", or try-except structure. Now I have to add more than two cases. What is the correct way of doing this? I am now thinking either nested try's or try-except-pass, try-except-pass,... Which one would be better or something else? Factory method perhaps (how?)?
This has to be easily expandable in the future as there will be much more cases.
Below is what I want (does not work because only one exeption per try can exist):
try:
# first try
imNo = int(imBN.split('S0001')[-1].replace('.tif',''))
except:
# second try
imNo = int(imBN.split('S0001')[-1].replace('.tiff',''))
except:
# final try
imNo = int(imBN.split('_0_')[-1].replace('.tif',''))
Edit:
Wow, thanks for the answers, but no pattern matching please. My bad, put "some common logic" at the beginning (now changed to "no common logic", sorry about that). In the cases above patterns are pretty similar... let me add something completely different to make the point.
except:
if imBN.find('first') > 0: imNo = 1
if imBN.find('second') > 0: imNo = 2
if imBN.find('third') > 0: imNo = 3
...
You can extract the common structure and make a list of possible parameters:
tries = [
('S0001', '.tif'),
('S0001', '.tiff'),
('_0_', '.tif'),
]
for sep, subst in tries:
num = imBN.split(sep)[-1].replace(subst, '')
try:
imNo = int(num)
break
except ValueError:
pass
else:
raise ValueError, "String doesn't match any of the possible patterns"
Update in reaction to question edit
This technique can easily be adapted to arbitrary expressions by making use of lambdas:
def custom_func(imBN):
if 'first' in imBN: return 1
if 'second' in imBN: return 2
tries = [
lambda: int(imBN.split('S0001')[-1].replace('.tif','')),
lambda: int(imBN.split('S0001')[-1].replace('.tiff','')),
lambda: int(imBN.split('_0_')[-1].replace('.tif','')),
lambda: custom_func(imBN),
]
for expr in tries:
try:
result = expr()
break
except:
pass
else:
# error
In your specific case, a regular expression will get rid of the need to do these try-except blocks. Something like this might catch your cases:
>>> import re
>>> re.match('.*(S0001|_0_)([0-9]+)\..*$', 'something_0_1234.tiff').groups()
('_0_', '1234')
>>> re.match('.*(S0001|_0_)([0-9]+)\..*$', 'somethingS00011234.tif').groups()
('S0001', '1234')
>>> re.match('.*(S0001|_0_)([0-9]+)\..*$', 'somethingS00011234.tiff').groups()
('S0001', '1234')
For your question about the serial try-except blocks, Niklas B.'s answer is obviously a great one.
Edit:
What you are doing is called pattern matching, so why not use a pattern matching library? If the regex string is bothering you, there are cleaner ways to do it:
import re
matchers = []
sep = ['S0001', '_0_']
matchers.append(re.compile('^.*(' + '|'.join(sep) + ')(\d+)\..*$'))
matchers.append(some_other_regex_for_other_cases)
for matcher in matchers:
match = matcher.match(yourstring)
if match:
print match.groups()[-1]
Another, more generic way which is compatible with custom functions:
import re
matchers = []
simple_sep = ['S0001', '_0_']
simple_re = re.compile('^.*(' + '|'.join(sep) + ')(\d+)\..*$')
def simple_matcher(s):
m = simple_re.match(s)
if m:
return m.groups()[-1]
def other_matcher(s):
if s[3:].isdigit():
return s[3:]
matchers.append(simple_matcher)
matchers.append(other_matcher)
for matcher in matchers:
match = matcher('yourstring')
if match:
print int(match)
Related
I have two different kinds of URLs in a list:
The first kind looks like this and starts with the word 'meldung':
meldung/xxxxx.html
The other kind starts with 'artikel':
artikel/xxxxx.html
I want to detect if a URL starts with 'meldung' or 'artikel' and then do different operations based on that. To achieve this I tired to use a loop with if and else conditions:
for line in r:
if re.match(r'^meldung/', line):
print('je')
else:
print('ne')
I also tried this with line.startswith():
for line in r:
if line.startswith('meldung/'):
print('je')
else:
print('ne')
But both methods dont work since the strings I am checking dont have any whitespaces.
How can I do this correctly?
You can just use the following, if the links are stored as strings within the list:
for line in r:
if ‘meldung’ in line:
print(‘je’)
else:
print(‘ne’)
What about this:
r = ['http://example.com/meldung/page1.html', 'http://example.com/artikel/page2.html']
for line in r:
url_tokens = line.split('/')
if url_tokens[-2] == 'meldung':
print(url_tokens[-1]) # the xxxxx.html part
elif url_tokens[-2] == 'artikel':
print('ne')
else:
print('something else')
you can do it using regex:
import re
def check(string):
if (re.search('^meldung|artikel*', string)):
print("je")
else:
print("ne")
for line in r:
check(line)
I'm using Python to search a large text file for a certain string, below the string is the data that I am interested in performing data analysis on.
def my_function(filename, variable2, variable3, variable4):
array1 = []
with open(filename) as a:
special_string = str('info %d info =*' %variable3)
for line in a:
if special_string == array1:
array1 = [next(a) for i in range(9)]
line = next(a)
break
elif special_string != c:
c = line.strip()
In the special_string variable, whatever comes after info = can vary, so I am trying to put a wildcard operator as seen above. The only way I can get the function to run though is if I put in the exact string I want to search for, including everything after the equals sign as follows:
special_string = str('info %d info = more_stuff' %variable3)
How can I assign a wildcard operator to the rest of the string to make my function more robust?
If your special string always occurs at the start of a line, then you can use the below check (where special_string does not have the * at the end):
line.startswith(special_string)
Otherwise, please do look at the module re in the standard library for working with regular expressions.
Have you thought about using something like this?
Based on your input, I'm assuming the following:
variable3 = 100000
special_string = str('info %d info = more_stuff' %variable3)
import re
pattern = re.compile('(info\s*\d+\s*info\s=)(.*)')
output = pattern.findall(special_string)
print(output[0][1])
Which would return:
more_stuff
Please excuse me for posting this again, but I think I really screwed up my previous thread. Because of comment blocks only allow so many characters, I could not explain myself better, and I did not see a choice for replying so that I would have more room. So if nobody minds, let me try explaining everything that I need. Basically I need to flip the names of 3D objects that have a prefix or a suffix of "L" or "R" from:
1: "L" with "R",
2: "R" with "L", or
3: don't change.
This is for a script in Maya in order to duplicate selected objects and flip there names. I got the duplicating part down packed and now it is about trying to flip the names of the duplicated objects based on 5 possibilities. Starting with the first 2 prefixes, the duplicated objects need to start with either
"L_" or "R_", match case doesn't matter.
The next 2, the suffixes, need to be either:
"_L" or "_R" with a possible extra character "_", such as "Finger_L_001".
Now in a search on this forum, I think found something almost to what I am looking for. I copied the syntax and replaced the user's search characters with mine being "L_" and "L", just to see if it would work, but with only some expectation. Since I only know the basics of regular expressions, such as "L.*" will find L_Finger_001, I really do not understand this line of syntax below and why the second choice is not leaving it as L_Finger.
So maybe this is not what I need or is it? And can someone explain this? I tried searching for keywords such as (?P) and (?P\S+), but I did not find anything. So without further due, here is the syntax....
>>> x = re.sub(r'(?P<prefix>_L)(?P<key>\S+)(?(prefix)|L_)','\g<key>',"L_Finger")
>>> x
'L_Finger'
>>> x = re.sub(r'(?P<prefix>L_)?(?P<key>\S+)(?(prefix)|_L)','\g<key>',"L_anything")
>>> x
'Finger'
#
Updated 11\10\13 3:52 PM ET
Ok, so I have tweaked the code a bit, but I like where this is going. Actually, my original idea was to use dictionaries, but I could figure out how to search. By kobejohn steering me in the right direction with defining all possibilities, this is starting to make sense. Here is a WIP
samples = ('L_Arm',
'R_Arm',
'Arm_L',
'Arm_R',
'IndexFinger_L_001',
'IndexFinger_R_001',
'_LArm')
prefix_l, prefix_r = 'L_', 'R_'
suffix_l, suffix_lIndex, suffix_r, suffix_rIndex = '_L', '_L_', '_R', '_R_'
prefix_replace = {prefix_l: prefix_r, prefix_r: prefix_l}
suffix_replace = {suffix_l: suffix_r, suffix_r: suffix_l}
suffixIndex_replace = {suffix_lIndex: suffix_rIndex, suffix_rIndex: suffix_lIndex}
results = dict()
for sample in samples:
# Default value is no modification - may be replaced below
results[sample] = sample
# Handle prefixes
prefix = prefix_replace.get(sample[:2].upper())
if prefix :
result = prefix+sample[:2]
else :
#handle the suffixes
suffix_partition = sample.rpartition("_")
result = suffix_partition[0] if suffix_partition[2].isdigit() else sample
suffix = suffix_replace.get(result[-2:])
print("Before: %s --> After: %s"%(sample, suffix))
Ok, I guess multiple regular expressions is valid too. Here is a way using re's similar to the ones you found. It assumes real prefixes (nothing before the prefix) and pseudo-suffixes (anywhere except the first characters). Below that is a parsing solution with the same assumptions.
import re
samples = ('L_Arm',
'R_Arm',
'Arm_L',
'Arm_R',
'IndexFinger_L_001',
'IndexFinger_R_001',
'_LArm')
re_with_subs = ((r'(?P<prefix>L_)(?P<poststring>\S+)',
r'R_\g<poststring>'),
(r'(?P<prefix>R_)(?P<poststring>\S+)',
r'L_\g<poststring>'),
(r'(?P<prestring>\S+)(?P<suffix>_L)(?P<poststring>\S*)',
r'\g<prestring>_R\g<poststring>'),
(r'(?P<prestring>\S+)(?P<suffix>_R)(?P<poststring>\S*)',
r'\g<prestring>_L\g<poststring>'))
results_re = dict()
for sample in samples:
# Default value is no modification - may be replaced below
results_re[sample] = sample
for pattern, substitution in re_with_subs:
result = re.sub(pattern, substitution, sample)
if result != sample:
results_re[sample] = result
break # only allow one substitution per string
for original, result in results_re.items():
print('{0} --> {1}'.format(original, result))
Here is the parsing solution.
samples = ('L_Arm',
'R_Arm',
'Arm_L',
'Arm_R',
'IndexFinger_L_001',
'IndexFinger_R_001',
'_LArm')
prefix_l, prefix_r = 'L_', 'R_'
suffix_l, suffix_r = '_L', '_R'
prefix_replacement = {prefix_l: prefix_r,
prefix_r: prefix_l}
suffix_replacement = {suffix_l: suffix_r,
suffix_r: suffix_l}
results = dict()
for sample in samples:
# Default value is no modification - may be replaced below
results[sample] = sample
# Handle prefixes
prefix = sample[:2].upper()
try:
results[sample] = prefix_replacement[prefix] + sample[2:]
continue # assume no suffixes if a prefix found
except KeyError:
pass # no valid prefix
# Handle pseudo-suffixes
start = None
for valid_suffix in (suffix_l, suffix_r):
try:
start = sample.upper().rindex(valid_suffix, 1)
break # stop if valid suffix found
except ValueError:
pass
if not start is None:
suffix = sample[start: start + 2].upper()
new_suffix = suffix_replacement[suffix]
results[sample] = sample[:start] + new_suffix + sample[start + 2:]
for original, result in results.items():
print('{0} --> {1}'.format(original, result))
gives the result:
L_Arm --> R_Arm
R_Arm --> L_Arm
IndexFinger_L_001 --> IndexFinger_R_001
Arm_L --> Arm_R
_LArm --> _LArm
IndexFinger_R_001 --> IndexFinger_L_001
Arm_R --> Arm_L
You can do it by this tricky regex pattern >>
if re.search('(^[LR]_|_[LR](_|$))', str):
str = re.sub(r'(^[LR](?=_)|(?<=_)[LR](?=(?:_|...$)))(.*)(?=.*\1(.))...$',
r'\3\2', str+"LRL")
See this demo.
Alternatively, you can do it by one line code >>
str = re.sub(r'(^[LR](?=_)|(?<=_)[LR](?=(?:_|...$)))(.*)(?=.*\1(.))...$', r'\3\2', str+"LRL")[:len(str)]
See this demo.
I'm trying to build a list of domain names from an Enom API call. I get back a lot of information and need to locate the domain name related lines, and then join them together.
The string that comes back from Enom looks somewhat like this:
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1
I'd like to build a list from that which looks like this:
[domain1.com, domain2.org, domain3.co.uk, domain4.net]
To find the different domain name components I've tried the following (where "enom" is the string above) but have only been able to get the SLD and TLD matches.
re.findall("^.*(SLD|TLD).*$", enom, re.M)
Edit:
Every time I see a question asking for regular expression solution I have this bizarre urge to try and solve it without regular expressions. Most of the times it's more efficient than the use of regex, I encourage the OP to test which of the solutions is most efficient.
Here is the naive approach:
a = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
b = a.split("\n")
c = [x.split("=")[1] for x in b if x != 'TLDOverride=1']
for x in range(0,len(c),2):
print ".".join(c[x:x+2])
>> domain1.com
>> domain2.org
>> domain3.co.uk
>> domain4.net
You have a capturing group in your expression. re.findall documentation says:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
That's why only the conent of the capturing group is returned.
try:
re.findall("^.*((?:SLD|TLD)\d*)=(.*)$", enom, re.M)
This would return a list of tuples:
[('SLD1', 'domain1'), ('TLD1', 'com'), ('SLD2', 'domain2'), ('TLD2', 'org'), ('SLD3', 'domain3'), ('TLD4', 'co.uk'), ('SLD5', 'domain4'), ('TLD5', 'net')]
Combining SLDs and TLDs is then up to you.
this works for you example,
>>> sld_list = re.findall("^.*SLD[0-9]*?=(.*?)$", enom, re.M)
>>> tld_list = re.findall("^.*TLD[0-9]*?=(.*?)$", enom, re.M)
>>> map(lambda x: x[0] + '.' + x[1], zip(sld_list, tld_list))
['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
I'm not sure why are you talking about regular expressions. I mean, why don't you just run a for loop?
A famous quote seems to be appropriate here:
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
domains = []
components = []
for line in enom.split('\n'):
k,v = line.split('=')
if k == 'TLDOverride':
continue
components.append(v)
if k.startswith('TLD'):
domains.append('.'.join(components))
components = []
P.S. I'm not sure what's this TLDOverride so the code just ignores it.
Here's one way:
import re
print map('.'.join, zip(*[iter(re.findall(r'^(?:S|T)LD\d+=(.*)$', text, re.M))]*2))
# ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
Just for fun, map -> filter -> map:
input = """
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
"""
splited = map(lambda x: x.split("="), input.split())
slds = filter(lambda x: x[1][0].startswith('SLD'), enumerate(splited))
print map(lambda x: '.'.join([x[1][1], splited[x[0] + 1][1], ]), slds)
>>> ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']
This appears to do what you want:
domains = re.findall('SLD\d+=(.+)', re.sub(r'\nTLD\d+=', '.', enom))
It assumes that the lines are sorted and SLD always comes before its TLD. If that can be not the case, try this slightly more verbose code without regexes:
d = dict(x.split('=') for x in enom.strip().splitlines())
domains = [
d[key] + '.' + d.get('T' + key[1:], '')
for key in d if key.startswith('SLD')
]
You need to use multiline regex for this. This is similar to this post.
data = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""
domain_seq = re.compile(r"SLD\d=(\w+)\nTLD\d=(\w+)", re.M)
for item in domain_seq.finditer(data):
domain, tld = item.group(1), item.group(2)
print "%s.%s" % (domain,tld)
As some other answers already said, there's no need to use a regular expression here. A simple split and some filtering will do nicely:
lines = data.split("\n") #assuming data contains your input string
sld, tld = [[x.split("=")[1] for x in lines if x[:3] == t] for t in ("SLD", "TLD")]
result = [x+y for x, y in zip(sld, tld)]
def handler_users_answ(coze, res, type, source):
if res:
if res.getType() == 'result':
aa=res.getQueryChildren()
if aa:
print 'workz1'
for x in aa:
m=x.getAttr('jid')
if m:
print m
so this code returns me the values like this:
roomname#domain.com/nickname1
roomname#domain.com/nickname2
and so on, but i want it to print the value after the '/' only.
like:
nickname1
nickname2
Thanks in advance.
You can use rpartition to get the part after the last \ in the string.
a = 'roomname#domain.com/nickname1'
b=a.split('/');
c=b[1];
You can use rsplit which will do the splitting form the right:
a = 'roomname#domain.com/nickname1'
try:
print a.rsplit('/')[1][1]
except IndexError:
print "No username was found"
I think that this is efficient and readable. If you really need it to be fast you can use rfind:
a = 'roomname#domain.com/nickname1'
index = a.rfind('/')
if index != -1:
print a[index+1:]
else:
print "No username was found"
To fully parse and validate the JID correctly, see this answer. There's a bunch of odd little edge cases that you might not expect.