Is there a better way to write something like this ?
if 'legal' in href_link or 'disclaimer' in href_link or 'contact' in href_link or 'faq' in href_link or 'terms' in href_link or 'log' in href_link:
continue
preferably in a single line...Where do I look?
Use the built-in any:
items = ('legal', 'disclaimer', 'contact', 'faq', 'terms', 'log')
if any(x in href_link for x in items):
continue
You can use the iterable directly in any to have a true one-liner, but then its more readable this way.
You could build a regular expression. I'm not sure about efficiency, you'd have to compare with #MosesKoledoye's nice answer.
To match against alternatives you use the pipe |. You'd need something like legal|disclaimer|contact|faq|terms|log as a pattern.
You can build that by joining a string '|' with the values:
>>> values = {'legal', 'disclaimer', 'contact', 'faq', 'terms', 'log'}
>>> pattern = '|'.join(values)
>>> pattern
'terms|log|faq|legal|contact|disclaimer'
Using the re (regular expression) module:
>>> import re
>>> href_link = 'link_to_disclaimer.html'
>>> if re.search(pattern, href_link):
... print('matches')
matches
#MosesKoledoye's answer is probably the best one for you: it certainly makes much better code to condense the six uniform tests into one iteration.
But you might instead have been asking "How can I break a long conditional to fit into 79 characters?". In other words, you might have been asking about code formatting rather than how to code. In which case my preferred answer is to format it something like this:
if (a in b or
c in d or
e not in f or
g not in h):
continue
Related
First of all, sorry if the title isn't very explicit, it's hard for me to formulate it properly. That's also why I haven't found if the question has already been asked, if it has.
So, I have a list of string, and I want to perform a "procedural" search replacing every * in my target-substring by any possible substring.
Here is an example:
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor('mesh_*')
# should return: ['mesh_1_TMP', 'mesh_2_TMP']
In this case where there is just one * I just split each string with * and use startswith() and/or endswith(), so that's ok.
But I don't know how to do the same thing if there are multiple * in the search string.
So my question is, how do I search for any number of unknown substrings in place of * in a list of string?
For example:
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor('*_1_*')
# should return: ['obj_1_mesh', 'mesh_1_TMP']
Hope everything is clear enough. Thanks.
Consider using 'fnmatch' which provides Unix-like file pattern matching. More info here http://docs.python.org/2/library/fnmatch.html
from fnmatch import fnmatch
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor = '*_1_*'
resultSubList = [ strList[i] for i,x in enumerate(strList) if fnmatch(x,searchFor) ]
This should do the trick
I would use the regular expression package for this if I were you. You'll have to learn a little bit of regex to make correct search queries, but it's not too bad. '.+' is pretty similar to '*' in this case.
import re
def search_strings(str_list, search_query):
regex = re.compile(search_query)
result = []
for string in str_list:
match = regex.match(string)
if match is not None:
result+=[match.group()]
return result
strList= ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
print search_strings(strList, '.+_1_.+')
This should return ['obj_1_mesh', 'mesh_1_TMP']. I tried to replicate the '*_1_*' case. For 'mesh_*' you could make the search_query 'mesh_.+'. Here is the link to the python regex api: https://docs.python.org/2/library/re.html
The simplest way to do this is to use fnmatch, as shown in ma3oun's answer. But here's a way to do it using Regular Expressions, aka regex.
First we transform your searchFor pattern so it uses '.+?' as the "wildcard" instead of '*'. Then we compile the result into a regex pattern object so we can efficiently use it multiple tests.
For an explanation of regex syntax, please see the docs. But briefly, the dot means any character (on this line), the + means look for one or more of them, and the ? means do non-greedy matching, i.e., match the smallest string that conforms to the pattern rather than the longest, (which is what greedy matching does).
import re
strList = ['obj_1_mesh',
'obj_2_mesh',
'obj_TMP',
'mesh_1_TMP',
'mesh_2_TMP',
'meshTMP']
searchFor = '*_1_*'
pat = re.compile(searchFor.replace('*', '.+?'))
result = [s for s in strList if pat.match(s)]
print(result)
output
['obj_1_mesh', 'mesh_1_TMP']
If we use searchFor = 'mesh_*' the result is
['mesh_1_TMP', 'mesh_2_TMP']
Please note that this solution is not robust. If searchFor contains other characters that have special meaning in a regex they need to be escaped. Actually, rather than doing that searchFor.replace transformation, it would be cleaner to just write the pattern using regex syntax in the first place.
If the string you are looking for looks always like string you can just use the find function, you'll get something like:
for s in strList:
if s.find(searchFor) != -1:
do_something()
If you have more than one string to look for (like abc*123*test) you gonna need to look for the each string, find the second one in the same string starting at the index you found the first + it's len and so on.
I'm trying to find all links on a webpage in the form of "http://something" or https://something. I made a regex and it works:
L = re.findall(r"http://[^/\"]+/|https://[^/\"]+/", site_str)
But, is there a shorter way to write this? I'm repeating ://[^/\"]+/ twice, probably without any need. I tried various stuff, but it doesn't work. I tried:
L = re.findall(r"http|https(://[^/\"]+/)", site_str)
L = re.findall(r"(http|https)://[^/\"]+/", site_str)
L = re.findall(r"(http|https)(://[^/\"]+/)", site_str)
It's obvious I'm missing something here or I just don't understand python regexes enough.
You are using capturing groups, and .findall() alters behaviour when you use those (it'll only return the contents of capturing groups). Your regex can be simplified, but your versions will work if you use non-capturing groups instead:
L = re.findall(r"(?:http|https)://[^/\"]+/", site_str)
You don't need to escape the double quote if you use single quotes around the expression, and you only need to vary the s in the expression, so s? would work too:
L = re.findall(r'https?://[^/"]+/', site_str)
Demo:
>>> import re
>>> example = '''
... "http://someserver.com/"
... "https://anotherserver.com/with/path"
... '''
>>> re.findall(r'https?://[^/"]+/', example)
['http://someserver.com/', 'https://anotherserver.com/']
I have a bunch of mathematical expressions stored as strings. Here's a short one:
stringy = "((2+2)-(3+5)-6)"
I want to break this string up into a list that contains ONLY the information in each "sub-parenthetical phrase" (I'm sure there's a better way to phrase that.) So my yield would be:
['2+2','3+5']
I have a couple of ideas about how to do this, but I keep running into a "okay, now what" issue.
For example:
for x in stringy:
substring = stringy[stringy.find('('+1 : stringy.find(')')+1]
stringlist.append(substring)
Works just peachy to return 2+2, but that's about as far as it goes, and I am completely blanking on how to move through the remainder...
One way using regex:
import re
stringy = "((2+2)-(3+5)-6)"
for exp in re.findall("\(([\s\d+*/-]+)\)", stringy):
print exp
Output
2+2
3+5
You could use regular expressions like the following:
import re
x = "((2+2)-(3+5)-6)"
re.findall(r"(?<=\()[0-9+/*-]+(?=\))", x)
Result:
['2+2', '3+5']
Is this even possible?
Basically, I want to turn these two calls to sub into a single call:
re.sub(r'\bAword\b', 'Bword', mystring)
re.sub(r'\baword\b', 'bword', mystring)
What I'd really like is some sort of conditional substitution notation like:
re.sub(r'\b([Aa])word\b', '(?1=A:B,a:b)word')
I only care about the capitalization of the first character. None of the others.
You can have functions to parse every match:
>>> def f(match):
return chr(ord(match.group(0)[0]) + 1) + match.group(0)[1:]
>>> re.sub(r'\b[aA]word\b', f, 'aword Aword')
'bword Bword'
OK, here's the solution I came up with, thanks to the suggestions to use a replace function.
re.sub(r'\b[Aa]word\b', lambda x: ('B' if x.group()[0].isupper() else 'b') + 'word', 'Aword aword.')
You can pass a lambda function which uses the Match object as a parameter as the replacement function:
import re
re.sub(r'\baword\b',
lambda m: m.group(0)[0].lower() == m.group(0)[0] and 'bword' or 'Bword',
'Aword aword',
flags=re.I)
# returns: 'Bword bword'
Use capture groups (r'\1'):
re.sub(r'\b([Aa])word\b', r'\1word', "hello Aword")
I have to check a lot of worlds if they are in string... code looks like:
if "string_1" in var_string or "string_2" in var_string or "string_3" in var_string or "string_n" in var_string:
do_something()
how to make it more readable and more clear?
This is one way:
words = ['string_1', 'string_2', ...]
if any(word in var_string for word in words):
do_something()
Reference: any()
Update:
For completeness, if you want to execute the function only if all words are contained in the string, you can use all() instead of any().
Also note that this construct won't do any unnecessary computations as any will return if it encounters a true value and a generator expression is used to create the Boolean values. So you also have some kind of short-circuit evaluation that is normally used when evaluating Boolean expressions.
import re
if re.search("string_1|string_2|string_n", var_strings): print True
The beauty of python regex it that it returns either a regex object (that gives informations on what matched) or None, that can be used as a "false" value in a test.
With regex that would be:
import re
words = ['string_1', 'string_2', ...]
if re.search('|'.join([re.escape(w) for w in words]), var_string):
blahblah
Have you looked at filter?
filter( lambda x: x in var_string, ["myString", "nextString"])
which then can be combined with map to get this
map( doSomething(), filter(lambda x: x in var_string, ["myString", "nextString"] ) )
EDIT:
of course that doesn't do what you want. Go with the any solution. For some reason I thought you wanted it done every time instead of just once.
>>> import re
>>> string="word1testword2andword3last"
>>> c=re.compile("word1|word2|word3")
>>> c.search(string)
<_sre.SRE_Match object at 0xb7715d40>
>>> string="blahblah"
>>> c.search(string)
>>>
one more way to achieve this
check = lambda a: any(y for y in ['string_%s'%x for x in xrange(0,10)] if y in a)
print check('hello string_1')