I want to test if certain characters are in a line of text. The condition is simple but characters to be tested are many.
Currently I am using \ for easy viewing, but it feels clumsy. What's the way to make the lines look nicer?
text = "Tel+971-2526-821 Fax:+971-2526-821"
if "971" in text or \
"(84)" in text or \
"+66" in text or \
"(452)" in text or \
"19 " in text:
print "foreign"
Why don't extract the phone numbers from the string and do your tests
text = "Tel:+971-2526-821 Fax:+971-2526-821"
tel, fax = text.split()
tel_prefix, *_ = tel.split(':')[-1].split('-')
fax_prefix, *_ = fax.split(':')[-1].split('-')
if tel_prefix in ("971", "(84)"):
print("Foreigner")
for python 2.x
tel_prefix = tel.split(':')[-1].split('-')[0]
fax_prefix = fax.split(':')[-1].split('-')[0]
Enlightened by #Patrick Haugh in the comment. We can do:
text = "Tel+971-2526-821 Fax:+971-2526-821"
if any(x in text for x in ("971", "(84)", "+66", "(452)", "19 ")):
print "foreign"
You can use any builtin function to check if any one of the token exists in the text. If you would like to check if all the token exists in the string you can replace the below any with all function. Cheers!
text = 'Hello your number is 19 '
tokens = ('971', '(84)', '+66', '(452)', '19 ')
if any(token in text for token in tokens):
print('Foriegn')
Output:
Foriegn
Existing comments mention that you can't really have multiple or statements like you intend, but using generators/comprehensions and the any() function you are able to come up with a serviceable option, such as the snippet if any(x in text for x in ('971', '(84)', '+66', '(452)', '19 ')): that #Patrick Haugh recommended.
I would recommend using regular expressions instead as a more versatile and efficient way of solving the problem. You could either generate the pattern dynamically, or for the purpose of this problem, the following snippet would work (don't forget to escape parentheses):
import re
text = 'Tel:+971-2526-821 Fax:+971-2526-821'
pattern = u'(971|\(84\)|66|\(452\)|19)'
prog = re.compile(pattern)
if prog.search(text):
print 'foreign'
If you are searching many lines of text or large bodies of text for multiple possible substrings, this approach will be faster and more reusable. You only have to compile prog once, and then you can use it as often as you'd like.
As far as dynamic generation of a pattern is concerned, a naive implementation might do something like this:
match_list = ['971', '(84)', '66', '(452)', '19']
pattern = '|'.join(map(lambda s: s.replace('(', '\(').replace(')', '\)'), match_list)).join(['(', ')'])
The variable match_list could then be updated and modified as needed. There is a slight inefficiency in running two passes of replace(), and #Andrew Clark has a good trick for fixing that here, but I don't want this answer to be too long and cumbersome.
You can construct a lambda function that checks if a value is in the text, and then map this function to all of the values:
text = "Tel:+971-2526-821 Fax:+971-2526-821"
print any(map((lambda x: x in text), ["971", "(84)", "+66", "(452)", "19 "]))
The result is True, which means at least one of the values is in text.
Related
I want to find something after a certain character. I am aware of how to find something before using rfind, but not so sure of the syntax to find something after. Here is an example
text = 'Hello.world'
#to find something before
print(text[:text.rfind('.')])
# out :
Hello
# to find something after, I tried this, but of course it's incorrect
print(text[text:.rfind('.')])
Any ideas on how to use the syntax to find something after
print(text[text.rfind('.')+1:])
Two other methods you may try might include splitting the string, and also doing a regex substitution to isolate the substring you want:
text = 'Hello.world'
print(text.split('.')[1])
print(re.sub(r'^.*\.', '', text))
Splitting would proabably outperform re.sub here, so I recommend split() first.
print(text[text:.rfind('.')]) => print(text[text.find('.')+1:])
Here's another way to do it, str.partition and str.rpartition
def find(text, sep=' ', right=False):
if not (text and sep) or sep not in text:
return None
return text.rpartition(sep)[2] if right else text.partition(sep)[0]
find('Hello.Word', '.') # 'Hello'
find('Hello.Word', '.', True) # 'Word'
What are the most efficient ways to extract text from a string? Are there some available functions or regex expressions, or some other way?
For example, my string is below and I want to extract the IDs as well
as the ScreenNames, separately.
[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]
Thank you!
Edit: These are the text strings that I want to pull. I want them to be in a list.
Target_IDs = 1234567890, 233323490, 4459284
Target_ScreenNames = RandomNameHere, AnotherRandomName, YetAnotherName
import re
str = '[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]'
print 'Target IDs = ' + ','.join( re.findall(r'ID=(\d+)', str) )
print 'Target ScreenNames = ' + ','.join( re.findall(r' ScreenName=(\w+)', str) )
Output :
Target IDs = 1234567890,233323490,4459284
Target ScreenNames = RandomNameHere,AnotherRandomName,YetAnotherName
It depends. Assuming that all your text comes in the form of
TagName = TagValue1, TagValue2, ...
You need just two calls to split.
tag, value_string = string.split('=')
values = value_string.split(',')
Remove the excess space (probably a couple of rstrip()/lstrip() calls will suffice) and you are done. Or you can take regex. They are slightly more powerful, but in this case I think it's a matter of personal taste.
If you want more complex syntax with nonterminals, terminals and all that, you'll need lex/yacc, which will require some background in parsers. A rather interesting thing to play with, but not something you'll want to use for storing program options and such.
The regex I'd use would be:
(?:ID=|ScreenName=)+(\d+|[\w\d]+)
However, this assumes that ID is only digits (\d) and usernames are only letters or numbers ([\w\d]).
This regex (when combined with re.findall) would return a list of matches that could be iterated through and sorted in some fashion like so:
import re
s = "[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]"
pattern = re.compile(r'(?:ID=|ScreenName=)+(\d+|[\w\d]+)');
ids = []
names = []
for p in re.findall(pattern, s):
if p.isnumeric():
ids.append(p)
else:
names.append(p)
print(ids, names)
This is how the string splitting works for me right now:
output = string.encode('UTF8').split('}/n}')[0]
output += '}\n}'
But I am wondering if there is a more pythonic way to do it.
The goal is to get everything before this '}/n}' including '}/n}'.
This might be a good use of str.partition.
string = '012za}/n}ddfsdfk'
parts = string.partition('}/n}')
# ('012za', '}/n}', 'ddfsdfk')
''.join(parts[:-1])
# 012za}/n}
Or, you can find it explicitly with str.index.
repl = '}/n}'
string[:string.index(repl) + len(repl)]
# 012za}/n}
This is probably better than using str.find since an exception will be raised if the substring isn't found, rather than producing nonsensical results.
It seems like anything "more elegant" would require regular expressions.
import re
re.search('(.*?}/n})', string).group(0)
# 012za}/n}
It can be done with with re.split() -- the key is putting parens around the split pattern to preserve what you split on:
import re
output = "".join(re.split(r'(}/n})', string.encode('UTF8'))[:2])
However, I doubt that this is either the most efficient nor most Pythonic way to achieve what you want. I.e. I don't think this is naturally a split sort of problem. For example:
tag = '}/n}'
encoded = string.encode('UTF8')
output = encoded[:encoded.index(tag)] + tag
or if you insist on a one-liner:
output = (lambda string, tag: string[:string.index(tag)] + tag)(string.encode('UTF8'), '}/n}')
or returning to regex:
output = re.match(r".*}/n}", string.encode('UTF8')).group(0)
>>> string_to_split = 'first item{\n{second item'
>>> sep = '{\n{'
>>> output = [item + sep for item in string_to_split.split(sep)]
NOTE: output = ['first item{\n{', 'second item{\n{']
then you can use the result:
for item_with_delimiter in output:
...
It might be useful to look up os.linesep if you're not sure what the line ending will be. os.linesep is whatever the line ending is under your current OS, so '\r\n' under Windows or '\n' under Linux or Mac. It depends where input data is from, and how flexible your code needs to be across environments.
Adapted from Slice a string after a certain phrase?, you can combine find and slice to get the first part of the string and retain }/n}.
str = "012za}/n}ddfsdfk"
str[:str.find("}/n}")+4]
Will result in 012za}/n}
I'm looking for a regular expression, implemented in Python, that will match on this text
WHERE PolicyGUID = '531B2310-403A-13DA-5964-E2EFA56B0753'
but will not match on this text
WHERE AsPolicy.PolicyGUID = '531B2310-403A-13DA-5964-E2EFA56B0753'
I'm doing this to find places in a large piece of SQL where the developer did not explicitly reference the table name. All I want to do is print the offending lines (the first WHERE clause above). I have all of the code done except for the regex.
re.compile('''WHERE [^.]+ =''')
Here, the [] indicates "match a set of characters," the ^ means "not" and the dot is a literal period. The + means "one or more."
Was that what you were looking for?
something like
WHERE .*\..* = .*
not sure how accurate can be, it depends on how your data looks... If you provide a bigger sample it can be refined
Something like this would work in java, c#, javascript, I suppose you can adapt it to python:
/WHERE +[^\.]+ *\=/
>>> l
["WHERE PolicyGUID = '531B2310-403A-13DA-5964-E2EFA56B0753' ", "WHERE AsPolicy.P
olicyGUID = '531B2310-403A-13DA-5964-E2EFA56B0753' "]
>>> [line for line in l if re.match('WHERE [^.]+ =', line)]
["WHERE PolicyGUID = '531B2310-403A-13DA-5964-E2EFA56B0753' "]
How to find values in string, add specific value to each of them and replace output with fixed string.
import re
def _replace(content):
#x = float(content.group(4))+20
#y = float(content.group(6))+20
return content.group(6)
print re.sub('<g(\s)transform="matrix\((.*)(\s)(.*)(\s)(.*)\)\"', _replace, '<g transform="matrix(0.412445 -0.0982513 0.0982513 0.412445 -5.77618 67.0025)">')
First off, I should repeat the usual warning about not parsing XML with regexes. It's a bad idea, and it will never work for all cases. If you're actually trying to parse the full xml document, use an XML parser.
That having been said, I'm guilty of doing quick and dirty stuff like this all the time. If you really just need a one-off solution, a simple regex can often get the job done. Just be aware that it will come back to haunt you as soon as you run into something more complex!
Next, I confess to not being much of a regex wiz, but here's how I'd modify your code snippet:
import re
def _replace(content):
values = [float(val) for val in content.group(2).split()]
values[3] += 20
values[5] += 100
values = ['{0}'.format(val) for val in values]
return content.group(1) + ' '.join(values) + content.group(3)
test_string = '<g transform="matrix(0.412445 -0.0982513 0.0982513 0.412445 -5.77618 67.0025)">'
pattern = r'(transform=\"matrix\()(.*?)(\))'
print test_string
print re.sub(pattern, _replace, test_string)