Check if none of the multiple chars appears in string A? - python

I have a string, A = "abcdef", and several chars "a", "f" and "m". I want a condition to make sure none of the chars appears in A, i.e.,
if a not in A and f not in A and m not in A:
# do something
Is there a better way to do this? Thanks!

Sets are useful for this -- see the isdisjoint() method:
Return True if the set has no elements in common with other.
Sets are disjoint if and only if their intersection is the empty set.
new in version 2.6.
>>> a = "abcde"
>>> b = "ace"
>>> c = "xyz"
>>> set(a).isdisjoint(set(b))
False
>>> set(a).isdisjoint(set(c))
True
edit after comment
sets are still you friend. If I'm following you better now, you want this (or something close to it):
We'll just set everything up as sets to begin with for clarity:
>>> a = set('abcde')
>>> b = set('ace')
>>> c = set('acx')
If all of the chars in your set of characters is in the string, this happens:
>>> a.intersection(b) == b
True
If any of those characters are not present in your string, this happens:
>>> a.intersection(c) == c
False
Closer to what you need?

True in [i in 'abcdef' for i in 'afm']
gives True
and
True in [i in 'nopqrst' for i in 'afm']
gives False

Related

Check if string contains substring at index

In Python 3.5, given this string:
"rsFooBargrdshtrshFooBargreshyershytreBarFootrhj"
and the index 17 -- so, the F at the start of the second occurrence of FooBar -- how can I check that "FooBar" exists? In this case, it should return True, while if I gave it the index 13 it should return false.
There's actually a very simple way to do this without using any additional memory:
>>> s = "rsFooBargrdshtrshFooBargreshyershytreBarFootrhj"
>>> s.startswith("FooBar", 17)
True
>>>
The optional second argument to startswith tells it to start the check at offset 17 (rather than the default 0). In this example, a value of 2 will also return True, and all other values will return False.
You need to slice your original string based on your substring's length and compare both the values. For example:
>>> my_str = "rsFooBargrdshtrshFooBargreshyershytreBarFootrhj"
>>> word_to_check, index_at = "FooBar", 17
>>> word_to_check == my_str[index_at:len(word_to_check)+index_at]
True
>>> word_to_check, index_at = "FooBar", 13
>>> word_to_check == my_str[index_at:len(word_to_check)+index_at]
False
print("rsFooBargrdshtrshFooBargreshyershytreBarFootrhj"[17:].startswith('Foo')) # True
or in common
my_string[start_index:].startswith(string_to_check)
Using Tom Karzes approach, as a function
def contains_at(in_str, idx, word):
return in_str[idx:idx+len(word)] == word
>>> contains_at(s, 17, "FooBar")
>>> True
Try this:
def checkIfPresent(strng1, strng2, index):
a = len(strng2)
a = a + index
b = 0
for i in range(index, a):
if strng2[b] != strng1[i]:
return false
b = b+1
return true
s = "rsFooBargrdshtrshFooBargreshyershytreBarFootrhj"
check = checkIfPresent(s, Foobar, 17)
print(check)

Comparing two strings in Python 2.7

I have two strings that I compare, but I am not getting the result I want. Here's how I do it, with Python 2.7:
str1 = '0000644'
str2 = '0000644'
if str1 == str2:
print 'true!'
else:
print 'false'
I have also tried with the is comparison:
if str1 is str2:
print 'true'
else:
print 'false'
Can someone explain why I am not printing true when I do this? I come from C#, and if you do it like this you should print the true value.
The code you posted is not valid Python.
This will do:
str1 = '0000644'
str2 = '0000644'
if str1 == str2:
print True
else:
print False
To elaborate:
booleans start with capital letters: True and False (not sure why you had the exclamation)
blocks need to be consistently indented (unlike C# where you separate them with {})
else needs to finish with a colon
edit: my answer was based on OPs original code, which was not valid Python. I can't help if someone then changes the code into valid code after.
is will return True if two variables point to the same object, == if the objects referred to by the variables are equal.
>>> a = [17,27,37]
>>> b = a
>>> b is a
True
>>> b == a
True
>>> b = a[:] #shallow copy of a
>>> b is a
False
>>> b == a
True
In Python, 'true' and 'false' booleans need to be capitalized at the 'T' and 'F', respectively. Also, when printing, whatever you want to print needs to be surrounded in double or single quotes.

How to Check if String Has the same characters in Python [duplicate]

This question already has answers here:
efficiently checking that string consists of one character in Python
(8 answers)
Closed 6 years ago.
What is the shortest way to check if a given string has the same characters?
For example if you have name = 'aaaaa' or surname = 'bbbb' or underscores = '___' or p = '++++', how do you check to know the characters are the same?
An option is to check whether the set of its characters has length 1:
>>> len(set("aaaa")) == 1
True
Or with all(), this could be faster if the strings are very long and it's rare that they are all the same character (but then the regex is good too):
>>> s = "aaaaa"
>>> s0 = s[0]
>>> all(c == s0 for c in s[1:])
True
You can use regex for this:
import re
p = re.compile(ur'^(.)\1*$')
re.search(p, "aaaa") # returns a match object
re.search(p, "bbbb") # returns a match object
re.search(p, "aaab") # returns None
Here's an explanation of what this regex pattern means: https://regexper.com/#%5E(.)%5C1*%24
Also possible:
s = "aaaaa"
s.count(s[0]) == len(s)
compare == len(name) * name[0]
if(compare):
# all characters are same
else:
# all characters aren't same
Here are a couple of ways.
def all_match0(s):
head, tail = s[0], s[1:]
return tail == head * len(tail)
def all_match1(s):
head, tail = s[0], s[1:]
return all(c == head for c in tail)
all_match = all_match0
data = [
'aaaaa',
'bbbb',
'___',
'++++',
'q',
'aaaaaz',
'bbbBb',
'_---',
]
for s in data:
print(s, all_match(s))
output
aaaaa True
bbbb True
___ True
++++ True
q True
aaaaaz False
bbbBb False
_--- False
all_match0 will be faster unless the string is very long, because its testing loop runs at C speed, but it uses more RAM because it constructs a duplicate string. For very long strings, the time taken to construct the duplicate string becomes significant, and of course it can't do any testing until it creates that duplicate string.
all_match1 should only be slightly slower, even for short strings, and because it stops testing as soon as it finds a mismatch it may even be faster than all_match0, if the mismatch occurs early enough in the string.
try to use Counter (High-performance container datatypes).
>>> from collections import Counter
>>> s = 'aaaaaaaaa'
>>> c = Counter(s)
>>> len(c) == 1
True

Difference between: IF IN and IF == python

I wanted to know which condition is better to use for the following code:
Here are my two lists:
Matrix = ['kys_q1a1','kys_q1a2','kys_q1a3','kys_q1a4','kys_q1a5','kys_q1a6']
fixedlist = ['kys_q1a2', 'kys_q1a5']
Option 1:
for i, topmember in enumerate(Matrix):
for fixedcol in fixedlist:
if topmember in fixedcol:
print i
OR
Option 2:
for i, topmember in enumerate(Matrix):
for fixedcol in fixedlist:
if topmember == fixedcol:
print i
I understand that the comparison opertor is matching strings but isn't 'in' doing the same?
Thanks
topmember in fixedcol
tests if the string topmember is contained within fixedcol.
topmember == fixedcol
tests if the string topmember is equal to fixedcol.
So, 'a' in 'ab' would evaluate True. But 'a' == 'ab' would evaluate False.
I wanted to know which condition is better to use.
Since the two variants perform different operations, we cannot answer that. You need to choose the option that does the operation that you require.
Your code could be simplified quite a bit. The second option could be reduced to:
for i, topmember in enumerate(Matrix):
if topmember in fixedlist:
print i
You could also use a list comprehension to find the matching indices:
[i for i, x in enumerate(Matrix) if x in fixedlist]
If you just have to print the indices rather than store them in a list you can write it like this:
print '\n'.join([str(i) for i, x in enumerate(Matrix) if x in fixedlist])
It's a matter of taste whether you prefer the dense list comprehension one-liner, or the rather more verbose version above.
Hi in opeartor is used for membership testing and == operator is used for equality testing .
Generally we used in for membership testing in sequence object. And is able to test in dictionary, set, tuple, list, string etc. But it behaves differently based on the object types.
Dictionary:
It check for the key exists.
>>> d = {'key' : 'value'}
>>> 'key' in d
True
>>> 'k' in d
False
>>>
Set:
Under the hood it checks for key is exist, set implementation is same as dictionary with some dummy value.
>>> s = set(range(10))
>>> 1 in s
True
>>>
List and Tuple:
For the list and tuple types, x in y is true if and only if there exists an index i such that x == y[i] is true.
>>> l = range(10)
>>> 3 in l
True
>>>
String:
checking whether the substring is present inside the string eg. x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1
Use defined data type:
user-defined classes which define the __contains__() method, x in y is true if and only if y.__contains__(x) is true.
class Person(object):
def __init__(self,name,age):
self.name = name
self.age = age
def __contains__(self, arg):
if arg in self.__dict__.keys():
return True
else:
return False
obj_p = Person('Jeff', 90)
print 'Jeff', 'Jeff' in obj_p
print 'age', 'age' in obj_p
print 'name', 'age' in obj_p
I Hope, you will clear some what is the usage of in.
Lets rewrite your snippet:
>>> Matrix = ['kys_q1a1','kys_q1a2','kys_q1a3','kys_q1a4','kys_q1a5','kys_q1a6']
>>> fixedlist = ['kys_q1a2', 'kys_q1a5']
>>> for i in fixedlist:
... print i, i in Matrix
...
kys_q1a2 True
kys_q1a5 True
>>>
And finally lets see some of the equality test: ==:
>>> 'a' == 'b'
False
>>> 'a' == 'a'
True
>>> 'a' == 'ab'
False
>>> '' in 'ab' # empty string is treated as a sub-string for any string
True
>>> '' == 'ab' # False as they are having different values
False
>>>
>>> 1 == 'ab'
False
>>> 1 == 1
True
>>>
Going with '==' is precise if you want to match exact string.

elegant way to match two wildcarded strings

I'm OCRing some text from two different sources. They can each make mistakes in different places, where they won't recognize a letter/group of letters. If they don't recognize something, it's replaced with a ?. For example, if the word is Roflcopter, one source might return Ro?copter, while another, Roflcop?er. I'd like a function that returns whether two matches might be equivalent, allowing for multiple ?s. Example:
match("Ro?copter", "Roflcop?er") --> True
match("Ro?copter", "Roflcopter") --> True
match("Roflcopter", "Roflcop?er") --> True
match("Ro?co?er", "Roflcop?er") --> True
So far I can match one OCR with a perfect one by using regular expressions:
>>> def match(tn1, tn2):
tn1re = tn1.replace("?", ".{0,4}")
tn2re = tn2.replace("?", ".{0,4}")
return bool(re.match(tn1re, tn2) or re.match(tn2re, tn1))
>>> match("Roflcopter", "Roflcop?er")
True
>>> match("R??lcopter", "Roflcopter")
True
But this doesn't work when they both have ?s in different places:
>>> match("R??lcopter", "Roflcop?er")
False
Well, as long as one ? corresponds to one character, then I can suggest a performant and a compact enough method.
def match(str1, str2):
if len(str1) != len(str2): return False
for index, ch1 in enumerate(str1):
ch2 = str2[index]
if ch1 == '?' or ch2 == '?': continue
if ch1 != ch2: return False
return True
>>> ================================ RESTART ================================
>>>
>>> match("Roflcopter", "Roflcop?er")
True
>>> match("R??lcopter", "Roflcopter")
True
>>>
>>> match("R??lcopter", "Roflcop?er")
True
>>>
Edit: Part B), brain-fart free now.
def sets_match(set1, set2):
return any(match(str1, str2) for str1 in set1 for str2 in set2)
>>> ================================ RESTART ================================
>>>
>>> s1 = set(['a?', 'fg'])
>>> s2 = set(['?x'])
>>> sets_match(s1, s2) # a? = x?
True
>>>
Thanks to Hamish Grubijan for this idea. Every ? in my ocr'd names can be anywhere from 0 to 3 letters. What I do is expand each string to a list of possible expansions:
>>> list(expQuestions("?flcopt?"))
['flcopt', 'flcopt#', 'flcopt##', 'flcopt###', '#flcopt', '#flcopt#', '#flcopt##', '#flcopt###', '##flcopt', '##flcopt#', '##flcopt##', '##flcopt###', '###flcopt', '###flcopt#', '###flcopt##', '###flcopt###']
then I expand both and use his matching function, which I called matchats:
def matchOCR(l, r):
for expl in expQuestions(l):
for expr in expQuestions(r):
if matchats(expl, expr):
return True
return False
Works as desired:
>>> matchOCR("Ro?co?er", "?flcopt?")
True
>>> matchOCR("Ro?co?er", "?flcopt?z")
False
>>> matchOCR("Ro?co?er", "?flc?pt?")
True
>>> matchOCR("Ro?co?e?", "?flc?pt?")
True
The matching function:
def matchats(l, r):
"""Match two strings with # representing exactly 1 char"""
if len(l) != len(r): return False
for i, c1 in enumerate(l):
c2 = r[i]
if c1 == "#" or c2 == "#": continue
if c1 != c2: return False
return True
and the expanding function, where cartesian_product does just that:
def expQuestions(s):
"""For OCR w/ a questionmark in them, expand questions with
#s for all possibilities"""
numqs = s.count("?")
blah = list(s)
for expqs in cartesian_product([(0,1,2,3)]*numqs):
newblah = blah[:]
qi = 0
for i,c in enumerate(newblah):
if newblah[i] == '?':
newblah[i] = '#'*expqs[qi]
qi += 1
yield "".join(newblah)
Using the Levenshtein distance may be useful. It will give a value of how similar the strings are to each other. This will work if they are different lengths, too. The linked page has some psuedocode to get you started.
You'll end up with something like this:
>>> match("Roflcopter", "Roflcop?er")
1
>>> match("R??lcopter", "Roflcopter")
2
>>> match("R?lcopter", "Roflcop?er")
3
So you could have a maximum threshold below which you say they may match.
This might not be the most Pythonic of options, but if a ? is allowed to match any number of characters, then the following backtracking search does the trick:
def match(a,b):
def matcher(i,j):
if i == len(a) and j == len(b):
return True
elif i < len(a) and a[i] == '?' \
or j < len(b) and b[j] == '?':
return i < len(a) and matcher(i+1,j) \
or j < len(b) and matcher(i,j+1)
elif i == len(a) or j == len(b):
return False
else:
return a[i] == b[j] and matcher(i+1,j+1)
return matcher(0,0)
This may be adapted to be more stringent in what to match. Also, to save stack space, the final case (i+1,j+1) may be transformed into a non-recursive solution.
Edit: some more clarification in response to the reactions below. This is an adaptation of a naive matching algorithm for simplified regexes/NFAs (see Kernighan's contrib to Beautiful Code, O'Reilly 2007 or Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2009).
How it works: the matcher function recursively walks through both strings/patterns, starting at (0,0). It succeeds when it reaches the end of both strings (len(a),len(b)); it fails when it encounters two unequal characters or the end of one string while there are still characters to match in the other string.
When matcher encounters a variable (?) in either string (say a), it can do two things: either skip over the variable (matching zero characters), or skip over the next character in b but keep pointing to the variable in a, allowing it to match more characters.

Categories