Related
How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?
You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.
any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)
You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).
Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))
A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.
You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"
a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"
jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive
The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link
A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>
Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))
It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")
flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')
I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False
Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")
If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));
How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?
You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.
any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)
You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).
Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))
A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.
You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"
a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"
jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive
The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link
A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>
Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))
It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")
flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')
I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False
Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")
If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));
I have a string with the following structure.
string = "[abcd, abc, a, b, abc]"
I would like to convert that into an array. I keep using the split function in Python but I get spaces and the brackets on the start and the end of my new array. I tried working around it with some if statements but I keep missing letters in the end from some words.
Keep in mind that I don't know the length of the elements in the string. It could be 1, 2, 3 etc.
Assuming your elements never end or start with spaces or square brackets, you could just strip them out (the bracket can be stripped out before splitting):
arr = [ x.strip() for x in string.strip('[]').split(',') ]
It gives as expected
print (arr)
['abcd', 'abc', 'a', 'b', 'abc']
The nice part with strip is that it leaves all inner characters untouched. With:
string = "[ab cd, a[b]c, a, b, abc]"
You get: ['ab cd', 'a[b]c', 'a', 'b', 'abc']
You can also do this
>>> s = string[1:len(string)-1].split(", ")
>>> s
['abcd', 'abc', 'a', 'b', 'abc']
If the values in this list are variables themselves (looks like it because they're not quoted) the easiest way to convert this string to the equivalent list is
string = eval(string)
Caution: If the values in your list should be strings this will not work.
another way to solve this problem
string = "[abcd, abc, a, b, abc]"
result = string[1:len(string)-1].split(", ")
print(result)
Hope this helps
First remove [ and ] from your string, then split on commas, then remove spaces from resulting items (using strip).
If you do not want to use strip, it can be done by following rather clumsy way:
arr = [e[1:] for e in string.split(',')]
arr[len(arr)-1]=arr[len(arr)-1].replace(']', '')
print(arr)
['abcd', 'abc', 'a', 'b', 'abc']
I would suggest following.
[list_element.strip() for list_element in string.strip("[]").split(",")]
First remove brackets and then split it accordingly.
I'm trying to catch if one letter that appears twice in a string using RegEx (or maybe there's some better ways?), for example my string is:
ugknbfddgicrmopn
The output would be:
dd
However, I've tried something like:
re.findall('[a-z]{2}', 'ugknbfddgicrmopn')
but in this case, it returns:
['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn'] # the except output is `['dd']`
I also have a way to get the expect output:
>>> l = []
>>> tmp = None
>>> for i in 'ugknbfddgicrmopn':
... if tmp != i:
... tmp = i
... continue
... l.append(i*2)
...
...
>>> l
['dd']
>>>
But that's too complex...
If it's 'abbbcppq', then only catch:
abbbcppq
^^ ^^
So the output is:
['bb', 'pp']
Then, if it's 'abbbbcppq', catch bb twice:
abbbbcppq
^^^^ ^^
So the output is:
['bb', 'bb', 'pp']
You need use capturing group based regex and define your regex as raw string.
>>> re.search(r'([a-z])\1', 'ugknbfddgicrmopn').group()
'dd'
>>> [i+i for i in re.findall(r'([a-z])\1', 'abbbbcppq')]
['bb', 'bb', 'pp']
or
>>> [i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]
['bb', 'bb', 'pp']
Note that , re.findall here should return the list of tuples with the characters which are matched by the first group as first element and the second group as second element. For our case chars within first group would be enough so I mentioned i[0].
As a Pythonic way You can use zip function within a list comprehension:
>>> s = 'abbbcppq'
>>>
>>> [i+j for i,j in zip(s,s[1:]) if i==j]
['bb', 'bb', 'pp']
If you are dealing with large string you can use iter() function to convert the string to an iterator and use itertols.tee() to create two independent iterator, then by calling the next function on second iterator consume the first item and use call the zip class (in Python 2.X use itertools.izip() which returns an iterator) with this iterators.
>>> from itertools import tee
>>> first = iter(s)
>>> second, first = tee(first)
>>> next(second)
'a'
>>> [i+j for i,j in zip(first,second) if i==j]
['bb', 'bb', 'pp']
Benchmark with RegEx recipe:
# ZIP
~ $ python -m timeit --setup "s='abbbcppq'" "[i+j for i,j in zip(s,s[1:]) if i==j]"
1000000 loops, best of 3: 1.56 usec per loop
# REGEX
~ $ python -m timeit --setup "s='abbbcppq';import re" "[i[0] for i in re.findall(r'(([a-z])\2)', 'abbbbcppq')]"
100000 loops, best of 3: 3.21 usec per loop
After your last edit as mentioned in comment if you want to only match one pair of b in strings like "abbbcppq" you can use finditer() which returns an iterator of matched objects, and extract the result with group() method:
>>> import re
>>>
>>> s = "abbbcppq"
>>> [item.group(0) for item in re.finditer(r'([a-z])\1',s,re.I)]
['bb', 'pp']
Note that re.I is the IGNORECASE flag which makes the RegEx match the uppercase letters too.
Using back reference, it is very easy:
import re
p = re.compile(ur'([a-z])\1{1,}')
re.findall(p, u"ugknbfddgicrmopn")
#output: [u'd']
re.findall(p,"abbbcppq")
#output: ['b', 'p']
For more details, you can refer to a similar question in perl: Regular expression to match any character being repeated more than 10 times
It is pretty easy without regular expressions:
In [4]: [k for k, v in collections.Counter("abracadabra").items() if v==2]
Out[4]: ['b', 'r']
Maybe you can use the generator to achieve this
def adj(s):
last_c = None
for c in s:
if c == last_c:
yield c * 2
last_c = c
s = 'ugknbfddgicrmopn'
v = [x for x in adj(s)]
print(v)
# output: ['dd']
"or maybe there's some better ways"
Since regex is often misunderstood by the next developer to encounter your code (may even be you),
And since simpler != shorter,
How about the following pseudo-code:
function findMultipleLetters(inputString) {
foreach (letter in inputString) {
dictionaryOfLettersOccurrance[letter]++;
if (dictionaryOfLettersOccurrance[letter] == 2) {
multipleLetters.add(letter);
}
}
return multipleLetters;
}
multipleLetters = findMultipleLetters("ugknbfddgicrmopn");
A1 = "abcdededdssffffccfxx"
print A1[1]
for i in range(len(A1)-1):
if A1[i+1] == A1[i]:
if not A1[i+1] == A1[i-1]:
print A1[i] *2
>>> l = ['ug', 'kn', 'bf', 'dd', 'gi', 'cr', 'mo', 'pn']
>>> import re
>>> newList = [item for item in l if re.search(r"([a-z]{1})\1", item)]
>>> newList
['dd']
How do you convert a string into a list?
Say the string is like text = "a,b,c". After the conversion, text == ['a', 'b', 'c'] and hopefully text[0] == 'a', text[1] == 'b'?
Like this:
>>> text = 'a,b,c'
>>> text = text.split(',')
>>> text
[ 'a', 'b', 'c' ]
Just to add on to the existing answers: hopefully, you'll encounter something more like this in the future:
>>> word = 'abc'
>>> L = list(word)
>>> L
['a', 'b', 'c']
>>> ''.join(L)
'abc'
But what you're dealing with right now, go with #Cameron's answer.
>>> word = 'a,b,c'
>>> L = word.split(',')
>>> L
['a', 'b', 'c']
>>> ','.join(L)
'a,b,c'
The following Python code will turn your string into a list of strings:
import ast
teststr = "['aaa','bbb','ccc']"
testarray = ast.literal_eval(teststr)
I don't think you need to
In python you seldom need to convert a string to a list, because strings and lists are very similar
Changing the type
If you really have a string which should be a character array, do this:
In [1]: x = "foobar"
In [2]: list(x)
Out[2]: ['f', 'o', 'o', 'b', 'a', 'r']
Not changing the type
Note that Strings are very much like lists in python
Strings have accessors, like lists
In [3]: x[0]
Out[3]: 'f'
Strings are iterable, like lists
In [4]: for i in range(len(x)):
...: print x[i]
...:
f
o
o
b
a
r
TLDR
Strings are lists. Almost.
In case you want to split by spaces, you can just use .split():
a = 'mary had a little lamb'
z = a.split()
print z
Output:
['mary', 'had', 'a', 'little', 'lamb']
If you actually want arrays:
>>> from array import array
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> myarray = array('c', text)
>>> myarray
array('c', 'abc')
>>> myarray[0]
'a'
>>> myarray[1]
'b'
If you do not need arrays, and only want to look by index at your characters, remember a string is an iterable, just like a list except the fact that it is immutable:
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> text[0]
'a'
m = '[[1,2,3],[4,5,6],[7,8,9]]'
m= eval(m.split()[0])
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
All answers are good, there is another way of doing, which is list comprehension, see the solution below.
u = "UUUDDD"
lst = [x for x in u]
for comma separated list do the following
u = "U,U,U,D,D,D"
lst = [x for x in u.split(',')]
I usually use:
l = [ word.strip() for word in text.split(',') ]
the strip remove spaces around words.
To convert a string having the form a="[[1, 3], [2, -6]]" I wrote yet not optimized code:
matrixAr = []
mystring = "[[1, 3], [2, -4], [19, -15]]"
b=mystring.replace("[[","").replace("]]","") # to remove head [[ and tail ]]
for line in b.split('], ['):
row =list(map(int,line.split(','))) #map = to convert the number from string (some has also space ) to integer
matrixAr.append(row)
print matrixAr
split() is your friend here. I will cover a few aspects of split() that are not covered by other answers.
If no arguments are passed to split(), it would split the string based on whitespace characters (space, tab, and newline). Leading and trailing whitespace is ignored. Also, consecutive whitespaces are treated as a single delimiter.
Example:
>>> " \t\t\none two three\t\t\tfour\nfive\n\n".split()
['one', 'two', 'three', 'four', 'five']
When a single character delimiter is passed, split() behaves quite differently from its default behavior. In this case, leading/trailing delimiters are not ignored, repeating delimiters are not "coalesced" into one either.
Example:
>>> ",,one,two,three,,\n four\tfive".split(',')
['', '', 'one', 'two', 'three', '', '\n four\tfive']
So, if stripping of whitespaces is desired while splitting a string based on a non-whitespace delimiter, use this construct:
words = [item.strip() for item in string.split(',')]
When a multi-character string is passed as the delimiter, it is taken as a single delimiter and not as a character class or a set of delimiters.
Example:
>>> "one,two,three,,four".split(',,')
['one,two,three', 'four']
To coalesce multiple delimiters into one, you would need to use re.split(regex, string) approach. See the related posts below.
Related
string.split() - Python documentation
re.split() - Python documentation
Split string based on regex
Split string based on a regular expression
# to strip `,` and `.` from a string ->
>>> 'a,b,c.'.translate(None, ',.')
'abc'
You should use the built-in translate method for strings.
Type help('abc'.translate) at Python shell for more info.
Using functional Python:
text=filter(lambda x:x!=',',map(str,text))
Example 1
>>> email= "myemailid#gmail.com"
>>> email.split()
#OUTPUT
["myemailid#gmail.com"]
Example 2
>>> email= "myemailid#gmail.com, someonsemailid#gmail.com"
>>> email.split(',')
#OUTPUT
["myemailid#gmail.com", "someonsemailid#gmail.com"]