Related
How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?
You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.
any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)
You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).
Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))
A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.
You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"
a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"
jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive
The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link
A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>
Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))
It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")
flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')
I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False
Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")
If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));
I need to create a function, where, If I give an input like 999933. It should give output as "ze". It basically work as numeric mobile phone keypad. How can I this. I have searched to get some sample in internet. All, I got was quite opposite. Like, Giving the text as input and you will get the number. I couldn't get the exact flow of, how to achieve that. Please let me know, how can i do that.
def number_to_text(val):
pass
Create a mapping pad_number to letter.
Use itertools.groupby to iterate over consecutive pad presses and calculate which letter we get.
import itertools
letters_by_pad_number = {"3": "def", "9": "wxyz"}
def number_to_text(val):
message = ""
# change val to string, so we can iterate over digits
digits = str(val)
# group consecutive numbers: itertools.groupby("2244") -> ('2', '22'), ('4','44')
for digit, group in itertools.groupby(digits):
# get the pad letters, i.e. "def" for "3" pad
letters = letters_by_pad_number[digit]
# get how many consecutive times it was pressed
presses_number = len(list(group))
# calculate the index of the letter cycling through if we pressed
# more that 3 times
letter_index = (presses_number - 1) % len(letters)
message += letters[letter_index]
return message
print(number_to_text(999933))
# ze
And hardcore one-liner just for fun:
letters = {"3": "def", "9": "wxyz"}
def number_to_text(val):
return "".join([letters[d][(len(list(g)) - 1) % len(letters[d])] for d, g in itertools.groupby(str(val))])
print(number_to_text(999933))
# ze
The other answers are correct, but I tried to write a less brief more real world (including doctests) explanation of how the previous results worked:
dialpad_text.py:
# Import the groupby function from itertools,
# this takes any sequence and returns an array of groups by some key
from itertools import groupby
# Use a dictionary as a lookup table
dailpad = {
'2': ['a', 'b', 'c'],
'3': ['d', 'e', 'f'],
'4': ['g', 'h', 'i'],
'5': ['j', 'k', 'l'],
'6': ['m', 'n', 'o'],
'7': ['p', 'q', 'r', 's'],
'8': ['t', 'u', 'v'],
'9': ['w', 'x', 'y', 'z'],
}
def dialpad_text(numbers):
"""
Takes in either a number or a string of numbers and creates
a string of characters just like a nokia without T9 support
Default usage:
>>> dialpad_text(2555222)
'alc'
Handle string inputs:
>>> dialpad_text('2555222')
'alc'
Handle wrapped groups:
>>> dialpad_text(2222555222)
'alc'
Throw an error if an invalid input is given
>>> dialpad_text('1BROKEN')
Traceback (most recent call last):
...
ValueError: Unrecognized input "1"
"""
# Convert to string if given a number
if type(numbers) == int:
numbers = str(numbers)
# Create our string output for the dialed numbers
output = ''
# Group each set of numbers in the order
# they appear and iterate over the groups.
# (eg. 222556 will result in [(2, [2, 2, 2]), (5, [5, 5]), (6, [6])])
# We can use the second element of each tuple to find
# our index into the dictionary at the given number!
for number, letters in groupby(numbers):
# Convert the groupby group generator into a list and
# get the offset into our array at the specified key
offset = len(list(letters)) - 1
# Check if the number is a valid dialpad key (eg. 1 for example isn't)
if number in dailpad.keys():
# Add the character to our output string and wrap
# if the number is greater than the length of the character list
output += dailpad[number][offset % len(dailpad[number])]
else:
raise ValueError(f'Unrecognized input "{number}"')
return output
Hope this helps you understand what's going on a lower level! Also if you don't trust my code, just save that to a file and run python -m doctest dialpad_text.py and it will pass the doctests from the module.
(Notes: without the -v flag it won't output anything, silence is golden!)
You need to
group the same digits together with the regex (\d)\1* that capture a digit then the same digit X times
use the value of a digit in the group to get the key
use the length of it to get the letter
phone_letters = ["", "", "abc", "def", "ghi", "jkl", "mno", "pqrs", "tuv", "wxyz"]
def number_to_text(val):
groups = [match.group() for match in re.finditer(r'(\d)\1*', val)]
result = ""
for group in groups:
keynumber = int(group[0])
count = len(group)
result += phone_letters[keynumber][count - 1]
return result
print(number_to_text("999933")) # ze
Using list comprehension
def number_to_text(val):
groups = [match.group() for match in re.finditer(r'(\d)\1*', val)]
return "".join(phone_letters[int(group[0])][len(group) - 1] for group in groups)
A slightly Modified answer of RaFalS without using itertools
import itertools
from collections import defaultdict
letters_by_pad_number = {"3": "def", "9": "wxyz"}
val = 999933
message = ""
digits = str(val)
num_group = defaultdict(int)
for digit in digits:
num_group[digit] += 1
for num in num_group.keys():
message += letters_by_pad_number[num][num_group[num]-1]
print(message)
# ze
I wanted to know how I could separate a text in the different letters it has without saving the same letter twice in python. So the output of a text like "hello" will be {'h','e',l','o'}, counting the letter l only once.
As the comments say, put your word in a set to remove duplicates:
>>> set("hello")
set(['h', 'e', 'l', 'o'])
Iterate through it (sets don't have order, so don't count on that):
>>> h = set("hello")
>>> for c in h:
... print(c)
...
h
e
l
o
Test if a character is in it:
>>> 'e' in h
True
>>> 'x' in h
False
There's a few ways to do this...
word = set('hello')
Or the following...
letters = []
for letter in "hello":
if letter not in letters:
letters.append(letter)
How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?
You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.
any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)
You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).
Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))
A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.
You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"
a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"
jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive
The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link
A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>
Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))
It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")
flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')
I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False
Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")
If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));
I am passing a function a list populated with strings. I want this function to take each of those strings and iterate through them, executing two different actions depending on the letters found in each string, then displaying them as the separate and now changed strings in a new list.
Specifically, when the program iterates through each string and finds a consonant, it should write that consonant in the order that it was found, into the new list. If the program finds a vowel in the current string, it should append 'xy' before the vowel, then the vowel itself.
As an example:
If the user input: "how now brown cow", the output of the function should be: "hxyow nxyow brxyown cxyow". I've tried nested for loops, nested while loops, and variations between. What's the best way to accomplish this? Cheers!
For every character in old string check if it is vowel or consonant and create new string accordingly.
old = "how now brown cow"
new = ""
for character in old:
if character in ('a', 'e', 'i', 'o', 'u'):
new = new + "xy" + character
else:
new = new + character
print(new)
I gave you the idea and now I leave it as exercise to make it work for list of strings. Also make appropriate changes if you are using python2.
>>> def xy(st):
... my_list,st1 =[],''
... for x in st:
... if x in 'aeiou':
... st1 += 'xy'+x
... elif x in 'cbdgfhkjmlnqpsrtwvyxz':
... my_list.append(x)
... st1 += x
... return my_list,st1
...
>>> my_string="how now brown cow"
>>> xy(my_string)
(['h', 'w', 'n', 'w', 'b', 'r', 'w', 'n', 'c', 'w'], 'hxyow nxyow brxyown cxyow')
In above function for iterates through string when it find vowel concatenate xy+vowel else it append consonant to list, at last returns list and string
A simple way to do this using a list comprehension:
old_str = "how now brown cow"
new_str = ''.join(["xy" + c if c in "aeiou" else c for c in old_str])
print new_str
But if you're processing a lot of data it'd be more efficient to use a set of vowels, eg
vowels = set("aeiou")
old_str = "how now brown cow"
new_str = ''.join(["xy" + c if c in vowels else c for c in old_str])
print new_str
Note that these programs simply copy all characters that aren't vowels, i.e., spaces, numbers and punctuation get treated as if they were consonants.