How to check for multi strings in a line [duplicate] - python

How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?

You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).

Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.

You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"

a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive

The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link

A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))

It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')

I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False

Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")

If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));

Related

python string match partial match [duplicate]

How can I check if any of the strings in an array exists in another string?
For example:
a = ['a', 'b', 'c']
s = "a123"
if a in s:
print("some of the strings found in s")
else:
print("no strings found in s")
How can I replace the if a in s: line to get the appropriate result?
You can use any:
a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]
if any([x in a_string for x in matches]):
Similarly to check if all the strings from the list are found, use all instead of any.
any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.
If you want the first match (with False as a default):
match = next((x for x in a if x in str), False)
If you want to get all matches (including duplicates):
matches = [x for x in a if x in str]
If you want to get all non-duplicate matches (disregarding order):
matches = {x for x in a if x in str}
If you want to get all non-duplicate matches in the right order:
matches = []
for x in a:
if x in str and x not in matches:
matches.append(x)
You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).
Just to add some diversity with regex:
import re
if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
print 'possible matches thanks to regex'
else:
print 'no matches'
or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))
A surprisingly fast approach is to use set:
a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
print("some of the strings found in str")
else:
print("no strings found in str")
This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.
You need to iterate on the elements of a.
a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:
if item in str:
found_a_string = True
if found_a_string:
print "found a match"
else:
print "no match found"
a = ['a', 'b', 'c']
str = "a123"
a_match = [True for match in a if match in str]
if True in a_match:
print "some of the strings found in str"
else:
print "no strings found in str"
jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.
Here is one way to use it in Python:
Download aho_corasick.py from here
Put it in the same directory as your main Python file and name it aho_corasick.py
Try the alrorithm with the following code:
from aho_corasick import aho_corasick #(string, keywords)
print(aho_corasick(string, ["keyword1", "keyword2"]))
Note that the search is case-sensitive
The regex module recommended in python docs, supports this
words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)
output:
['he', 'low', 'or']
Some details on implementation: link
A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.
>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring) # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>
Just some more info on how to get all list elements availlable in String
a = ['a', 'b', 'c']
str = "a123"
list(filter(lambda x: x in str, a))
It depends on the context
suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough
original_word ="hackerearcth"
for 'h' in original_word:
print("YES")
if you want to check any of the character among the original_word:
make use of
if any(your_required in yourinput for your_required in original_word ):
if you want all the input you want in that original_word,make use of all
simple
original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
print("yes")
flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
for fstr in strlist:
if line.find(fstr) != -1:
print('found')
res = True
if res:
print('res true')
else:
print('res false')
I would use this kind of function for speed:
def check_string(string, substring_list):
for substring in substring_list:
if substring in string:
return True
return False
Yet another solution with set. using set.intersection. For a one-liner.
subset = {"some" ,"words"}
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
print("All values present in text")
if subset & set(text.split()):
print("Atleast one values present in text")
If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:
from nltk.tokenize import word_tokenize
Here is the tokenized string from the accepted answer:
a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']
The accepted answer gets modified as follows:
matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]
As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.
matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]
Using word tokenization, "mo" is no longer matched:
[x in tokens for x in matches_2]
Out[44]: [False, False, False]
That is the additional behavior that I wanted. This answer also responds to the duplicate question here.
data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']
# for each
for field in mandatory_fields:
if field not in data:
print("Error, missing req field {0}".format(field));
# still fine, multiple if statements
if ('firstName' not in data or
'lastName' not in data or
'age' not in data):
print("Error, missing a req field");
# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
print("Error, missing fields {0}".format(", ".join(missing_fields)));

Removing specific set of characters in a list of strings

I have a list of strings, and want to use another list of strings and remove any instance of the combination of bad list in my list. Such as the output of the below would be foo, bar, foobar, foofoo... Currently I have tried a few things for example below
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
for remove in remove_list:
for strings in mylist:
strings = strings.replace(bad, ' ')
The above code doesnt work, I did at one point set it to a new variable and append that afterwords but that wasnt working well becuase if their was two issues in a string it would be appended twice.
You changed the temporary variable, not the original list. Instead, assign the result back into mylist
for bad in remove_list:
for pos, string in enumerate(mylist):
mylist[pos] = string.replace(bad, ' ')
Try this:
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
bads = ['\\n', '!', '*', '?', ':']
result = []
for s in mylist:
# s is a temporary copy
for bad in bads:
s = s.replace(bad, '') # for all bad remove it
result.append(s)
print(result)
Could be implemented more concise, but this way it's more understandable.
I had a hard time interpreting the question, but I see you have the result desired at the top of your question.
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = ['\\n', '!', '*', '?', ':']
output = output[]
for strings in mylist:
for remove in remove_list:
strings = strings.replace(remove, '')
output.append(strings)
import re
for list1 in mylist:
t = regex.sub('', list1)
print(t)
If you just want to get rid of non-chars do this. It works a lot better than comparing two separate array lists.
Why not have regex do the work for you? No nested loops this way (just make sure to escape correctly):
import re
mylist = ['foo!', 'bar\\n', 'foobar!!??!!', 'foofoo::!*']
remove_list = [r'\\n', '\!', '\*', '\?', ':']
removals = re.compile('|'.join(remove_list))
print([removals.sub('', s) for s in mylist])
['foo', 'bar', 'foobar', 'foofoo']
Another solution you can use is a comprehension list and remove the characters you want. After that, you delete duplicates.
list_good = [word.replace(bad, '') for word in mylist for bad in remove_list]
list_good = list(set(list_good))
my_list = ["foo!", "bar\\n", "foobar!!??!!", "foofoo::*!"]
to_remove = ["!", "\\n", "?", ":", "*"]
for index, item in enumerate(my_list):
for char in to_remove:
if char in item:
item = item.replace(char, "")
my_list[index] = item
print(my_list) # outputs [“foo”,”bar”,”foobar”,”foofoo”]

Splitting a string with numbers and letters [duplicate]

I'd like to split strings like these
'foofo21'
'bar432'
'foobar12345'
into
['foofo', '21']
['bar', '432']
['foobar', '12345']
Does somebody know an easy and simple way to do this in python?
I would approach this by using re.match in the following way:
import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")
def mysplit(s):
head = s.rstrip('0123456789')
tail = s[len(head):]
return head, tail
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
Yet Another Option:
>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]
>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'
So, if you have a list of strings with that format:
import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]
Output:
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
I'm always the one to bring up findall() =)
>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
Note that I'm using a simpler (less to type) regex than most of the previous answers.
here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,
def seperate_string_number(string):
previous_character = string[0]
groups = []
newword = string[0]
for x, i in enumerate(string[1:]):
if i.isalpha() and previous_character.isalpha():
newword += i
elif i.isnumeric() and previous_character.isnumeric():
newword += i
else:
groups.append(newword)
newword = i
previous_character = i
if x == len(string) - 2:
groups.append(newword)
newword = ''
return groups
print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']
import re
s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)
without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number
def text_num_split(item):
for index, letter in enumerate(item, 0):
if letter.isdigit():
return [item[:index],item[index:]]
print(text_num_split("foobar12345"))
OUTPUT :
['foobar', '12345']
This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.
def getNumbers( input ):
# Collect Info
compile = ""
complete = []
for letter in input:
# If compiled string
if compile:
# If compiled and letter are same type, append letter
if compile.isdigit() == letter.isdigit():
compile += letter
# If compiled and letter are different types, append compiled string, and begin with letter
else:
complete.append( compile )
compile = letter
# If no compiled string, begin with letter
else:
compile = letter
# Append leftover compiled string
if compile:
complete.append( compile )
# Return numbers only
numbers = [ word for word in complete if word.isdigit() ]
return numbers
Here is simple solution for that problem, no need for regex:
user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []
for item in user:
try:
item = int(item) # searching for integers in your string
except:
str_list.append(item)
string = ''.join(str_list)
else: # if there are integers i will add it to int_list but as str, because join function only can work with str
int_list.append(str(item))
integer = int(''.join(int_list)) # if you want it to be string just do z = ''.join(int_list)
final = [string, integer] # you can also add it to dictionary d = {string: integer}
print(final)
In Addition to the answer of #Evan
If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.
import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
items = match.groups()
print(items)
>> ("21", "foofo")
Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.

Making sure only a particular group of characters are in a string

Is there any way to make sure that only the characters 'm' 'c' 'b' are in a string without resorting to regex?
For instance, if the user inputs 'm', the program will print 'Major'. If the user inputs 'mc', the program will print 'Major, Critical'.
So I want to make sure that if the user inputs something like 'mca', the program will print 'Not applicable'.
try:
if 'a' in args.findbugs:
if len(args.findbugs) > 1:
print 'findbugs: Not an applicable argument.'
else:
print 'FINDBUGS:ALL'
else:
if 'm' in args.findbugs:
print 'FINDBUGS:MAJOR'
if 'c' in args.findbugs:
print 'FINDBUGS:CRITICAL'
if 'b' in args.findbugs:
print 'FINDBUGS:BLOCKER'
except TypeError:
print "FINDBUGS: NONE"
Well, the simplest way from what you've described would be:
some_string = 'mca'
if set(some_string) <= {'m', 'c', 'b'}:
# The string contains only 'm', 'c', or 'b'.
else:
# The string 'mca' does not match because of 'a'.
Or, if you intend to require at least m, c, or b:
some_string = 'mca'
if set(some_string) & {'m', 'c', 'b'}:
# The string contains 'm', 'c', or 'b', so 'mca' will match.
NOTE: As pointed out by bgporter, the set literal notation is not available in Python versions less than 2.7. If support for those is required, use set(('m', 'c', 'b')).
This is a way to check it in linear time.
s = "blabla"
l = 'mcb'
print all(x in l for x in s)
Crude, but this would return what you need.
input not in itertools.combinations('mcb', 1) + itertools.combinations('mcb', 2) + itertools.combinations('mcb', 3)
arg_dict = {"m":'FINDBUGS:MAJOR',"c": 'FINDBUGS:CRITICAL',"b": 'FINDBUGS:BLOCKER'}
accepted =["m","c","b"]
user_args = "bccm"
if all(x in accepted for x in user_args):
for x in set(user_args):
print (arg_dict.get(x),
else:
print ("FINDBUGS: NONE")
FINDBUGS:CRITICAL FINDBUGS:BLOCKER FINDBUGS:MAJOR
If you want them in order sort the input:
accepted =["m","c","b"]
user_args = "bcm"
if all(x in accepted for x in user_args):
user_args = sorted(set(user_args),key=lambda x: accepted.index(x))
for x in user_args:
print "{} ".format((format(arg_dict.get(x)))),
else:
print ("FINDBUGS: NONE")
FINDBUGS:MAJOR FINDBUGS:CRITICAL FINDBUGS:BLOCKER

How to convert a string with comma-delimited items to a list in Python?

How do you convert a string into a list?
Say the string is like text = "a,b,c". After the conversion, text == ['a', 'b', 'c'] and hopefully text[0] == 'a', text[1] == 'b'?
Like this:
>>> text = 'a,b,c'
>>> text = text.split(',')
>>> text
[ 'a', 'b', 'c' ]
Just to add on to the existing answers: hopefully, you'll encounter something more like this in the future:
>>> word = 'abc'
>>> L = list(word)
>>> L
['a', 'b', 'c']
>>> ''.join(L)
'abc'
But what you're dealing with right now, go with #Cameron's answer.
>>> word = 'a,b,c'
>>> L = word.split(',')
>>> L
['a', 'b', 'c']
>>> ','.join(L)
'a,b,c'
The following Python code will turn your string into a list of strings:
import ast
teststr = "['aaa','bbb','ccc']"
testarray = ast.literal_eval(teststr)
I don't think you need to
In python you seldom need to convert a string to a list, because strings and lists are very similar
Changing the type
If you really have a string which should be a character array, do this:
In [1]: x = "foobar"
In [2]: list(x)
Out[2]: ['f', 'o', 'o', 'b', 'a', 'r']
Not changing the type
Note that Strings are very much like lists in python
Strings have accessors, like lists
In [3]: x[0]
Out[3]: 'f'
Strings are iterable, like lists
In [4]: for i in range(len(x)):
...: print x[i]
...:
f
o
o
b
a
r
TLDR
Strings are lists. Almost.
In case you want to split by spaces, you can just use .split():
a = 'mary had a little lamb'
z = a.split()
print z
Output:
['mary', 'had', 'a', 'little', 'lamb']
If you actually want arrays:
>>> from array import array
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> myarray = array('c', text)
>>> myarray
array('c', 'abc')
>>> myarray[0]
'a'
>>> myarray[1]
'b'
If you do not need arrays, and only want to look by index at your characters, remember a string is an iterable, just like a list except the fact that it is immutable:
>>> text = "a,b,c"
>>> text = text.replace(',', '')
>>> text[0]
'a'
m = '[[1,2,3],[4,5,6],[7,8,9]]'
m= eval(m.split()[0])
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
All answers are good, there is another way of doing, which is list comprehension, see the solution below.
u = "UUUDDD"
lst = [x for x in u]
for comma separated list do the following
u = "U,U,U,D,D,D"
lst = [x for x in u.split(',')]
I usually use:
l = [ word.strip() for word in text.split(',') ]
the strip remove spaces around words.
To convert a string having the form a="[[1, 3], [2, -6]]" I wrote yet not optimized code:
matrixAr = []
mystring = "[[1, 3], [2, -4], [19, -15]]"
b=mystring.replace("[[","").replace("]]","") # to remove head [[ and tail ]]
for line in b.split('], ['):
row =list(map(int,line.split(','))) #map = to convert the number from string (some has also space ) to integer
matrixAr.append(row)
print matrixAr
split() is your friend here. I will cover a few aspects of split() that are not covered by other answers.
If no arguments are passed to split(), it would split the string based on whitespace characters (space, tab, and newline). Leading and trailing whitespace is ignored. Also, consecutive whitespaces are treated as a single delimiter.
Example:
>>> " \t\t\none two three\t\t\tfour\nfive\n\n".split()
['one', 'two', 'three', 'four', 'five']
When a single character delimiter is passed, split() behaves quite differently from its default behavior. In this case, leading/trailing delimiters are not ignored, repeating delimiters are not "coalesced" into one either.
Example:
>>> ",,one,two,three,,\n four\tfive".split(',')
['', '', 'one', 'two', 'three', '', '\n four\tfive']
So, if stripping of whitespaces is desired while splitting a string based on a non-whitespace delimiter, use this construct:
words = [item.strip() for item in string.split(',')]
When a multi-character string is passed as the delimiter, it is taken as a single delimiter and not as a character class or a set of delimiters.
Example:
>>> "one,two,three,,four".split(',,')
['one,two,three', 'four']
To coalesce multiple delimiters into one, you would need to use re.split(regex, string) approach. See the related posts below.
Related
string.split() - Python documentation
re.split() - Python documentation
Split string based on regex
Split string based on a regular expression
# to strip `,` and `.` from a string ->
>>> 'a,b,c.'.translate(None, ',.')
'abc'
You should use the built-in translate method for strings.
Type help('abc'.translate) at Python shell for more info.
Using functional Python:
text=filter(lambda x:x!=',',map(str,text))
Example 1
>>> email= "myemailid#gmail.com"
>>> email.split()
#OUTPUT
["myemailid#gmail.com"]
Example 2
>>> email= "myemailid#gmail.com, someonsemailid#gmail.com"
>>> email.split(',')
#OUTPUT
["myemailid#gmail.com", "someonsemailid#gmail.com"]

Categories