I've split a string by [ and ] but I want these characters to still appear. How do I do this?
words = [beginning for ending in x.split('[') for beginning in ending.split(']')]
I think you need re.split to do this easily:
>>> import re
>>> s = 'Hello, my name is [name] and I am [age] years old'
>>> re.split(r'(\[|\])', s)
['Hello, my name is ', '[', 'name', ']', ' and I am ', '[', 'age', ']', ' years old']
Would need to know more about the context of your list and what x, beginning, and ending are, but here are some suggestions.
You can add [ and ] to each item in the list, and return a new list, like this:
["[%s]" % s for s in some_list]
Or, string.join will return a string from the items in a list joined by a given string:
"[".join(some_list)
Related
This question already has answers here:
Removing a list of characters in string
(20 answers)
Closed 2 years ago.
I have a search string If the character inside the search matches then replace with None
sear = '!%'
special_characters = ['!', '"', '#', '$', '%','(',')']
for remove_char in special_characters:
search_value = re.sub(remove_char, '', sear)
My out got error
Expected out is None
sear = 'ABC!%DEF'
Expected is 'ABCDEF'
sear = 'ABC,DEF'
Expected is 'ABC,DEF'
Just do a list comprehension and ''.join:
sear = '!%'
special_characters = ['!', '"', '#', '$', '%']
sear = ''.join([i for i in sear if i not in special_characters])
print(sear)
This code iterates the string by characters, and see if the character is not in the special_characters list, if it's not, it keeps it, if it is, it removes, but that only gives us a list of strings, so we need ''.join to change it into a string.
You can make a regex character class out of your special characters and then use a regex substitution to do the replacements. For longer strings or larger lists of special characters, you should find this runs 2-3x faster than the list comprehension solution.
import re
special_characters = ['!', '"', '#', '$', '%','(',')']
regex = re.compile('[' + ''.join(f'\{c}' for c in special_characters) + ']')
sear = '!%'
search_value = regex.sub('', sear)
print(search_value)
sear = 'ABC!%DEF'
search_value = regex.sub('', sear)
print(search_value)
sear = 'ABC,DEF'
search_value = regex.sub('', sear)
print(search_value)
Output:
<blank line>
ABCDEF
ABC,DEF
Note I've prefixed all characters in the character class with \ so that you don't have to worry about using characters such as - and ] which have special meaning within a character class.
What is the best way to split a string like
text = "hello there how are you"
in Python?
So I'd end up with an array like such:
['hello there', 'there how', 'how are', 'are you']
I have tried this:
liste = re.findall('((\S+\W*){'+str(2)+'})', text)
for a in liste:
print(a[0])
But I'm getting:
hello there
how are
you
How can I make the findall function move only one token when searching?
Here's a solution with re.findall:
>>> import re
>>> text = "hello there how are you"
>>> re.findall(r"(?=(?:(?:^|\W)(\S+\W\S+)(?:$|\W)))", text)
['hello there', 'there how', 'how are', 'are you']
Have a look at the Python docs for re: https://docs.python.org/3/library/re.html
(?=...) Lookahead assertion
(?:...) Non-capturing regular parentheses
If regex isn't require you could do something like:
l = text.split(' ')
out = []
for i in range(len(l)):
try:
o.append(l[i] + ' ' + l[i+1])
except IndexError:
continue
Explanation:
First split the string on the space character. The result will be a list where each element is a word in the sentence. Instantiate an empty list to hold the result. Loop over the list of words adding the two word combinations seperated by a space to the output list. This will throw an IndexError when accessing the last word in the list, just catch it and continue since you don't seem to want that lone word in your result anyway.
I don't think you actually need regex for this.
I understand you want a list, in which each element contains two words, the latter also being the former of the following element. We can do this easily like this:
string = "Hello there how are you"
liste = string.split(" ").pop(-1)
# we remove the last index, as otherwise we'll crash, or have an element with only one word
for i in range(len(liste)-1):
liste[i] = liste[i] + " " + liste[i+1]
I don't know if it's mandatory for you need to use regex, but I'd do this way.
First, you can get the list of words with the str.split() method.
>>> sentence = "hello there how are you"
>>> splited_sentence = sentence.split(" ")
>>> splited_sentence
['hello', 'there', 'how', 'are', 'you']
Then, you can make pairs.
>>> output = []
>>> for i in range (1, len(splited_sentence) ):
... output += [ splited[ i-1 ] + ' ' + splited_sentence[ i ] ]
...
output
['hello there', 'there how', 'how are', 'are you']
An alternative is just to split, zip, then join like so...
sentence = "Hello there how are you"
words = sentence.split()
[' '.join(i) for i in zip(words, words[1:])]
Another possible solution using findall.
>>> liste = list(map(''.join, re.findall(r'(\S+(?=(\s+\S+)))', text)))
>>> liste
['hello there', 'there how', 'how are', 'are you']
Currently I use this helper function to remove the empty entries.
Is there a built-in way for this?
def getNonEmptyList(str, splitSym):
lst=str.split(splitSym)
lst1=[]
for entry in lst:
if entry.strip() !='':
lst1.append(entry)
return lst1
str.split(sep=None, maxsplit=-1)
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
For example:
>>> '1 2 3'.split()
['1', '2', '3']
>>> '1 2 3'.split(maxsplit=1)
['1', '2 3']
>>> ' 1 2 3 '.split()
['1', '2', '3']
This split could be done more compactly with a comprehension like:
def getNonEmptyList(str, splitSym):
return [s for s in str.split(splitSym) if s.strip() != '']
You could use filter
def get_non_empty_list(s, delimiter):
return list(filter(str.strip, s.split(delimiter)))
If you want to split text by newline and remove any empty lines here's one liner :)
lines = [l for l in text.split('\n') if l.strip()]
You can use a regex to capture the extra whitespace.
import re
split_re = r'\s*{}\s*'.format(splitSym)
return re.split(split_re, string)
I found that I can join them with '-'.join(name) but I dont want to add any character. Lets say I have
['stanje1', '|', 'st6', ',' 'stanje2', '|', '#']
and I want to be like this
stanje1|st6,stanje2|#
Just ommit the -:
''.join(name)
In that case, you can just do it as:
''.join(name)
>>> name = ['stanje1', '|', 'st6', ',' 'stanje2', '|', '#']
>>> print ''.join(name)
stanje1|st6,stanje2|#
This will join the string with no intermediate string.
Examples
>>> s = ['Hello', 'World']
>>> print ''.join(s)
HelloWorld
>>> print '-'.join(s)
Hello-World
guys, I'm a programming newbie trying to improve the procedure bellow in a way that when I pass it this argument: split_string("After the flood ... all the colors came out."," .") it returns it:
['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
and not this:
['After', 'the', 'flood', '', '', '', '', 'all', 'the', 'colors', 'came', 'out', '']
Any hint of how to do this? (I could just iterate again the list and delete the '' elements, but I wanted a more elegant solution)
This is the procedure:
def split_string(source, separatorList):
splited = [source]
for separator in splitlist:
source = splited
splited = []
print 'separator= ', separator
for sequence in source:
print 'sequence = ', sequence
if sequence not in splitlist and sequence != ' ':
splited = splited + sequence.split(separator)
return splited
print split_string("This is a test-of the,string separation-code!", " ,!-")
print
print split_string("After the flood ... all the colors came out."," .")
You can filter out the empty strings in the return statement:
return [x for x in split if x]
As a side note, I think it would be easier to write your function based on re.split():
def split_string(s, separators):
pattern = "|".join(re.escape(sep) for sep in separators)
return [x for x in re.split(pattern, s) if x]
print re.split('[. ]+', 'After the flood ... all the colors came out.')
or, better, the other way round
print re.findall('[^. ]+', 'After the flood ... all the colors came out.')
Let's see where did the empty strings come from first, try to execute this in shell:
>>> 'After the'.split(' ')
result:
['After', '', 'the']
This was because when split method came to ' ' in the string, it find nothing but '' between two spaces.
So the solution is simple, just check the boolean value of every item get from .split(
def split_string(source, separatorList):
splited = [source]
for separator in separatorList:
# if you want to exchange two variables, then write in one line can make the code more clear
source, splited = splited, []
for sequence in source:
# there's no need to check `sequence` in advance, just split it
# if sequence not in separatorList and sequence != ' ':
# splited = splited + sequence.split(separator)
# code to prevent appearance of `''` is here, do a if check in list comprehension.
# `+=` is equivalent to `= splited +`
splited += [i for i in sequence.split(separator) if i]
return splited
More details about [i for i in a_list if i] see PEP 202