python - splitting a string without removing delimiters - python

I'm trying to split a string without removing the delimiter and having trouble doing so. The string I want to split is:
'+ {- 9 4} {+ 3 2}'
and I want to end up with
['+', '{- 9 4}', '{+ 3 2}']
yet everything I've tried hasn't worked. I was looking through this stackoverflow post for answers as well as google: Python split() without removing the delimiter
Thanks!

re.split will keep the delimiters when they are captured, i.e., enclosed in parentheses:
import re
s = '+ {- 9 4} {+ 3 2}'
p = filter(lambda x: x.strip() != '', re.split("([+{} -])", s))
will give you
['+', '{', '-', '9', '4', '}', '{', '+', '3', '2', '}']
which, IMO, is what you need to handle nested expressions

Related

Creating a list given an equation with no spaces

I want to create a list given a string such as 'b123+xyz=1+z1$' so that the list equals ['b123', '+', 'xyz', '=', '1', '+', 'z1', '$']
Without spaces or a single repeating pattern, I do not know how to split the string into a list.
I tried creating if statements in a for loop to append the string when it reaches a character that is not a digit or letter through isdigit and isalpha but could not differentiate between variables and digits.
You can use a regular expression to split your string. This works by using positive lookaheads and look behinds for none word chars.
import re
sample = "b123+xyz=1+z1$"
split_sample = re.split("(?=\W)|(?:(?<=\W)(?!$))", sample)
print(split_sample)
OUTPUT
['b123', '+', 'xyz', '=', '1', '+', 'z1', '$']
REGEX EXPLAIN
Another regex approach giving the same result is:
split_sample = re.split(r"(\+|=|\$)", sample)[:-1]
The [:-1] is to remove the final empty string.
"""
Given the equation b123+xyz=1+z1$, break it down
into a list of variables and operators
"""
operators = ['+', '-', '/', '*', '=']
equation = 'b123+xyz=1+z1$'
equation_by_variable_and_operator = []
text = ''
for character in equation:
if character not in operators:
text = text + character
elif character in operators and len(text):
equation_by_variable_and_operator.append(text)
equation_by_variable_and_operator.append(character)
text = ''
# For the final variable
equation_by_variable_and_operator.append(text)
print(equation_by_variable_and_operator)
Output
['b123', '+', 'xyz', '=', '1', '+', 'z1$']
A straight-forward regex solution is;
equation = "b123+xyz=1+z1$"
equation_list = re.findall(r'\W+|\w+', equation)
print(equation_list)
This would also work with strings such as -b**10.
Using re.split() returns empty strings at the start and end of the string from the delimiters at the start and end of the string (see this question). To remove them, they can be filtered out, or otherwise look-behind or look-ahead conditions can be used which add to the pattern's complexity, as earlier answers to this question demonstrate.
Well my answer seems to not be the easiest among them all but i hope it helps you.
data: str = "b123+xyz=1+z1$"
symbols: str = "+=$"
merge_text: str = ""
for char in data:
if char not in symbols:
merge_text += char
else:
# insert a unique character for splitting
merge_text += ","
merge_text += char
merge_text += ","
final_result: list = merge_text.split(",")

How to keep the white space when splitting sentence to a list?

Below is the code that splits the sentence "s".
s = "1 a 3 bb b8"
b = s.split()
print(b)
The output from the above code is ['1', 'a', '3', 'bb', 'b8'].
The desired output is ['1', 'a', '3', 'bb', ' b8']. Be aware that there is only one white space in the last field.
The code is not the best and not very efficient but it works. It dived spaces as field separators and spaces as data that way that the latter is replaced with a special string (e.g. $KEEP_THAT_SPACE$). In the next step the string is split by the spaces working as field separators. Then all sepcial strings in all elements are re-replaced with blank.
#!/usr/bin/env python3
s = "1 a 3 bb b8"
# assume that there are only two-character-spaces
keep_placeholder = '$KEEP_THAT_SPACE$'
s = s.replace(' ', f' {keep_placeholder}')
b = s.split()
for index, element in enumerate(b): # <- iterat
while keep_placeholder in element:
element = element.replace(keep_placeholder, ' ')
b[index] = element
print(b)
The output is ['1', 'a', '3', 'bb', ' b8'] and please see that there is only one blank space in the beginning of the last field.
The code can easily adopted if you have fields with more then two blank spaces.
That is a tricky one which make it hard to do with generic function and thus require some custom code.
I took s = s = "1 a 3 bb b8" with 3 white spaces before b8 to make it more fun :)
So first thing you can do is specify clearly the limiter in your split :
s.split(' ')
Would give the following result: ['1', 'a', '3', 'bb', '', '', 'b8']
Now you have to interpret the '' as a ' ' needed to be added to the next not empty string. In the following for loop you will implement your "business rules" that put the white spaces in the expected place.
split_list = []
buffer = ''
for elt in temp_split:
if elt != "":
split_list.append(buffer + elt)
buffer = ''
else:
buffer += ' '
print(split_list)
And the result is: ['1', 'a', '3', 'bb', ' b8']

Python regular expression retrieving numbers between two different delimiters

I have the following string
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
I would like to use regular expressions to extract the groups:
group1 56,7,1
group2 88,9,1
group3 58,8,1
group4 45
group5 100
group6 null
My ultimate goal is to have tuples such as (group1, group2), (group3, group4), (group5, group6). I am not sure if this all can be accomplished with regular expressions.
I have the following regular expression with gives me partial results
(?<=h=|d=)(.*?)(?=h=|d=)
The matches have an extra comma at the end like 56,7,1, which I would like to remove and d=, is not returning a null.
You likely do not need to use regex. A list comprehension and .split() can likely do what you need like:
Code:
def split_it(a_string):
if not a_string.endswith(','):
a_string += ','
return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]
Test Code:
tests = (
"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
"h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)
for test in tests:
print(split_it(test))
Results:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]
You could match rather than split using the expression
[dh]=([\d,]*),
and grab the first group, see a demo on regex101.com.
That is
[dh]= # d or h, followed by =
([\d,]*) # capture d and s 0+ times
, # require a comma afterwards
In Python:
import re
rx = re.compile(r'[dh]=([\d,]*),')
string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)
Which yields
['56,7,1', '88,9,1', '58,8,1', '45', '100', '']
You can use ([a-z]=)([0-9,]+)(,)?
Online demo
just you need add index to group
You could use $ in positive lookahead to match against the end of the string:
import re
input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
m = x.strip(',')
if m:
groups.append(m.split(','))
else:
groups.append(None)
print(groups)
Output:
[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]
Here, I have assumed that parameters will only have numerical values. If it is so, then you can try this.
(?<=h=|d=)([0-9,]*)
Hope it helps.

How to remove whitespace in a list

I can't remove my whitespace in my list.
invoer = "5-9-7-1-7-8-3-2-4-8-7-9"
cijferlijst = []
for cijfer in invoer:
cijferlijst.append(cijfer.strip('-'))
I tried the following but it doesn't work. I already made a list from my string and seperated everything but the "-" is now a "".
filter(lambda x: x.strip(), cijferlijst)
filter(str.strip, cijferlijst)
filter(None, cijferlijst)
abc = [x.replace(' ', '') for x in cijferlijst]
Try that:
>>> ''.join(invoer.split('-'))
'597178324879'
If you want the numbers in string without -, use .replace() as:
>>> string_list = "5-9-7-1-7-8-3-2-4-8-7-9"
>>> string_list.replace('-', '')
'597178324879'
If you want the numbers as list of numbers, use .split():
>>> string_list.split('-')
['5', '9', '7', '1', '7', '8', '3', '2', '4', '8', '7', '9']
This looks a lot like the following question:
Python: Removing spaces from list objects
The answer being to use strip instead of replace. Have you tried
abc = x.strip(' ') for x in x

How to remove whitespace from the string of list

I would like to remove whitespace of the string of the list as following
original = ['16', '0000D1AE18', '1', '1', '1', 'S O S .jpg', '0']
after remove the whitespace
['16', '0000D1AE18', '1', '1', '1', 'SOS.jpg', '0']
Use str.translate() on each element in a list comprehension:
[v.translate(None, ' ') for v in original]
Here None means don't replace characters with other characters, and ' ' means remove spaces altogether. This produces a new list to replace the original.
The above only removes just the spaces. To remove all whitespace (newlines, tabs, feeds, etc.) simply expand what characters should be removed
[v.translate(None, ' \t\r\n\f\x0a') for v in original]
str.translate() is the fastest option for removing characters from text.
Demo:
>>> original = ['16', '0000D1AE18', '1', '1', '1', 'S O S .jpg', '0']
>>> [v.translate(None, ' \t\r\n\f\x0a') for v in original]
['16', '0000D1AE18', '1', '1', '1', 'SOS.jpg', '0']
If you want to remove any whitespace (i.e.Space, Tab, CR and Newline), use this:
import re
without_spaces = [re.sub(r'\s+', '', item) for item in original]
If you need to replace only regular spaces, use the already suggested solution
without_spaces = [item.replace(' ', '') for item in original]
You can use
k=[]
for i in original :
j = i.replace(' ','')
k.append(j)

Categories