I've got a list here that represents a line in a file after I split it:
['[0.111,', '-0.222]', '1', '2', '3']
and I'm trying to trim off the "[" and the "," in the first element and the "]" in the second element. How would I do that? I've started my thought process here, but this code doesn't work:
for line in file:
line = line.split()
line[0] = line[1:-1]
line[1] = line[0:-1]
print(line2)
You can use re.sub:
from re import sub
s = '[0.111, -0.222] 1 2 3'
s = sub('[\[\]]', '', s)
print(s.split())
Output:
['0.111,', '-0.222', '1', '2', '3']
If by any chance you would like to remove the comma as well, you can
from re import sub
s = '[0.111, -0.222] 1 2 3'
s = sub('[\[\],]', '', s)
print(s.split())
Output:
['0.111', '-0.222', '1', '2', '3']
You can use replace to remove the brackets:
lst = ['[0.111,', '-0.222]', '1', '2', '3']
lst2 = [x.replace('[','').replace(']','') for x in lst]
print(lst2)
Output
['0.111,', '-0.222', '1', '2', '3']
You also be more specific:
lst2 = [x[1:] if x[0] == '[' else x[:-1] if x[-1] == ']' else x for x in lst]
You could filter only the numeric part of each string with a function like:
def clean(strings):
def onlyNumeric(c):
return c.isdigit() or c == '-' or c == '.'
return list(map(lambda s: "".join(filter(onlyNumeric, s)), strings))
Then your example (and many other oddities) could be addressed.
>>> clean(['[0.111,', '-0.222]', '1', '2', '3'])
['0.111', '-0.222', '1', '2', '3']
Related
I have an assignment for which my script should be able to receive a string for input (e.g. "c27bdj3jddj45g" ) and extract the numbers into a list (not just the digits, it should be able to detect full numbers).
I am not allowed to use regex at all, only simple methods like split, count and append.
Any ideas? (Using python)
Example for the output needed for the string I gave as an example:
['27','3', '45']
Nothing I have tried so far is worth mentioning here, I am pretty lost on which approach to take here without re.findall, which I cannot use.
One way to solve it is to use the groupby from itertools lib:
from itertools import groupby
s = 'c27bdj3jdj45g11' # last dight is 11
ans = []
for k, g in groupby(s, lambda x: x.isdigit()):
if k: # True if x is digit
ans.append(''.join(g))
ans
['27', '3', '45', '11']
Second solution - even OP has opt out the regex, but this is just for a reference. (to show how much easier to approach this type of puzzle - which should be the way to go)
You could try to use regex - re lib like this (if there's no restriction!)
s = 'c27bdj3jddj45g'
import re
list(re.findall(r'\d+', s)) # matching one more digits
['27', '3', '45']
# or to get *integer*
list(map(int, re.findall(r'\d+', s)))
[27, 3, 45]
You can do this with a for-loop and save the numbers. Then, when you see no digit, append digits and reset the string.
s = 'g38ff11'
prv = ''
res = []
for c in s:
if c.isdigit():
prv += c
else:
if prv != '': res.append(prv)
prv = ''
if prv != '': res.append(prv)
print(res)
Output:
['38', '11']
You can also write a lambda to check and append:
s = 'g38ff11'
prv = ''
res = []
append_dgt = lambda prv, res: res.append(prv) if prv!="" else None
for c in s:
if c.isdigit():
prv += c
else:
append_dgt(prv, res)
prv = ''
append_dgt(prv, res)
print(res)
s='c27bdj3jddj45g'
lst=[]
for x in s:
if x.isdigit():
lst.append(x)
else:
lst.append('$') # here $ is appended as a place holder so that all the numbers can come togetrher
Now, lst becomes :
#['$', '2', '7', '$', '$', '$', '3', '$', '$', '$', '$', '4', '5', '$']
''.join(lst).split('$') becomes:
['', '27', '', '', '3', '', '', '', '45', '']
Finally doing list comprehension to extract the numbers:
[x for x in ''.join(lst).split('$') if x.isdigit()]
['27', '3', '45']
string='c27bdj3jddj45g'
lst=[]
for i in string:
if i.isdigit():
lst.append(i)
else:
lst.append('$')
print([int(i) for i in ''.join(lst).split('$') if i.isdigit()])
I am trying to get conversion of string like 'a3b4' to 'aaabbbb'.
How can this be done without additional modules? So far my code looks like this:
s = 'a3b4'
n = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
a = []
b = []
for i in range(len(s)):
if s[i] in n:
a.append(s[i])
if s[i] not in n:
b.append(s[i])
for i in range(len(b)):
print(b[i])
This should work:
letters = list(s[::2])
nums = list(s[1::2])
res = ''.join([a*int(b) for a,b in zip(letters,nums)])
>>res
Out[1]: 'aaabbbb'
EDIT:
If you want to match any srting and any digits you should use regex:
letters = re.findall(r'[a-z]+',s)
nums = re.findall(r'\d+',s)
res = ''.join([a*int(b) for a,b in zip(letters,nums)])
for:
s='a10b3'
output is:
>>res
Out[2]: 'aaaaaaaaaabbb'
I currently have a group of strings that look like this:
[58729 58708]
[58729]
[58708]
[58729]
I need to turn them into a list, but when I use list(), I get:
['[', '5', '8', '7', '2', '9', ']']
['[', '5', '8', '7', '0', '8', ']']
['[', '5', '8', '7', '2', '9', ']']
['[', '5', '8', '7', '2', '9', ' ', '5', '8', '7', '0', '8', ']']
How do I group them so that they don't get separated out into individual characters? So, something like this:
['58729', '58708']
['58729']
['58708']
['58729']
Let's say your input string is assigned to a variable foo.
foo = '[58729 58708]'
First, you want to use list slicing to get rid of the brackets at the start and end of the string:
foo = foo[1:-1]
Now, you can just use the string method split() to turn the string into a list. Here, the input of split() is the character at which the list shall be split. In your case, that would be a single space character:
foo.split(' ')
This returns
['58729', '58708'].
You can use regex to extract the values between the square brackets, then split the values into a list.
The code:
import re
s = '[58729 58708]'
result = re.search('\[(.*)\]', s).group(1).split()
The result:
>>> %Run string2list.py
['58729', '58708']
>>> %Run string2list.py
<class 'list'>
Imo the royal path would be to combine a regex with a small parser:
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
import re
data = """
[58729 58708]
[58729]
[58708]
[58729]
"""
# outer expression
rx = re.compile(r'\[[^\[\]]+\]')
# nodevisitor class
class StringVisitor(NodeVisitor):
grammar = Grammar(
r"""
list = lpar content+ rpar
content = item ws?
item = ~"[^\[\]\s]+"
ws = ~"\s+"
lpar = "["
rpar = "]"
"""
)
def generic_visit(self, node, visited_children):
return visited_children or node
def visit_content(self, node, visited_children):
item, _ = visited_children
return item.text
def visit_list(self, node, visited_children):
_, content, _ = visited_children
return [item for item in content]
sv = StringVisitor()
for lst in rx.finditer(data):
real_list = sv.parse(lst.group(0))
print(real_list)
Which would yield
['58729', '58708']
['58729']
['58708']
['58729']
Example with "ast" module usage
import ast
data_str = '[58729 58708]'
data_str = data_str.replace(' ',',') # make it '[58729, 58708]'
x = ast.literal_eval(data_str)
print(x)
Out[1]:
[58729, 58708]
print(x[0])
Out[2]:
58729
print(type(x))
Out[3]:
<class 'list'>
# and after all if you want exactly list of string:
[str(s) for s in x]
Out[4]:
['58729', '58708']
I can't remove my whitespace in my list.
invoer = "5-9-7-1-7-8-3-2-4-8-7-9"
cijferlijst = []
for cijfer in invoer:
cijferlijst.append(cijfer.strip('-'))
I tried the following but it doesn't work. I already made a list from my string and seperated everything but the "-" is now a "".
filter(lambda x: x.strip(), cijferlijst)
filter(str.strip, cijferlijst)
filter(None, cijferlijst)
abc = [x.replace(' ', '') for x in cijferlijst]
Try that:
>>> ''.join(invoer.split('-'))
'597178324879'
If you want the numbers in string without -, use .replace() as:
>>> string_list = "5-9-7-1-7-8-3-2-4-8-7-9"
>>> string_list.replace('-', '')
'597178324879'
If you want the numbers as list of numbers, use .split():
>>> string_list.split('-')
['5', '9', '7', '1', '7', '8', '3', '2', '4', '8', '7', '9']
This looks a lot like the following question:
Python: Removing spaces from list objects
The answer being to use strip instead of replace. Have you tried
abc = x.strip(' ') for x in x
I'm looking for a way to expand numbers that are separated by slashes. In addition to the slashes, parentheses, may be used around some (or all) numbers to indicate a "group" which may be repeated (by the number of times directly following the parentheses) or repeated in reverse (followed by 's' as shown in the second set of examples). Some examples are:
1 -> ['1'] -> No slashes, no parentheses
1/2/3/4 -> ['1', '2', '3', '4'] -> basic example with slashes
1/(2)4/3 -> ['1', '2', '2', '2', '2', '3'] -> 2 gets repeated 4 times
1/(2/3)2/4 -> ['1', '2', '3', '2', '3', '4'] -> 2/3 is repeated 2 times
(1/2/3)2 -> ['1', '2', '3', '1', '2', '3'] -> Entire sequence is repeated twice
(1/2/3)s -> ['1', '2', '3', '3', '2', '1'] -> Entire sequence is repeated in reverse
1/(2/3)s/4 -> ['1', '2', '3', '3', '2', '4'] -> 2/3 is repeated in reverse
In the most general case, there could even be nested parentheses, which I know generally make the use of regex impossible. In the current set of data I need to process, there are no nested sets like this, but I could see potential use for it in the future. For example:
1/(2/(3)2/4)s/5 -> 1/(2/3/3/4)s/5
-> 1/2/3/3/4/4/3/3/2/5
-> ['1', '2', '3', '3', '4', '4', '3', '3', '2', '5']
I know of course that regex cannot do all of this (especially with the repeating/reversing sets of parenthesis). But if I can get a regex that at least separates the strings of parenthesis from those not in parenthesis, then I could probably make some loop pretty easily to take care of the rest. So, the regex I'd be looking for would do something like:
1 -> ['1']
1/2/3/4 -> ['1', '2', '3', '4']
1/(2)4/3 -> ['1', '(2)4', '3']
1/(2/3)2/4 -> ['1', '(2/3)2', '4']
1/(2/(3)2/4)s/5 -> ['1', '(2/(3)/2/4)s', '5']
And then I could loop on this result and continue expanding any parentheses until I have only digits.
EDIT
I wasn't totally clear in my original post. In my attempt to make the examples as simple as possible, I perhaps oversimplified them. This needs to work for numbers >= 10 as well as negative numbers.
For example:
1/(15/-23)s/4 -> ['1', '(15/-23)s', '4']
-> ['1', '15', '-23', '-23', '15', '4']
Since you are dealing with nested parenthesis, regex can't help you much here. It cannot easily convert the string to the list, as you wanted at the end.
You would better go with parsing the string yourself. You can try this code, just to meet your requirement at the end:
Parsing string into list without loosing parenthesis:
def parse(s):
li = []
open = 0
closed = False
start_index = -1
for index, c in enumerate(s):
if c == '(':
if open == 0:
start_index = index
open += 1
elif c == ')':
open -= 1
if open == 0:
closed = True
elif closed:
li.append(s[start_index: index + 1])
closed = False
elif open == 0 and c.isdigit():
li.append(c)
return li
This will give you for the string '1/(2/(3)2/4)s/5' the following list:
['1', '(2/(3)2/4)s', '5']
and for the string '1/(15/-23)s/4', as per your changed requirement, this gives:
['1', '(15/-23)s', '4']
Now, you need to take care of the breaking the parenthesis further up to get different list elements.
Expanding the strings with parenthesis to individual list elements:
Here you can make use of a regex, by just dealing with inner-most parenthesis at once:
import re
def expand(s):
''' Group 1 contains the string inside the parenthesis
Group 2 contains the digit or character `s` after the closing parenthesis
'''
match = re.search(r'\(([^()]*)\)(\d|s)', s)
if match:
group0 = match.group()
group1 = match.group(1)
group2 = match.group(2)
if group2.isdigit():
# A digit after the closing parenthesis. Repeat the string inside
s = s.replace(group0, ((group1 + '/') * int(group2))[:-1])
else:
s = s.replace(group0, '/'.join(group1.split('/') + group1.split('/')[::-1]))
if '(' in s:
return expand(s)
return s
li = parse('1/(15/-23)2/4')
for index, s in enumerate(li):
if '(' in s:
s = expand(s)
li[index] = s.split('/')
import itertools
print list(itertools.chain(*li))
This will give you the required result:
['1', '15', '-23', '-23', '15', '4']
The above code iterates over the list generated from parse(s) method, and then for each element, recursively expands the inner most parenthesis.
Here is another way to get that done.
def expand(string):
level = 0
buffer = ""
container = []
for char in string:
if char == "/":
if level == 0:
container.append(buffer)
buffer = ""
else:
buffer += char
elif char == "(":
level += 1
buffer += char
elif char == ")":
level -= 1
buffer += char
else:
buffer += char
if buffer != "":
container.append(buffer)
return container
Regular expressions are the completely wrong tool for this job. There's a long, drawn out explanation as to why regular expressions are not appropriate (If you want to know why, here's an online course). A simple recursive parser is easy enough to write to handle this that you'd probably be done with it well before you finished debugging your regular expression.
It's a slow day so I took it upon myself to write it myself, complete with doctests.
def parse(s):
"""
>>> parse('1')
['1']
>>> parse('1/2/3/4')
['1', '2', '3', '4']
>>> parse('1/(2)4/3')
['1', '2', '2', '2', '2', '3']
>>> parse('1/(2/3)2/4')
['1', '2', '3', '2', '3', '4']
>>> parse('(1/2/3)2')
['1', '2', '3', '1', '2', '3']
>>> parse('1/(2/3)s/4')
['1', '2', '3', '3', '2', '4']
>>> parse('(1/2/3)s')
['1', '2', '3', '3', '2', '1']
>>> parse('1/(2/(3)2/4)s/5')
['1', '2', '3', '3', '4', '4', '3', '3', '2', '5']
"""
return _parse(list(s))
def _parse(chars):
output = []
while len(chars):
c = chars.pop(0)
if c == '/':
continue
elif c == '(':
sub = _parse(chars)
nextC = chars.pop(0)
if nextC.isdigit():
n = int(nextC)
sub = n * sub
output.extend(sub)
elif nextC == 's':
output.extend(sub)
output.extend(reversed(sub))
elif c == ')':
return output
else:
output.extend(c)
return output
if __name__ == "__main__":
import doctest
doctest.testmod()