Get inside of square brackets in Python

Get inside of square brackets in Python - python

I have this string.
"ascascascasc[xx]asdasdasdasd[yy]qweqweqwe"
I want to get strings inside brackets. Like this;
"xx", "yy"
I have tried this but it did not work:
a = "ascascascasc[xx]asdasdasdasd[yy]qweqweqwe"
listinside = []
for i in range(a.count("[")):
listinside.append(a[a.index("["):a.index("]")])
print (listinside)
Output:
['[xx', '[xx']

You dont need count , you can use regex , re.findall() can do it :
>>> s="ascascascasc[xx]asdasdasdasd[yy]qweqweqwe"
>>> import re
>>> re.findall(r'\[(.*?)\]',s)
['xx', 'yy']
\[ matches the character [ literally
*? matches Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\] matches the character ] literally
DEMO

Related

Regex to ignore data between brackets

I replace characters { , } , : , , with an empty string using below:
This code :
s = "\":{},"
print(s)
print(re.sub(r'\"|{|}' , "",s))
prints:
":{},
:,
which is expected.
I'm attempting to modify the regex to ignore everything between open and closed brackets. So for the string "\":{},[test,test2]" just :,[test,test2] should be returned.
How to modify the regex such that data contained between [ and ] is not applied by the regex.
I tried using:
s = "\":{},[test1, test2]"
print(s)
print(re.sub(r'[^a-zA-Z {}]+\"|{|}' , "",s))
(src: How to let regex ignore everything between brackets?)
None of the , values are replaced .

Assuming your brackets are balanced/unescaped, you may use this regex with a negative lookahead to assert that matched character is not inside [...]:
>>> import re
>>> s = "\":{},[test1,test2]"
>>> print (re.sub(r'[{}",](?![^[]*\])', '', s))
:[test1,test2]
RegEx Demo
RegEx Details:
[{}",]: Match one of those character inside [...]
(?![^[]*\]): Negative lookahead to assert that we don't have a ] ahead of without matching any [ in between, in other words matched character is not inside [...]

If you want to remove the {, }, , and " not inside square brackets, you can use
re.sub(r'(\[[^][]*])|[{}",]', r'\1', s)
See the regex demo. Note you can add more chars to the character set, [{}"]. If you need to add a hyphen, make sure it is the last char in the character set. Escape \, ] (if not the first, right after [) and ^ (if it comes first, right after [).
Details:
(\[[^][]*]) - Capturing group 1: a [...] substring
| - or
[{}",] - a {, }, , or " char.
See a Python demo using your sample input:
import re
s = "\":{},[test1, test2]"
print( re.sub(r'(\[[^][]*])|[{}",]', r'\1', s) )
## => :[test1, test2]

How to match and replace this pattern in Python RE?

s = "[abc]abx[abc]b"
s = re.sub("\[([^\]]*)\]a", "ABC", s)
'ABCbx[abc]b'
In the string, s, I want to match 'abc' when it's enclosed in [], and followed by a 'a'. So in that string, the first [abc] will be replaced, and the second won't.
I wrote the pattern above, it matches:
match anything starting with a '[', followed by any number of characters which is not ']', then followed by the character 'a'.
However, in the replacement, I want the string to be like:
[ABC]abx[abc]b . // NOT ABCbx[abc]b
Namely, I don't want the whole matched pattern to be replaced, but only anything with the bracket []. How to achieve that?
match.group(1) will return the content in []. But how to take advantage of this in re.sub?

Why not simply include [ and ] in the substitution?
s = re.sub("\[([^\]]*)\]a", "[ABC]a", s)

There exist more than 1 method, one of them is exploting groups.
import re
s = "[abc]abx[abc]b"
out = re.sub('(\[)([^\]]*)(\]a)', r'\1ABC\3', s)
print(out)
Output:
[ABC]abx[abc]b
Note that there are 3 groups (enclosed in brackets) in first argument of re.sub, then I refer to 1st and 3rd (note indexing starts at 1) so they remain unchanged, instead of 2nd group I put ABC. Second argument of re.sub is raw string, so I do not need to escape \.

This regex uses lookarounds for the prefix/suffix assertions, so that the match text itself is only "abc":
(?<=\[)[^]]*(?=\]a)
Example: https://regex101.com/r/NDlhZf/1
So that's:
(?<=\[) - positive look-behind, asserting that a literal [ is directly before the start of the match
[^]]* - any number of non-] characters (the actual match)
(?=\]a) - positive look-ahead, asserting that the text ]a directly follows the match text.

About how to find all desired format in a str

I have a text like this format,
s = '[aaa]foo[bbb]bar[ccc]foobar'
Actually the text is Chinese car review like this
【最满意】整车都很满意，最满意就是性价比，...【空间】空间真的超乎想象，毫不夸张，...【内饰】内饰还可以吧，没有多少可以说的...
Now I want to split it to these parts
[aaa]foo
[bbb]bar
[ccc]foobar
first I tried
>>> re.findall(r'\[.*?\].*?',s)
['[aaa]', '[bbb]', '[ccc]']
only got first half.
Then I tried
>>> re.findall(r'(\[.*?\].*?)\[?',s)
['[aaa]', '[bbb]', '[ccc]']
still only got first half
At last I have to get the two parts respectively then zip them
>>> re.findall(r'\[.*?\]',s)
['[aaa]', '[bbb]', '[ccc]']
>>> re.split(r'\[.*?\]',s)
['', 'foo', 'bar', 'foobar']
>>> for t in zip(re.findall(r'\[.*?\]',s),[e for e in re.split(r'\[.*?\]',s) if e]):
... print(''.join(t))
...
[aaa]foo
[bbb]bar
[ccc]foobar
So I want to know if exists some regex could directly split it to these parts?

One of the approaches:
import re
s = '[aaa]foo[bbb]bar[ccc]foobar'
result = re.findall(r'\[[^]]+\][^\[\]]+', s)
print(result)
The output:
['[aaa]foo', '[bbb]bar', '[ccc]foobar']
\[ or \] - matches the bracket literally
[^]]+ - matches one or more characters except ]
[^\[\]]+ - matches any character(s) except brackets \[\]

I think this could work:
r'\[.+?\]\w+'

Here it is:
>>> re.findall(r"(\[\w*\]\w+)",s)
['[aaa]foo', '[bbb]bar', '[ccc]foobar']
Explanation:
parenthesis means the group to search. Witch group:
it should start by a braked \[ followed by some letters \w
then the matched braked braked \] followed by more letters \w
Notice you should to escape braked with \.

I think if input string format is "strict enough", it's possible to try something w/o regexp. It may look as a microoptimisation, but could be interesting as a challenge.
result = map(lambda x: '[' + x, s[1:].split("["))
So I tried to check performance on a 1Mil iterations and here are my results (seconds):
result = map(lambda x: '[' + x, s[1:].split("[")) # 0.89862203598
result = re.findall(r'\[[^]]+\][^\[\]]+', s) # 1.48306798935
result = re.findall(r'\[.+?\]\w+', s) # 1.47224497795
result = re.findall(r'(\[\w*\]\w+)', s) # 1.47370815277

\[.*?\][a-zA-Z]*
This regex should capture anything that start with [somethinghere]Any letters from a to Z
you can play on regex101 to try out different ones and it's easy to make your own regex there

All you need is findall and here is very simple pattern without making it complicated:
import re
print(re.findall(r'\[\w+\]\w+','[aaa]foo[bbb]bar[ccc]foobar'))
output:
['[aaa]foo', '[bbb]bar', '[ccc]foobar']
Detailed solution:
import re
string_1='[aaa]foo[bbb]bar[ccc]foobar'
pattern=r'\[\w+\]\w+'
print(re.findall(pattern,string_1))
explanation:
\[\w+\]\w+
\[ matches the character [ literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed

python split a string by comma not inside matrix expression

I want to split a string separated by commas not inside Matrix expression.
For example:
input:
value = 'MA[1,2],MA[1,3],der(x),x,y'
expected output:
['MA[1,2]','MA[1,3]','der(x)','x','y']
I tried with value.split(','), but it splits inside [], I tried with some regular expressions to catch extract text inside [] using this regular expression
import re
re.split(r'\[(.*?)\]', value)
I am not good in regular expression,Any suggestions would be helpful

You can use negative lookbehind
>>> import re
>>> value1 = 'MA[1,2],MA[1,3],der(x),x,y'
>>> value2 = 'M[a,b],x1,M[1,2],der(x),y1,y2,der(a,b)'
>>> pat = re.compile(r'(?<![[()][\d\w]),')
>>> pat.split(value1)
['MA[1,2]', 'MA[1,3]', 'der(x)', 'x', 'y']
>>> pat.split(value2)
['M[a,b]', 'x1', 'M[1,2]', 'der(x)', 'y1', 'y2', 'der(a,b)']
Demo
Explanation:
"(?<![[()][\d\w]),"g
(?<![[()][\d\w]) Negative Lookbehind - Assert that it is impossible to match the regex below
[[()] match a single character present in the list below
[() a single character in the list [() literally
[\d\w] match a single character present in the list below
\d match a digit [0-9]
\w match any word character [a-zA-Z0-9_]
, matches the character , literally
g modifier: global. All matches (don't return on first match)

python: remove the farthest left instance matching regex

I have a string like
xp = /dir/dir/dir[2]/dir/dir[5]/dir
I want
xp = /dir/dir/dir[2]/dir/dir/dir
xp.replace(r'\[([^]]*)\]', '') removes all the square brackets, I just want to remove the one on the far left.
IT should also completely ignore square brackets with not(random_number_of_characters)
ex /dir/dir/dir[2]/dir/dir[5]/dir[1][not(random_number_of_characters)]
should yield /dir/dir/dir[2]/dir/dir[5]/dir[not(random_number_of_characters)]
ex. /dir/dir/dir[2]/dir/dir[5]/dir[not(random_number_of_characters)]
should yield /dir/dir/dir[2]/dir/dir/dir[not(random_number_of_characters)]

Make it greedy and replace with captured groups.
(.*)\[[^]]*\](.*)
Greedy Group ------^^ ^^^^^^^^-------- Last bracket [ till ]
Replacement : $1$2 or \1\2
Online demo
sample code:
import re
p = re.compile(ur'(.*)\[[^]]*\](.*)')
test_str = u"xp = /dir/dir/dir[2]/dir/dir[5]/dir"
subst = u"$1$2"
result = re.sub(p, subst, test_str)

This code would remove the last square brackets,
>>> import re
>>> xp = "/dir/dir/dir[2]/dir/dir[5]/dir"
>>> m = re.sub(r'\[[^\]]*\](?=[^\[\]]*$)', r'', xp)
>>> m
'/dir/dir/dir[2]/dir/dir/dir'
A lookahead is used to check whether the square brackets are followed by any character not of [, ] symbols zero or more times upto the line end. So it helps to match the last [] brackets. Then replacing the matched brackets with an empty string would completely remove the last brackets.
UPDATE:
You could try the below regex also,
\[[^\]]*\](?=(?:[^\[\]]*\[not\(.*?\)\]$))
DEMO

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get inside of square brackets in Python - python

Related

Regex to ignore data between brackets

How to match and replace this pattern in Python RE?

About how to find all desired format in a str

python split a string by comma not inside matrix expression

python: remove the farthest left instance matching regex

Categories

Resources