Replace commas enclosed in curly braces - python

I try to replace commas with semicolons enclosed in curly braces.
Sample string:
text = "a,b,{'c','d','e','f'},g,h"
I am aware that it comes down to lookbehinds and lookaheads, but somehow it won't work like I want it to:
substr = re.sub(r"(?<=\{)(.+?)(,)(?=.+\})",r"\1;", text)
It returns:
a,b,{'c';'d','e','f'},g,h
However, I am aiming for the following:
a,b,{'c';'d';'e';'f'},g,h
Any idea how I can achieve this?
Any help much appreciated :)

You can match the whole block {...} (with {[^{}]+}) and replace commas inside it only with a lambda:
import re
text = "a,b,{'c','d','e','f'},g,h"
print(re.sub(r"{[^{}]+}", lambda x: x.group(0).replace(",", ";"), text))
See IDEONE demo
Output: a,b,{'c';'d';'e';'f'},g,h
By declaring lambda x we can get access to each match object, and get the whole match value using x.group(0). Then, all we need is replace a comma with a semi-colon.
This regex does not support recursive patterns. To use a recursive pattern, you need PyPi regex module. Something like m = regex.sub(r"\{(?:[^{}]|(?R))*}", lambda x: x.group(0).replace(",", ";"), text) should work.

Below I have posted a solution that does not rely on an regular expression. It uses a stack (list) to determine if a character is inside a curly bracket {. Regular expression are more elegant, however, they can be harder to modify when requirements change. Please note that the example below also works for nested brackets.
text = "a,b,{'c','d','e','f'},g,h"
output=''
stack = []
for char in text:
if char == '{':
stack.append(char)
elif char == '}':
stack.pop()
#Check if we are inside a curly bracket
if len(stack)>0 and char==',':
output += ';'
else:
output += char
print output
This gives:
'a,b,{'c';'d';'e';'f'},g,h
You can also rewrite this as a map function if you use a the global variable for stack:
stack = []
def replace_comma_in_curly_brackets(char):
if char == '{':
stack.append(char)
elif char == '}':
stack.pop()
#Check if we are inside a curly bracket
if len(stack)>0 and char==',':
return ';'
return char
text = "a,b,{'c','d','e','f'},g,h"
print ''.join(map(str, map(replace_comma_in_curly_brackets,text)))
Regarding performance, when running the above two methods and the regular expression solution proposed by #stribizhev on the test string at the end of this post, I get the following timings:
Regular expression (#stribizshev): 0.38 seconds
Map function: 26.3 seconds
For loop: 251 seconds
This is the test string that is 55,300,00 characters long:
text = "a,able,about,across,after,all,almost,{also,am,among,an,and,any,are,as,at,be,because},been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,with,would,yet,you,your" * 100000

If you don't have nested braces, it might be enough to just look ahead at each , if there is a closing } ahead without any opening { in between. Search for
,(?=[^{]*})
and replace with ;
, match a comma literally
(?=...) the lookahead to check
if there's ahead [^{]* any amount of characters, that are not {
followed by a closing curly brace }
See demo at regex101

Related

RegEx removing pair parantheses only

How to replace only pair parentheses by nothing in this expression? I tried many ways, but then I decided to post my question here...
Code:
expression = ')([()])('
pattern = r'[(\(.*\)]'
nothing = ''
print(re.sub(pattern, nothing, expression)) # Expected to be ')[]('
Another expressions to validating:
// True
<s>HTML is a programming language</s>
(1+2) * (3+4) / 4!
{1, 2, 3, ..., 10}
([{<>}])
// False
<}>
)[}({>]<
<[{}]><
As you guess, I want to solve a classic problem in new way... Not only parentheses, another punctuation marks such as brackets, angle brackets, and braces should be removed. (Use re.sub(r'[^\(\)\[\]\{\}\<\>]', '', expr) to clean them)
I want to drop them in one step, but all answers are accepted...
Based on How to remove all text between the outer parentheses in a string?:
import re
def rem_parens(text):
n = 1 # run at least once
while n:
text, n = re.subn(r'\(([^()]*)\)', r'\1', text)
return text
print(rem_parens(")([()])("))
Results: )[](
See Python proof
How to extend to accept more bracket types
Add alternatives to the expression and backreferences to the replace:
re.subn(r'\(([^()]*)\)|\[([^][]*)]|<([^<>]*)>|\{([^{}]*)}', r'\1\2\3\4', text)

Python regex match anything enclosed in either quotations brackets braces or parenthesis

UPDATE
This is still not entirely the solution so far. It is only for preceding repeated closing characters (e.g )), ]], }}). I'm still looking for a way to capture enclosed contents and will update this.
Code:
>>> import re
>>> re.search(r'(\(.+?[?<!)]\))', '((x(y)z))', re.DOTALL).groups()
('((x(y)z))',)
Details:
r'(\(.+?[?<!)]\))'
() - Capturing group special characters.
\( and \) - The open and closing characters (e.g ', ", (), {}, [])
.+? - Match any character content (use with re.DOTALL flag)
[?<!)] - The negative lookbehind for character ) (replace this with the matching closing character). This will basically find any ) character where \) character does not precede (more info here).
I was trying to parse something like a variable assignment statement for this lexer thing I'm working with, just trying to get the basic logic behind interpreters/compilers.
Here's the basic assignment statements and literals I'm dealing with:
az = none
az_ = true
az09 = false
az09_ = +0.9
az_09 = 'az09_'
_az09 = "az09_"
_az = [
"az",
0.9
]
_09 = {
0: az
1: 0.9
}
_ = (
true
)
Somehow, I managed to parse those simple assignments like none, true, false, and numeric literals. Here's where I'm currently stuck at:
import sys
import re
# validate command-line arguments
if (len(sys.argv) != 2): raise ValueError('usage: parse <script>')
# parse the variable name and its value
def handle_assignment(index, source):
# TODO: handle quotations, brackets, braces, and parenthesis values
variable = re.search(r'[\S\D]([\w]+)\s+?=\s+?(none|true|false|[-+]?\d+\.?\d+|[\'\"].*[\'\"])', source[index:])
if variable is not None:
print('{}={}'.format(variable.group(1), variable.group(2)))
index += source[index:].index(variable.group(2))
return index
# parse through the source element by element
with open(sys.argv[1]) as file:
source = file.read()
index = 0
while index < len(source):
# checks if the line matches a variable assignment statement
if re.match(r'[\S\D][\w]+\s+?=', source[index:]):
index = handle_assignment(index, source)
index += 1
I was looking for a way to capture those values with enclosed quotations, brackets, braces, and parenthesis.
Probably, will update this post if I found an answer.
Use a regexp with multiple alternatives for each matching pair.
re.match(r'\'.*?\'|".*?"|\(.*?\)|\[.*?\]|\{.*?\}', s)
Note, however, that if there are nested brackets, this will match the first ending bracket, e.g. if the input is
(words (and some more words))
the result will be
(words (and some more words)
Regular expressions are not appropriate for matching nested structures, you should use a more powerful parsing technique.
Solution for #Barmar's recursive characters using the regex third-party module:
pip install regex
python3
>>> import regex
>>> recurParentheses = regex.compile(r'[(](?:[^()]|(?R))*[)]')
>>> recurParentheses.findall('(z(x(y)z)x) ((x)(y)(z))')
['(z(x(y)z)x)', '((x)(y)(z))']
>>> recurCurlyBraces = regex.compile(r'[{](?:[^{}]|(?R))*[}]')
>>> recurCurlyBraces.findall('{z{x{y}z}x} {{x}{y}{z}}')
['{z{x{y}z}x}', '{{x}{y}{z}}']
>>> recurSquareBrackets = regex.compile(r'[[](?:[^][]|(?R))*[]]')
>>> recurSquareBrackets.findall('[z[x[y]z]x] [[x][y][z]]')
['[z[x[y]z]x]', '[[x][y][z]]']
For string literal recursion, I suggest take a look at this.

Add automatically the quotation marks inside a string in python

I am in python and I want to add the quotation marks inside a string. Concretely, I have the following string:
'{name:robert,surname:paul}'
And I want to programmatically get the following, operating on the first
'{name:"robert",surname:"paul"}'
Is there any efficient way to perform this?
Use a regex to match word \w* after : and replace it using backreference \1 :
Prefix your regexstring by r (raw string) to automatically escape characters.
https://repl.it/Nh29/1
import re
input_str='{name:robert,surname:paul}'
output_str=re.sub(r':(\w*)', r':"\1"', input_str )
print output_str
will produce
{name:"robert",surname:"paul"}
def literalize(string):
string = string[1:-1].split(',')
string = map(lambda s: str.split(s, ':'), string)
return_string = ''
for item in string:
return_string += '%s: "%s", ' % tuple(item)
return "{%s}" % return_string
I wouldn't ever consider this a masterpiece but I've tried not to use RegEx for this; however it ended up being messy and bodgy, and obviously factorizable with list comprehensions and what not.
Some implementation details are that it won't work very well when the value has a comma or colon inside, and another implementation detail being that tuple(item) can be replaced by (*item) if you prefer a more Python 3 way.
>>> literalize(a)
'{name: "robert", surname: "paul", }'
Note: I don't think the redundant , at the end should matter too much when parsing using something like json.loads(...)

How to replace a word which occurs before another word in python

I want to replace(re-spell) a word A in a text string with another word B if the word A occurs before an operator. Word A can be any word.
E.G:
Hi I am Not == you
Since "Not" occurs before operator "==", I want to replace it with alist["Not"]
So, above sentence should changed to
Hi I am alist["Not"] == you
Another example
My height > your height
should become
My alist["height"] > your height
Edit:
On #Paul's suggestion, I am putting the code which I wrote myself.
It works but its too bulky and I am not happy with it.
operators = ["==", ">", "<", "!="]
text_list = text.split(" ")
for index in range(len(text_list)):
if text_list[index] in operators:
prev = text_list[index - 1]
if "." in prev:
tokens = prev.split(".")
prev = "alist"
for token in tokens:
prev = "%s[\"%s\"]" % (prev, token)
else:
prev = "alist[\"%s\"]" % prev
text_list[index - 1] = prev
text = " ".join(text_list)
This can be done using regular expressions
import re
...
def replacement(match):
return "alist[\"{}\"]".format(match.group(0))
...
re.sub(r"[^ ]+(?= +==)", replacement, s)
If the space between the word and the "==" in your case is not needed, the last line becomes:
re.sub(r"[^ ]+(?= *==)", replacement, s)
I'd highly recommend you to look into regular expressions, and the python implementation of them, as they are really useful.
Explanation for my solution:
re.sub(pattern, replacement, s) replaces occurences of patterns, that are given as regular expressions, with a given string or the output of a function.
I use the output of a function, that puts the whole matched object into the 'alist["..."]' construct. (match.group(0) returns the whole match)
[^ ] match anything but space.
+ match the last subpattern as often as possible, but at least once.
* match the last subpattern as often as possible, but it is optional.
(?=...) is a lookahead. It checks if the stuff after the current cursor position matches the pattern inside the parentheses, but doesn't include them in the final match (at least not in .group(0), if you have groups inside a lookahead, those are retrievable by .group(index)).
str = "Hi I am Not == you"
s = str.split()
y = ''
str2 = ''
for x in s:
if x in "==":
str2 = str.replace(y, 'alist["'+y+'"]')
break
y = x
print(str2)
You could try using the regular expression library I was able to create a simple solution to your problem as shown here.
import re
data = "Hi I am Not == You"
x = re.search(r'(\w+) ==', data)
print(x.groups())
In this code, re.search looks for the pattern of (1 or more) alphanumeric characters followed by operator (" ==") and stores the result ("Hi I am Not ==") in variable x.
Then for swaping you could use the re.sub() method which CodenameLambda suggested.
I'd also recommend learning how to use regular expressions, as they are useful for solving many different problems and are similar between different programming languages

How to use regular expression to detect parenthesis at the end of a string?

I am using if/elif statements in Python to match some strings, but I need help in matching one particular type of string. I want all strings that have parenthesis '()' in the end to match the same if condition. For example, string = "Tennis (5.5)" or string = "Football (6.3)".
def method(string):
if (string has parenthesis in the end):
Can I use some regular expression for this ? I am not sure how to go about it.
I think you mean this,
if re.search(r'(?m)\([^()]*\)$', line):
$ asserts that we are at the end of a line.
In the case you'd prefer regex, this is probably the simplest solution. It asserts, if there is a closing parenthesis at the end of line, irrespectively of trailing blanks:
"\)$"
For example:
test1 = "Tennis (5.5) "
test2 = "Football (6.3)"
res1 = bool(re.search(r"\)$", test1.strip()))
res2 = bool(re.search(r"\)$", test2.strip()))
print(res1, res2, sep='\n')
>>> True
>>> True

Categories