String substitution by regular expression while excluding quoted strings

String substitution by regular expression while excluding quoted strings - python

I searched a bit but couldn't find any questions addressing my problem. Sorry if my question is repetitive. I'm trying to edit python code say to replace all -/+/= operators that don't have white space on either side.
string = 'new_str=str+"this is a quoted string-having some operators+=- within the code."'
I would use '([^\s])(=|+|-)([^\s])' to find such operators. The problem is, I want to exclude those findings within the quoted string. Is there any way to do this by regular expression substitution.
The output I'm trying to get is:
edited_string = 'new_str = str + "this is a quoted string-having some operators+=- within the code."'
This example is just to help to understand the issue. I'm looking for an answer working on general cases.

You can do it in two steps: first adding space to the chars doesn't have space before them and then chars don't have space after them:
string = 'new_str=str+"this is a quoted string-having some operators+=- within the code."'
new_string = re.sub("(?<!\s>)(\+|\=)[^\+=-]", r" \g<0>", string)
new_string = re.sub("(\+|\=)(?=[^\s|=|-])", r"\g<0> ", new_string)
print(new_string)
>>> new_str = str + "this is a quoted string-having some operators+=- within the code."

Related

how to find unknown string from a known pattern ? python re.findall

I have an html text with strings such as
sentence-transformers/paraphrase-MiniLM-L6-v2
I want to extract all the strings that appear after "sentence-transformers/".
I tried models = re.findall("sentence-transformers/"+"(\w+)", text) but it only output the first word (paraphrase) while I want the full "paraphrase-MiniLM-L6-v2 "
Also I don't know the len(paraphrase-MiniLM-L6-v2 ) a priori.
How can I extract the full string?
Many thanks,
Ele

The problem with your regex is that - is not considered a word character, and you are only searching for word characters. The following regex works on your example:
text = 'sentence-transformers/paraphrase-MiniLM-L6-v2'
models = re.findall(r'sentence-transformers/([\w-]+)', text)
assert models[0] == 'paraphrase-MiniLM-L6-v2'

How can I output a string excluding ALL whitespaces? [duplicate]

This question already has answers here:
How to strip all whitespace from string
(14 answers)
Closed 4 years ago.
Basically, I'm trying to do a code in Python where a user inputs a sentence. However, I need my code to remove ALL whitespaces (e.g. tabs, space, index, etc.) and print it out.
This is what I have so far:
def output_without_whitespace(text):
newText = text.split("")
print('String with no whitespaces: '.join(newText))
I'm clear that I'm doing a lot wrong here and I'm missing plenty, but, I haven't been able to thoroughly go over splitting and joining strings yet, so it'd be great if someone explained it to me.
This is the whole code that I have so far:
text = input(str('Enter a sentence: '))
print(f'You entered: {text}')
def get_num_of_characters(text):
result = 0
for char in text:
result += 1
return result
print('Number of characters: ', get_num_of_characters(text))
def output_without_whitespace(text):
newtext = "".join(text.split())
print(f'String without whitespaces: {newtext}')
I FIGURED OUT MY PROBLEM!
I realize that in this line of code.
print(f'String without whitespaces: {newtext}')
It's supposed to be.
print('String without whitespaces: ', output_without_whitespace(text))
I realize that my problem as to why the sentence without whitespaces was not printing back out to me was, because I was not calling out my function!

You have the right idea, but here's how to implement it with split and join:
def output_without_whitespace(text):
return ''.join(text.split())
so that:
output_without_whitespace(' this\t is a\n test..\n ')
would return:
thisisatest..

A trivial solution is to just use split and rejoin (similar to what you are doing):
def output_without_whitespace(text):
return ''.join(text.split())
First we split the initial string to a list of words, then we join them all together.
So to think about it a bit:
text.split()
will give us a list of words (split by any whitespace). So for example:
'hello world'.split() -> ['hello', 'world']
And finally
''.join(<result of text.split()>)
joins all of the words in the given list to a single string. So:
''.join(['hello', 'world']) -> 'helloworld'
See Remove all whitespace in a string in Python for more ways to do it.

Get input, split, join
s = ''.join((input('Enter string: ').split()))
Enter string: vash the stampede
vashthestampede

There are a few different ways to do this, but this seems the most obvious one to me. It is simple and efficient.
>>> with_spaces = ' The quick brown fox '
>>> list_no_spaces = with_spaces.split()
>>> ''.join(list_no_spaces)
'Thequickbrownfox'
.split() with no parameter splits a string into a list wherever there's one or more white space characters, leaving out the white space...more details here.
''.join(list_no_spaces) joins elements of the list into a string with nothing betwen the elements, which is what you want here: 'Thequickbrownfox'.
If you had used ','.join(list_no_spaces) you'd get 'The,quick,brown,fox'.
Experienced Python programmers tend to use regular expressions sparingly. Often it's better to use tools like .split() and .join() to do the work, and keep regular expressions for where there is no alternative.

Python Parser-Regular Expression

I have two strings in Python:
String1 = "1.451E1^^http://www.test.org/Schema#double"
String2 = "http://www.test.org/m3-lite#AirTemperature"
From String1 i want to extract the number 1.451E1 meaning the field from the start of the string till the ^ symbol.
From String2 i want to extract field AirTemperature meaning the field from the # symbol and after till the end of the string.
Can anyone help me with the the regular expressions for the parser?

If your strings have such clear separators, maybe a simple split is enough?
value = string.split("^^")[0]
measurement = string.split("#")[-1]
If regular expressions are really what you want, ^([0-9E.]+)\^ and #(\w+)$ are an ok start.

Apply multiple regular expressions to a text at same time in python

Assume I have a text containing '.' and ',' and sometimes they both are followed by empty spaces. I need to write a regular expression which removes ['.' and space] or [',' and space] from the whole text. I have the regular expression as mentioned below:-
text = re.sub('[.]+[ ]+', " ", text)
text = re.sub('[,]+[ ]+', " ", text)
Here, I am applying multiple patterns to string multiple times. Is there an efficient way to do this in one pass? Also, the output is stored in the same variable. Is this an efficient way or we do have a copy created in this case. Kindly let me know.
Thanks.

You are already using character sets, put both . and , into one::
text = re.sub('[.,]+ +', " ", text)

since you need to replace only the . or , followed by a space. you could use
text = re.sub('[.,]\s', " ", text)

Replace the single quote (') character from a string

I need to strip the character "'" from a string in python. How do I do this?
I know there is a simple answer. Really what I am looking for is how to write ' in my code. for example \n = newline.

As for how to represent a single apostrophe as a string in Python, you can simply surround it with double quotes ("'") or you can escape it inside single quotes ('\'').
To remove apostrophes from a string, a simple approach is to just replace the apostrophe character with an empty string:
>>> "didn't".replace("'", "")
'didnt'

Here are a few ways of removing a single ' from a string in python.
str.replace
replace is usually used to return a string with all the instances of the substring replaced.
"A single ' char".replace("'","")
str.translate
In Python 2
To remove characters you can pass the first argument to the funstion with all the substrings to be removed as second.
"A single ' char".translate(None,"'")
In Python 3
You will have to use str.maketrans
"A single ' char".translate(str.maketrans({"'":None}))
re.sub
Regular Expressions using re are even more powerful (but slow) and can be used to replace characters that match a particular regex rather than a substring.
re.sub("'","","A single ' char")
Other Ways
There are a few other ways that can be used but are not at all recommended. (Just to learn new ways). Here we have the given string as a variable string.
Using list comprehension
''.join([c for c in string if c != "'"])
Using generator Expression
''.join(c for c in string if c != "'")
Another final method can be used also (Again not recommended - works only if there is only one occurrence )
Using list call along with remove and join.
x = list(string)
x.remove("'")
''.join(x)

Do you mean like this?
>>> mystring = "This isn't the right place to have \"'\" (single quotes)"
>>> mystring
'This isn\'t the right place to have "\'" (single quotes)'
>>> newstring = mystring.replace("'", "")
>>> newstring
'This isnt the right place to have "" (single quotes)'

You can escape the apostrophe with a \ character as well:
mystring.replace('\'', '')

I met that problem in codewars, so I created temporary solution
pred = "aren't"
pred = pred.replace("'", "99o")
pred = pred.title()
pred = pred.replace("99O", "'")
print(pred)
You can use another char combination, like 123456k and etc., but the last char should be letter

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

String substitution by regular expression while excluding quoted strings - python

Related

how to find unknown string from a known pattern ? python re.findall

How can I output a string excluding ALL whitespaces? [duplicate]

Python Parser-Regular Expression

Apply multiple regular expressions to a text at same time in python

Replace the single quote (') character from a string

Categories

Resources