match everything after a slash and without the slash [duplicate] - python

This question already has answers here:
Split a string by backslash in python
(6 answers)
Closed 6 years ago.
I am working with regular expression with the module re in python. I am supose to match everything before a slash, put the match in a variable, and match everything after a slash, and put it in another variable.
For example:
for the string
"NlaIII/Csp6I"
I would like to match NlaIII and store it in a variable and match Csp6I and store it in another variable
variable_1 = "NlaIII"
variable_2 = "Csp6I"
Using python module re, I have been able to match everything before the slash with the following regular expression:
first_enzyme = re.compile('.+?(?=\W+)')
But I am completely unable to everything after a backslash without the backslash
Thank you very much for your help!

You don't need a regex for that at all.
s = "NlaIII/Csp6I"
variable_1, variable_2 = s.split('/')

Related

How does this regex remove punctuation pattern work? [duplicate]

This question already has answers here:
Carets in Regular Expressions
(2 answers)
Closed 11 months ago.
I'm currently learning a bit of regex in python in a course I'm doing online and I'm struggling to understand a particular expression - I've been searching the python re docs and not sure why I'm returning the non-punctuation elements rather than the punctuation.
The code is:
import re
test_phrase = "This is a sentence, with! unnecessary: punctuation."
punc_remove = re.findall(r'[^,!:]+',test_phrase)
punc_reomve
OUTPUT: ['This is a sentence',' with',' unnecessary',' punctuation.']
I think I understand what each character does. I.e. [] is a character set, and ^ means starts with. So anything starting with ,!: will be returned? (or at least that's how I'm probably mistakingly interpreting it) And the + will return one of more of the pattern. But why is the output not returning something like:
OUTPUT: [', with','! unnecessary',': punctuation.']
Any explanation really appreciated!
Inside a character class, a ^ does not mean ‘start with’: it means ‘not’. So the RegEx matches sequences of one or more non-,1: characters.

Is there any way to account for all delimiters in a string in Python? [duplicate]

This question already has answers here:
Python - How to split a string by non alpha characters
(8 answers)
Closed 2 years ago.
I'm trying to create a word count for a book (.txt file) and I'm trying to split each line into its separate words using this:
temp = re.split('[; |, |\*|\n| |\|:|.|’|"|&|#|$|(|)|]|//|'']', line)
However, this isn't working because every time I run the program, I have to add another delimiter to the list. This time I have to add '-' and '%'. I remember doing something similar in Java where I could specify a 'range' of delimiters and when I tried the same thing here, it didn't seem to work.
Is there any better way to do this and make sure I just get the word and nothing else?
I think you're looking for \W, the set of all non-word characters, i.e. not a letter, digit, or underscore.
i.e.
temp = re.split('\W+', line)
By the way, characters inside a regex character set are mostly literal. Yours boils down to this:
[; |,*\n:.’"&#$()]/']

How to use escape characters in Python? [duplicate]

This question already has answers here:
How to escape “\” characters in python
(4 answers)
Closed 4 years ago.
I am totally confused with the escape characters in Python. Sometimes I expect it to output one single '/', it prints '//'; Sometimes I use '//' as '/' it works, but other times it doesn't; And so on so on...
Some example:
print('\\hello') #output --> \hello
print(['\\hello']) #output --> ['\\hello']
So how should I understand '\hello', as '\hello' or '\\hello'? Can anyone explain the mechanism of escape characters more generally?
Firstly there is the question of getting the right characters into your strings. Then there is the question of how Python displays your string. The same string can be displayed in two different ways.
>>> s = '\\asd'
>>> s
'\\asd'
>>> print(s)
\asd
In this example the string only has one slash. We use two slashes to create it but that results in a string with one slash. We can see that there's only one slash when we print the string.
But when we display the string simply by typing s we see two slashes. Why is that? In that situation the interpreter shows the repr of the string. That is it shows us the code that would be needed to make the string - we need to use quotes and also two slashes on our code to make a string that then has one slash (as s does).
When you print a list with a string as an element we will see the repr of the string inside the list:
>>> print([s])
['\\asd']

Replace as raw string in Python [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 6 years ago.
I am replacing string content as:
re.sub(all, val, parsedData['outData'])
where all contains some round braces and might contain other characters.
>>> print all
PICDSPVERS="DspFw:1.0008(1.0008),Fpga1:2.0925(2.0925),Fpga2:1.0404(1.0404),Mcu:1.0000(1.0000)"
Because of which matching fails. The pattern is coming from some interface, so I don't want to put \\ in the data.
I tried with 'r' and re.U option also, but still the match fails.
re.search('PICDSPVERS="DspFw:1.0008(1.0008)', parsedData['outData'])
How can we direct Python to treat a matching pattern as a string?
I am using Python 2.x.
If you don't want the matching pattern to be treated as a regular expression, then don't use re.sub. For plain strings, use str.replace(), like so:
new_outData = parsedData['outData'].replace(all, val)

Regex not working to get string between 2 strings. Python 27 [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 3 years ago.
From this URL view-source:https://www.amazon.com/dp/073532753X?smid=A3P5ROKL5A1OLE
I want to get string between var iframeContent = and obj.onloadCallback = onloadCallback;
I have this regex iframeContent(.*?)obj.onloadCallback = onloadCallback;
But it does not work. I am not good at regex so please pardon my lack of knowledge.
I even tried iframeContent(.*?)obj.onloadCallback but it does not work.
It looks like you just want that giant encoded string. I believe yours is failing for two reasons. You're not running in DOTALL mode, which means your . won't match across multiple lines, and your regex is failing because of catastrophic backtracking, which can happen when you have a very long variable length match that matches the same characters as the ones following it.
This should get what you want
m = re.search(r'var iframeContent = \"([^"]+)\"', html_source)
print m.group(1)
The regex is just looking for any characters except double quotes [^"] in between two double quotes. Because the variable length match and the match immediately after it don't match any of the same characters, you don't run into the catastrophic backtracking issue.
I suspect that input string lies across multiple lines.Try adding re.M in search line (ie. re.findall('someString', text_Holder, re.M)).
You could try this regex too
(?<=iframeContent =)(.*)(?=obj.onloadCallback = onloadCallback)
you can check at this site the test.
Is it very important you use DOTALL mode, which means that you will have single-line

Categories