Regular Expression in Python for (\xa0) and (<a).*(>).*(</a>) [duplicate] - python

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Just reading through some code for pre-processing text data, and came across these regex and am struggling to figure out what they mean.
ReviewText = ReviewText.str.replace('(<a).*(>).*(</a>)', '')
ReviewText = ReviewText.str.replace('(\xa0)', ' ')

Well, it looks like they are playing with HTML using regexp . . . generally, folks frown on that but given you are using, not developing we'll ignore that issue for now.
Looks like the first line would take:
Visit W3Schools.com!
and suppress it to nothing.
The second one takes the shown string and changes it to a space.
As the person above stated, you need both the regexp and input to actually do anything with that. Once you have both the regexp and some input, I recommend playing with the input with a regexp checker . . . like here (or equal): https://pythex.org/

Related

Split python string in a specific way [duplicate]

This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Match text between two strings with regular expression
(3 answers)
Closed 5 months ago.
I have a string like a = 'This is an example string that has a code !3377! this is the code I want to extract'.
How can I extract 3377 from this string, i.e., the part surrounded by !?
There are multiple ways of doing what you are looking for. But the most optimal way of doing it would be by using regular expressions.
For example, in the case you gave:
import re
def subtract_code_from(sentence: str) -> str:
m = re.search(r'\w?!(\d+)!\w?', sentence)
return m.group(0)
Keep in mind that what I've done is a very quick and loose solution I implemented in five minutes. I don't know what other types of particular cases you could encounter for each sentence. So it is your job to implement the proper regex to match all the cases.
I encourage you to follow this tutorial. And you can use this website to build your regexes.
Good luck.

Extract a part of a string using Regex in Python Pandas [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I'm a student working on a data science project and I need to extract a part from one column of my dataframe.
The dataframe looks like this :
column.
I want to extract the part HOTHOTVIDEO from a string like "HOTHOTVIDEOHOT0501005107FilmVidéoClub"
So I wrote this instruction using a regex like this :
facturation['annotation']=facturation['annotation'].str.findall('([A-Z0-9]{3}\d+)').apply(''.join)
It extracts everything correclty, except sometimes when I have strings like these : "CTVCANALVODCTV0200052670CTV0200052670", it returns CTV0200052670CTV0200052670, but only want the first occurence: Like this
Can someone help me to fix this issue please :)
I think the problem is in your apply + join and findall methods, because you have matched 2 times this pattern in your data and next you have joined it. findall returns for you list. From the list you need only 1st item, not all.
Well thanks everyone who helped me :) I found the answer :
facturation['annotation'] = facturation['annotation'].str.findall('([A-Z0-9]{3}\d+)').apply(''.join)
facturation['annotation'] = facturation['annotation'].str.extract('(.{0,13})')

get strings between 2 delimiter in python [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I would like to get, from the following string "/path/to/%directory_1%/%directory_2%.csv"
the following list: [directory_1, directory_2]. I would like to avoid using split by "%" my string. I was hoping to find a regex that could help me. However I cannot find the correct one.
For now, I have the following:
re.findall('%(.*)%', dirty_arg)
which output ["directory_1%/%directory_2"]
Do you have any recommandation about that?
Thank you very much for your help.
Try this:
import re
regex = r"%(.*?)%"
dirty_arg = "/path/to/%directory_1%/%directory_2%.csv"
print(re.findall(regex, dirty_arg))
I've added ? to your regex which makes sure it matches as few times as possible. The output of this code is ['directory_1', 'directory_2']

unable to match this regular expression in python [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I am trying to match regular expression using python in this code.
CDS_REGEX = re.compile(r'\+CDS:\s*"([^"]+)",\s*(\d+)$')
cdsiMatch = allLinesMatchingPattern(self.CDS_REGEX, notificationLine)
print cdsiMatch
Matching String:
['+CDS: 24', '079119890400202306A00AA17909913764514010106115225140101061452200']
Please help me i am not able to find my mistake,
As #Blckknght said, are you sure you really want to match that string?
What is ([^"]+) supposed to match?
You're looking for " instead of ' (you probably want ['"]).
You're only checking for numbers here: (\d+), but your long string clearly contains A's.

Splitting an input in to smaller fragments [duplicate]

This question already has an answer here:
Splitting an input in to fragments (Python)
(1 answer)
Closed 9 years ago.
I need to somehow convert a mathematical input(str) to a number,
e.g.
4-3*2-1+5 = ((((4-3)*2)-1)+5).
Current code looks like this:
Answer = input ('Put your answer here: ')
4-3*2-1+5
Somehow, I need to remake the string in to smaller fragments so that it reads from left to right, and to remake the numbers in to integers, but I have no idea how to do it.
I tried doing
Answer.split('+','-','*','/')
But it says TypeError: split() takes at most 2 arguments (4 given)
Also tried adding the answer to a list to see if that helped me at all:
li.append(Answer)
(li = ['4-3*2-1+5']
But I don't see anything beneficial with that..
Please help!
(I'm new to SOF, so if there's any information that's missing, please tell me what and I will try to correct it).
What you need to write is a parser and simple evaulator for simple expressions.
I would start reading any of the following:
http://pyparsing.wikispaces.com/HowToUsePyparsing
http://pyparsing.wikispaces.com/Examples
http://kmkeen.com/funcparserlib/
There are many other parser libraires, but these are just a couple.
You could also just use the rply library which if you have a look at the PyPi page has an example that directly implement and simple expression parser and evaluater just like what you're describing in your question.

Categories