Finding latex class name using python regex [duplicate] - python

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I want to use python regex to find the document class in a latex document.
A latex file contains \documentclass{myclass} somewhere near the top. I want to find myclass using regex.
This is what I've tried so far:
latex_text = "blank /documentclass{myclass} words, more text /documentclassdoc{11} more words"
s=re.search(r'/documentclass{(?P<class_name>.*)}', latex_text)
It matches: myclass} words, more text /documentclassdoc{11
How can I change it to only match myclass. It should also stop searching after it finds a match, as the document can get quite long.
I know the file should only have one documentclass, but I want to handle the case where there is more than 1 as well.

import re
latex_text = "blank /documentclass{myclass} words, more text /documentclassdoc{11} more words"
print(re.search(r'/documentclass\{(.*?)\}', latex_text).group())

Related

Split python string in a specific way [duplicate]

This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Match text between two strings with regular expression
(3 answers)
Closed 5 months ago.
I have a string like a = 'This is an example string that has a code !3377! this is the code I want to extract'.
How can I extract 3377 from this string, i.e., the part surrounded by !?
There are multiple ways of doing what you are looking for. But the most optimal way of doing it would be by using regular expressions.
For example, in the case you gave:
import re
def subtract_code_from(sentence: str) -> str:
m = re.search(r'\w?!(\d+)!\w?', sentence)
return m.group(0)
Keep in mind that what I've done is a very quick and loose solution I implemented in five minutes. I don't know what other types of particular cases you could encounter for each sentence. So it is your job to implement the proper regex to match all the cases.
I encourage you to follow this tutorial. And you can use this website to build your regexes.
Good luck.

Python regex find everything between substring and first space [duplicate]

This question already has answers here:
Python non-greedy regexes
(7 answers)
Closed 2 years ago.
I have a string string = "radios label="Does the command/question above meet the Rules to the left?" name="tq_utt_test" validates="required" gold="true" aggregation="agg"" and I want to be able to extract the substring within the "name". So in this case I want to extract "tq_utt_test" because it is the substring inside name.
I've tried regex re.findall('name=(.*)\s', string) which I thought would extract everything after the substring name= and before the first space. But after running that regex, it actually returned "tq_utt_test" validates="required" gold="true". So seems like it's returning everything between name= and the last space, instead of everything between name= and first space.
Is there a way to twist this regex so that it returns everything after name= and before the first space?
I will just do re.findall('name=([^ ]*)\s', string)

Regex working in text editor(sublime) but not in python [duplicate]

This question already has answers here:
Case insensitive regular expression without re.compile?
(10 answers)
Closed 2 years ago.
I want to extract the line using regex.
The line that I want to extract from document is:
":method":"POST",":path":"/api/browser/projects/8bd4d1d3-0b69-515e-8e15-e9c49992f7d5/buckets/b-ao-mock-testing/copy
The regex I am using is:
":method"[:"a-z,/\d-]{20,1000}/copy
The code for the same in python is:
re.findall('":method"[:"a-z,/\d-]{20,1000}/copy', str(s), re.MULTILINE)
It is working perfectly fine in sublime text but not in python. It is returning an empty list in python. How to resolve this?
You need to use i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]).
Without this how will POST match?
or use ":method"[:"a-zA-Z,/\d-]{20,1000}/copy
See demo

Date regex in a sentence [duplicate]

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 4 years ago.
I'm trying to use the date regex from this post:
^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$
However, I want to find all matches that are also wrapped around white spaces.
For example in this sentence:
I went to Disney World on 11/11/1989 and once more on 12/12/2009
I want to get back:
11/11/1989
12/12/2009
How do I accomplish this? I'm using Python3 regex module if it matters.
If you want to tweak the regex you linked to work in a string like that, change the three ^ and $s to word boundaries (\b) instead:
\b(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))\b|\b(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})\b
https://regex101.com/r/WX5Itv/1

Python: Replacing strings [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 9 years ago.
I'm iterating through pages and I'd like to modify lines containing
<span class="font16"></span>
How can I correct the code below?
text = re.sub(r'<span class="font(.*)"></span><span', r'<span class="font\1"> </span><span', text)
The pattern .* will match anything until the end of line, so the match will look like this:
16"></span>....
which isn't what you want. Use a pattern that stops at the first " (since they aren't allowed inside attribute values which are quoted with "):
r'<span class="font([^"]+)"></span><span'

Categories