Python using AND with regex - python

So I have looked around the internet for like 20 minutes and I haven't been able to figure it out. Is it possible to use AND in regex, or something similar (I've just started learning about regex)?
For example, I have the string "finksdssfsk32residogs" and I want to get the output: "32 dogs". I've tried using re.search, re.match, and re.findall but I haven't had any luck. I've tried things like:
re.findall(r"(\d{2})(dogs)", str)
re.search(r"(\d{2})(dogs)", str)
And I've tried a few combinations of each. And I know I can do this with multiple lines but the goal is to get "32 dogs" from "finksdssfsk32residogs" with only one line. Any help is appreciated, thanks.

You've almost got it. You just need some space between the numbers and the dogs.
Can you just match anything? How about (\d{2}).*(dogs)? Then you can replace the middle part with a space using join:
>>> print(' '.join(re.search(r'(\d{2}).*(dogs)', 'finksdssfsk32residogs').groups()))
32 dogs

Related

Extract values in name=value lines with regex

I'm really sorry for asking because there are some questions like this around. But can't get the answer fixed to make problem.
This are the input lines (e.g. from a config file)
profile2.name=share2
profile8.name=share8
profile4.name=shareSSH
profile9.name=share9
I just want to extract the values behind the = sign with Python 3.9. regex.
I tried this on regex101.
^profile[0-9]\.name=(.*?)
But this gives me the variable name including the = sign as result; e.g. profile2.name=. But I want exactly the inverted opposite.
The expected results (what Pythons re.find_all() return) are
['share2', 'share8', 'shareSSH', 'share9']
Try pattern profile\d+\.name=(.*), look at Regex 101 example
import re
re.findall('profile\d+\.name=(.*)', txt)
# output
['share2', 'share8', 'shareSSH', 'share9']
But this problem doesn't necessarily need regex, split should work absolutely fine:
Try removing the ? quantifier. It will make your capture group match an empty st
regex101

How to get python to search for whole numbers in a string-not just digits

Okay please do not close this and send me to a similar question because I have been looking for hours at similar questions with no luck.
Python can search for digits using re.search([0-9])
However, I want to search for any whole number. It could be 547 or 2 or 16589425. I don't know how many digits there are going to be in each whole number.
Furthermore I need it to specifically find and match numbers that are going to take a form similar to this: 1005.2.15 or 100.25.1 or 5.5.72 or 1102.170.24 etc.
It may be that there isn't a way to do this using re.search but any info on what identifier I could use would be amazing.
Just use
import re
your_string = 'this is 125.156.56.531 and this is 0540505050.5 !'
result = re.findall(r'\d[\d\.]*', your_string)
print(result)
output
['125.156.56.531', '0540505050.5']
Assuming that you're looking for whole numbers only, try re.search(r"[0-9]+")

Replcae inner space in Python

Iam new to Python, And I need to remove space between string and a digit only not between two strings.
eg:
Input : Paragraph 25 is in documents and paragraph number in another file.
Output : Paragraph25 is in documents and paragraph number in another file.
How this can be done in Python ? I tried regex
re.sub("paragraph\s[a-z]", "paragraph[a-z]", Input)
But its not working.
>>> re.sub(r'\s+(\d+)', r'\1', 'Program 25 is fun')
'Program25 is fun'
That might work in a pinch. I'm not the most familiar with regexes, so hopefully someone who is can chime in with something more robust.
Basically we match on whitespace succeeded by numbers and remove it.

How to apply string method on regular expression in Python

I'm having a markdown file wich is a little bit broken: the links and images which are too long have line-breaks in it. I would like to remove line-breaks from them.
Example:
from:
See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to
![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)
_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
to:
See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to
![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)
_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_
As you can see in this snippet, I managed to match the all links and images with the right pattern: https://regex101.com/r/uL8pO4/2
But now, what is the syntax in Python to use a string method like string.trim() on what I have captured with regular expression?
For the moment, I'm stuck with this:
fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])
Edit: I updated the example to be more explicit about my problem.
Thank you for your answer
strip would work similar to functionality of trim. As you would need to trim the new lines, use strip('\n'),
fin.readline.strip('\n')
This will work also:
>>> s = """
... ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """
>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>>
Often times built-in string functions will do, and are easier to read than figuring out regexes. In this case strip removes leading and trailing space, then split returns a list of items between newlines, and join puts them back together in a single string.
Alright, I finally found what I was searching. With the snippet below, I could capture a string with a regex and then apply the treatment on each of them.
def remove_newlines(match):
return "".join(match.group().strip().split('\n'))
links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])
Thank you for your answers and sorry if my question wasn't explicit enough.

Can someone translate this code into Pseudo Code or something I can understand? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
I am not very familiar with regex(s) and would like somebody to put this into something that I will be able to understand? As in, outline what each part of the regex is doing
re.compile(r'ATG((?:[ACTG]{3})+?)(?:TAG|TAA|TGA)')
So far, this is what I have come up with:
re.compile is a regex method... or something along those lines
r' is simply needed in regex
After that, I'm not too sure...
Searches for a piece in the string ATG
?:[ACTG]{3} searches for a piece of the string containing the characters A C T G within the string (does the order of these matter?) that is {3} three characters long.
+? something about going at least once, but minimal times...? What would part of code would be going at least once, but minimal times?
?: searches for TAG|TAA|TGAwithin the string. Once it finds these, what does happens?
Would I be able to do something like
key_words = "TAG TAA TGA".replace(" ", "|") so that I can have a whole long list without having to type of | a bunch of times if I have over 100 substrings?
I would then format this to something like this:
...(?:key_words)')
Examples and simple explanations always work wonders - thanks!
You can use regex101 to have it explained step by step.

Categories