How to find a string between to special characters in python? - python

I have a set of strings like this:
uc001acu.2;C1orf159;chr1:1046736-1056736;uc001act.2;C1orf159;
I need to extract the sub-string between two semicolons and I only need the first occurrence.
The result should be: C1orf159
I have tried this code, but it does not work:
import re
info = "uc001acu.2;C1orf159;chr1:1046736-1056736;uc001act.2;C1orf159;"
name = re.search(r'\;(.*)\;', info)
print name.group()
Please help me.
Thanks

You can split the string and limit it to two splits.
x = info.split(';',2)[1]

import re
pattern=re.compile(r".*?;([a-zA-Z0-9]+);.*")
print pattern.match(info).groups()
This looks for first ; eating up non greedily through .*? .Then it captures the alpha numeric string until next ; is found.Then it eats up the rest of the string.Match captured though .groups()

Related

trim a string of text, after a hyphen media in python

I just started with python, now I see myself needing the following, I have the following string:
1184-7380501-2023-183229
what i need is to trim this string and get only the following characters after the first hyphen. it should be as follows:
1184-738
how can i do this?
s = "1184-7380501-2023-183229"
print(s[:8])
Or perhaps
import re
pattern = re.compile(r'^\d+-...')
m = pattern.search(s)
print(m[0])
which accommodates variable length numeric prefixes.
You could (you can do this a lot of different ways) use partition() and join()...
"".join([token[:3] if idx == 2 else token for idx, token in enumerate("1184-7380501-2023-183229".partition("-"))])

How to extract function name python regex

Hello I am trying to extract the function name in python using Regex however I am new to Python and nothing seems to be working for me. For example: if i have a string "def myFunction(s): ...." I want to just return myFunction
import re
def extractName(s):
string = []
regexp = re.compile(r"\s*(def)\s+\([^\)]*\)\s*{?\s*")
for m in regexp.finditer(s):
string += [m.group()]
return string
Assumption: You want the name myFunction from "...def myFunction(s):..."
I find something missing in your regex and the way it is structured.
\s*(def)\s+\([^\)]*\)\s*{?\s*
Lets look at it step by step:
\s*: match to zero or more white spaces.
(def): match to the word def.
\s+: match to one or more white spaces.
\([^\)]*\): match to balanced ()
\s*: match to zero or more white spaces.
After that pretty much doesn't matter if you are going for just the name of the function. You are not matching the exact thing you want out of the regex.
You can try this regex if you are interested in doing it by regex:
\s*(def)\s([a-zA-Z]*)\([a-zA-z]*\)
Now the way I have structured the regex, you will get def myFunction(s) in group0, def in group1 and myFunction in group2. So you can use the following code to get you result:
import re
def extractName(s):
string = ""
regexp = re.compile(r"(def)\s([a-zA-Z]*)\([a-zA-z]*\)")
for m in regexp.finditer(s):
string += m.group(2)
return string
You can check your regex live by going on this site.
Hope it helps!

re.compile() python: issue in getting particular pattern

I have been trying to extract particular pattern, which looks like (PSSA) or (FJFD10) in a string.
In a string like this, I want to extract for instance something inside that parentheses (PNDM) in this case. However, I wanted to print it without parentheses.
eg_string = """DAAAAAAJFF: Hellllllllo (PNDM)
CC [MIM:606176]: Blalblablalbalbl. {CCO:0000069|Pubd:160,
CC ECO:0000269|PubMed:18162506}. Note=elllelefjfjfjf HAahndfd
"""
What I did was:
patti = re.compile(r'([A-Z]+)')
www = patti.findall(eg_string)
However, this was giving me more than I needed. It did include PNDM, but it also included like DAAAJFF, ECO
Another thing I tried was r'(^[A-Z]+) I knew it was going to print out DAAAAAJFF only. I want to know how to print (PNDM) which is in the middle of the string.
Use the regex: r"\([A-Z]+\)" to get text results for including ().
Demo: https://regex101.com/r/e2gyly/1
Explanation:
\( - will look for opening brace (
[A-Z] - any char between range A to Z
\) - closing brace )
Here ([A-Z]+) is consider as pattern like A-Z any number of times but you need to change it as \(([A-Z]+)\)
Your Code will be like
import re
eg_string = """DAAAAAAJFF: Hellllllllo (PNDM)
CC [MIM:606176]: Blalblablalbalbl. {CCO:0000069|Pubd:160,
CC ECO:0000269|PubMed:18162506}. Note=elllelefjfjfjf HAahndfd
"""
patti = re.compile(r'\(([A-Z]+)\)')
www = patti.findall(eg_string)
print(www)
#Output : ['PNDM']
Hope this will Help...

Python replace regex

I have a string in which there are some attributes that may be empty:
[attribute1=value1, attribute2=, attribute3=value3, attribute4=]
With python I need to sobstitute the empty values with the value 'None'. I know I can use the string.replace('=,','=None,').replace('=]','=None]') for the string but I'm wondering if there is a way to do it using a regex, maybe with the ?P<name> option.
You can use
import re
s = '[attribute1=value1, attribute2=, attribute3=value3, attribute4=]'
re.sub(r'=(,|])', r'=None\1', s)
\1 is the match in parenthesis.
With python's re module, you can do something like this:
# import it first
import re
# your code
re.sub(r'=([,\]])', '=None\1', your_string)
You can use
s = '[attribute1=value1, attribute2=, attribute3=value3, attribute4=]'
re.sub(r'=(?!\w)', r'=None', s)
This works because the negative lookahead (?!\w) checks if the = character is not followed by a 'word' character. The definition of "word character", in regular expressions, is usually something like "a to z, 0 to 9, plus underscore" (case insensitive).
From your example data it seems all attribute values match this. It will not work if the values may start with something like a comma (unlikely), may be quoted, or may start with anything else. If so, you need a more fool proof setup, such as parse from the start: skipping the attribute name by locating the first = character.
Be specific and use a character class:
import re
string = "[attribute1=value1, attribute2=, attribute3=value3, attribute4=]"
rx = r'\w+=(?=[,\]])'
string = re.sub(rx, '\g<0>None', string)
print string
# [attribute1=value1, attribute2=None, attribute3=value3, attribute4=None]

What should I use the Non-greedy match in this case

Assume I have a string which includes some data fields that are separated by "|", like
|1|2|3|4|5|6|7|8|
My purpose is to get the 8th field. This is what I'm doing:
pattern = re.compile(r'^\s+(\|.*?\|){8}')
match = pattern.match(test_line)
if match:
print:match.group(8)
But looks like it can not match. I know in this case I need to use ? for non-greedy match, but why I can not get the 8th field?
Thanks
Regex might be complicating this problem rather than simplifying it. A simple way to get an eighth item from a | delimited string is using split():
a = '|here|is|some|data|separated|by|bars|hooray!|'
print a.split('|')[8]
RETURNS
hooray!
Using regex, one way to get it would be:
import re
a = '|here|is|some|data|separated|by|bars|hooray!|'
pattern = re.compile(r'([^\|]+)')
match = pattern.findall(a)
print match[7]
RETURNS
hooray!

Categories