What would be the regular expression for such data
i would like to get these
It looks like you're parsing paths, in which case you should really be using os.path instead of regex:
from os.path import basename
# 3A5F.py
It is a simple split, no regex needed:
>>> "/home//Desktop/3A5F.py".split("/")[-1]
As an alternative, you can get same result without regexps:
lines = ['/home//Desktop/3A5F.py', 'path/sth/R67G.py', 'a/b/c/d/t/6UY7.py']
result = [l.split('/')[-1] for l in lines]
print result
# ['3A5F.py', 'R67G.py', '6UY7.py']
use : [^\/]*\.py$
But this is a bad question. You need to show what you have try. Whe are not here to do your work for you.
You can use this.
pattern = ".*/(.*$)"
mystring = "/home//Desktop/3A5F.py"
re.findall(pattern, mystring)
You can also use os.path.split(mystring)
I'm trying to use remove dot (.) from specific following words like com and org for text cleaning using Python e.g.
Input: cnnindonesia.com liputan.org
Output: cnnindonesiacom liputanorg
Anybody has an idea using regex or iterations? Thank you.
You can use .replace() and a list comprehension; regular expressions aren't necessary here:
data = ["cnnindonesia.com", "liputan.org"]
print([url.replace(".com", "com").replace(".org", "org") for url in data])
Try this
input = "cnnindonesia.com liputan.org"
output = input.replace(".", "")
cnnindonesiacom liputanorg
You can split on the '.' and then join it.
input = "cnnindonesia.com liputan.org"
output = input.split(".")
output = ("").join(output)
If you have multiple patterns, re would be useful:
import re
s = "cnnindonesia.com liputan.org example.net twitch.tv"
output = re.sub(r"\.(com|org|net|tv)", r"\1", s)
print(output) # cnnindonesiacom liputanorg examplenet twitchtv
How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
but it doesn't work right, if string will be like
You can use this regex to capture the value preceded by - and followed by : using lookarounds
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
You can use following non-regex because you know example is a 7 letter word:
For any arbitrary word, that would change to:
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
Python strings has built-in function find:
will return:
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
I have a regex "value=4020a345-f646-4984-a848-3f7f5cb51f21"
if re.search( "value=\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*", x ):
x = re.search( "value=\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*\-\w*|\d*", x )
m = x.group(1)
m only gives me 4020a345, not sure why it does not give me the entire "4020a345-f646-4984-a848-3f7f5cb51f21"
Can anyone tell me what i am doing wrong?
try out this regex, looks like you are trying to match a GUID
This should match what you want, if all the strings are of the form you've shown:
You can also use this website to validate your regular expressions:
The below regex works as you expect.
You are trying to match on some hex numbers, that is why this regex is more correct than using [\w\d]
pattern = "value=([0-9a-fA-F]{8}-([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12})"
data = "value=4020a345-f646-4984-a848-3f7f5cb51f21"
res = re.search(pattern, data)
If you dont care about the regex safety, aka checking that it is correct hex, there is no reason not to use simple string manipulation like shown below.
>>> data = "value=4020a345-f646-4984-a848-3f7f5cb51f21"
>>> print(data[7:])
>>> # or maybe
>>> print(data[7:].replace('-',''))
You can get the subparts of the value as a list
txt = "value=4020a345-f646-4984-a848-3f7f5cb51f21"
parts = re.findall('\w+', txt)[1:]
parts is ['4020a345', 'f646', '4984', 'a848', '3f7f5cb51f21']
if you really want the entire string
full = "-".join(parts)
A simple way
full = re.findall("[\w-]+", txt)[-1]
full is 4020a345-f646-4984-a848-3f7f5cb51f21
Try this.Grab the capture.Your regex was not giving the whole as you had used | operator.So if regex on left side of | get satisfied it will not try the latter part.
See demo.
I have a string in python, which is in this format:
How could I trim that down to just 5+5?
I forgot to mention, basically, you just need to look for the first non-numeric character after the operator, and crop everything (starting at that point) off.
This is a simple regular expression:
import re
s = "5+5.[)]1"
s = re.search("\d+\+\d+", s).group()
print(s) # 5+5
This should work.
Using re in Python, I would like to return all of the characters in a string that precede the first appearance of an underscore. In addition, I would like the string that is being returned to be in all uppercase and without any non-alpanumeric characters.
For example:
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2 = TLAV1
I am pretty sure I know how to return a string in all uppercase using string.upper() but I'm sure there are several ways to remove the . efficiently. Any help would be greatly appreciated. I am still learning regular expressions slowly but surely. Each tip gets added to my notes for future use.
To further clarify, my above examples aren't the actual strings. The actual string would look like:
With my desired output looking like:
And the next example would be the same. String:
Desired output:
Again, thanks all for the help!
Even without re:
text.split('_', 1)[0].replace('.', '').upper()
Try this:
re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())
Since everyone is giving their favorite implementation, here's mine that doesn't use re:
>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
... print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
I put .upper() on the outside of the generator so it is only called once.
You don't have to use re for this. Simple string operations would be enough based on your requirements:
tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2 = TLAV1
for t in tests.splitlines():
print t[:t.find('_')].replace('.', '').upper()
# Returns:
# AGAV08
Or if you absolutely must use re:
import re
pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)
for t in tests.splitlines():
print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()
# Returns:
# AGAV08
He, just for fun, another option to get text before the first underscore is:
before_underscore, sep, after_underscore = str.partition('_')
So all in one line could be:
re.sub("[^A-Z\d]", "", str.partition('_')[0].upper())
import re
re.sub("[^A-Z\d]", "", yourstr.split('_',1)[0].upper())