I need to extract only particular pattern from a string using python - python

As i need to extract only particular pattern from string:
import re
string='/x/eng/wcov/Job148666--rollup_generic/Job148674--ncov_aggregate/Job148678--run_command/Job148678.info: devN_180107_2035'
line2=re.findall(r'(?:/\w*)' ,string)
print(line2)
I'm getting output as below:
['/x', '/eng', '/wcov', '/Job148666', '/Job148674', '/Job148678', '/Job148678']
But actual output i required is:
/x/eng/wcov/Job148666--rollup_generic/Job148674--ncov_aggregate/Job148678--run_command/Job148678.info

Try using split() function
string='/x/eng/wcov/Job148666--rollup_generic/Job148674--ncov_aggregate/Job148678--run_command/Job148678.info: devN_180107_2035'
sp=string.split(':')[0]

Does the string always end with :? Then use this
str.split(":", 1)[0]

Related

Python Regex - How do I fetch a word after a specific word in a string using python regex?

I need to fetch "repo-name" which is "sonar-repo" from the above multi-line commit string. Can this be achieved with regex? Output Expected: sonar-repo
Here is the string which I need to read using regex,
commit_message=
"""repo-name=sonar-repo;repo-title=Sonar;repo-description=A little demo;repo-requester=Jack
"""
You should be able to use regex to look for repo-name= and then look for the ; right after and get what's inbetween. Something like this:
(?<=repo-name=).*?(?=;)
Tested it here with regex101
Try this:
import re
commit_message= 'repo-name=sonar-repo;repo-title=Sonar;repo-description=A little demo;repo-requester=Jack'
print(re.search(r'repo-name=(.*?);', commit_message).group(1))
Output:
sonar-repo

Python String .strip() function returning wrong output

I have the following string
'file path = data/imagery/256:0:10.0:34:26:-1478/256:0:10.0:34:26:-1478_B02_10m.tif'
I am trying to get 256:0:10.0:34:26:-1478_B02_10m.tif from the string above
but if I run
os.path.splitext(filepath.strip('data/imagery/256:0:10.0:34:26:-1478'))[0]
It outputs '_B02_10m'
Same with filepath.rstrip('data/imagery/256:0:10.0:34:26:-1478')
Assuming you want all the string data after the / you can always use string.split. This spits your string into a list of strings split on the split string. Then you would only need the final item of this list.
string_var.split("/")[:-1]
See more official python docs on string.split here.
Python's strip doesn't strip the string in the argument but uses it as a list of characters to remove from the original string see: https://docs.python.org/3/library/stdtypes.html#str.strip
EDIT: This doesn't provide a meaningful solution, see accepted answer.
Instead of using strip you should use string.split()
Following piece of code gets you the required substring:
filepath = "data/imagery/256:0:10.0:34:26:-1478/256:0:10.0:34:26:-1478_B02_10m.tif"
print(filepath.split('/')[-1])
Output:
256:0:10.0:34:26:-1478_B02_10m.tif

How to get everything after string x in python

I have a string:
s3://tester/test.pdf
I want to exclude s3://tester/ so even if i have s3://tester/folder/anotherone/test.pdf I am getting the entire path after s3://tester/
I have attempted to use the split & partition method but I can't seem to get it.
Currently am trying:
string.partition('/')[3]
But i get an error saying that it out of index.
EDIT: I should have specified that the name of the bucket will not always be the same so I want to make sure that it is only grabbing anything after the 3rd '/'.
You can use str.split():
path = 's3://tester/test.pdf'
print(path.split('/', 3)[-1])
Output:
test.pdf
UPDATE: With regex:
import re
path = 's3://tester/test.pdf'
print(re.split('/',path,3)[-1])
Output:
test.pdf
Have you tried .replace?
You could do:
string = "s3://tester/test.pdf"
string = string.replace("s3://tester/", "")
print(string)
This will replace "s3://tester/" with the empty string ""
Alternatively, you could use .split rather than .partition
You could also try:
string = "s3://tester/test.pdf"
string = "/".join(string.split("/")[3:])
print(string)
To answer "How to get everything after x amount of characters in python"
string[x:]
PLEASE SEE UPDATE
ORIGINAL
Using the builtin re module.
p = re.search(r'(?<=s3:\/\/tester\/).+', s).group()
The pattern uses a lookbehind to skip over the part you wish to ignore and matches any and all characters following it until the entire string is consumed, returning the matched group to the p variable for further processing.
This code will work for any length path following the explicit s3://tester/ schema you provided in your question.
UPDATE
Just saw updates duh.
Got the wrong end of the stick on this one, my bad.
Below re method should work no matter S3 variable, returning all after third / in string.
p = ''.join(re.findall(r'\/[^\/]+', s)[1:])[1:]

How to get string before hyphen

I have below filename:
pagecounts-20150802-000000
I want to extract the date out of above 20150802
I am using the below code but its not working:
print os.path.splitext("pagecounts-20150802-000000")[0]
The methods in os.path are mainly used for path string manipulation. You want to use string splitting:
print 'pagecounts-20150802-000000'.split('-')[1]

Regex to grab number in line

I have an html file that I am reading the below line from. I would like to grab only the number that appears after the ':' and before the ',' using REGEX... THANKS IN ADVANCE
"totalPages":15,"bloodhoundHtml"
"totalPages":([0-9]*),
You can see the Demo here
Then the python code is
import re
p = re.compile('"totalPages":([0-9]*),')
print p.findall('"totalPages":15,"bloodhoundHtml"')
you can try :\d+, to get the ':15,'
then you can trim first':' and trim end ',' to get the pure numbers,
I don't know if python can use variable in the regex, I'm a c# programe, in c#, I can use :(?<id>\d+), to match this string, and get the number directly by result.group["id"]
:\d{1,},
Also works for parsing the line you gave. According to this post, you might run into some trouble parsing the HTML

Categories