How to get string before hyphen - python

I have below filename:
pagecounts-20150802-000000
I want to extract the date out of above 20150802
I am using the below code but its not working:
print os.path.splitext("pagecounts-20150802-000000")[0]

The methods in os.path are mainly used for path string manipulation. You want to use string splitting:
print 'pagecounts-20150802-000000'.split('-')[1]

Related

How to get everything after string x in python

I have a string:
s3://tester/test.pdf
I want to exclude s3://tester/ so even if i have s3://tester/folder/anotherone/test.pdf I am getting the entire path after s3://tester/
I have attempted to use the split & partition method but I can't seem to get it.
Currently am trying:
string.partition('/')[3]
But i get an error saying that it out of index.
EDIT: I should have specified that the name of the bucket will not always be the same so I want to make sure that it is only grabbing anything after the 3rd '/'.
You can use str.split():
path = 's3://tester/test.pdf'
print(path.split('/', 3)[-1])
Output:
test.pdf
UPDATE: With regex:
import re
path = 's3://tester/test.pdf'
print(re.split('/',path,3)[-1])
Output:
test.pdf
Have you tried .replace?
You could do:
string = "s3://tester/test.pdf"
string = string.replace("s3://tester/", "")
print(string)
This will replace "s3://tester/" with the empty string ""
Alternatively, you could use .split rather than .partition
You could also try:
string = "s3://tester/test.pdf"
string = "/".join(string.split("/")[3:])
print(string)
To answer "How to get everything after x amount of characters in python"
string[x:]
PLEASE SEE UPDATE
ORIGINAL
Using the builtin re module.
p = re.search(r'(?<=s3:\/\/tester\/).+', s).group()
The pattern uses a lookbehind to skip over the part you wish to ignore and matches any and all characters following it until the entire string is consumed, returning the matched group to the p variable for further processing.
This code will work for any length path following the explicit s3://tester/ schema you provided in your question.
UPDATE
Just saw updates duh.
Got the wrong end of the stick on this one, my bad.
Below re method should work no matter S3 variable, returning all after third / in string.
p = ''.join(re.findall(r'\/[^\/]+', s)[1:])[1:]

I need to extract only particular pattern from a string using python

As i need to extract only particular pattern from string:
import re
string='/x/eng/wcov/Job148666--rollup_generic/Job148674--ncov_aggregate/Job148678--run_command/Job148678.info: devN_180107_2035'
line2=re.findall(r'(?:/\w*)' ,string)
print(line2)
I'm getting output as below:
['/x', '/eng', '/wcov', '/Job148666', '/Job148674', '/Job148678', '/Job148678']
But actual output i required is:
/x/eng/wcov/Job148666--rollup_generic/Job148674--ncov_aggregate/Job148678--run_command/Job148678.info
Try using split() function
string='/x/eng/wcov/Job148666--rollup_generic/Job148674--ncov_aggregate/Job148678--run_command/Job148678.info: devN_180107_2035'
sp=string.split(':')[0]
Does the string always end with :? Then use this
str.split(":", 1)[0]

In my date time value I want to use regex to strip out the slash and colon from time and replace it with underscore

I am using Python, Webdriver for my automated test. My scenario is on the Admin page of our website I click Add project button and i enter a project name.
Project Name I enter is in the format of LADEMO_IE_05/20/1515:11:38
It is a date and time at the end.
What I would like to do is using a regex I would like to find the / and :
and replace them with an underscore _
I have worked out the regex expression:
[0-9]{2}[/][0-9]{2}[/][0-9]{4}:[0-9]{2}[:][0-9]{2}
This finds 2 digits then / followed by 2 digits then / and so on.
I would like to replace / and : with _.
Can I do this in Python using import re? I need some help with the syntax please.
My method which returns the date is:
def get_datetime_now(self):
dateTime_now = datetime.datetime.now().strftime("%x%X")
print dateTime_now #prints e.g. 05/20/1515:11:38
return dateTime_now
My code snippet for entering the project name into the text field is:
project_name_textfield.send_keys('LADEMO_IE_' + self.get_datetime_now())
The Output is e.g.
LADEMO_IE_05/20/1515:11:38
I would like the Output to be:
LADEMO_IE_05_20_1515_11_38
Just format the datetime using strftime() into the desired format:
>>> datetime.datetime.now().strftime("%m_%d_%y%H_%M_%S")
'05_20_1517_20_16'
Another simple option is just using string replace :
s = "your time string"
s = s.replace("/", "_").replace(":", "_")
Two ways:
i) use strftime with the format:
strftime("%m_%d_%y_%H_%M_%S")
ii) simply use replace() method of strings to replace '/' and ':' to '_'
Basically, you want ton replace every unadvised character by an underscore. To do it, instead of using regex, you could simply use the str.replace method. For example:
out_string = in_string.replace('/', '_').replace(':', '_')
In this example, the first replace returns a string with all the slash replaced, and the second call replace the colons. I think it's the simplest way for replacing one or two characters. But, if you want your program to be able to evolve, I advise you using re.sub, as follows:
# first we compile the regex, for speed sake
# this regex match every one of the bad characters, and it's modular: just add one, in case
bad_characters = re.compile(r'/|:')
# your code
# replacement
out_string = re.sub(bad_characters, '_', in_string)

Python replace with re-using unknown strings

I have an XML in which I'd like to rename one of the tag groups like this:
<string>ABC</string>
<string>unknown string</string>
should be
<xyz>ABC</xyz>
<xyz>unknown string</xyz>
ABC is always the same, so that's no issue. However, "unknown string" is always different, but since I need this information extracted, I also want to keep the same string in the replacement.
Here's what I got so far:
import re
#open the xml file for reading:
file = open('path/file','r+')
#convert to string:
data = file.read()
file.write(re.sub("<string>ABC</string>(\s+)<string>(.*)</string>","<xyz>ABC</xyz>[\1]<xyz>[\2]</xyz>",data))
print (data)
file.close()
I tried to use capture groups, but didn't do it correctly. The string is replaced with weird symbols in my XML. Plus, it's printed twice. I have both the unchanged and the changed version in my XML, which I don't want.
The problem you're experiencing is not due to your regex pattern. The backslash (\) in the strings are escaping proceeding characters thus resulting in the weird symbols that you see.
>>> print "hello\1world"
helloworld
>>> print r"hello\1world"
hello\1world
Always use the raw string notation to define your re patterns.
>>> data = """
... <string>ABC</string>
... <string>unknown string</string>
... """
>>> print re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data)
<xyz>ABC</xyz>
<xyz>unknown string</xyz>
Why are you including the content in your replacement operation? All you need to do is:
Replace <string> by <xyz>.
Replace </string> by </xyz>.
It would take two operations but the intent of your code would be clear and you don't need to know what unknown string is.

Compare & manipulate strings with python

I've written an XML parser in Python and have just added functionality to read a further script from a different directory.
I've got two args, first is the path where I'm parsing XML. Second is a string in another XML file which I want to match with the first path;
arg1 = \work\parser\main\tools\app\shared\xml\calculators\2012\example\calculator
path = calculators/2012/example/calculator
How can I compare the two strings to match identify that they're both referencing the same thing and also, how can I strip calculator from either string so I can store that & use it?
edit
Just had a thought. I have used a Regex to get the year out of the path already with year = re.findall(r"\.(\d{4})\.", path) following a problem Python has with numbers when converting the path to an import statement.
I could obviously split the strings and use a regex to match the path as a pattern in arg1 but this seems a long way round. Surely there's a better method?
Here I am assuming you are actually talking about strings, and not file paths - for which #mgilson's suggestion is better
How can I compare the two strings to match identify that they're both
referencing the same thing
Well first you need to identify what you mean by "the same thing"
At first glance it seems that if the the second string ends with the first string with the reversed slash, you have a match.
arg1 = r'\work\parser\main\tools\app\shared\xml\calculators\2012\example\calculator'
arg2 = r'calculators/2012/example/calculator'
>>> arg1.endswith(arg2.replace('/','\\'))
True
and also, how can I strip calculator from
either string so I can store that & use it?
You also need to decide if you want to strip the first calculator, the last calculator or any occurance of calculator in the string.
If you just want to remove the last string after the separator, then its simply:
>>> arg2.split('/')[-1]
'calculator'
Now to get the orignal string back, without the last bit:
>>> '/'.join(arg2.split('/')[:-1])
'calculators/2012/example'
check out os.path.samefile:
http://docs.python.org/library/os.path.html#os.path.samefile
and os.path.dirname:
http://docs.python.org/library/os.path.html#os.path.dirname
or maybe os.path.basename (I'm not sure what part of the string you want to keep).
Here, try this:
arg1 = "\work\parser\main\tools\app\shared\xml\calculators\2012\example\calculator"
path = "calculators/2012/example/calculator"
arg1=arg1.replace("/","\\")
path=path.replace("/","\\")
if str(arg1).endswith(str(path)) or str(path).endswith(str(arg1)):
print "Match"
That should work for your needs. Cheers :)

Categories