How to get everything after string x in python - python

I have a string:
s3://tester/test.pdf
I want to exclude s3://tester/ so even if i have s3://tester/folder/anotherone/test.pdf I am getting the entire path after s3://tester/
I have attempted to use the split & partition method but I can't seem to get it.
Currently am trying:
string.partition('/')[3]
But i get an error saying that it out of index.
EDIT: I should have specified that the name of the bucket will not always be the same so I want to make sure that it is only grabbing anything after the 3rd '/'.

You can use str.split():
path = 's3://tester/test.pdf'
print(path.split('/', 3)[-1])
Output:
test.pdf
UPDATE: With regex:
import re
path = 's3://tester/test.pdf'
print(re.split('/',path,3)[-1])
Output:
test.pdf

Have you tried .replace?
You could do:
string = "s3://tester/test.pdf"
string = string.replace("s3://tester/", "")
print(string)
This will replace "s3://tester/" with the empty string ""
Alternatively, you could use .split rather than .partition
You could also try:
string = "s3://tester/test.pdf"
string = "/".join(string.split("/")[3:])
print(string)

To answer "How to get everything after x amount of characters in python"
string[x:]

PLEASE SEE UPDATE
ORIGINAL
Using the builtin re module.
p = re.search(r'(?<=s3:\/\/tester\/).+', s).group()
The pattern uses a lookbehind to skip over the part you wish to ignore and matches any and all characters following it until the entire string is consumed, returning the matched group to the p variable for further processing.
This code will work for any length path following the explicit s3://tester/ schema you provided in your question.
UPDATE
Just saw updates duh.
Got the wrong end of the stick on this one, my bad.
Below re method should work no matter S3 variable, returning all after third / in string.
p = ''.join(re.findall(r'\/[^\/]+', s)[1:])[1:]

Related

Python String .strip() function returning wrong output

I have the following string
'file path = data/imagery/256:0:10.0:34:26:-1478/256:0:10.0:34:26:-1478_B02_10m.tif'
I am trying to get 256:0:10.0:34:26:-1478_B02_10m.tif from the string above
but if I run
os.path.splitext(filepath.strip('data/imagery/256:0:10.0:34:26:-1478'))[0]
It outputs '_B02_10m'
Same with filepath.rstrip('data/imagery/256:0:10.0:34:26:-1478')
Assuming you want all the string data after the / you can always use string.split. This spits your string into a list of strings split on the split string. Then you would only need the final item of this list.
string_var.split("/")[:-1]
See more official python docs on string.split here.
Python's strip doesn't strip the string in the argument but uses it as a list of characters to remove from the original string see: https://docs.python.org/3/library/stdtypes.html#str.strip
EDIT: This doesn't provide a meaningful solution, see accepted answer.
Instead of using strip you should use string.split()
Following piece of code gets you the required substring:
filepath = "data/imagery/256:0:10.0:34:26:-1478/256:0:10.0:34:26:-1478_B02_10m.tif"
print(filepath.split('/')[-1])
Output:
256:0:10.0:34:26:-1478_B02_10m.tif

Is there a way to strip the end of a string until a certain character is reached?

I'm working on a side project for myself and have stumbled on an issue that I'm not sure how to solve for. I have a url, for arguments sake let's say https://stackoverflow.com/xyz/abc. I'm attempting to strip the the end of the url so that I am only left with https://stackoverflow.com/xyz/.
Initially I tried to use the strip function and specify a length/position to remove up to, but realized for other url's I'm working with, it is not the same length. (i.e. URL 1 = /xyz/abc, URL 2 = /xyz/abcd))
Is there any advice for achieving this, I looked into using the regular expression operations in Python, but was unsure how to apply it to this use case. Ideally I would like to write a function that would start from the end of the string and strip away all characters till the first '/' is reached. Any advice would be appreciated.
Thanks
Why not just use rfind, which starts from the end?
>>> string = 'https://stackoverflow.com/xyz/abc'
>>> string = string[:string.rfind('/')+1]
>>> print(string)
'https://stackoverflow.com/xyz/'
And if you don't want the character either (the / in this case), simply remove the +1.
Keep in mind however that this only works if the string actually contains the character you are looking for.
If you want to protect against this, you will have to use the following:
string = 'https://stackoverflow.com/xyz/abc'
idx = string.rfind('/')
if(idx != -1):
string = string[:idx+1]
Unless, obviously, you do want to end up with an empty string in case the character is not found.
Then the first example works just fine.
if yo dont want to use regex, you can combine both the split and join().
lol = 'https://stackoverflow.com/xyz/abc'
splt= lol.split('/')[:-1]
'/'.join(splt)
output
'https://stackoverflow.com/xyz'

In my date time value I want to use regex to strip out the slash and colon from time and replace it with underscore

I am using Python, Webdriver for my automated test. My scenario is on the Admin page of our website I click Add project button and i enter a project name.
Project Name I enter is in the format of LADEMO_IE_05/20/1515:11:38
It is a date and time at the end.
What I would like to do is using a regex I would like to find the / and :
and replace them with an underscore _
I have worked out the regex expression:
[0-9]{2}[/][0-9]{2}[/][0-9]{4}:[0-9]{2}[:][0-9]{2}
This finds 2 digits then / followed by 2 digits then / and so on.
I would like to replace / and : with _.
Can I do this in Python using import re? I need some help with the syntax please.
My method which returns the date is:
def get_datetime_now(self):
dateTime_now = datetime.datetime.now().strftime("%x%X")
print dateTime_now #prints e.g. 05/20/1515:11:38
return dateTime_now
My code snippet for entering the project name into the text field is:
project_name_textfield.send_keys('LADEMO_IE_' + self.get_datetime_now())
The Output is e.g.
LADEMO_IE_05/20/1515:11:38
I would like the Output to be:
LADEMO_IE_05_20_1515_11_38
Just format the datetime using strftime() into the desired format:
>>> datetime.datetime.now().strftime("%m_%d_%y%H_%M_%S")
'05_20_1517_20_16'
Another simple option is just using string replace :
s = "your time string"
s = s.replace("/", "_").replace(":", "_")
Two ways:
i) use strftime with the format:
strftime("%m_%d_%y_%H_%M_%S")
ii) simply use replace() method of strings to replace '/' and ':' to '_'
Basically, you want ton replace every unadvised character by an underscore. To do it, instead of using regex, you could simply use the str.replace method. For example:
out_string = in_string.replace('/', '_').replace(':', '_')
In this example, the first replace returns a string with all the slash replaced, and the second call replace the colons. I think it's the simplest way for replacing one or two characters. But, if you want your program to be able to evolve, I advise you using re.sub, as follows:
# first we compile the regex, for speed sake
# this regex match every one of the bad characters, and it's modular: just add one, in case
bad_characters = re.compile(r'/|:')
# your code
# replacement
out_string = re.sub(bad_characters, '_', in_string)

Python regex to extract substring at start and end of string

I am looking for a regex that will extract everything up to the first . (period) in a string, and everything including and after the last . (period)
For example:
my_file.10.4.5.6.csv
myfile2.56.3.9.txt
Ideally the regex when run against these strings would return:
my_file.csv
myfile2.txt
The numeric stamp in the file will be different each time the script is run, so I am looking essentially to exclude it.
The following prints out the string up to the first . (period)
print re.search("^[^.]*", data_file).group(0)
I am having trouble though getting it to also return the the last period and string after it.
Sorry just to update this based upon feedback and comments below:
This does need to be a regex. The regex will be passed into the program from a configuration file. The user will not have access to the source code as it will be packaged.
The user may need to change the regex based upon some arbitrary criteria, so they will need to update the config file, rather than edit the application and re-build the package.
Thanks
You don’t need a regular expression!
parts = data_file.split(".")
print parts[0] + "." + parts[-1]
Instead of regular expressions, I would suggest using str.split. For example:
>>> data_file = 'my_file.10.4.5.6.csv'
>>> parts = data_file.split('.')
>>> print parts[0] + '.' + parts[-1]
my_file.csv
However if you insist on regular expressions, here is one approach:
>>> print re.sub(r'\..*\.', '.', data_file)
my_file.csv
You don't need a regex.
tokens = expanded_name.split('.')
compressed_name = '.'.join((tokens[0], tokens[-1]))
If you are concerned about performance, you could use a length limit and rsplit() to only chop up the string as much as you need.
compressed_name = expanded_name.split('.', 1)[0] + '.' + expanded_name.rsplit('.', 1)[1]
Do you need a regex here?
>>> address = "my_file.10.4.5.6.csv"
>>> split_by_periods = address.split(".")
>>> "{}.{}".format(address[0], address[-1])
>>> "my_file.csv"

Print raw string from variable? (not getting the answers)

I'm trying to find a way to print a string in raw form from a variable. For instance, if I add an environment variable to Windows for a path, which might look like 'C:\\Windows\Users\alexb\', I know I can do:
print(r'C:\\Windows\Users\alexb\')
But I cant put an r in front of a variable.... for instance:
test = 'C:\\Windows\Users\alexb\'
print(rtest)
Clearly would just try to print rtest.
I also know there's
test = 'C:\\Windows\Users\alexb\'
print(repr(test))
But this returns 'C:\\Windows\\Users\x07lexb'
as does
test = 'C:\\Windows\Users\alexb\'
print(test.encode('string-escape'))
So I'm wondering if there's any elegant way to make a variable holding that path print RAW, still using test? It would be nice if it was just
print(raw(test))
But its not
I had a similar problem and stumbled upon this question, and know thanks to Nick Olson-Harris' answer that the solution lies with changing the string.
Two ways of solving it:
Get the path you want using native python functions, e.g.:
test = os.getcwd() # In case the path in question is your current directory
print(repr(test))
This makes it platform independent and it now works with .encode. If this is an option for you, it's the more elegant solution.
If your string is not a path, define it in a way compatible with python strings, in this case by escaping your backslashes:
test = 'C:\\Windows\\Users\\alexb\\'
print(repr(test))
In general, to make a raw string out of a string variable, I use this:
string = "C:\\Windows\Users\alexb"
raw_string = r"{}".format(string)
output:
'C:\\\\Windows\\Users\\alexb'
You can't turn an existing string "raw". The r prefix on literals is understood by the parser; it tells it to ignore escape sequences in the string. However, once a string literal has been parsed, there's no difference between a raw string and a "regular" one. If you have a string that contains a newline, for instance, there's no way to tell at runtime whether that newline came from the escape sequence \n, from a literal newline in a triple-quoted string (perhaps even a raw one!), from calling chr(10), by reading it from a file, or whatever else you might be able to come up with. The actual string object constructed from any of those methods looks the same.
I know i'm too late for the answer but for people reading this I found a much easier way for doing it
myVariable = 'This string is supposed to be raw \'
print(r'%s' %myVariable)
try this. Based on what type of output you want. sometime you may not need single quote around printed string.
test = "qweqwe\n1212as\t121\\2asas"
print(repr(test)) # output: 'qweqwe\n1212as\t121\\2asas'
print( repr(test).strip("'")) # output: qweqwe\n1212as\t121\\2asas
Get rid of the escape characters before storing or manipulating the raw string:
You could change any backslashes of the path '\' to forward slashes '/' before storing them in a variable. The forward slashes don't need to be escaped:
>>> mypath = os.getcwd().replace('\\','/')
>>> os.path.exists(mypath)
True
>>>
Just simply use r'string'. Hope this will help you as I see you haven't got your expected answer yet:
test = 'C:\\Windows\Users\alexb\'
rawtest = r'%s' %test
I have my variable assigned to big complex pattern string for using with re module and it is concatenated with few other strings and in the end I want to print it then copy and check on regex101.com.
But when I print it in the interactive mode I get double slash - '\\w'
as #Jimmynoarms said:
The Solution for python 3x:
print(r'%s' % your_variable_pattern_str)
Your particular string won't work as typed because of the escape characters at the end \", won't allow it to close on the quotation.
Maybe I'm just wrong on that one because I'm still very new to python so if so please correct me but, changing it slightly to adjust for that, the repr() function will do the job of reproducing any string stored in a variable as a raw string.
You can do it two ways:
>>>print("C:\\Windows\Users\alexb\\")
C:\Windows\Users\alexb\
>>>print(r"C:\\Windows\Users\alexb\\")
C:\\Windows\Users\alexb\\
Store it in a variable:
test = "C:\\Windows\Users\alexb\\"
Use repr():
>>>print(repr(test))
'C:\\Windows\Users\alexb\\'
or string replacement with %r
print("%r" %test)
'C:\\Windows\Users\alexb\\'
The string will be reproduced with single quotes though so you would need to strip those off afterwards.
To turn a variable to raw str, just use
rf"{var}"
r is raw and f is f-str; put them together and boom it works.
Replace back-slash with forward-slash using one of the below:
re.sub(r"\", "/", x)
re.sub(r"\", "/", x)
This does the trick
>>> repr(string)[1:-1]
Here is the proof
>>> repr("\n")[1:-1] == r"\n"
True
And it can be easily extrapolated into a function if need be
>>> raw = lambda string: repr(string)[1:-1]
>>> raw("\n")
'\\n'
i wrote a small function.. but works for me
def conv(strng):
k=strng
k=k.replace('\a','\\a')
k=k.replace('\b','\\b')
k=k.replace('\f','\\f')
k=k.replace('\n','\\n')
k=k.replace('\r','\\r')
k=k.replace('\t','\\t')
k=k.replace('\v','\\v')
return k
Here is a straightforward solution.
address = 'C:\Windows\Users\local'
directory ="r'"+ address +"'"
print(directory)
"r'C:\\Windows\\Users\\local'"

Categories