I can't use properly the replace command in python - python

I'm new on the platform:
I have a doubt about the use of the replace command.
Here is my code and what I need to do:
The thing is that I need to replace a substring with a string sequence.
For example:
from: Hitacoworld to Hiworld.
The thing is that I part from a dictionary with the string and substrings and I don't know how to get the string without the substring part. I use the command replace this way:
ntdna = ''
ntdna += string.replace(seqs[element],'',)
Where 'string' is my string and 'seqs[element]' are the substrings that I want to remove from the string, but the problem is that when I see the outcome I'm not getting the right string, so I think the problem is in the use of the replace, any hint or something I could use? Ty in advance, I'm lost.
enter image description here

Related

Python regex. Get the last word from a sequence

I have a line like this:
jsdata="l7Bhpb;_;CJWKh4 cECq7c;_;CJWKiA" data-ved="2ahUKEwjxq7L29Yr7AhWM7qQKHRABDVEQ2esEegQIGxAE">
I need to get the word CJWKiA.
But I don't understand how to write it in the regex language.
My failed attempt:
jsdata=\".+?;.+?\"
This returns the entire string, including the word I need :(
I don't understand how to get only CJWKiA word, I need something pattern like this:
jsdata=\"l7Bhpb;_;CJWKh4 cECq7c;_;(CJWKiA)\"
There may be different words, I only need to get the last one
/jsdata="[^"]*;([^;"]*)"/gm
You can't have double quotes in the attribute.

Python String .strip() function returning wrong output

I have the following string
'file path = data/imagery/256:0:10.0:34:26:-1478/256:0:10.0:34:26:-1478_B02_10m.tif'
I am trying to get 256:0:10.0:34:26:-1478_B02_10m.tif from the string above
but if I run
os.path.splitext(filepath.strip('data/imagery/256:0:10.0:34:26:-1478'))[0]
It outputs '_B02_10m'
Same with filepath.rstrip('data/imagery/256:0:10.0:34:26:-1478')
Assuming you want all the string data after the / you can always use string.split. This spits your string into a list of strings split on the split string. Then you would only need the final item of this list.
string_var.split("/")[:-1]
See more official python docs on string.split here.
Python's strip doesn't strip the string in the argument but uses it as a list of characters to remove from the original string see: https://docs.python.org/3/library/stdtypes.html#str.strip
EDIT: This doesn't provide a meaningful solution, see accepted answer.
Instead of using strip you should use string.split()
Following piece of code gets you the required substring:
filepath = "data/imagery/256:0:10.0:34:26:-1478/256:0:10.0:34:26:-1478_B02_10m.tif"
print(filepath.split('/')[-1])
Output:
256:0:10.0:34:26:-1478_B02_10m.tif

Regex to remove strings from list that do not match given prefix

I have a string that includes multiple comma-separated lists of values, always embedded between <mks:Field name="MyField"> and </mks:Field>.
For example:
<mks:Field name="MyField">X001_ABC</mks:Field><mks:Field name="AnotherField">X002_XYZ</mks:Field><mks:Field name="MyField"></mks:Field><mks:Field name="MyField">X000_Test1,X000_Test2</mks:Field><mks:Field name="MyField">X001_ABC,X000_Test1</mks:Field><mks:Field name="MyField">X000_Test1,X000_Test2,X002_XYZ</mks:Field>
In this example I have the following values to work with:
X001_ABC
(empty)
X000_Test1,X000_Test2
X001_ABC,X000_Test1
X000_Test1,X000_Test2,X002_XYZ
Now I want to remove all the values that do not start with the prefix ""X000_", including any needless commas, so that my result looks like this:
<mks:Field name="MyField"></mks:Field><mks:Field name="AnotherField">X002_XYZ</mks:Field><mks:Field name="MyField"></mks:Field><mks:Field name="MyField">X000_Test1,X000_Test2</mks:Field><mks:Field name="MyField">X000_Test1</mks:Field><mks:Field name="MyField">X000_Test1,X000_Test2</mks:Field>
I have tried the following regex, but it does not work properly if only one value exists not matching my regex and I do not want to change my regex if a new value matching my prefix is introduced (e.g. X000_Test3).
Search: (?<=name="MyField">)[^<>](?:.*?(X000_Test1,X000_Test2|X000_Test1|X000_Test2))?.*?(?=</mks:Field>)
Replace: \1
This gives me the following result that does not match the expected output:
<mks:Field name="MyField">X000_Test1,X000_Test2</mks:Field><mks:Field name="MyField">X000_Test1</mks:Field><mks:Field name="MyField">X000_Test2</mks:Field>
Unfortunately I cannot simply parse the string with something else - I only have the option of a regex search/replace in this case.
Thank you in advance, any help would be appreciated.
If you are using Javascript use this:
prefix='X000';
let pattern= new RegExp(`((?<=>)|,)((?!${prefix}|[>\<,]).)*(,|(?=\<))`, 'g');
For any other language use this:
'/((?<=>)|,)((?!X000|[>\<,]).)*(,|(?=\<))/';
X000 being the prefix you want to keep

How to get everything after string x in python

I have a string:
s3://tester/test.pdf
I want to exclude s3://tester/ so even if i have s3://tester/folder/anotherone/test.pdf I am getting the entire path after s3://tester/
I have attempted to use the split & partition method but I can't seem to get it.
Currently am trying:
string.partition('/')[3]
But i get an error saying that it out of index.
EDIT: I should have specified that the name of the bucket will not always be the same so I want to make sure that it is only grabbing anything after the 3rd '/'.
You can use str.split():
path = 's3://tester/test.pdf'
print(path.split('/', 3)[-1])
Output:
test.pdf
UPDATE: With regex:
import re
path = 's3://tester/test.pdf'
print(re.split('/',path,3)[-1])
Output:
test.pdf
Have you tried .replace?
You could do:
string = "s3://tester/test.pdf"
string = string.replace("s3://tester/", "")
print(string)
This will replace "s3://tester/" with the empty string ""
Alternatively, you could use .split rather than .partition
You could also try:
string = "s3://tester/test.pdf"
string = "/".join(string.split("/")[3:])
print(string)
To answer "How to get everything after x amount of characters in python"
string[x:]
PLEASE SEE UPDATE
ORIGINAL
Using the builtin re module.
p = re.search(r'(?<=s3:\/\/tester\/).+', s).group()
The pattern uses a lookbehind to skip over the part you wish to ignore and matches any and all characters following it until the entire string is consumed, returning the matched group to the p variable for further processing.
This code will work for any length path following the explicit s3://tester/ schema you provided in your question.
UPDATE
Just saw updates duh.
Got the wrong end of the stick on this one, my bad.
Below re method should work no matter S3 variable, returning all after third / in string.
p = ''.join(re.findall(r'\/[^\/]+', s)[1:])[1:]

Compare & manipulate strings with python

I've written an XML parser in Python and have just added functionality to read a further script from a different directory.
I've got two args, first is the path where I'm parsing XML. Second is a string in another XML file which I want to match with the first path;
arg1 = \work\parser\main\tools\app\shared\xml\calculators\2012\example\calculator
path = calculators/2012/example/calculator
How can I compare the two strings to match identify that they're both referencing the same thing and also, how can I strip calculator from either string so I can store that & use it?
edit
Just had a thought. I have used a Regex to get the year out of the path already with year = re.findall(r"\.(\d{4})\.", path) following a problem Python has with numbers when converting the path to an import statement.
I could obviously split the strings and use a regex to match the path as a pattern in arg1 but this seems a long way round. Surely there's a better method?
Here I am assuming you are actually talking about strings, and not file paths - for which #mgilson's suggestion is better
How can I compare the two strings to match identify that they're both
referencing the same thing
Well first you need to identify what you mean by "the same thing"
At first glance it seems that if the the second string ends with the first string with the reversed slash, you have a match.
arg1 = r'\work\parser\main\tools\app\shared\xml\calculators\2012\example\calculator'
arg2 = r'calculators/2012/example/calculator'
>>> arg1.endswith(arg2.replace('/','\\'))
True
and also, how can I strip calculator from
either string so I can store that & use it?
You also need to decide if you want to strip the first calculator, the last calculator or any occurance of calculator in the string.
If you just want to remove the last string after the separator, then its simply:
>>> arg2.split('/')[-1]
'calculator'
Now to get the orignal string back, without the last bit:
>>> '/'.join(arg2.split('/')[:-1])
'calculators/2012/example'
check out os.path.samefile:
http://docs.python.org/library/os.path.html#os.path.samefile
and os.path.dirname:
http://docs.python.org/library/os.path.html#os.path.dirname
or maybe os.path.basename (I'm not sure what part of the string you want to keep).
Here, try this:
arg1 = "\work\parser\main\tools\app\shared\xml\calculators\2012\example\calculator"
path = "calculators/2012/example/calculator"
arg1=arg1.replace("/","\\")
path=path.replace("/","\\")
if str(arg1).endswith(str(path)) or str(path).endswith(str(arg1)):
print "Match"
That should work for your needs. Cheers :)

Categories