I am trying to slice URLs from the last symbol "/".
For example, I have an URL http://google.com/images/54152352.
Now I need the part of that image which is 54152352.
I understand that I could simply slice it with slicing from a certain character, but I have a list of URLs and each of them is different.
Other examples of URLs:
https://google.uk/images/kfakp3ok2 #I would need kfakp3ok2
bing.com/img/3525236236 #I would need 3525236236
wwww.google.com/img/1osdkg23 #I would need 1osdkg23
Is there a way to slice the characters from the last character "/" in a string in Python3? Each part from a different URL has a different length.
All the help will be appreciated.
target=url.split("/")[-1]
split methode returns a list of words separated by the separator specified in the argument
and [-1] is for the last element of that list
You can use the rsplit() functionality.
Syntax:
string.rsplit(separator, maxsplit)
Reference
https://www.w3schools.com/python/ref_string_rsplit.asp
rsplit() splits the string from the right using the delimiter/separator and using maxsplit you can split only once with some performance benefit as compared to split() as you dont need to split more than once.
>>>> url='https://google.uk/images/kfakp3ok2'
>>>>
>>>> url.rsplit('/', 1)[-1]
'kfakp3ok2'
>>>>
Related
I have a list of strings, all of which have a common property, they all go like this "pp:actual_string". I do not know for sure what the substring "pp:" will be, basically : acts as a delimiter; everything before : shouldn't be included in the result.
I have solved the problem using the brute force approach, but I would like to see a clever method, maybe something like regex.
Note : Some strings might not have this "pp:string" format, and could be already a perfect string, i.e. without the delimiter.
This is my current solution:
ll = ["pp17:gaurav","pp17:sauarv","pp17:there","pp17:someone"]
res=[]
for i in ll:
g=""
for j in range(len(i)):
if i[j] == ':':
index=j+1
res.append(i[index:len(i)])
print(res)
Is there a way that I can do it without creating an extra list ?
Whilst regex is an incredibly powerful tool with a lot of capabilities, using a "clever method" is not necessarily the best idea you are unfamiliar with its principles.
Your problem is one that can be solved without regex by splitting on the : character using the str.split() method, and just returning the last part by using the [-1] index value to represent the last (or only) string that results from the split. This will work even if there isn't a :.
list_with_prefixes = ["pp:actual_string", "perfect_string", "frog:actual_string"]
cleaned_list = [x.split(':')[-1] for x in list_with_prefixes]
print(cleaned_list)
This is a list comprehension that takes each of the strings in turn (x), splits the string on the : character, this returns a list containing the prefix (if it exists) and the suffix, and builds a new list with only the suffix (i.e. item [-1] in the list that results from the split. In this example, it returns:
['actual_string', 'perfect_string', 'actual_string']
Here are a few options, based upon different assumptions.
Most explicit
if s.startswith('pp:'):
s = s[len('pp:'):] # aka 3
If you want to remove anything before the first :
s = s.split(':', 1)[-1]
Regular expressions:
Same as startswith
s = re.sub('^pp:', '', s)
Same as split, but more careful with 'pp:' and slower
s = re.match('(?:^pp:)?(.*)', s).group(1)
Consider the following example
a= 'Apple'
b = a.split(',')
print(b)
Output is ['Apple'].
I am not getting why is it returning a list even when there is no ',' character in Apple
There might be case when we use split method we are expecting more than one element in list but since we are splitting based on separator not present in string, there will be only one element, wouldn't it be better if this mistake is caught during this split method itself
The behaviour of a.split(',') when no commas are present in a is perfectly consistent with the way it behaves when there are a positive number of commas in a.
a.split(',') says to split string a into a list of substrings that are delimited by ',' in a; the delimiter is not preserved in the substrings.
If 1 comma is found you get 2 substrings in the list, if 2 commas are found you get 3 substrings in the list, and in general, if n commas are found you get n+1 substrings in the list. So if 0 commas are found you get 1 substring in the list.
If you want 0 substrings in the list, then you'll need to supply a string with -1 commas in it. Good luck with that. :)
The docstring of that method says:
Return a list of the words in the string S, using sep as the delimiter string.
The delimiter is used to separate multiple parts of the string; having only one part is not an error.
That's the way split() function works. If you do not want that behaviour, you can implement your my_split() function as follows:
def my_split(s, d=' '):
return s.split(d) if d in s else s
Is there a way to manipulate a string in Python using the following ways?
For any string that is stored in dot notation, for example:
s = "classes.students.grades"
Is there a way to change the string to the following:
"classes.students"
Basically, remove everything up to and including the last period. So "restaurants.spanish.food.salty" would become "restaurants.spanish.food".
Additionally, is there any way to identify what comes after the last period? The reason I want to do this is I want to use isDigit().
So, if it was classes.students.grades.0 could I grab the 0 somehow, so I could use an if statement with isdigit, and say if the part of the string after the last period (so 0 in this case) is a digit, remove it, otherwise, leave it.
you can use split and join together:
s = "classes.students.grades"
print '.'.join(s.split('.')[:-1])
You are splitting the string on . - it'll give you a list of strings, after that you are joining the list elements back to string separating them by .
[:-1] will pick all the elements from the list but the last one
To check what comes after the last .:
s.split('.')[-1]
Another way is to use rsplit. It works the same way as split but if you provide maxsplit parameter it'll split the string starting from the end:
rest, last = s.rsplit('.', 1)
'classes.students'
'grades'
You can also use re.sub to substitute the part after the last . with an empty string:
re.sub('\.[^.]+$', '', s)
And the last part of your question to wrap words in [] i would recommend to use format and list comprehension:
''.join("[{}]".format(e) for e in s.split('.'))
It'll give you the desired output:
[classes][students][grades]
The best way to do this is using the rsplit method and pass in the maxsplit argument.
>>> s = "classes.students.grades"
>>> before, after = s.rsplit('.', maxsplit=1) # rsplit('.', 1) in Python 2.x onwards
>>> before
'classes.students'
>>> after
'grades'
You can also use the rfind() method with normal slice operation.
To get everything before last .:
>>> s = "classes.students.grades"
>>> last_index = s.rfind('.')
>>> s[:last_index]
'classes.students'
Then everything after last .
>>> s[last_index + 1:]
'grades'
if '.' in s, s.rpartition('.') finds last dot in s,
and returns (before_last_dot, dot, after_last_dot):
s = "classes.students.grades"
s.rpartition('.')[0]
If your goal is to get rid of a final component that's just a single digit, start and end with re.sub():
s = re.sub(r"\.\d$", "", s)
This will do the job, and leave other strings alone. No need to mess with anything else.
If you do want to know about the general case (separate out the last component, no matter what it is), then use rsplit to split your string once:
>>> "hel.lo.there".rsplit(".", 1)
['hel.lo', 'there']
If there's no dot in the string you'll just get one element in your array, the entire string.
You can do it very simply with rsplit (str.rsplit([sep[, maxsplit]]) , which will return a list by breaking each element along the given separator.
You can also specify how many splits should be performed:
>>> s = "res.spa.f.sal.786423"
>>> s.rsplit('.',1)
['res.spa.f.sal', '786423']
So the final function that you describe is:
def dimimak_cool_function(s):
if '.' not in s: return s
start, end = s.rsplit('.', 1)
return start if end.isdigit() else s
>>> dimimak_cool_function("res.spa.f.sal.786423")
'res.spa.f.sal'
>>> dimimak_cool_function("res.spa.f.sal")
'res.spa.f.sal'
I want to do the following split:
input: 0x0000007c9226fc output: 7c9226fc
input: 0x000000007c90e8ab output: 7c90e8ab
input: 0x000000007c9220fc output: 7c9220fc
I use the following line of code to do this but it does not work!
split = element.rpartition('0')
I got these outputs which are wrong!
input: 0x000000007c90e8ab output: e8ab
input: 0x000000007c9220fc output: fc
what is the fastest way to do this kind of split?
The only idea for me right now is to make a loop and perform checking but it is a little time consuming.
I should mention that the number of zeros in input is not fixed.
Each string can be converted to an integer using int() with a base of 16. Then convert back to a string.
for s in '0x000000007c9226fc', '0x000000007c90e8ab', '0x000000007c9220fc':
print '%x' % int(s, 16)
Output
7c9226fc
7c90e8ab
7c9220fc
input[2:].lstrip('0')
That should do it. The [2:] skips over the leading 0x (which I assume is always there), then the lstrip('0') removes all the zeros from the left side.
In fact, we can use lstrip ability to remove more than one leading character to simplify:
input.lstrip('x0')
format is handy for this:
>>> print '{:x}'.format(0x000000007c90e8ab)
7c90e8ab
>>> print '{:x}'.format(0x000000007c9220fc)
7c9220fc
In this particular case you can just do
your_input[10:]
You'll most likely want to properly parse this; your idea of splitting on separation of non-zero does not seem safe at all.
Seems to be the XY problem.
If the number of characters in a string is constant then you can use
the following code.
input = "0x000000007c9226fc"
output = input[10:]
Documentation
Also, since you are using rpartitionwhich is defined as
str.rpartition(sep)
Split the string at the last occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself.
Since your input can have multiple 0's, and rpartition only splits the last occurrence this a malfunction in your code.
Regular expression for 0x00000 or its type is (0x[0]+) and than replace it with space.
import re
st="0x000007c922433434000fc"
reg='(0x[0]+)'
rep=re.sub(reg, '',st)
print rep
I have a list of strings where all of the strings roughly follow the format 'foo\tbar\tfoo\n' in that there are three segments of variable length that are separated by two tabs (\t) and with a newline indicator at the end (\n).
I want to remove everything except for the text before the first \, so that it would return as 'foo'. Given that the first segment is of variable length, I'm not sure how I can do that.
Use str.split():
>>> string = 'foo\tbar\tfoo\n'
>>> string.split('\t', 1)[0]
'foo'
This splits the string by the first occurrence of the '\t' tab character, which returns a list with two elements. The [0] selects the first element in the list, which is the part of the string before the first '\t' occurrence.
Just search for the first \t character, and get everything before it. Slicing makes this easy.
newstr = oldstr[:oldstr.find("\t")]
Try with:
t = 'foo\tbar\tfoo\n'
t[:t.index("\t")]