Suppose that I have the following string:
http://www.domain.com/?s=some&two=20
How can I take off what is after & including the & and have this string:
http://www.domain.com/?s=some
Well, to answer the immediate question:
>>> s = "http://www.domain.com/?s=some&two=20"
The rfind method returns the index of right-most substring:
>>> s.rfind("&")
29
You can take all elements up to a given index with the slicing operator:
>>> "foobar"[:4]
'foob'
Putting the two together:
>>> s[:s.rfind("&")]
'http://www.domain.com/?s=some'
If you are dealing with URLs in particular, you might want to use built-in libraries that deal with URLs. If, for example, you wanted to remove two from the above query string:
First, parse the URL as a whole:
>>> import urlparse, urllib
>>> parse_result = urlparse.urlsplit("http://www.domain.com/?s=some&two=20")
>>> parse_result
SplitResult(scheme='http', netloc='www.domain.com', path='/', query='s=some&two=20', fragment='')
Take out just the query string:
>>> query_s = parse_result.query
>>> query_s
's=some&two=20'
Turn it into a dict:
>>> query_d = urlparse.parse_qs(parse_result.query)
>>> query_d
{'s': ['some'], 'two': ['20']}
>>> query_d['s']
['some']
>>> query_d['two']
['20']
Remove the 'two' key from the dict:
>>> del query_d['two']
>>> query_d
{'s': ['some']}
Put it back into a query string:
>>> new_query_s = urllib.urlencode(query_d, True)
>>> new_query_s
's=some'
And now stitch the URL back together:
>>> result = urlparse.urlunsplit((
parse_result.scheme, parse_result.netloc,
parse_result.path, new_query_s, parse_result.fragment))
>>> result
'http://www.domain.com/?s=some'
The benefit of this is that you have more control over the URL. Like, if you always wanted to remove the two argument, even if it was put earlier in the query string ("two=20&s=some"), this would still do the right thing. It might be overkill depending on what you want to do.
You need to split the string:
>>> s = 'http://www.domain.com/?s=some&two=20'
>>> s.split('&')
['http://www.domain.com/?s=some', 'two=20']
That will return a list as you can see so you can do:
>>> s2 = s.split('&')[0]
>>> print s2
http://www.domain.com/?s=some
string = 'http://www.domain.com/?s=some&two=20'
cut_string = string.split('&')
new_string = cut_string[0]
print(new_string)
You can use find()
>>> s = 'http://www.domain.com/?s=some&two=20'
>>> s[:s.find('&')]
'http://www.domain.com/?s=some'
Of course, if there is a chance that the searched for text will not be present then you need to write more lengthy code:
pos = s.find('&')
if pos != -1:
s = s[:pos]
Whilst you can make some progress using code like this, more complex situations demand a true URL parser.
>>str = "http://www.domain.com/?s=some&two=20"
>>str.split("&")
>>["http://www.domain.com/?s=some", "two=20"]
s[0:"s".index("&")]
what does this do:
take a slice from the string starting at index 0, up to, but not including the index of &in the string.
Related
So, I have this URL: https://www.last.fm/music/Limp+Bizkit/Significant+Other
I want to split it, to only keep the Limp+Bizkit and Significant+Other part of the URL. These are variables, and can be different each time. These are needed to create a new URL (which I know how to do).
I want the Limp+Bizkit and Significant+Other to be two different variables. How do I do this?
You can use the str.split method and use the forward slash as the separator.
>>> url = "https://www.last.fm/music/Limp+Bizkit/Significant+Other"
>>> *_, a, b = url.split("/")
>>> a
'Limp+Bizkit'
>>> b
'Significant+Other'
You can replace https://www.last.fm/music/ in the URL to just get Limp+Bizkit/Significant+Other. Then you can split it in half at the / character to break it into two strings. Then the URL will be a list and you can access the indices with url[0] and url[1]:
>>> url = "https://www.last.fm/music/Limp+Bizkit/Significant+Other"
>>> url = url.replace("https://www.last.fm/music/",'').split('/')
>>> first_value = url[0]
>>> second_value = url[1]
>>> first_value
'Limp+Bizkit'
>>> second_value
'Significant+Other'
You can use regular expressions to achieve this.
import regex as re
url = "https://www.last.fm/music/Limp+Bizkit/Significant+Other"
match = re.match("^.*\/\/.*\/.*\/(.*)\/(.*)", url)
print(match.group(1))
print(match.group(2))
I want to remove [' from start and '] characters from the end of a string.
This is my text:
"['45453656565']"
I need to have this text:
"45453656565"
I've tried to use str.replace
text = text.replace("['","");
but it does not work.
You need to strip your text by passing the unwanted characters to str.strip() method:
>>> s = "['45453656565']"
>>>
>>> s.strip("[']")
'45453656565'
Or if you want to convert it to integer you can simply pass the striped result to int function:
>>> try:
... val = int(s.strip("[']"))
... except ValueError:
... print("Invalid string")
...
>>> val
45453656565
Using re.sub:
>>> my_str = "['45453656565']"
>>> import re
>>> re.sub("['\]\[]","",my_str)
'45453656565'
You could loop over the character filtering if the element is a digit:
>>> number_array = "['34325235235']"
>>> int(''.join(c for c in number_array if c.isdigit()))
34325235235
This solution works even for both "['34325235235']" and '["34325235235"]' and whatever other combination of number and characters.
You also can import a package and use a regular expresion to get it:
>>> import re
>>> theString = "['34325235235']"
>>> int(re.sub(r'\D', '', theString)) # Optionally parse to int
Instead of hacking your data by stripping brackets, you should edit the script that created it to print out just the numbers. E.g., instead of lazily doing
output.write(str(mylist))
you can write
for elt in mylist:
output.write(elt + "\n")
Then when you read your data back in, it'll contain the numbers (as strings) without any quotes, commas or brackets.
I have a list with strings.
list_of_strings
They look like that:
'/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
I want to part this string into:
/folder1/folder2/folder3/folder4/folder5/exp-* and put this into a new list.
I thought to do something like that, but I am lacking the right snippet to do what I want:
list_of_stringparts = []
for string in sorted(list_of_strings):
part= string.split('/')[7] # or whatever returns the first part of my string
list_of_stringparts.append(part)
has anyone an idea? Do I need a regex?
You are using array subscription which extracts one (eigth) element. To get first seven elements, you need a slicing [N:M:S] like this:
>>> l = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> l.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
In our case N is ommitted (by default 0) and S is step which is by default set to 1, so you'll get elements 0-7 from the result of split.
To construct your string back, use join():
>>> '/'.join(s)
'/folder1/folder2/folder3/folder4/folder5/exp-*'
I would do like this,
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> s.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
>>> '/'.join(s.split('/')[:7])
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Using re.match
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> re.match(r'.*?\*', s).group()
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Your example suggests that you want to partition the strings at the first * character. This can be done with str.partition():
list_of_stringparts = []
list_of_strings = ['/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder1/exp-*/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder/blah/pow']
for s in sorted(list_of_strings):
head, sep, tail = s.partition('*')
list_of_stringparts.append(head + sep)
>>> list_of_stringparts
['/folder/blah/pow', '/folder1/exp-*', '/folder1/folder2/folder3/folder4/folder5/exp-*']
Or this equivalent list comprehension:
list_of_stringparts = [''.join(s.partition('*')[:2]) for s in sorted(list_of_strings)]
This will retain any string that does not contain a * - not sure from your question if that is desired.
svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz
from the following string I need to fetch Rev1233. So i was wondering if we can have any regexpression to do that. I like to do following string.search ("Rev" uptill next /)
so far I split this using split array
s1,s2,s3,s4,s5 = string ("/",4)
You don't need a regex to do this. It is as simple as:
str = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
str.split('/')[-2]
Here is a quick python example
>>> impot re
>>> s = 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz'
>>> p = re.compile('.*/(Rev\d+)/.*')
>>> p.match(s).groups()[0]
'Rev1223'
Find second part from the end using regex, if preferred:
/(Rev\d+)/[^/]+$
http://regex101.com/r/cC6fO3/1
>>> import re
>>> m = re.search('/(Rev\d+)/[^/]+$', 'svn-backup-test,2014/09/24/18/Rev1223/FullSvnCheckout.tgz')
>>> m.groups()[0]
'Rev1223'
I have a regex match object in Python. I want to get the text it matched. Say if the pattern is '1.3', and the search string is 'abc123xyz', I want to get '123'. How can I do that?
I know I can use match.string[match.start():match.end()], but I find that to be quite cumbersome (and in some cases wasteful) for such a basic query.
Is there a simpler way?
You can simply use the match object's group function, like:
match = re.search(r"1.3", "abc123xyz")
if match:
doSomethingWith(match.group(0))
to get the entire match. EDIT: as thg435 points out, you can also omit the 0 and just call match.group().
Addtional note: if your pattern contains parentheses, you can even get these submatches, by passing 1, 2 and so on to group().
You need to put the regex inside "()" to be able to get that part
>>> var = 'abc123xyz'
>>> exp = re.compile(".*(1.3).*")
>>> exp.match(var)
<_sre.SRE_Match object at 0x691738>
>>> exp.match(var).groups()
('123',)
>>> exp.match(var).group(0)
'abc123xyz'
>>> exp.match(var).group(1)
'123'
or else it will not return anything:
>>> var = 'abc123xyz'
>>> exp = re.compile("1.3")
>>> print exp.match(var)
None