Split a string in python - python

a="aaaa#b:c:"
>>> for i in a.split(":"):
... print i
... if ("#" in i): //i=aaaa#b
... print only b
In the if loop if i=aaaa#b how to get the value after the hash.should we use rsplit to get the value?

The following can replace your if statement.
for i in a.split(':'):
print i.partition('#')[2]

>>> a="aaaa#b:c:"
>>> a.split(":",2)[0].split("#")[-1]
'b'

a = "aaaa#b:c:"
print(a.split(":")[0].split("#")[1])

I'd suggest from: Python Docs
str.rsplit([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter
string. If maxsplit is given, at most maxsplit splits are done, the
rightmost ones. If sep is not specified or None, any whitespace string
is a separator. Except for splitting from the right, rsplit() behaves
like split() which is described in detail below.
so to answer your question yes.
EDIT:
It depends on how you wish to index your strings too, it looks like Rstring does it from the right, so if your data is always "rightmost" you could index by 0 (or 1, not sure how python indexes), every time, rather then having to do a size check of the returned array.

do you really need to use split? split create a list, so isn't so efficient...
what about something like this:
>>> a = "aaaa#b:c:"
>>> a[a.find('#') + 1]
'b'
or if you need particular occurence, use regex instead...

split would do the job nicely. Use rsplit only if you need to split from the last '#'.
a="aaaa#b:c:"
>>> for i in a.split(":"):
... print i
... b = i.split('#',1)
... if len(b)==2:
... print b[1]

Related

During string concatenation, how to add delimiter only if variable is set?

How to add the delimiter only if that variable has a value, in the below code, I am trying to avoid 2 underscores like: foo_bar__baz, a,b,d will be always set, only c is optional, is there a more pythonic way?
>>> a_must='foo'
>>> b_must='bar'
>>> c_optional=''
>>> d_must='baz'
>>>
>>> f'{a_must}_{b_must}_{c_optional}_{d_must}' if c_optional else
f'{a_must}_{b_must}_{d_must}'
'foo_bar_baz'
Its in python3.6
You can write the conditional inside the f-string itself:
f'{a_must}_{b_must}_{c_optional+"_" if c_optional else ""}{d_must}'
Output:
'foo_bar_baz'
To be a little more flexible, something like this would work:
variables = [a_must, b_must, c_optional, d_must]
'_'.join([x for x in variables if x])
You can build a list of tokens and use str.join to join the list into a string with _ as the delimiter:
tokens = [a_must, b_must]
if c_optional:
tokens.append(c_optional)
tokens.append(d_must)
print('_'.join(tokens))
Your solution works fine, it just needed a little formatting. I added the print statement for testing.
a_must='foo'
b_must='bar'
c_optional=''
d_must='baz'
if c_optional:
result = f'{a_must}_{b_must}_{c_optional}_{d_must}'
else:
result = f'{a_must}_{b_must}_{d_must}'
print(result)

How to escape null characters .i.e [' '] while using regex split function? [duplicate]

I have the following file names that exhibit this pattern:
000014_L_20111007T084734-20111008T023142.txt
000014_U_20111007T084734-20111008T023142.txt
...
I want to extract the middle two time stamp parts after the second underscore '_' and before '.txt'. So I used the following Python regex string split:
time_info = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
But this gives me two extra empty strings in the returned list:
time_info=['', '20111007T084734', '20111008T023142', '']
How do I get only the two time stamp information? i.e. I want:
time_info=['20111007T084734', '20111008T023142']
I'm no Python expert but maybe you could just remove the empty strings from your list?
str_list = re.split('^[0-9]+_[LU]_|-|\.txt$', f)
time_info = filter(None, str_list)
Don't use re.split(), use the groups() method of regex Match/SRE_Match objects.
>>> f = '000014_L_20111007T084734-20111008T023142.txt'
>>> time_info = re.search(r'[LU]_(\w+)-(\w+)\.', f).groups()
>>> time_info
('20111007T084734', '20111008T023142')
You can even name the capturing groups and retrieve them in a dict, though you use groupdict() rather than groups() for that. (The regex pattern for such a case would be something like r'[LU]_(?P<groupA>\w+)-(?P<groupB>\w+)\.')
If the timestamps are always after the second _ then you can use str.split and str.strip:
>>> strs = "000014_L_20111007T084734-20111008T023142.txt"
>>> strs.strip(".txt").split("_",2)[-1].split("-")
['20111007T084734', '20111008T023142']
Since this came up on google and for completeness, try using re.findall as an alternative!
This does require a little re-thinking, but it still returns a list of matches like split does. This makes it a nice drop-in replacement for some existing code and gets rid of the unwanted text. Pair it with lookaheads and/or lookbehinds and you get very similar behavior.
Yes, this is a bit of a "you're asking the wrong question" answer and doesn't use re.split(). It does solve the underlying issue- your list of matches suddenly have zero-length strings in it and you don't want that.
>>> f='000014_L_20111007T084734-20111008T023142.txt'
>>> f[10:-4].split('-')
['0111007T084734', '20111008T023142']
or, somewhat more general:
>>> f[f.rfind('_')+1:-4].split('-')
['20111007T084734', '20111008T023142']

Split Sentences of a String

HI i am trying to split a text
For example
'C:/bye1.txt'
i would want 'C:/bye1.txt' only
'C:/bye1.txt C:/hello1.txt'
i would want C:/hello1.txt only
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
i would want C:/bye3 and so on.
Code i tried, Problem is it only print out Y.
x = "Hello i am boey"
Last = x[len(x)-1]
print Last
Look at this:
>>> x = 'C:/bye1.txt'
>>> x.split()[-1]
'C:/bye1.txt'
>>> y = 'C:/bye1.txt C:/hello1.txt'
>>> y.split()[-1]
'C:/hello1.txt'
>>> z = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> z.split()[-1]
'C:/bye3'
>>>
Basically, you split the strings using str.split and then get the last item (that's what [-1] does).
x = "Hello i am boey".split()
Last = x[len(x)-1]
print Last
Though more Pythonic:
x = "Hello i am boey".split()
print x[-1]
Try:
>>> x = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> x.split()[-1]
'C:/bye3'
>>>
x[len()-1] returns you the last character of the list (in this case a list of characters). It is the same as x[-1]
To get the output you want, do this:
x.split()[-1]
If your sentences are separated by other delimiters, you can specify the delimeter to split like so:
delimeter = ','
item = x.split(delimeter)[-1] # Split list according to delimeter and get last item
In addition, the "lstrip()", "rstrip()", and "strip()" functions might be useful for removing unnecessary characters from the end of your string. Look them up in the python documentations.
Answer:
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'.split()[-1]
would give you 'C:/bye3'.
Details:
The split method without any parameter assumes space to be the delimiter. In the example above, it returns the following list:
['C:/bye1.txt', 'C:/hello1.txt', 'C:/bye2', 'C:/bye3']
An index of -1 specifies taking the first character in the reversed order (from the back).

removing part of a string (up to but not including) in python

I'm trying to strip off part of a string.
e.g. Strip:-
a = xyz-abc
to leave:-
a = -abc
I would usually use lstrip e.g.
a.lstrip('xyz')
but in this case I don't know what xyz is going to be, so I need a way to just strip everything to the left of '-'.
Is it possible to set that option with lstrip or do I have to go about it a different way?
Thanks.
If there's only one - character, this will work:
'xyz-abc'.split('-')[1]
If you want the '-' in there, you have to reattach it:
>>> '-' + 'xyz-abc'.split('-')[1]
'-abc'
There's also count parameter that allows you to split only at the first - character.
>>> '-' + 'xyz-ab-c'.split('-', 1)[1]
'-ab-c'
partition is also potentially useful:
>>> 'xyz-abc'.partition('-')
('xyz', '-', 'abc')
It splits at the first occurrence of the separator:
>>> ''.join('xyz-ab-c'.partition('-')[1:])
'-ab-c'
>>> a = 'xyz-abc'
>>> a.find('-') # return the index of the first instance of '-'
3
>>> a[a.find('-'):] # return the string of everything past that index
'-abc'
You could use a conjunction of .find and splicing.
If there is no guarantee that the text to the left of - doesn't contain dashes of its own, the reversed version of find called rfind is even more useful:
>>> s = "xyv-er-hdgcfh-abc"
>>> print s[s.rfind("-"):]
-abc

Regex to Split 1st Colon

I have a time in ISO 8601 ( 2009-11-19T19:55:00 ) which is also paired with a name commence. I'm trying to parse this into two. I'm currently up to here:
import re
sColon = re.compile('[:]')
aString = sColon.split("commence:2009-11-19T19:55:00")
Obviously this returns:
>>> aString
['commence','2009-11-19T19','55','00']
What I'd like it to return is this:
>>>aString
['commence','2009-11-19T19:55:00']
How would I go about do this in the original creation of sColon? Also, do you recommend any Regular Expression links or books that you have found useful, as I can see myself needing it in the future!
EDIT:
To clarify... I'd need a regular expression that would just parse at the very first instance of :, is this possible? The text ( commence ) before the colon can chance, yes...
>>> first, colon, rest = "commence:2009-11-19T19:55:00".partition(':')
>>> print (first, colon, rest)
('commence', ':', '2009-11-19T19:55:00')
You could put maximum split parameter in split function
>>> "commence:2009-11-19T19:55:00".split(":",1)
['commence', '2009-11-19T19:55:00']
Official Docs
S.split([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are removed
from the result.
Looks like you need .IndexOf(":"), then .Substring()?
#OP, don't do the unnecessary. Regex is not needed with what you are doing. Python has very good string manipulation methods that you can use. All you need is split(), and slicing. Those are the very basics of Python.
>>> "commence:2009-11-19T19:55:00".split(":",1)
['commence', '2009-11-19T19:55:00']
>>>

Categories