Split Sentences of a String - python

HI i am trying to split a text
For example
'C:/bye1.txt'
i would want 'C:/bye1.txt' only
'C:/bye1.txt C:/hello1.txt'
i would want C:/hello1.txt only
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
i would want C:/bye3 and so on.
Code i tried, Problem is it only print out Y.
x = "Hello i am boey"
Last = x[len(x)-1]
print Last

Look at this:
>>> x = 'C:/bye1.txt'
>>> x.split()[-1]
'C:/bye1.txt'
>>> y = 'C:/bye1.txt C:/hello1.txt'
>>> y.split()[-1]
'C:/hello1.txt'
>>> z = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> z.split()[-1]
'C:/bye3'
>>>
Basically, you split the strings using str.split and then get the last item (that's what [-1] does).

x = "Hello i am boey".split()
Last = x[len(x)-1]
print Last
Though more Pythonic:
x = "Hello i am boey".split()
print x[-1]

Try:
>>> x = 'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'
>>> x.split()[-1]
'C:/bye3'
>>>

x[len()-1] returns you the last character of the list (in this case a list of characters). It is the same as x[-1]
To get the output you want, do this:
x.split()[-1]
If your sentences are separated by other delimiters, you can specify the delimeter to split like so:
delimeter = ','
item = x.split(delimeter)[-1] # Split list according to delimeter and get last item
In addition, the "lstrip()", "rstrip()", and "strip()" functions might be useful for removing unnecessary characters from the end of your string. Look them up in the python documentations.

Answer:
'C:/bye1.txt C:/hello1.txt C:/bye2 C:/bye3'.split()[-1]
would give you 'C:/bye3'.
Details:
The split method without any parameter assumes space to be the delimiter. In the example above, it returns the following list:
['C:/bye1.txt', 'C:/hello1.txt', 'C:/bye2', 'C:/bye3']
An index of -1 specifies taking the first character in the reversed order (from the back).

Related

python parsing a string

I have a list with strings.
list_of_strings
They look like that:
'/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
I want to part this string into:
/folder1/folder2/folder3/folder4/folder5/exp-* and put this into a new list.
I thought to do something like that, but I am lacking the right snippet to do what I want:
list_of_stringparts = []
for string in sorted(list_of_strings):
part= string.split('/')[7] # or whatever returns the first part of my string
list_of_stringparts.append(part)
has anyone an idea? Do I need a regex?
You are using array subscription which extracts one (eigth) element. To get first seven elements, you need a slicing [N:M:S] like this:
>>> l = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> l.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
In our case N is ommitted (by default 0) and S is step which is by default set to 1, so you'll get elements 0-7 from the result of split.
To construct your string back, use join():
>>> '/'.join(s)
'/folder1/folder2/folder3/folder4/folder5/exp-*'
I would do like this,
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> s.split('/')[:7]
['', 'folder1', 'folder2', 'folder3', 'folder4', 'folder5', 'exp-*']
>>> '/'.join(s.split('/')[:7])
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Using re.match
>>> s = '/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file'
>>> re.match(r'.*?\*', s).group()
'/folder1/folder2/folder3/folder4/folder5/exp-*'
Your example suggests that you want to partition the strings at the first * character. This can be done with str.partition():
list_of_stringparts = []
list_of_strings = ['/folder1/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder1/exp-*/folder2/folder3/folder4/folder5/exp-*/exp-*/otherfolder/file', '/folder/blah/pow']
for s in sorted(list_of_strings):
head, sep, tail = s.partition('*')
list_of_stringparts.append(head + sep)
>>> list_of_stringparts
['/folder/blah/pow', '/folder1/exp-*', '/folder1/folder2/folder3/folder4/folder5/exp-*']
Or this equivalent list comprehension:
list_of_stringparts = [''.join(s.partition('*')[:2]) for s in sorted(list_of_strings)]
This will retain any string that does not contain a * - not sure from your question if that is desired.

How to extract number from end of the string

I have a string like "Titile Something/17". I need to cut out "/NNN" part which can be 3, 2, 1 digit number or may not be present.
How you do this in python? Thanks.
\d{0,3} matches from zero upto three digits. $ asserts that we are at the end of a line.
re.search(r'/\d{0,3}$', st).group()
Example:
>>> re.search(r'/\d{0,3}$', 'Titile Something/17').group()
'/17'
>>> re.search(r'/\d{0,3}$', 'Titile Something/1').group()
'/1'
You don't need RegEx here, simply use the built-in str.rindex function and slicing, like this
>>> data = "Titile Something/17"
>>> data[:data.rindex("/")]
'Titile Something'
>>> data[data.rindex("/") + 1:]
'17'
Or you can use str.rpartition, like this
>>> data.rpartition('/')[0]
'Titile Something'
>>> data.rpartition('/')[2]
'17'
>>>
Note: This will get any string after the last /. Use it with caution.
If you want to make sure that the split string is actually full of numbers, you can use str.isdigit function, like this
>>> data[data.rindex("/") + 1:].isdigit()
True
>>> data.rpartition('/')[2].isdigit()
True
data = "Titile Something/17"
print data.split("/")[0]
'Titile Something'
print data.split("/")[-1] #last part string after separator /
'17'
or
print data.split("/")[1] # next part after separator in this case this is the same
'17'
when You want add this to the list use strip() to remove newline "\n"
print data.split("/")[-1].strip()
'17'
~
I need to cut out "/NNN"
x = "Titile Something/17"
print re.sub(r"/.*$","",x) #cuts the part after /
print re.sub(r"^.*?/","",x) #cuts the part before /
Using re.sub you can what you want.

Strip in Python

I have a question regarding strip() in Python. I am trying to strip a semi-colon from a string, I know how to do this when the semi-colon is at the end of the string, but how would I do it if it is not the last element, but say the second to last element.
eg:
1;2;3;4;\n
I would like to strip that last semi-colon.
Strip the other characters as well.
>>> '1;2;3;4;\n'.strip('\n;')
'1;2;3;4'
>>> "".join("1;2;3;4;\n".rpartition(";")[::2])
'1;2;3;4\n'
how about replace?
string1='1;2;3;4;\n'
string2=string1.replace(";\n","\n")
>>> string = "1;2;3;4;\n"
>>> string.strip().strip(";")
"1;2;3;4"
This will first strip any leading or trailing white space, and then remove any leading or trailing semicolon.
Try this:
def remove_last(string):
index = string.rfind(';')
if index == -1:
# Semi-colon doesn't exist
return string
return string[:index] + string[index+1:]
This should be able to remove the last semicolon of the line, regardless of what characters come after it.
>>> remove_last('Test')
'Test'
>>> remove_last('Test;abc')
'Testabc'
>>> remove_last(';test;abc;foobar;\n')
';test;abc;foobar\n'
>>> remove_last(';asdf;asdf;asdf;asdf')
';asdf;asdf;asdfasdf'
The other answers provided are probably faster since they're tailored to your specific example, but this one is a bit more flexible.
You could split the string with semi colon and then join the non-empty parts back again using ; as separator
parts = '1;2;3;4;\n'.split(';')
non_empty_parts = []
for s in parts:
if s.strip() != "": non_empty_parts.append(s.strip())
print "".join(non_empty_parts, ';')
If you only want to use the strip function this is one method:
Using slice notation, you can limit the strip() function's scope to one part of the string and append the "\n" on at the end:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:8].strip(';') + str[8:]
Using the rfind() method(similar to Micheal0x2a's solution) you can make the statement applicable to many strings:
# create a var for later
str = "1;2;3;4;\n"
# format and assign to newstr
newstr = str[:str.rfind(';') + 1 ].strip(';') + str[str.rfind(';') + 1:]
re.sub(r';(\W*$)', r'\1', '1;2;3;4;\n') -> '1;2;3;4\n'

removing part of a string (up to but not including) in python

I'm trying to strip off part of a string.
e.g. Strip:-
a = xyz-abc
to leave:-
a = -abc
I would usually use lstrip e.g.
a.lstrip('xyz')
but in this case I don't know what xyz is going to be, so I need a way to just strip everything to the left of '-'.
Is it possible to set that option with lstrip or do I have to go about it a different way?
Thanks.
If there's only one - character, this will work:
'xyz-abc'.split('-')[1]
If you want the '-' in there, you have to reattach it:
>>> '-' + 'xyz-abc'.split('-')[1]
'-abc'
There's also count parameter that allows you to split only at the first - character.
>>> '-' + 'xyz-ab-c'.split('-', 1)[1]
'-ab-c'
partition is also potentially useful:
>>> 'xyz-abc'.partition('-')
('xyz', '-', 'abc')
It splits at the first occurrence of the separator:
>>> ''.join('xyz-ab-c'.partition('-')[1:])
'-ab-c'
>>> a = 'xyz-abc'
>>> a.find('-') # return the index of the first instance of '-'
3
>>> a[a.find('-'):] # return the string of everything past that index
'-abc'
You could use a conjunction of .find and splicing.
If there is no guarantee that the text to the left of - doesn't contain dashes of its own, the reversed version of find called rfind is even more useful:
>>> s = "xyv-er-hdgcfh-abc"
>>> print s[s.rfind("-"):]
-abc

Split a string in python

a="aaaa#b:c:"
>>> for i in a.split(":"):
... print i
... if ("#" in i): //i=aaaa#b
... print only b
In the if loop if i=aaaa#b how to get the value after the hash.should we use rsplit to get the value?
The following can replace your if statement.
for i in a.split(':'):
print i.partition('#')[2]
>>> a="aaaa#b:c:"
>>> a.split(":",2)[0].split("#")[-1]
'b'
a = "aaaa#b:c:"
print(a.split(":")[0].split("#")[1])
I'd suggest from: Python Docs
str.rsplit([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter
string. If maxsplit is given, at most maxsplit splits are done, the
rightmost ones. If sep is not specified or None, any whitespace string
is a separator. Except for splitting from the right, rsplit() behaves
like split() which is described in detail below.
so to answer your question yes.
EDIT:
It depends on how you wish to index your strings too, it looks like Rstring does it from the right, so if your data is always "rightmost" you could index by 0 (or 1, not sure how python indexes), every time, rather then having to do a size check of the returned array.
do you really need to use split? split create a list, so isn't so efficient...
what about something like this:
>>> a = "aaaa#b:c:"
>>> a[a.find('#') + 1]
'b'
or if you need particular occurence, use regex instead...
split would do the job nicely. Use rsplit only if you need to split from the last '#'.
a="aaaa#b:c:"
>>> for i in a.split(":"):
... print i
... b = i.split('#',1)
... if len(b)==2:
... print b[1]

Categories