How can I split and parse a string in Python? [duplicate]

How can I split and parse a string in Python? [duplicate] - python

This question already has answers here:
Splitting on first occurrence
(5 answers)
Split a string by a delimiter in python
(5 answers)
Closed last month.
I am trying to split this string in python: 2.7.0_bf4fda703454
I want to split that string on the underscore _ so that I can use the value on the left side.

"2.7.0_bf4fda703454".split("_") gives a list of strings:
In [1]: "2.7.0_bf4fda703454".split("_")
Out[1]: ['2.7.0', 'bf4fda703454']
This splits the string at every underscore. If you want it to stop after the first split, use "2.7.0_bf4fda703454".split("_", 1).
If you know for a fact that the string contains an underscore, you can even unpack the LHS and RHS into separate variables:
In [8]: lhs, rhs = "2.7.0_bf4fda703454".split("_", 1)
In [9]: lhs
Out[9]: '2.7.0'
In [10]: rhs
Out[10]: 'bf4fda703454'
An alternative is to use partition(). The usage is similar to the last example, except that it returns three components instead of two. The principal advantage is that this method doesn't fail if the string doesn't contain the separator.

Python string parsing walkthrough
Split a string on space, get a list, show its type, print it out:
el#apollo:~/foo$ python
>>> mystring = "What does the fox say?"
>>> mylist = mystring.split(" ")
>>> print type(mylist)
<type 'list'>
>>> print mylist
['What', 'does', 'the', 'fox', 'say?']
If you have two delimiters next to each other, empty string is assumed:
el#apollo:~/foo$ python
>>> mystring = "its so fluffy im gonna DIE!!!"
>>> print mystring.split(" ")
['its', '', 'so', '', '', 'fluffy', '', '', 'im', 'gonna', '', '', '', 'DIE!!!']
Split a string on underscore and grab the 5th item in the list:
el#apollo:~/foo$ python
>>> mystring = "Time_to_fire_up_Kowalski's_Nuclear_reactor."
>>> mystring.split("_")[4]
"Kowalski's"
Collapse multiple spaces into one
el#apollo:~/foo$ python
>>> mystring = 'collapse these spaces'
>>> mycollapsedstring = ' '.join(mystring.split())
>>> print mycollapsedstring.split(' ')
['collapse', 'these', 'spaces']
When you pass no parameter to Python's split method, the documentation states: "runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace".
Hold onto your hats boys, parse on a regular expression:
el#apollo:~/foo$ python
>>> mystring = 'zzzzzzabczzzzzzdefzzzzzzzzzghizzzzzzzzzzzz'
>>> import re
>>> mylist = re.split("[a-m]+", mystring)
>>> print mylist
['zzzzzz', 'zzzzzz', 'zzzzzzzzz', 'zzzzzzzzzzzz']
The regular expression "[a-m]+" means the lowercase letters a through m that occur one or more times are matched as a delimiter. re is a library to be imported.
Or if you want to chomp the items one at a time:
el#apollo:~/foo$ python
>>> mystring = "theres coffee in that nebula"
>>> mytuple = mystring.partition(" ")
>>> print type(mytuple)
<type 'tuple'>
>>> print mytuple
('theres', ' ', 'coffee in that nebula')
>>> print mytuple[0]
theres
>>> print mytuple[2]
coffee in that nebula

If it's always going to be an even LHS/RHS split, you can also use the partition method that's built into strings. It returns a 3-tuple as (LHS, separator, RHS) if the separator is found, and (original_string, '', '') if the separator wasn't present:
>>> "2.7.0_bf4fda703454".partition('_')
('2.7.0', '_', 'bf4fda703454')
>>> "shazam".partition("_")
('shazam', '', '')

Related

Python proper way to split multiple delimiters [duplicate]

This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Closed 8 years ago.
I found some answers online, but I have no experience with regular expressions, which I believe is what is needed here.
I have a string that needs to be split by either a ';' or ', '
That is, it has to be either a semicolon or a comma followed by a space. Individual commas without trailing spaces should be left untouched
Example string:
"b-staged divinylsiloxane-bis-benzocyclobutene [124221-30-3], mesitylene [000108-67-8]; polymerized 1,2-dihydro-2,2,4- trimethyl quinoline [026780-96-1]"
should be split into a list containing the following:
('b-staged divinylsiloxane-bis-benzocyclobutene [124221-30-3]' , 'mesitylene [000108-67-8]', 'polymerized 1,2-dihydro-2,2,4- trimethyl quinoline [026780-96-1]')

Luckily, Python has this built-in :)
import re
re.split('; |, ', string_to_split)
Update:Following your comment:
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']

Do a str.replace('; ', ', ') and then a str.split(', ')

Here's a safe way for any iterable of delimiters, using regular expressions:
>>> import re
>>> delimiters = "a", "...", "(c)"
>>> example = "stackoverflow (c) is awesome... isn't it?"
>>> regex_pattern = '|'.join(map(re.escape, delimiters))
>>> regex_pattern
'a|\\.\\.\\.|\\(c\\)'
>>> re.split(regex_pattern, example)
['st', 'ckoverflow ', ' is ', 'wesome', " isn't it?"]
re.escape allows to build the pattern automatically and have the delimiters escaped nicely.
Here's this solution as a function for your copy-pasting pleasure:
def split(delimiters, string, maxsplit=0):
import re
regex_pattern = '|'.join(map(re.escape, delimiters))
return re.split(regex_pattern, string, maxsplit)
If you're going to split often using the same delimiters, compile your regular expression beforehand like described and use RegexObject.split.
If you'd like to leave the original delimiters in the string, you can change the regex to use a lookbehind assertion instead:
>>> import re
>>> delimiters = "a", "...", "(c)"
>>> example = "stackoverflow (c) is awesome... isn't it?"
>>> regex_pattern = '|'.join('(?<={})'.format(re.escape(delim)) for delim in delimiters)
>>> regex_pattern
'(?<=a)|(?<=\\.\\.\\.)|(?<=\\(c\\))'
>>> re.split(regex_pattern, example)
['sta', 'ckoverflow (c)', ' is a', 'wesome...', " isn't it?"]
(replace ?<= with ?= to attach the delimiters to the righthand side, instead of left)

In response to Jonathan's answer above, this only seems to work for certain delimiters. For example:
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
>>> b='1999-05-03 10:37:00'
>>> re.split('- :', b)
['1999-05-03 10:37:00']
By putting the delimiters in square brackets it seems to work more effectively.
>>> re.split('[- :]', b)
['1999', '05', '03', '10', '37', '00']

This is how the regex look like:
import re
# "semicolon or (a comma followed by a space)"
pattern = re.compile(r";|, ")
# "(semicolon or a comma) followed by a space"
pattern = re.compile(r"[;,] ")
print pattern.split(text)

How to parse string to list without eval or ast.literal_eval

Is there any way to convert a list containing unicode strings to a proper list without using eval() or ast.literal_eval() in Python?
For example:
"[u'hello', u'hi']"
to
['hello', 'hi']

Could this be what you are looking for?
a = "[u'hello', u'hi']".translate(None, "[]u'' ")
a = a.split(',')
print(a) #['hello', 'hi']
Seems to fail when you have 'u' in string so you can go with:
a = "[u'hello', u'hi', u'uyou']".translate(None, "[]' ")
a = [item[1:] for item in a.split(',')]

It depends a bit on the formatting of your input:
it only contains "strings"
if there are always u in front of the strings,
each string is inside single quotations '
there is always no whitespace before the , but one after.
there are no whitespaces before or after the [ and ]
you could simply strip the left [u' and the right '] (for convenience I just slice it from the fourth element to the second to last element), then split at ', u' and you're done:
>>> s = "[u'hello', u'hi']"
>>> s[3:-2].split("', u'")
['hello', 'hi']

Python - defining string split delimiter?

How could I define string delimiter for splitting in most efficient way? I mean to not need to use many if's etc?
I have strings that need to be splited strictly into two element lists. The problem is those strings have different symbols by which I can split them. For example:
'Hello: test1'. This one has split delimiter ': '. The other example would be:
'Hello - test1'. So this one would be ' - '. Also split delimiter could be ' -' or '- '. So if I know all variations of delimiters, how could I define them most efficiently?
First I did something like this:
strings = ['Hello - test', 'Hello- test', 'Hello -test']
for s in strings:
delim = ' - '
if len(s.split('- ', 1)) == 2:
delim = '- '
elif len(s.split(' -', 1)) == 2:
delim = ' -'
print s.split(delim, 1)[1])
But then I got new strings that had another unexpected delimiters. So doing this way I should add even more ifs to check other delimiters like ': '. But then I wondered if there is some better way to define them (there is not problem if I should need to include new delimiters in some kind of list if I would need to later on). Maybe regex would help or some other tool?

Put all the delimiters inside re.split function like below using logical OR | operator.
re.split(r': | - | -|- ', string)
Add maxsplit=1, if you want to do an one time split.
re.split(r': | - | -|- ', string, maxsplit=1)

You can use the split function of the re module
>>> strings = ['Hello1 - test1', 'Hello2- test2', 'Hello3 -test3', 'Hello4 :test4', 'Hello5 : test5']
>>> for s in strings:
... re.split(" *[:-] *",s)
...
['Hello1', 'test1']
['Hello2', 'test2']
['Hello3', 'test3']
['Hello4', 'test4']
['Hello5', 'test5']
Where between [] you put all the possible delimiters. The * indicates that some spaces can be put before or after.

\s*[:-]\s*
You can split by this.Use re.split(r"\s*[:-]\s*",string).See demo.
https://regex101.com/r/nL5yL3/14
You should use this if you can have delimiters like - or - or -.wherein you have can have multiple spaces.

This isn't the best way, but if you want to avoid using re for some (or no) reason, this is what I would do:
>>> strings = ['Hello - test', 'Hello- test', 'Hello -test', 'Hello : test']
>>> delims = [':', '-'] # all possible delimiters; don't worry about spaces.
>>>
>>> for string in strings:
... delim = next((d for d in delims if d in string), None) # finds the first delimiter in delims that's present in the string (if there is one)
... if not delim:
... continue # No delimiter! (I don't know how you want to handle this possibility; this code will simply skip the string all together.)
... print [s.strip() for s in string.split(delim, 1)] # assuming you want them in list form.
['Hello', 'test']
['Hello', 'test']
['Hello', 'test']
['Hello', 'test']
This uses Python's native .split() to break the string at the delimiter, and then .strip() to trim the white space off the results, if there is any. I've used next to find the appropriate delimiter, but there are plenty of things you can swap that out with (especially if you like for blocks).
If you're certain that each string will contain at least one of the delimiters (preferably exactly one), then you can shave it down to this:
## with strings and delims defined...
>>> for string in strings:
... delim = next(d for d in delims if d in string) # raises StopIteration at this line if there is no delimiter in the string.
... print [s.strip() for s in string.split(delim, 1)]
I'm not sure if this is the most elegant solution, but it uses fewer if blocks, and you won't have to import anything to do it.

Extracting alphanumeric substring from a string in python

i have a string in python
text = '(b)'
i want to extract the 'b'. I could strip the first and the last letter of the string but the reason i wont do that is because the text string may contain '(a)', (iii), 'i)', '(1' or '(2)'. Some times they contain no parenthesis at all. but they will always contain an alphanumeric values. But i equally want to retrieve the alphanumeric values there.
this feat will have to be accomplished in a one line code or block of code that returns justthe value as it will be used in an iteratively on a multiple situations
what is the best way to do that in python,

I don't think Regex is needed here. You can just strip off any parenthesis with str.strip:
>>> text = '(b)'
>>> text.strip('()')
'b'
>>> text = '(iii)'
>>> text.strip('()')
'iii'
>>> text = 'i)'
>>> text.strip('()')
'i'
>>> text = '(1'
>>> text.strip('()')
'1'
>>> text = '(2)'
>>> text.strip('()')
'2'
>>> text = 'a'
>>> text.strip('()')
'a'
>>>
Regarding #MikeMcKerns' comment, a more robust solution would be to pass string.punctuation to str.strip:
>>> from string import punctuation
>>> punctuation # Just to demonstrate
'!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
>>>
>>> text = '*(ab2**)'
>>> text.strip(punctuation)
'ab2'
>>>

You could do this through python's re module,
>>> import re
>>> text = '(5a)'
>>> match = re.search(r'\(?([0-9A-Za-z]+)\)?', text)
>>> match.group(1)
'5a'
>>> text = '*(ab2**)'
>>> match = re.search(r'\(?([0-9A-Za-z]+)\)?', text)
>>> match.group(1)
'ab2'

Not fancy, but this is pretty generic
>>> import string
>>> ''.join(i for i in text if i in string.ascii_letters+'0123456789')
This works for all sorts of combinations of parenthesis in the middle of the string, and also if you have other non-alphanumeric characters (aside from the parenthesis) present.

re.match(r'\(?([a-zA-Z0-9]+)', text).group(1)
for your input provided by exmple it would be:
>>> a=['(a)', '(iii)', 'i)', '(1' , '(2)']
>>> [ re.match(r'\(?([a-zA-Z0-9]+)', text).group(1) for text in a ]
['a', 'iii', 'i', '1', '2']

python string pattern matching

new_str="##2##*##1"
new_str1="##3##*##5##7"
How to split the above string in python
for val in new_str.split("##*"):
logging.debug("=======")
logging.debug(val[2:]) // will give
for st in val.split("##*"):
//how to get the values after ## in new_str and new_str1

I don't understand the question.
Are you trying to split a string by a delimiter? Then use split:
>>> a = "##2##*##1"
>>> b = "##3##*##5##7"
>>>
>>> a.split("##*")
['##2', '##1']
>>> b.split("##*")
['##3', '##5##7']
Are you trying to strip extraneous characters from a string? Then use strip:
>>> c = b.split("##*")[1]
>>> c
'##5##7'
>>> c.strip("#")
'5##7'
Are you trying to remove all the hashes (#) from a string? Then use replace:
>>> c.replace("#","")
'57'
Are you trying to find all the characters after "##"? Then use rsplit with its optional argument to split only once:
>>> a.rsplit("##",1)
['##2##*', '1']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I split and parse a string in Python? [duplicate] - python

Related

Python proper way to split multiple delimiters [duplicate]

How to parse string to list without eval or ast.literal_eval

Python - defining string split delimiter?

Extracting alphanumeric substring from a string in python

python string pattern matching

Categories

Resources