how to parse this string in Python? [duplicate] - python

This question already has answers here:
How can I split and parse a string in Python? [duplicate]
(3 answers)
Closed 8 years ago.
I have a file which contains each line in the following format
"('-1259656819525938837', 598679497)\t0.036787946" # "\t" within the string is the tab sign
I need to get the components out
-1259656819525938837 #string, it is the content within ' '
598679497 # long
0.036787946 # float
Python 2.6

You can use regular expressions from re module:
import re
s = "('-1259656819525938837', 598679497)\t0.036787946"
re.findall(r'[-+]?[0-9]*\.?[0-9]+', s)
% gives: ['-1259656819525938837', '598679497', '0.036787946']

"2.7.0_bf4fda703454".split("_") gives a list of strings:
In [1]: "2.7.0_bf4fda703454".split("_")
Out[1]: ['2.7.0', 'bf4fda703454']
This splits the string at every underscore. If you want it to stop after the first split, use "2.7.0_bf4fda703454".split("_", 1).
If you know for a fact that the string contains an underscore, you can even unpack the LHS and RHS into separate variables:
In [8]: lhs, rhs = "2.7.0_bf4fda703454".split("_", 1)
In [9]: lhs
Out[9]: '2.7.0'
In [10]: rhs
Out[10]: 'bf4fda703454'

You can use a regex to extract number and float from string:
>>> import re
>>> a = "('-1259656819525938837', 598679497)\t0.036787946"
>>> re.findall(r'[-?\d\.\d]+', a)
['-1259656819525938837', '598679497', '0.036787946']

Related

Split string with escaped delimeter using a delimeter [duplicate]

This question already has answers here:
Python split string without splitting escaped character
(10 answers)
Closed 5 years ago.
Is there any better way to split a string which contains escaped delimeter in it.
string = "fir\&st_part&secon\&d_part"
print(string.split('&'))
# is giving me
>>> ['fir\\', 'st_part', 'secon\\', 'd_part']
# but not
>>> ['fir&st_part', 'secon&d_part']
I have added an escape character \ before & in fir&st_part and secon&d_part with the intention that split function will escape the following character.
Is there any better way to do this if not by using a string split?
You can user regular expression!
split if ?<! current position of string is not preceded with backward (\, two slashes to escape it)slash and ampersand symbol(&)
>>> import re
>>> re.split(r'(?<!\\)&', string)
['fir\\&st_part', 'secon\\&d_part']
With the resulting list, you can iterate and replace the escaped '\&' with '&' if necessary!
>>> import re
>>> print [each.replace("\&","&") for each in re.split(r'(?<!\\)&', string)]
['fir&st_part', 'secon&d_part']
It's possible using a regular expression:
import re
string = "fir\&st_part&secon\&d_part"
re.split(r'[^\\]&', string)
# ['fir\\&st_par', 'secon\\&d_part']

Spliting string with lookahead/lookbehind assertions for empty string match [duplicate]

This question already has an answer here:
python re.split lookahead pattern
(1 answer)
Closed 6 years ago.
I`m trying to split and to rename some ugly looking variable names (as an example):
In[1]: import re
ugly_names = ['some-Ugly-Name', 'ugly:Case:Style', 'uglyNamedFunction']
new_names = []
In[2]: patt = re.compile(r'(?<=[a-z])[\-:]?(?=[A-Z])')
In[3]: for name in ugly_names:
loc_name = patt.split(name)
new_names.append("_".join(s.lower() for s in loc_name))
print(new_names)
Out[3]: ['some_ugly_name', 'ugly_case_style', 'uglynamedfunction']
What's wrong with my pattern? Why doesn't it match on empty string, or I'm missing something?
p.s.: Is it possible with Python's regex to split on empty strings or should I use some other functions and .groups()?
Not a direct answer to the question, but just an alternative way - use the inflection library (have to handle : separately though):
>>> import inflection
>>>
>>> [inflection.underscore(name.replace(":", "_")) for name in ugly_names]
['some_ugly_name', 'ugly_case_style', 'ugly_named_function']

Python - Most elegant way to extract a substring, being given left and right borders [duplicate]

This question already has answers here:
How to extract the substring between two markers?
(22 answers)
Closed 4 years ago.
I have a string - Python :
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
Expected output is :
"Atlantis-GPS-coordinates"
I know that the expected output is ALWAYS surrounded by "/bar/" on the left and "/" on the right :
"/bar/Atlantis-GPS-coordinates/"
Proposed solution would look like :
a = string.find("/bar/")
b = string.find("/",a+5)
output=string[a+5,b]
This works, but I don't like it.
Does someone know a beautiful function or tip ?
You can use split:
>>> string.split("/bar/")[1].split("/")[0]
'Atlantis-GPS-coordinates'
Some efficiency from adding a max split of 1 I suppose:
>>> string.split("/bar/", 1)[1].split("/", 1)[0]
'Atlantis-GPS-coordinates'
Or use partition:
>>> string.partition("/bar/")[2].partition("/")[0]
'Atlantis-GPS-coordinates'
Or a regex:
>>> re.search(r'/bar/([^/]+)', string).group(1)
'Atlantis-GPS-coordinates'
Depends on what speaks to you and your data.
What you haven't isn't all that bad. I'd write it as:
start = string.find('/bar/') + 5
end = string.find('/', start)
output = string[start:end]
as long as you know that /bar/WHAT-YOU-WANT/ is always going to be present. Otherwise, I would reach for the regular expression knife:
>>> import re
>>> PATTERN = re.compile('^.*/bar/([^/]*)/.*$')
>>> s = '/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/'
>>> match = PATTERN.match(s)
>>> match.group(1)
'Atlantis-GPS-coordinates'
import re
pattern = '(?<=/bar/).+?/'
string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
result = re.search(pattern, string)
print string[result.start():result.end() - 1]
# "Atlantis-GPS-coordinates"
That is a Python 2.x example. What it does first is:
1. (?<=/bar/) means only process the following regex if this precedes it (so that /bar/ must be before it)
2. '.+?/' means any amount of characters up until the next '/' char
Hope that helps some.
If you need to do this kind of search a bunch it is better to 'compile' this search for performance, but if you only need to do it once don't bother.
Using re (slower than other solutions):
>>> import re
>>> string = "/foo13546897/bar/Atlantis-GPS-coordinates/bar457822368/foo/"
>>> re.search(r'(?<=/bar/)[^/]+(?=/)', string).group()
'Atlantis-GPS-coordinates'

separate string into substring python [duplicate]

This question already has answers here:
Split a string to even sized chunks
(9 answers)
Closed 8 years ago.
how can I separate a string: "Blahblahblahblah" into "Blah" "blah" "blah" "blah" on python. I've tried the following:
str = "Blahblahblahblah"
for letter[0:3] on str
How can I do it?
If you do not mind to use re library. In this example the regex .{4} means any character except \n of length 4.
import re
str = "Blahblahblahblah"
print re.findall(".{4}", str)
output:
['Blah', 'blah', 'blah', 'blah']
Note: str is not a very good name for a variable name. Because there is a function named str() in python that converts the given variable into a string.
Try:
>>> SUBSTR_LEN = 4
>>> string = "bla1bla2bla3bla4"
>>> [string[n:n + SUBSTR_LEN] for n in range(0, len(string), SUBSTR_LEN)]
['bla1', 'bla2', 'bla3', 'bla4']

long hex string to integer in python [duplicate]

This question already has answers here:
Python Trailing L Problem
(5 answers)
Closed 9 years ago.
I receive from a module a string that is a representation of an long int
>>> str = hex(5L)
>>> str
'0x5L'
What I now want is to convert the string str back to a number (integer)
int(str,16) does not work because of the L.
Is there a way to do this without stripping the last L out of the string? Because it is also possible that the string contains a hex without the L ?
Use str.rstrip; It works for both cases:
>>> int('0x5L'.rstrip('L'),16)
5
>>> int('0x5'.rstrip('L'),16)
5
Or generate the string this way:
>>> s = '{:#x}'.format(5L) # remove '#' if you don' want '0x'
>>> s
'0x5'
>>> int(s, 16)
5
You could even just use:
>>> str = hex(5L)
>>> long(str,16)
5L
>>> int(long(str,16))
5
>>>

Categories