Convert a string containing integers into an integer - python

I am trying to convert the integers contained into a string like "15m" into an integer.
With the code below I can achieve what I want. But I am wondering if there is a better solution for this, or a function I'm not aware of which already implements this.
s = "15m"
s_result = ""
for char in s:
try:
i = int(char)
s_result = s_result + char
except:
pass
result = int(s_result)
print result
This code would output below result:
>>>
15
Maybe there is no such "better" solution but I would like to see other solutions, like using regex maybe.

I found a good solution using regex.
import re
result = int(re.sub('[^0-9]','', s))
print result
Which results in:
>>>
15

You could also match one or more digits from the start of the line ^\d+
import re
regex = r"^\d+"
test_str = "15m"
match = re.search(regex, test_str)
if match:
print (int(match.group()))

Related

Python regex to get returned value from a string

Perl has some good and easy function to set the returned value to a variable
if($string =~ /<(\w+)>/){
$name = $1;
}
This is what I tried for python and it works, but is there any alternative way of doing this?
if re.match('\s*<\w+>.+', string):
var = re.findall('>(\w+)<', string)
Hope this is what you looking for:
string = "id: 10"
match = re.search("id: (\d+)", string)
if match:
id = match.group(1)
print id
Whatever you need, you have possibly everything in Python re doc.
You don't need to do the match followed by findall, findall will return an empty list when there's no match:
>>> string = 'sdafasdf asdfas '
>>> var = re.findall('>(\w+)<', string)
>>> var
[]
So, you can translate your Perl example like this:
try: name = re.findall('>(\w+)<', string)[0]
except IndexError: name = 'unknown'
I don't think you regex will match anything. They both contradict each other.
This is how you would do a match in Python:
import re
string = "string"
matches = re.match('(\w+)', string)
print matches.group()

Find specific string sections in python

I want to be able to grab sections of strings with a function. Here is an example:
def get_sec(s1,s2,first='{',last='}'):
start = s2.index(first)
end = -(len(s2) - s2.index(last)) + 1
a = "".join(s2.split(first + last))
b = s1[:start] + s1[end:]
print a
print b
if a == b:
return s1[start:end]
else:
print "The strings did not match up"
string = 'contentonemore'
finder = 'content{}more'
print get_sec(string,finder)
#'one'
So that example works...my issue is I want multiple sections, not just one. So my function needs to be able to work for any amount of sections, for example:
test_str = 'contwotentonemorethree'
test_find = 'con{}tent{}more{}'
print get_sec(test_str,test_find)
#['one','two','three']
any ideas on how I can make that function work for an arbitrary number of replacements?
You probably want to use the standard python regex library
import re
a = re.search('con(.*)tent(.*)more(.*)','contwotentonemorethree')
print a.groups()
# ('two', 'one', 'three')
or
print re.findall('con(.)tent(.)more(.*)','contwotentonemorethree')
# [('two', 'one', 'three')]
edit:
you can escape special character in a string using
re.escape(str)
example:
part1 = re.escape('con(')
part2 = re.escape('(tent')
print re.findall(part1 + '(.*)' + part2,'con(two)tent')
It is not just "use regex". you are trying to actually implement regex. well, the easiest way for implemeting regex will be using the re library. of course.
ummm use regex?
import re
re.findall("con(.*)tent(.*)more(.*)",my_string)
Looks like you want something with regular expressions.
Here's python's page about regular expressions: http://docs.python.org/2/library/re.html
As an example, if say you knew that the string would only be broken into segments "con", "tent", "more" you could have:
import re
regex = re.compile(r"(con).*(tent).*(more).*")
s = 'conxxxxtentxxxxxmore'
match = regex.match(s)
Then find the indices of the matches with:
index1 = s.index(match.group(1))
index2 = s.index(match.group(2))
index3 = s.index(match.group(3))
Or if you wanted to find the locations of the other characters (.*):
regex = re.compile(r"con(.*)tent(.*)more(.*)")

Python splitting string to find specific content

I am trying to split a string in python to extract a particular part. I am able to get the part of the string before the symbol < but how do i get the bit after? e.g. the emailaddress part?
>>> s = 'texttexttextblahblah <emailaddress>'
>>> s = s[:s.find('<')]
>>> print s
This above code gives the output texttexttextblahblah 
s = s[s.find('<')+1:-1]
or
s = s.split('<')[1][:-1]
cha0site's and ig0774's answers are pretty straightforward for this case, but it would probably help you to learn regular expressions for times when it's not so simple.
import re
fullString = 'texttexttextblahblah <emailaddress>'
m = re.match(r'(\S+) <(\S+)>', fullString)
part1 = m.group(1)
part2 = m.group(2)
Perhaps being a bit more explicit with a regex isn't a bad idea in this case:
import re
match = re.search("""
(?<=<) # Make sure the match starts after a <
[^<>]* # Match any number of characters except angle brackets""",
subject, re.VERBOSE)
if match:
result = match.group()

Simple python regex, match after colon

I have a simple regex question that's driving me crazy.
I have a variable x = "field1: XXXX field2: YYYY".
I want to retrieve YYYY (note that this is an example value).
My approach was as follows:
values = re.match('field2:\s(.*)', x)
print values.groups()
It's not matching anything. Can I get some help with this? Thanks!
Your regex is good
field2:\s(.*)
Try this code
match = re.search(r"field2:\s(.*)", subject)
if match:
result = match.group(1)
else:
result = ""
re.match() only matches at the start of the string. You want to use re.search() instead.
Also, you should use a verbatim string:
>>> values = re.search(r'field2:\s(.*)', x)
>>> print values.groups()
('YYYY',)

Regex: Replace one pattern with another

I am trying to replace one regex pattern with another regex pattern.
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile('\d+x\d+') # for st_srt
re.sub(pattern, 'S\1E\2',st_srt)
I know the use of S\1E\2 is wrong here. The reason am using \1 and \2 is to catch the value 01 and 02 and use it in S\1E\2.
My desired output is:
st_srt = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
So, what is the correct way to achieve this.
You need to capture what you're trying to preserve. Try this:
pattern = re.compile(r'(\d+)x(\d+)') # for st_srt
st_srt = re.sub(pattern, r'S\1E\2', st_srt)
Well, it looks like you already accepted an answer, but I think this is what you said you're trying to do, which is get the replace string from 'st_mkv', then use it in 'st_srt':
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'Awake\.([^.]+)\.')
m = replace_pattern.match(st_mkv)
replace_string = m.group(1)
new_srt = re.sub(r'^Awake\.[^.]+\.', 'Awake.{0}.'.format(replace_string), st_srt)
print new_srt
Try using this regex:
([\w+\.]+){5}\-\w+
copy the stirngs into here: http://www.gskinner.com/RegExr/
and paste the regex at the top.
It captures the names of each string, leaving out the extension.
You can then go ahead and append the extension you want, to the string you want.
EDIT:
Here's what I used to do what you're after:
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt' // dont actually need this one
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'([\w+\.]+){5}\-\w+')
m = replace_pattern.match(st_mkv)
new_string = m.group(0)
new_string += '.srt'
>>> new_string
'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile(r'(\d+)x(\d+)')
st_srt_new = re.sub(pattern, r'S\1E\2', st_srt)
print st_srt_new

Categories