Grabbing a part of a string using regex in python 3.x - python

So what i am trying to do is to have an input field named a. Then have a line of regex which checks a for 'i am (something)' (note something could be a chain of words.) and then prints How long have you been (something)?
This is my code so far:
if re.findall(r"i am", a):
print('How long have you been {}'.format(re.findall(r"i am", a)))
But this returns me a list of [i, am] not the (something). How do i get it to return me (something?)
Thanks,
A n00b at Python

Do you mean something like this?
>>> import re
>>> a = "I am a programmer"
>>> reg = re.compile(r'I am (.*?)$')
>>> print('How long have you been {}'.format(*reg.findall(a)))
How long have you been a programmer
r'I am (.*?)$' matches I am and then everything else to the end of the string.
To match one word after, you can do:
>>> a = "I am an apple"
>>> reg = re.compile(r'I am (\w+).*?$')
>>> print('How long have you been {}'.format(*reg.findall(a)))
How long have you been an

may be just a simple solution avoiding a weigth and cost regexp
>>> a = "i am foxmask"
>>> print a[5:]
foxmask

Related

Python: How to slice string using string?

Assuming that the user entered:
"i like eating big apples"
Want to remove "eating" and "apples" together with whatever is in between these two words. Output in this case
"i like"
In another case, if the user entered:
"i like eating apples very much"
Expected output:
"i like very much"
And I want to slice the input starting from "eating" to "apples"
(However, the index cannot be used as you are unsure how long the user is going to type, but it is guaranteed that "eating" and "apples" will be entered)
So, is there any way that we can slide without using the index, instead, we indicate the start and end of the slide with another string?
Slicing a string in python is like this:
mystr = "i like eating big apples"
print(mystr[10:20])
This means between the 10th boundary of characters in the string and the 20th. So it will become: ing big ap.
Now the question is how to find out which index 'eating' starts and 'apple' ends.
Use the .index method to find the beginning of something in a string.
mystr.index('eating') returns 7, so if you print mystr[7:] (which means from the 7th index till the last of the string) you'll have 'eating big apples'.
The second part is a little tricky. If you use mystr.index('apple'), you'll get the beginning of apple, (18), so mystr[7:18] will give you 'eating big '.
In fact you should go some characters further to include the apple word too, which is 5 chars exactly, and this number will be returned by len('apple'). So the final result is:
start = mystr.index('eating')
stop = mystr.index('apple') + len('apple')
print(mystr[start:stop])
You can do the follwoing:
s = "i like eating big apples"
start_ = s.find("eating")
end_ = s.find("apples") + len("apples")
s[start_:end_] # 'eating big apples'
Using find() to find the starting indices of the desired word in the string, and then adjust the start_/end_ to your needs.
To remove the sub string:
s[:start_] + s[end_:] # i like
And for:
s = "i like eating apples very much"
end_ = s.find("apples") + len("apples")
start_ = s.find("eating")
s[:start_] + s[end_:] # 'i like very much'
maybe you can use this:
txt = "Hello, welcome to my world."
x = txt.find("welcome")
print(x)
Which outputs: 7
To find "eating" and "apple"
S = "i like eating big apples"
Index = S.find("eating")
output = S[Index:-1]
Use find() or rfind() method for searching substring's occurrence indices, then paste method's result into slice:
s = "i like eating big apples"
substr = s[s.rfind("eating"):s.rfind("apples")]
You can use str.partition to split string into three parts.
In [112]: s = "i like eating apples very much"
In [113]: h, _, t = s.partition('eating')
In [114]: _, _, t = t.partition('apples')
In [115]: h + t
Out[115]: 'i like very much'
In [116]: s = "i like eating big apples"
In [117]: h, _, t = s.partition('eating')
In [118]: _, _, t = t.partition('apples')
In [119]: h + t
Out[119]: 'i like '

How to return a word in a string if it starts with a certain character? (Python)

I'm building a reddit bot for practice that converts US dollars into other commonly used currencies, and I've managed to get the conversion part working fine, but now I'm a bit stuck trying to pass the characters that directly follow a dollar sign to the converter.
This is sort of how I want it to work:
def run_bot():
subreddit = r.get_subreddit("randomsubreddit")
comments = subreddit.get_comments(limit=25)
for comment in comments:
comment_text = comment.body
#If comment contains a string that starts with '$'
# Pass the rest of the 'word' to a variable
So for example, if it were going over a comment like this:
"I bought a boat for $5000 and it's awesome"
It would assign '5000' to a variable that I would then put through my converter
What would be the best way to do this?
(Hopefully that's enough information to go off, but if people are confused I'll add more)
You could use re.findall function.
>>> import re
>>> re.findall(r'\$(\d+)', "I bought a boat for $5000 and it's awesome")
['5000']
>>> re.findall(r'\$(\d+(?:\.\d+)?)', "I bought two boats for $5000 $5000.45")
['5000', '5000.45']
OR
>>> s = "I bought a boat for $5000 and it's awesome"
>>> [i[1:] for i in s.split() if i.startswith('$')]
['5000']
If you dealing with prices as in float number, you can use this:
import re
s = "I bought a boat for $5000 and it's awesome"
matches = re.findall("\$(\d*\.\d+|\d+)", s)
print(matches) # ['5000']
s2 = "I bought a boat for $5000.52 and it's awesome"
matches = re.findall("\$(\d*\.\d+|\d+)", s2)
print(matches) # ['5000.52']

How to separate the first and last part of a string in a list

i have the following:
a = ['hello there good friend']
i need the following:
a = ['hello', 'there good', 'friend']
Basically I need it so the last index of the list and the first index are separated by commas whereas the rest in between is a single string. I've tried using a for loop for my function, however, it just turned into something really messy which i just think is counter productive.
You should really just be splitting this using the split() function and then slicing your results. There might be slightly cleaner ways, but the easiest way I can think of is the following:
test = a[0].split()
result = [test[0], " ".join(test[1:-1]), test[-1]]
where -1 represents the last entry of the list.
You could alternately do it in a single line (similar to inspectorG4dget's solution), but it means you're splitting your string three times instead of once.
[a[0].split()[0], " ".join(a[0].split()[1:-1]), a[0].split()[-1]]
Alternately, if you think that the slicing is a little over the top (which I do), you could use a regular expression instead, which is arguably a much better solution than anything above:
import re
a = 'hello there good friend'
return re.split(' (.*) ', a)
>>> ['hello', 'there good', 'friend']
As Ord mentions, there's some ambiguity in the question, but for the sample case this should work just fine.
As far as performance goes, gnibbler was right and the regex is in fact slower by about a factor of two, and the complexity of both operations is O(n), so if performance is your goal then you're better of choosing his, but I still think the regex solution is (in a rare win for regex) more readable than the alternatives. Here are the direct timing results:
# gnibbler's tuple solution
>>> timeit.timeit("s='hello there good friend';i1=s.find(' ');i2=s.rfind(' ');s[:i1], s[i1+1:i2], s[i2+1:]", number=100000)
0.0976870059967041
# gnibbler's list solution
>>> timeit.timeit("s='hello there good friend';i1=s.find(' ');i2=s.rfind(' ');[s[:i1], s[i1+1:i2], s[i2+1:]]", number=100000)
0.10682892799377441
# my first solution
>>> timeit.timeit("a='hello there good friend'.split();[a[0], ' '.join(a[1:-1]), a[-1]]", number=100000)
0.12330794334411621
# regex solution
>>> timeit.timeit("re.split(' (.*) ', 'hello there good friend')", "import re", number=100000)
0.27667903900146484
>>> [a[0].split(None, 1)[0]] + [a[0].split(None, 1)[-1].rsplit(None, 1)[0]] + [a[0].rsplit(None, 1)[-1]]
['hello', 'there good', 'friend']
Minimising the creation of temporary strings.
>>> a = ['hello there good friend']
>>> s = a[0]
>>> i1 = s.find(' ')
>>> i2 = s.rfind(' ')
>>> s[:i1], s[i1+1:i2], s[i2+1:]
('hello', 'there good', 'friend') # as a tuple
>>> [s[:i1], s[i1+1:i2], s[i2+1:]]
['hello', 'there good', 'friend'] # as a list

Shorten the url to domain name in a string in python 2.7

suppose i have a string
This is a good doll http://www.google.com/a/bs/jdd/etc/etc/a.py
i would like to get something like this
This is a good doll www.google.com
i tried print re.sub(r'(http://|https://)',"",a)) function in python but i was only able to remove the http:// part from it .Any ideas on how i could achieve this in python 2.7
>>> import re
>>> s = 'This is a good doll http://www.google.com/a/bs/jdd/etc/etc/a.py'
>>> re.sub(r'(?:https?://)([^/]+)(?:\S+)', r"\1", s)
'This is a good doll www.google.com'
If you want to use regexes, then you could do something like this:
>>> import re
>>> the_string = "This is a good doll http://www.google.com/a/bs/jdd/etc/etc/a.py"
>>> def replacement(match):
... return match.group(2)
...
>>> re.sub(r"(http://|https://)(.*?)/\S+", replacement, the_string)
'This is a good doll www.google.com'
>>> string = "This is a good doll http://www.google.com/a/bs/jdd/etc/etc/a.py"
>>> print string.replace('http://', '').split('/')[0]
This is a good doll www.google.com

Can you have variables within triple quotes? If so, how?

This is probably a very simple question for some, but it has me stumped. Can you use variables within python's triple-quotes?
In the following example, how do use variables in the text:
wash_clothes = 'tuesdays'
clean_dishes = 'never'
mystring =""" I like to wash clothes on %wash_clothes
I like to clean dishes %clean_dishes
"""
print(mystring)
I would like it to result in:
I like to wash clothes on tuesdays
I like to clean dishes never
If not what is the best way to handle large chunks of text where you need a couple variables, and there is a ton of text and special characters?
The preferred way of doing this is using str.format() rather than the method using %:
This method of string formatting is the new standard in Python 3.0, and should be preferred to the % formatting described in String Formatting Operations in new code.
Example:
wash_clothes = 'tuesdays'
clean_dishes = 'never'
mystring =""" I like to wash clothes on {0}
I like to clean dishes {1}
"""
print mystring.format(wash_clothes, clean_dishes)
Yes! Starting from Python 3.6 you can use the f strings for this: They're interpolated in place, so mystring would have the desired value after the mystring = ... line:
wash_clothes = 'tuesdays'
clean_dishes = 'never'
mystring = f"""I like to wash clothes on {wash_clothes}
I like to clean dishes {clean_dishes}
"""
print(mystring)
Should you need to add a literal { or } in the string, you would just double it:
if use_squiggly:
kind = 'squiggly'
else:
kind = 'curly'
print(f"""The {kind} brackets are:
- '{{', or the left {kind} bracket
- '}}', or the right {kind} bracket
""")
would print, depending on the value of use_squiggly, either
The squiggly brackets are:
- '{', or the left squiggly bracket
- '}', or the right squiggly bracket
or
The curly brackets are:
- '{', or the left curly bracket
- '}', or the right curly bracket
One of the ways in Python 2 :
>>> mystring =""" I like to wash clothes on %s
... I like to clean dishes %s
... """
>>> wash_clothes = 'tuesdays'
>>> clean_dishes = 'never'
>>>
>>> print mystring % (wash_clothes, clean_dishes)
I like to wash clothes on tuesdays
I like to clean dishes never
Also look at string formatting
http://docs.python.org/library/string.html#string-formatting
Yes. I believe this will work.
do_stuff = "Tuesday"
mystring = """I like to do stuff on %(tue)s""" % {'tue': do_stuff}
EDIT: forgot an 's' in the format specifier.
I think the simplest way is str.format() as others have said.
However, I thought I'd mention that Python has a string.Template class starting in Python2.4.
Here's an example from the docs.
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
One of the reasons I like this is the use of a mapping instead of positional arguments.
Also note that you don't need the intermediate variable:
name = "Alain"
print """
Hello %s
""" % (name)
Pass multiple args in simple way
wash_clothes = 'tuesdays'
clean_dishes = 'never'
a=""" I like to wash clothes on %s I like to clean dishes %s"""%(wash_clothes,clean_dishes)
print(a)

Categories