Need Python regex to extract last two words of URL [closed]

Need Python regex to extract last two words of URL [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Please, recommend regex expression to extract the last two words from an URL like:
INPUT OUTPUT
'www.abcd.google.com' --> 'google.com'
'www.xyz.stackoverflow.com' --> 'stackoverflow.com'

use this regex with 'negative lookahead' feature:
import re
for url in ['www.abcd.google.com','www.xyz.stackoverflow.com']:
print (re.search (r'\w*\.(?!\w*\.)\w*', url)[0])
google.com
stackoverflow.com
Here is the example at Regexr:

Use split with . as delimetr:
url_strings = 'www.xyz.stackoverflow.com'
s = '.'.join(url_strings.split('.')[-2:])
# stackoverflow.com
print(s)
If input validation is required:
url_strings = 'www.xyz.stackoverflow.com'
def return_last_words(url_string, last_words_count=2):
splitted = url_string.split('.')
if last_words_count < len(splitted):
return '.'.join(splitted[-last_words_count:])
return url_string
print(return_last_words(url_strings))

Related

Python match pattern to rename file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have couple of files with like this, llm_rc_v3212.xml, llm_ds_v3232.xml.
Names can be anything. however, common parameter would be_v3212. I want to match this number and replace it (ideally renaming the file).
How can i match this pattern with regex? I am trying to use re.sub, but not able to figure yet.
any help would be appreciated.

Here is a working example. Take into account that depending on other filenames the regex might need to be changed.
import re
FILENAME_VERSION_REGEX = re.compile(r'_v(\d)+')
def rename(filename, replacement):
full_replacement = r'_v{}'.format(replacement)
new_filename = FILENAME_VERSION_REGEX.sub(full_replacement, filename)
return new_filename
Tested with the filenames you gave:
>>> rename('llm_rc_v3212.xml', 1)
'llm_rc_v1.xml'
>>> rename('llm_ds_v3232.xml', 2)
'llm_ds_v2.xml'
>>> rename('llm_v232_uc.xml', 3)
'llm_v3_uc.xml'

Web Scraping - How to get a specific part of a weblink [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
i have the following link:
https://webcache.googleusercontent.com/search?q=cache:jAc7OJyyQboJ:https://cooking.nytimes.com/learn-to-cook+&cd=5&hl=en&ct=clnk
I have multiple links in a dataset. Each link is of same pattern. I want to get a specific part of the link, for the above link i would be the bold part of the link above. I want text starting from 2nd http to before first + sign.
I don't know how to do so using regex. I am working in python. Kindly help me out.

If each link has the same pattern you do not need regex. You can use string.find() and string cutting
link = "https://webcache.googleusercontent.com/search?q=cache:jAc7OJyyQboJ:https://cooking.nytimes.com/learn-to-cook+&cd=5&hl=en&ct=clnk"
# This finds the second occurrence of "https://" and returns the position
second_https = link.find("https://", link.find("https://")+1)
# Index of the end of the link
end_of_link = link.find("+")
new_link = link[second_https:end_of_link]
print(new_link)
This will return "https://cooking.nytimes.com/learn-to-cook" and will work if the link follows the same pattern as described (it is the second https:// in the link and ends with + sign)

I'd go with urlparse (Python 2) or urlparse (Python 3) and a little bit of regex:
import re
from urlparse import urlparse
url_example = "https://webcache.googleusercontent.com/search?q=cache:jAc7OJyyQboJ:https://cooking.nytimes.com/learn-to-cook+&cd=5&hl=en&ct=clnk"
parsed = urlparse(url_example)
result = re.findall('https?.*', parsed.query)[0].split('+')[0]
print(result)
Output:
https://cooking.nytimes.com/learn-to-cook

Newbie need Help python regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a content like this:
aid: "1168577519", cmt_id = 1168594403;
Now I want to get all number sequence：
1168577519
1168594403
by regex.
I have never meet regex problem, but this time I should use it to do some parse job.
Now I can just get sequence after "aid" and "cmt_id" respectively. I don't know how to merge them into one regex.
My current progress:
pattern = re.compile('(?<=aid: ").*?(?=",)')
print pattern.findall(s)
and
pattern = re.compile('(?<=cmt_id = ).*?(?=;)')
print pattern.findall(s)

There are many different approaches to designing a suitable regular expression which depend on the range of possible inputs you are likely to encounter.
The following would solve your exact question but could fail given different styled input. You need to provide more details, but this would be a start.
re_content = re.search("aid\: \"([0-9]*?)\",\W*cmt_id = ([0-9]*?);", input)
print re_content.groups()
This gives the following output:
('1168577519', '1168594403')
This example assumes that there might be other numbers in your input, and you are trying to extract just the aid and cmt_id values.

The simplest solution is to use re.findall
Example
>>> import re
>>> string = 'aid: "1168577519", cmt_id = 1168594403;'
>>> re.findall(r'\d+', string)
['1168577519', '1168594403']
>>>
\d+ matches one or more digits.

Regex not working in python script [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
for some reason when I get regex to get the number i need it returns none.
But when I run it here http://regexr.com/38n3o it works
the regex was designed to get the last number of the ip so it can be removed
lanip=74.125.224.72
notorm=re.search("/([1-9])\w+$/g", lanip)

That is not how you define a regular expressions in Python. The correct way would be:
import re
lanip="74.125.224.72"
notorm=re.search("([1-9])\w+$", lanip)
print notorm
Output:
<_sre.SRE_Match object at 0x10131df30>
You were using a javascript regex style. To read more on correct python syntax read the documentation
If you want to match the last number of an IP use:
import re
lanip="74.125.224.72"
notorm=re.search("(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)", lanip)
print notorm.group(4)
Output:
72
Regex used from http://www.regular-expressions.info/examples.html
Your example did work in this scenario, but would match a lot of false positives.

What is lanip's type? That can't run.
It needs to be a string, i.e.
lanip = "74.125.224.72"
Also your RE syntax looks strange, make sure you've read the documentation on Python's RE syntax.

About python re raw pattern search [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to perform re.search using the pattern as a raw string like below.
m=re.search(r'pattern',string)
But if I have the 'pattern' in variable like pat='pattern'. How do I perform raw search?

You declare the pattern string as a raw string:
regexpattern = r'pattern'
m=re.search(regexpattern,string)

you can give the raw input this way. test is the string variable.
pat = """pat%s""" % test
pattern = re.compile(pat, re.I | re.M)
match = pattern.search(l)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Need Python regex to extract last two words of URL [closed] - python

use this regex with 'negative lookahead' feature: import re for url in ['www.abcd.google.com','www.xyz.stackoverflow.com']: print (re.search (r'\w\.(?!\w\.)\w*', url)[0]) google.com stackoverflow.com Here is the example at Regexr:

Related

Python match pattern to rename file [closed]

Web Scraping - How to get a specific part of a weblink [closed]

Newbie need Help python regex [closed]

Regex not working in python script [closed]

About python re raw pattern search [closed]

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Need Python regex to extract last two words of URL [closed] - python

use this regex with 'negative lookahead' feature: import re for url in ['www.abcd.google.com','www.xyz.stackoverflow.com']: print (re.search (r'\w*\.(?!\w*\.)\w*', url)[0]) google.com stackoverflow.com Here is the example at Regexr:

Related

Python match pattern to rename file [closed]

Web Scraping - How to get a specific part of a weblink [closed]

Newbie need Help python regex [closed]

Regex not working in python script [closed]

About python re raw pattern search [closed]

Categories

Resources

use this regex with 'negative lookahead' feature: import re for url in ['www.abcd.google.com','www.xyz.stackoverflow.com']: print (re.search (r'\w\.(?!\w\.)\w*', url)[0]) google.com stackoverflow.com Here is the example at Regexr: