i have variable key.links.self (from json output) in template which is an URL:
https://ahostnamea.net:666/api/v1/
Now what i would like to do is render in template only ahostnamea from this variable.
I know it is possible to cut letters but when first letters always have same count (https:// = 8 letters), the rest is not that simple and it gets different.
Is there any way to split/cut string from / to . ? Or any other way?
You could use a pattern with a capturing group and a negated character class [^.]+ matching any char except a dot.
https?://([^.]+)
Regex demo | Python demo
For example
import re
regex = r"https?://([^.]+)"
test_str = "https://ahostnamea.net:666/api/v1/"
matches = re.search(regex, test_str)
if matches:
print(matches.group(1))
Result
ahostnamea
Edit
As suggested you could also use urllib.parse to get the hostname.
from urllib.parse import urlparse
o = urlparse("https://ahostnamea.net:666/api/v1/")
Python demo
The you could get the first part by for example splitting on a dot:
s = o.hostname.split('.', 1)[0]
print(s)
Result
ahostnamea
A proper solution would be {{ request.META.HTTP_HOST }}
Related
I'm having a hard time grasping regex no matter how much documentation I read up on. I'm trying to match everything between a a string and the first occurrence of & this is what I have
link = "group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("group\.do\?sys_id=(.?)&")
sysid = rex.search(link).groups()[0]
I'm using https://regex101.com/#python to help me validate my regex and I can kinda get rex = re.compile("user_group.do?sys_id=(.*)&") to work but the .* is greedy and matches to the last & and im looking to match to the first &
I thought .? matches zero to 1 time
You don't necessarily need regular expressions here. Use urlparse instead:
>>> from urlparse import urlparse, parse_qs
>>> parse_qs(urlparse(link).query)['sys_id'][0]
'69adb887157e450051e85118b6ff533c'
In case of Python 3 change the import to:
from urllib.parse import urlparse, parse_qs
You can simply regex out to the & instead of the final & like so:
import re
link = "user_group.do?sys_id=69adb887157e450051e85118b6ff533c&&"
rex = re.compile("user_group\.do\?sys_id=(.*)&&")
sysid = rex.search(link).groups()[0]
print(sysid)
.*
is greedy but
.*?
should not be in regex.
.?
would only look for any character 0-1 times while
.*?
will look for it up to the earliest matching occurrence. I hope that explains it.
I am an absolute noob at regex (I kind of know the basics and need to help a word, or a phrase. If it is a phrase, then separate each word with a hyphen - :
This is my current regex, which only matches one word:
r'^streams/search/(?P<stream_query>\w+)/$
The ?P just allows the URL to take a parameter.
Extra note: I am using python re module with the Django urls.py
Any suggestions?
Here are some examples:
game
gsl
starcraft-2014
final-fantasy-iv
word1-word2-word-3
Updated explanation:
I basically need a regular expression to expand the current one, so inside the same regex, no other one:
r'^streams/search/(?P<stream_query>\w+)/$
So include the new regex INSIDE this one, where ?P\w+ is any word that Django considers a parameter (and is passed into a function).
URL definition, which includes the regex:
url(r'^streams/search/(?P\w+)/$', 'stream_search', name='stream_search')
Then, Django passes that parameter into the stream_search function, which takes that parameter:
def stream_search(request, stream_query):
#here I manipulate the stream_query string, ie: removing the hyphens
So, once again, I need an re to match a word or phrase, that are passed into the stream_query parameter (or if necessary, a second one).
So, what I want stream_query to have is:
word1
or
word1-word2-word3
If I understand your question correctly then you might not have to use regexs at all.
Based on your example:
example.com/streams/search/rocket-league-fsdfs-fsdfs
It seems that the term you want to deal with is always found after the last /. So you can rsplit and then check for -. Here is an example:
url = "example.com/streams/search/rocket-league-fsdfs-fsdfs"
result = url.rsplit("/", 1)[-1]
#result = ["example.com/streams/search", "rocket-league-fsdfs-fsdfs"]
if "-" in result:
#do whatever you want with the string
else:
#do whatever you want with the string
or a regex that would match either word or word-word-word would be: [\w-]+
Try this,
import re
str = "http://example.com/something?id=123&action=yes"
regex = "(query\d+)=(\w+)"
re.findall(regex, str)
You can also use Python's urlparse library,
from urlparse import url parse
urlparse = urlparse("http://example.com/something?id=123&action=yes")
Just call url parse to return
ParseResult(scheme='http', netloc='example.com', path='/something', params='', query='id=123&action=yes', fragment='')
I'm struggling with Python's re. I don't know how to solve the following problem in a clean way.
I want to extract a part of an URL,
What I tried so far:
url = http://www.example.com/this-2-me-4/123456-subj
m = re.search('/[0-9]+-', url)
m = m.group(0).rstrip('-')
m = m.lstrip('/')
This leaves me with the desired output 123456, but I feel this is not the proper way to extract the slug.
How can I solve this quicker and cleaner?
Use a capturing group by putting parentheses around the part of the regex that you want to capture (...). You can get the contents of a capturing group by passing in its number as an argument to m.group():
>>> m = re.search('/([0-9]+)-', url)
>>> m.group(1)
123456
From the docs:
(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use \( or \), or enclose them inside a character class: [(] [)].
You may want to use urllib.parse combined with a capturing group for mildly cleaner code.
import urllib.parse, re
url = 'http://www.example.com/this-2-me-4/123456-subj'
parsed = urllib.parse.urlparse(url)
path = parsed.path
slug = re.search(r'/([\d]+)-', path).group(1)
print(slug)
Result:
123456
In Python 2, use urlparse instead of urllib.parse.
if you wants to find all the slugs available in a URL you can use this code.
from slugify import slugify
url = "https://www.allrecipes.com/recipe/79300/real-poutine?search=random/some-name/".split("/")
for i in url:
i = i.split("?")[0] if "?" in i else i
if "-" in i and slugify(i) == i:
print(i)
This will provide with an output of
real-poutine
some-name
I have the following URL:
http://google.com/sadfasdfsd$AA=mytag&SS=sdfsdf
What is the best way in Python to get mytag from the string ~$AA=mytag&~?
Try this,
>>> import re
>>> str = 'http://google.com/sadfasdfsd$AA=mytag&SS=sdfsdf'
>>> m = re.search(r'.*\$AA=([^&]*)\&.*', str)
>>> m.group(1)
'mytag'
There is a special meaning for $ and & in regex, so you have to escape those characters to tell python interpreter that these characters are literal $ and &.
Use this regex =(.+)&
import re
regex = "=(.+)&"
print re.findall(regex,"http://google.com/sadfasdfsd$AA=mytag&SS=sdfsdf")[0]
To retrieve the mytag that comes after $AA, you can use this simple regex (see demo):
(?<=\$AA=)[^&]+
In Python:
match = re.search(r"(?<=\$AA=)[^&]+", subject)
Explain Regex
(?<= # look behind to see if there is:
\$ # '$'
AA= # 'AA='
) # end of look-behind
[^&]+ # any character except: '&' (1 or more times
# (matching the most amount possible))
I'm just going to throw this one out there to show there are other ways of doing this:
import urlparse
url = "http://google.com/sadfasdfsd?AA=mytag&SS=sdfsdf"
query = urlparse.urlparse(url).query # Extract the query string from the full URL
parsed_query = urlparse.parse_qs(query) # Parses the query string into a dict
print parsed_query["AA"][0]
# mytag
See here: https://docs.python.org/2/library/urlparse.html for documentation on the urlparse module.
NB parse_qs returns a list so we use [0] to get the first result.
Also, I have assumed the question has a typo and have amended the url so that it represents a traditional query string.
when I try to extract this video ID (AIiMa2Fe-ZQ) with a regex expression, I can't get the dash an all the letters after.
>>> id = re.search('(?<=\?v\=)\w+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
>>> print id.group(0)
>>> AIiMa2Fe
Intead of \w+ use below. Word character (\w) doesn't include a dash. It only includes [a-zA-Z_0-9].
[\w-]+
I don't know the pattern for youtube hashes, but just include the "-" in the possibilities as it is not considered an alpha:
import re
id = re.search('(?<=\?v\=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
print id.group(0)
I have edited the above because as it turns out:
>>> re.search("[\w|-]", "|").group(0)
'|'
The "|" in the character definition does not act as a special character but does indeed match the "|" pipe. My apologies.
>>> re.search('(?<=v=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group()
'AIiMa2Fe-ZQ'
\w is a short-hand for [a-zA-Z0-9_] in python2.x, you'll have to use re.A flag in py3k. You quite clearly have additional character in that videoid, i.e., hyphen. I've also removed redundant escape backslashes from the lookbehind.
Use the urlparse module instead of regex for such kind of things.
import urlparse
parsed_url = urlparse.urlparse(url)
if parsed_url.netloc.find('youtube.com') != -1 and parsed_url.path == '/watch':
video = urlparse.parse_qs(parsed_url.query).get('v', None)
if video is None:
video = urlparse.parse_qs(parsed_url.fragment.strip('!')).get('v', None)
if video is not None:
print video[0]
EDIT: Updated for the upcoming new youtube url format.
/(?:/v/|/watch\?v=|/watch#!v=)([A-Za-z0-9_-]+)/
Explain the RE
There are three alternate YouTube formats: /v/[ID] and watch?v= and the new AJAX watch#!v= This RE captures all three. There is also new YouTube URL for user pages that is of the form /user/[user]?content={complex URI} This is not captured here by any regex...
I'd try this:
>>> import re
>>> a = re.compile(r'.*(\-\w+)$')
>>> a.search('http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group(1)
'-ZQ'