I'm trying to just input a URL and have python pull the data needed from the url for the script.
I only really need to isolate 1.2.3 into a variable, 7/8/9 into a variable, and 1.2.3/4/5/6/7 into a variable. The url variables changes and want to be able to update my script easily.
Not really sure if that's possible.
Url=https://Thedog.big.com//red/house/large/1.2.3/4/5/6/7/8/9
x = 1.2.3
y = 7/8/9
z = 1.2.3/4/5/6/7
That really depends what kind of urls you have, the idea is just find a regular patterns
If all urls are having same structure with change only in values, you can just take out the parts from url string as index:
a = str(url)[start:stop]
else you can use str(url).split(r'/') to split the url into list of pieces seperated by /
Related
I'm using the click package to get input for one or more variables which get loaded in as a combined dictionary. Each entry is then joined and the combined string is added to the end of a base URL and sent through the requests package to receive some xml data.
Earlier I had an issue with one of the variables that let you search through a range, such as
[value1, value2]
Python added double quotes around it so the search function didn't operate correctly, so I used
.replace('"', '')
on the joined string before combined with the base url and that seemed to fix that problem. The issue now is that individual input that contains more than one word now doesn't produce the same output as the actual search engine online. I have to use quotes when I input the information to keep it as a single argument, but then the quotes get removed by the function above and I believe that is what is causing the issue.
I think if I have a way to access individual entries of this dictionary and remove the double quotes from only certain entries then that should get the job done. But if I am overlooking something please let me know.
Help is appreciated.
Code added below:
import click
import requests
#click.command()
#click.option(--variable1)
#click.option(--variable2)
query_list=[variable1, variable2]
query=''.join(query_list)
base_url = "abc.com...."
response=requests.get(base_url,query)
I am using an API I found online for one of my scripts, and I am wondering if I can change one word from the API to something else. My code is:
import requests
people = requests.get('https://insult.mattbas.org/api/insult')
print("Welcome to the insult machine!\nType somebody you want to insult!")
b = input()
print(people.replace("You", b))
Is replace not a command? If so, what plugin and/or commands would I need to do it? Thanks!
The value returned from requests.get isn’t a string, it’s a response object and that class has no replace method.
Have a look at the structure of that class. For example, you can do r = requests.get(...) and r.text.replace(...).
In other words, you need to operate on the text part of the response object.
I want to crawl a webpage for some information and what I've done so far It's working but I need to do a request to another url from the website, I'm trying to format it but it's not working, this is what I have so far:
name = input("> ")
page = requests.get("http://www.mobafire.com/league-of-legends/champions")
tree = html.fromstring(page.content)
for index, champ in enumerate(champ_list):
if name == champ:
y = tree.xpath(".//*[#id='browse-build']/a[{}]/#href".format(index + 1))
print(y)
guide = requests.get("http://www.mobafire.com{}".format(y))
builds = html.fromstring(guide.content)
print(builds)
for title in builds.xpath(".//table[#class='browse-table']/tr[2]/td[2]/div[1]/a/text()"):
print(title)
From the input, the user enters a name; if the name matches one from a list (champ_list) it prints an url and from there it formats it to the guide variable and gets another information but I'm getting errors such as invalid ipv6.
This is the output url (one of them but they're similar anyway) ['/league-of-legends/champion/ivern-133']
I tried using slicing but it doesn't do anything, probably I'm using it wrong or it doesn't work in this case. I tried using replace as well, they don't work on lists; tried using it as:
y = [y.replace("'", "") for y in y] so I could see if it removed at least the quotes but it didn't work neither; what can be another approach to format this properly?
I take it y is the list you want to insert into the string?
Try this:
"http://www.mobafire.com{}".format('/'.join(y))
I have a large sets of urls. Some are similar to each other i.e. they represent the similar set of pages.
For eg.
http://example.com/product/1/
http://example.com/product/2/
http://example.com/product/40/
http://example.com/product/33/
are similar. Similarly
http://example.com/showitem/apple/
http://example.com/showitem/banana/
http://example.com/showitem/grapes/
are also similar. So i need to represent them as http://example.com/product/(Integers)/
where (Integers) = 1,2,40,33 and http://example.com/showitem/(strings)/ where strings = apple,banana,grapes ... and so on.
Is there any inbuilt function or library in python to do find these similar urls from large set of mixed urls? How can this be done more efficiently? Please suggest. Thanks in advance.
Use a string to store the first part of the URL and just handle IDs, example:
In [1]: PRODUCT_URL='http://example.com/product/%(id)s/'
In [2]: _ids = '1 2 40 33'.split() # split string into list of IDs
In [3]: for id in _ids:
...: print PRODUCT_URL % {'id':id}
...:
http://example.com/product/1/
http://example.com/product/2/
http://example.com/product/40/
http://example.com/product/33/
The statement print PRODUCT_URL % {'id':id} uses Python string formatting to format the product URL depending on the variable id passed.
UPDATE:
I see you've changed your question. The solution for your problem is quite domain-specific and depends on your data set. There are several approaches, some more manual than others. One such approach would be to get the top-level URLs i.e. to retrieve the domain name:
In [7]: _url = 'http://example.com/product/33/' # url we're testing with
In [8]: ('/').join(_url.split('/')[:3]) # get domain
Out[8]: 'http://example.com'
In [9]: ('/').join(_url.split('/')[:4]) # get domain + first URL sub-part
Out[9]: 'http://example.com/product'
[:3] and [:4] above are just slicing the list resulting from split('/')
You can set the result as a key on a dict for which you keep a count of each time you encounter the URL part. And move on from there. Again the solution depends on your data. If it gets more complex than above then I suggest you look into regex as the other answers suggest.
You can use regular expressions to handle that cases. You can go to the Python documentation to see how is this handle.
Also you can see how Django implement this on its routings system
I'm not exactly sure what specifically you are looking for. It sounds to me that you are looking for something to match URLs. If this is indeed what you want then I suggest you use something that is built using regular expressions. One example can be found here.
I also suggest you take a look at Django and its routing system.
Not in Python, but I've created a Ruby Library (and an accompanying app) --
https://rubygems.org/gems/LinkGrouper
It works on all links (doesn't need to know any pattern).
I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.