Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 days ago.
Improve this question
I need your valuable help, I want to download the invoices for my amazon orders from last years, I 've done it with this simple script using Beautifulsoup and requests:
url = 'https://www.amazon.fr/documents/download/4a70c9ee-36bb-4278-ad4b-2da1a456525e/invoice.pdf'
response = requests.get(url)
if response.status_code == 200:
print("GET request successful")
with open("invoice.pdf", "wb") as f:
f.write(response.content)
else:
print('No conection')
The script download the file but I can not open it, it give me error: "amazon/400". I have tried with selenium, urlib and wget and always is the same, the script connects and download the file but can not open.
Any idea what could be happening and how to fix it?
Your input is much much appreciate
Try different modules such as Selenium, urllib, requests and wget to no avail.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
I'm using the below code to extract full HTML:
cont = await page1.content()
The website I intend to extract from is:
https://www.mohmal.com/en
which is a website to make temporary email accounts. The exact thing I want to do is reading the content of received emails, but by using the above code, I could not extract inner frame HTML where received emails contents placed within it. How can I do so?
Did you try using urllib?
You can use the urllib module to read html websites.
from urllib.request import urlopen
f = urlopen("https://www.google.com")
print(f.read())
f.close()
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
For example, I tried getting Python to read the following filtered page
http://www.hearthpwn.com/cards?filter-attack-val=1&filter-attack-op=1&display=1
but Python only gets the unfiltered page http://www.hearthpwn.com/cards instead.
The standard library urllib2 normally follows redirects. If retrieving this URL used to work without being redirected, then the site has changed.
Although you can prevent following the redirect within urllib2 (by providing an alternative HTTP handler), I recommend using requests, where you can do:
import requests
r = requests.get('http://www.hearthpwn.com/cards?filter-attack-val=1'
'&filter-attack-op=1&display=1', allow_redirects=False)
print(r)
giving you:
<Response [302]>
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to extract the HTML from a webpage:
import urllib2
req = urllib2.Request('https://www.example.com')
response = urllib2.urlopen(req)
fullhtml = response.read()
I tried with "ulrllib2" but since the page is built dynamically, the HTML content is empty.
Is there a way to wait for the javascript to load?
Take a look at this http://phantomjs.org/ . Most websites are javascript based and php or python can not execute them. I think this library will be the best you can get.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am new to Python and I want to use Reddit API to retrieve top 10 headline on the front page of Reddit using Python. I tried to read the API documentation but I am not able to understand how to proceed.
It would be great if someone can give me an example.
Thanks
Here's a quick example on how to download the json data you want. Basically, open the URL, download the data in JSON format, and use json.loads() to load it into a dictionary.
try:
from urllib.request import urlopen
except ImportError: # Python 2
from urllib2 import urlopen
import json
url = 'http://www.reddit.com/r/python/.json?limit=10'
jsonDownload = urlopen(url)
jsonData = json.loads(jsonDownload.read())
From there, you can print out 'jsonData', write it to a file, parse it, whatever.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
So let's say I have this URL: https://www.python.org/
and I want to download the page's source into a .txt file named python_source.txt
how would I do that?
Use urllib2, Here's how it's done:
response = urllib2.urlopen(url)
content = response.read()
Now you can save the content in any text file.
The python package urllib does just this. The documentation gives a very clear example on what you want to do.
import urllib.request
local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)