Accessing data from the internet - python

I want to access the file automatically using Python 3. the website is https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls
when you manually enter the url into explorer it asks you to download the file but i want to do this in python automatically and load the data as a df.
i get the below error
URLError:
from urllib.request import urlretrieve
import pandas as pd
# Assign url of file: url
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'
# Save file locally
urlretrieve(url, 'my-sheet.xls')
# Read file into a DataFrame and print its head
df=pd.read_excel('my-sheet.xls')
print(df.head())
URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

$ curl https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>307 Temporary Redirect</title>
</head><body>
<h1>Temporary Redirect</h1>
<p>The document has moved here.</p>
</body></html>
You are just getting redirected. There are ways to implement this in code, but I would just change url to "https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls"

I ran your code in a jupyter environment, and it worked. No error was prompted, but the dataframe has only NaN values. I checked the xls file you are trying to read, and it seems to not contain any data...
There are other ways to retrieve xls data, such as: downloading an excel file from the web in python
import requests
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'
resp = requests.get(url)
output = open('my-sheet.xls', 'wb')
output.write(resp.content)
output.close()
df=pd.read_excel('my-sheet.xls')
print(df.head())

You can do it directly with pandas and .read_excel method
df = pd.read_excel("https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls", sheet_name='Data', skiprows=5)
df.head(1)
Output

Sorry mate. It works on my PC (not a very helpful comment tbh). Here's a list of things you can do ->
Obtain a reference and check the status code of the reference (200 or 300 means that everything is good, anything else has different meanings)
Check if that link has bot access blocked (Certain sites do that)
In case of blocked access to bot, use selenium for python

Related

Use Python to download a file from ServiceNow using URL

I have multiple reports that I need to run and export from service now. I have been using a URL with filtering to get the reports downloaded directly into the downloads folder this way,but would like to create a python 3.10.5 script that would do all this for me and save it all to a specified location. I would then add to this code to do some data merging and add extra columns to the CSV. Is this even possible given ServiceNow requires credentials?
from urllib.request import urlopen
# Download from URL.
with urlopen( 'MyURL' ) as download:
content = webpage.read()
# Save to file.
with open( 'output.html', 'wb' ) as download:
download.write( content )
getting a heap of errors most notably HTTP Error 401: unauthorized

Not able to download Excel Spreadhsheet from a given url

I have already written a script that will take an .xlsm file and will update this file based on some other file and then will plot a graph according to the updated data. The Script is working fine. But now as this script is contributing to automation of a process,it needs to get the Excel(.xlsm) file from a url and then update the file and may be save it back to the same url.
I have tried downloading the file into a local copy using below code-
import requests
url = 'https://sharepoint.amr.ith.intel.com/sites/SKX/patchboard/Shared%20Documents/Forms/AllItems.aspx?RootFolder=%2Fsites%2FSKX%2Fpatchboard%2FShared%20Documents%2FReleaseInfo&FolderCTID=0x0120004C1C8CCA66D8D94FB4D7A0D2F56A8DB7&View={859827EF-6A11-4AD6-BD42-23F385D43AD6}/Copy of Patch_Release_Utilization'
r = requests.get(url)
open('Excel.xlsm', 'wb').write(r.content)
By doing this I am getting the error-
Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)'),)
What I understood till now is that the Server is not sending the complete Chain Certificates to the browser for authentication.
I also tried-
r=requests.get(url,verify=False)
By doing this error is gone but The file created is empty.When I checked status code for the connection using code-
r=requests.get(url,verify=False).status_code
I got the code as "401" which means authorization error.I have tried providing authentication as-
resp = requests.get(url,auth=HTTPBasicAuth('username', 'password'),verify=False)
and
resp = requests.get(url,auth=HTTPBasicAuth('username', 'password'))
both the above lines I have tried but still status code remained same.
Then I came along an article-Python requests SSL error - certificate verify failed
,where the author is asking to add missing certificates in a .pem file then use that pem file.How to know what are the missing certificates??So din't get any help from there also.
Can somebody please help me with this if somebody already catered this problem.It will be a great help.I am using Python3.6.3 and requests version is 2.18.4
NOTE- When i am using the link manually on Internet Explorer I am able to download the file

When saving FogBugz attachment, server always returns empty response (with some headers)

I'am trying to get case attachment to save in local folder. I have problem with using attachment url to download it, each time server returns empty results and status code 200.
This is a sample url I use (changed host and token) :
https://example.fogbugz.com/default.asp?pg=pgDownload&pgType=pgFile&ixBugEvent=385319&ixAttachment=56220&sFileName=Log.7z&sTicket=&sToken=1234567890627ama72kaors2grlgsk
I have tried using token instead of sToken but no difference. If I copy above url to chrome then it won't work either, but If I login to FogBugz (manuscript) and then try this url again then it works. So I suppose there are some security issues here.
btw. I use python FogBugz api for this and save url using urllib urllib.request.urlretrieve(url, "fb/" + file_name)
Solution I have found is to use cookies from the web browser where I have previously loged in to FB account I use. So it looks like a security issue.
For that I used pycookiecheat (for windows see my fork: https://github.com/luskan/pycookiecheat). For full code see here: https://gist.github.com/luskan/66ffb8f82afb96d29d3f56a730340adc

How to read JSON from URL in Python?

I am trying to use Python to get a JSON file from the Web. If I open the URL in my browser (Mozilla or Chromium) I do see the JSON. But when I do the following with the Python:
response = urllib2.urlopen(url)
data = json.loads(response.read())
I get an error message that tells me the following (after translation in English): Errno 10060, a connection troughs an error, since the server after a certain time period did not react, or the connection was erroneous, or the host did not react.
ADDED
It looks like there are many people who faced the described problem. There are also some answers to the similar (or the same) question. For example here we can see the following solution:
import requests
r = requests.get("http://www.google.com", proxies={"http": "http://61.233.25.166:80"})
print(r.text)
It is already a step forward for me (I think that it is very likely that the proxy is the reason of the problem). However, I still did not get it done since I do not know URL of my proxy and I probably will need user name and password. Howe can I find them? How did it happen that my browsers have them I do not?
ADDED 2
I think I am now one step further. I have used this site to find out what my proxy is: http://www.whatismyproxy.com/
Then I have used the following code:
proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r
As a result I get
<Response [404]>
Looks not so good, but at least I think that my proxy is correct, because when I randomly change the address of the proxy I get another error:
Cannot connect to proxy
So, I can connect to proxy but something is not found.
I think there might be something wrong, when you're trying to get the json from the online source(URL). Just to make things clear, here is a small code snippet
#!/usr/bin/env python
try:
# For Python 3+
from urllib.request import urlopen
except ImportError:
# For Python 2
from urllib2 import urlopen
import json
def get_jsonparsed_data(url):
response = urlopen(url)
data = str(response.read())
return json.loads(data)
If you still get a connection error, You can try a couple of steps:
Try to urlopen() a random site from the Interpreter (Interactive Mode). If you are able to grab the source code you're good. If not check internet conditions or try the request module. Check here
Check and see if the json in the URL is in the correct syntax. For sample json syntax check here
Try the simplejson module.
Edit 1:
if you want to access websites using a system wide proxy you will have to use a proxy handler to use loopback(local host) to connect to that proxy.. A sample code is shown below.
proxy = urllib2.ProxyHandler({
'http': '127.0.0.1',
'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way you can send both http and https request using proxies
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')
I have not not worked a lot with ProxyHandler. I just know the theory and code. I am sure there are better ways to access websites through proxies; One which does not involve installing the opener everytime you run the program. But hopefully it will point you in the right direction.

SUDS python connection

im trying to build an client for an webservice in python with suds. i used the tutorial
on this site: http://www.jansipke.nl/python-soap-client-with-suds. Its working with my own written Webservice and WSDL, but not with the wsdl file i got. The wsdl file is working in soapUI, i can send requests and get an answer. So the problem is, i think, how suds is parsing the wsdl file. I get following error:
urllib2.URLError: <urlopen error [Errno -2] Name or service not known>
Any ideas how to fix that? If you need more information please ask. Thank you!
The error you have given us seems to imply that the URL you are using to access the WSDL is not correct. could you show us a bit more of your code? for example the client instatiation and the url to the WSDL. this might allow others to actually help you.
Olly
# SUDS is primarily built for Python 2.6/7 (Lightweight SOAP client)
# SUDS does not work properly with other version, absolutely no support for 3.x
# Test your code with Python 2.7.12 (I am using)
from suds.client import Client
from suds.sax.text import Raw
# Use your tested URL same format with '?wsdl', Check once in SOAP-UI, below is dummy
# Make sure to use same Method name in below function 'client.service.MethodName'
url = 'http://localhost:8080/your/path/MethodName?wsdl'
#Use your Request XML, below is dummy, format xml=Raw('xml_text')
xml = Raw('<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:diag=" </soapenv:Body></soapenv:Envelope>')
def GetResCode(url, xml):
client = Client(url)
xml_response = (client.service.MethodName(__inject={'msg':xml}))
return xml_response
print(GetResCode(url,xml))

Categories