How to download NLTK data? - python

I am new Data Science. Now I learn NLP. I need download nlkt data. I give this code:
nltk.download("punkt")
I get this error:
[nltk_data] Error loading punkt: <urlopen error [WinError 10061]
[nltk_data] Подключение не установлено, т.к. конечный компьютер
[nltk_data] отверг запрос на подключение>
I search internet but get any good information. What mean WinError 10061 error. Please help me. Thanks

WinError 10061 is a connection error that occurs when the system cannot establish a connection to a remote server or host. In the context of your code, it likely means that the NLTK library is unable to connect to the server hosting the "punkt" data package in order to download it. This can be caused by a few different things:
Network connectivity issues: There may be a problem with your internet connection, or the server hosting the data package may be down.
Firewall settings: Your firewall settings may be blocking the connection. You can try temporarily disabling your firewall to see if that resolves the issue.
Proxy settings: If you are behind a proxy server, your connection may be blocked by the proxy settings. You can try specifying the proxy settings in your code, or disabling the proxy temporarily.
Network permissions: You may not have the necessary permissions to access the server. Check your network settings and ensure that the appropriate permissions are set.
Try to check your internet connection and try again.
If the problem persists, you can try downloading the data package manually from the NLTK website and installing it locally on your machine. After that, you can specify the local path of the data package when using the nltk.download() function.
nltk.download("punkt", download_dir='path/to/local/data')
This will install the data package to your specified location, and the NLTK library will use that instead of trying to download it from the server.

Related

Connecting to Elasticsearch via python

I am running elasticsearch-8.6.1 with default settings on an Azure VM, with port 5601 open. This is a dev server with only one cluster. I am able to start Elasticsearch, Kibana and Logstash services and view them via a browser.
I have a some python code which is trying to connect to ElasticSearch using the recommended route of verifying https through the ca_certification route as per https://www.elastic.co/guide/en/elasticsearch/client/python-api/master/connecting.html
I have copied the http_ca.crt file from the VM onto my local machine and made it accessible.
es = Elasticsearch('https://localhost:9200',
ca_certs=CA_CERT,
basic_auth=(USER_ID,ELASTIC_PASSWORD))
Elasticsearch.yml has the following enabled
network.host: 0.0.0.0
http.host: 0.0.0.0
xpack.security.enabled: true
I appreciate that I can turn off security, but this isn't a sustainable approach moving forward.
The error I am getting is
elastic_transport.ConnectionError: Connection error caused by:
ConnectionError(Connection error caused by:
NewConnectionError(<urllib3.connection.HTTPSConnection object at
0x000001890CEF3730>: Failed to establish a new connection: [WinError
10061] No connection could be made because the target machine actively
refused it))
I suspect there is some configuration setting that I am missing somewhere.
Thanks in advance for any advise or pointers that can be offered.
The error message suggests that the Python code is unable to establish a connection to Elasticsearch on the specified host and port. There could be several reasons for this, including network configuration issues or problems with SSL/TLS certificates.
Here are some things you could try to troubleshoot the issue:
Check that Elasticsearch is running and listening on the correct host and port. You can use the curl command to test the connection:
curl -k https://localhost:9200
If Elasticsearch is running, you should see a JSON response with information about the cluster.
Check that the SSL/TLS certificate is valid and trusted by the Python client. You can use the openssl command to check the certificate:
openssl x509 -in http_ca.crt -text -noout
This will display detailed information about the certificate. Make sure that the Issuer and Subject fields match and that the Validity dates are correct.
Check that the firewall on the Azure VM is not blocking incoming traffic on port 9200. You can use the ufw command to check the firewall rules:
sudo ufw status
If port 9200 is not listed as "ALLOW", you can add a new rule:
sudo ufw allow 9200/tcp
Check that the Python client is using the correct ca_certs file. Make sure that the
CA_CERT
variable in your code points to the correct file location.
Check the Elasticsearch logs for any error messages that might indicate the cause of the connection problem. The logs are usually located in the logs directory of the Elasticsearch installation.
Hopefully, one of these steps will help you resolve the issue. Good luck!

Python API Max retries ecceeded [duplicate]

File "C:\Python27\lib\socket.py", line 224, in meth
return getattr(self._sock,name)(*args) gaierror: [Errno 11004]
getaddrinfo failed
Getting this error when launching the hello world sample from here:
http://bottlepy.org/docs/dev/
It most likely means the hostname can't be resolved.
import socket
socket.getaddrinfo('localhost', 8080)
If it doesn't work there, it's not going to work in the Bottle example. You can try '127.0.0.1' instead of 'localhost' in case that's the problem.
The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy.
Removing the http_proxy environment variable fixed the problem.
The problem in my case was that I needed to add environment variables for http_proxy and https_proxy.
E.g.,
http_proxy=http://your_proxy:your_port
https_proxy=https://your_proxy:your_port
To set these environment variables in Windows, see the answers to this question.
Make sure you pass a proxy attribute in your command
forexample - pip install --proxy=http://proxyhost:proxyport pixiedust
Use a proxy port which has direct connection (with / without password). Speak with your corporate IT administrator. Quick way is find out network settings used in eclipse which will have direct connection.
You will encouter this issue often if you work behind a corporate firewall. You will have to check your internet explorer - InternetOptions -LAN Connection - Settings
Uncheck - Use automatic configuration script
Check - Use a proxy server for your LAN. Ensure you have given the right address and port.
Click Ok
Come back to anaconda terminal and you can try install commands
May be this will help some one. I have my proxy setup in python script but keep getting the error mentioned in the question.
Below is the piece of block which will take my username and password as a constant in the beginning.
if (use_proxy):
proxy = req.ProxyHandler({'https': proxy_url})
auth = req.HTTPBasicAuthHandler()
opener = req.build_opener(proxy, auth, req.HTTPHandler)
req.install_opener(opener)
If you are using corporate laptop and if you did not connect to Direct Access or office VPN then the above block will throw error. All you need to do is to connect to your org VPN and then execute your python script.
Thanks
I spent some good hours fixing this but the solution turned out to be really simple. I had my ftp server address starting with ftp://. I removed it and the code started working.
FTP address before:
ftp_css_address = "ftp://science-xyz.xyz.xyz.int"
Changed it to:
ftp_css_address = "science-xyz.xyz.xyz.int"

New [Errno 11001] getaddrinfo failed error after switching Laptops [duplicate]

File "C:\Python27\lib\socket.py", line 224, in meth
return getattr(self._sock,name)(*args) gaierror: [Errno 11004]
getaddrinfo failed
Getting this error when launching the hello world sample from here:
http://bottlepy.org/docs/dev/
It most likely means the hostname can't be resolved.
import socket
socket.getaddrinfo('localhost', 8080)
If it doesn't work there, it's not going to work in the Bottle example. You can try '127.0.0.1' instead of 'localhost' in case that's the problem.
The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy.
Removing the http_proxy environment variable fixed the problem.
The problem in my case was that I needed to add environment variables for http_proxy and https_proxy.
E.g.,
http_proxy=http://your_proxy:your_port
https_proxy=https://your_proxy:your_port
To set these environment variables in Windows, see the answers to this question.
Make sure you pass a proxy attribute in your command
forexample - pip install --proxy=http://proxyhost:proxyport pixiedust
Use a proxy port which has direct connection (with / without password). Speak with your corporate IT administrator. Quick way is find out network settings used in eclipse which will have direct connection.
You will encouter this issue often if you work behind a corporate firewall. You will have to check your internet explorer - InternetOptions -LAN Connection - Settings
Uncheck - Use automatic configuration script
Check - Use a proxy server for your LAN. Ensure you have given the right address and port.
Click Ok
Come back to anaconda terminal and you can try install commands
May be this will help some one. I have my proxy setup in python script but keep getting the error mentioned in the question.
Below is the piece of block which will take my username and password as a constant in the beginning.
if (use_proxy):
proxy = req.ProxyHandler({'https': proxy_url})
auth = req.HTTPBasicAuthHandler()
opener = req.build_opener(proxy, auth, req.HTTPHandler)
req.install_opener(opener)
If you are using corporate laptop and if you did not connect to Direct Access or office VPN then the above block will throw error. All you need to do is to connect to your org VPN and then execute your python script.
Thanks
I spent some good hours fixing this but the solution turned out to be really simple. I had my ftp server address starting with ftp://. I removed it and the code started working.
FTP address before:
ftp_css_address = "ftp://science-xyz.xyz.xyz.int"
Changed it to:
ftp_css_address = "science-xyz.xyz.xyz.int"

Python SSL: CERTIFICATE_VERIFY_FAILED

I'm getting an error when connecting to www.mydomain.com using Python 2.7.12, on a fairly new machine that uses Windows 8.1. The error is SSL: CERTIFICATE_VERIFY_FAILED on the ssl_sock.connect line of the code below. The code wraps an SSL connection in an context, and specifies I don't want to carry out certificate verification:
ssl._create_default_https_context = ssl._create_unverified_context
s_ = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
context.verify_mode = ssl.CERT_NONE
context.check_hostname = True
context.load_default_certs()
ssl_sock = context.wrap_socket(s_, server_hostname=myurl)
ssl_sock.connect((myurl, int(myportno)))
I've tried adding the plain text version of the security certificate from the server I'm trying to connect to, to the default certificate file that Python references - that didn't work (in any case, it doesn't make sense that I should need to do this)
When I browse to the domain I'm trying to connect to, the browser also doesn't trust the remote server certificate, however I've examined the certificate that's bound to the domain and it's validating fine. What could be causing the mistrust? (I'm currently investigating removal of a Windows security patch from the machine where I'm getting the error, to see if that could be the cause)
(this issue has occurred on other computers using the same code, however it seems to resolve after Windows retrieves a full set of updates. The machine where the problem is persisting also has a full set of updates however)
I resolved this issue, which seems to be related to a post Aug 29th 2016 security update for Windows that causes issues with certificate verification when using the TLS 1.0 protocol. Re-installing Windows without the security update at least allows things to work for now. Also I didn't get this issue when running under Windows 10

Python requests - get request to secured resource - SSL3_GET_RECORD:decryption failed or bad record mac

I am using python-requests to perform get requests to some resources.
In staging and production environments things works out fine, but in the test environment with a slightly different setup, I receive the message below when trying to perform the request:
requests.exceptions.SSLError: [Errno 1] _ssl.c:510: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac
I have tried using an adapter as specified here: SSL Error on Python GET Request
I have tried all the protocols in the list. Still no luck.
I have tried mounting both the complete url and the domain url. No difference.
Any ideas on what to try next?

Categories