Unzipping tokenizers\punkt.zip in nltk.download('punkt') - python

I have integrate ntlk in my python project but after installing punkt by nltk.download('punkt')
is is showing
Unzipping tokenizers\punkt.zip.
I have check the nltk-data download location for confirmation but nothing happened.

Your question is not clear but try to restart your terminal and paste the following command.
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

Related

SSL: CERTIFICATE_VERIFY_FAILED in python while installing package [duplicate]

I get the following error when trying to install Punkt for nltk:
nltk.download('punkt')
[nltk_data] Error loading Punkt: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed
[nltk_data] (_ssl.c:590)>
False
TLDR: Here is a better solution: https://github.com/gunthercox/ChatterBot/issues/930#issuecomment-322111087
Note that when you run nltk.download(), a window will pop up and let you select which packages to download (Download is not automatically started right away).
To complement the accepted answer, the following is a complete list of directories that will be searched on Mac (not limited to the one mentioned in the accepted answer):
- '/Users/YOUR_USERNAME/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/Users/YOUR_USERNAME/YOUR_VIRTUAL_ENV_DIRECTORY/nltk_data'
- '/Users/YOUR_USERNAME/YOUR_VIRTUAL_ENV_DIRECTORY/share/nltk_data'
- '/Users/YOUR_USERNAME/YOUR_VIRTUAL_ENV_DIRECTORY/lib/nltk_data'
In case the link above dies, here is the solution pasted in its entirety:
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Run the above code in your favourite Python IDE or via the command line.
This works by disabling SSL check!
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Run the Python interpreter and type the commands:
import nltk
nltk.download()
from here: http://www.nltk.org/data.html
if you get an SSL/Certificate error, run the following command
bash /Applications/Python 3.6/Install Certificates.command
from here: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)
Search 'Install Certificates.command' in the finder and open it.
Then do the following steps in the terminal:
python3
import nltk
nltk.download()
The downloader script is broken. As a temporal workaround can manually download the punkt tokenizer from here and then place the unzipped folder in the corresponding location. The default folders for each OS are:
Windows: C:\nltk_data\tokenizers
OSX: /usr/local/share/nltk_data/tokenizers
Unix: /usr/share/nltk_data/tokenizers
This is how I solved it for MAC OS.
Initially after installing nltk, I was getting the SSL error.
Solution:
Goto
cd /Applications/Python\ 3.8
Run the command
./Install\ Certificates.command
Now if you try again, it should work!
Thanks a lot to this article!
You just need to Install the certificate doing this simple step
In the python application folder double-click on the file 'Certificates.command'
this will make a prompt window show in your screen and basically will automatically install the certificate for you, close this window and try again.
My solution is:
Download punkt.zip from here and unzip
Create nltk_data/tokenizers folders under home folder
Put punkt folder under tokenizers folder
There is a very simple way to fix all of this as written in the formal bug report for anyone else coming across this problem recently (e.g. 2019) and using MacOS. From the bug report at https://bugs.python.org/issue28150:
...there is a simple double-clickable or command-line-runnable script ("/Applications/Python 3.6/Install Certificates.command") that does two things: 1. uses pip to install certifi and 2. creates a symlink in the OpenSSL directory to certifi's installed bundle location.
Simply running the "Install Certificates.command" script worked for me on MacOS (10.15 beta as of this writing) and I was off and running.
My solution after nothing worked. I navigated, via the GUI to the Python 3.7 folder, opened the 'Certificates.command' file in terminal and the SSL issue was immediately resolved.
A bit late to the party but I just entered Certificates.command into Spotlight which found it and ran it. All fixed in seconds.
I'm running mac Catalina and using python 3.7 installed by Homebrew
It means that you are not using HTTPS to work consistently with other run time dependencies for Python etc.
If you are using Linux (Ubuntu)
~$ sudo apt-get install ca-certificates
Should solve the issue.
If you are using this in a script with a docker file, you have to make sure you have install the the ca-certificates modules in your docker file.
For mac users,
just copy paste the following in the terminal:
/Applications/Python\ 3.10/Install\ Certificates.command ; exit;
First go to the path /Applications/Python 3.6/ and run
Install Certificates.command
You will admin rights for the same.
If you are unable to download it, then as other answer suggest you can download directly and place it. You need to place them in the following directory structure.
> nltk_data
> corpora
> brown
> conll2000
> movie_reviews
> wordnet
> taggers
> averaged_perceptron_tagger
> tokenizers
> punkt
Updating the python certificates worked for me.
At the top of your script, keep:
import nltk
nltk.download('punkt')
In a separate terminal run (Mac):
bash /Applications/Python <version>/Install Certificates.command
For me, the solution was much simpler: I was still connected to my corporate network/VPN which blocks certain types of downloads. Switching the network made the SSL error disappear.

Not able to install nltk supporting packages and getting noaddrinfo error with nltk.download()

After importing nltk to my system i tried to download the packages using nltk.download()
but its showing the following error.
enter image description here
i have tried changing the proxy but its not working(my laptop is not using any proxy but it is connected to VPN).
I tried to manually download required packages from http://www.nltk.org/nltk_data/ but still not going anywhere.
i also tried
enter image description here
and
i have tried every solution from
error installing nltk supporting packages : nltk.download()
,but the problem remains the same.

NLTK not getting imported in VS Code

I have just started learning NLP and for that purpose I installed nltk package using pip install nltk in the cmd terminal of VS Code. After I installed it, I tried importing it in the command line itself and I was successful but in the main window where we write the code, I tried from nltk.tokenize import sent_tokenize but it showed the following error:
then I tried simply import nltk but that too didn't work, and it showed the following error:
I also tried restarting VS Code, but it was all in vain. How can I resolve this issue?

what should i do to remove the error of importing nltk as i have already installed it?

I have installed nltk and now it's not working, and I need assistance figuring out what's wrong. I installed via pip
It is showing these error.
Please use the NLTK Downloader to obtain the resource
How can I solve this?
You might want to try this.
import nltk
nltk.download()
Installing nltk gives you access to the code, but nltk relies on internal datasets, which you need to download separately.
You need to run nltk.download() to get access to them.
More information here.

downloading error using nltk.download()

I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download(). I got this kind of error message. How to solve this problem? Thanks.
The system I used is Ubuntu installed under VMware. The IDE is Spyder.
After using nltk.download('all'), it can download some packages, but it gets error message when downloading oanc_masc
To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you're unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of "popular" resources.
Ensure that you've the latest version of NLTK because it's always improving and constantly maintain:
$ pip install --upgrade nltk
EDITED
In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
And if anyone wants to find nltk_data directory, see https://stackoverflow.com/a/36383314/610569
And to config nltk_data path, see https://stackoverflow.com/a/22987374/610569
From command line, after importing nltk, try
nltk.download('popular', halt_on_error=False)
After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.
a) in OSX either run
sudo /Applications/Python\ 3.6/Install\ Certificates.command
b) switch to admin user (the one you have set up with administrator privileges)
and type at command line:
/Applications/Python\ 3.6/Install\ Certificates.command
Notes:
"\" are necessary because they escape blank characters in file names.
This procedure worked if you have python 3.6 installed, otherwise
change it in order to match your install python version... for this
execute:
ls /Applications
and look at the python directory name you have there.
An easy(hard) way to get over this error is to do the process manually. Just go to the website https://www.nltk.org/nltk_data/ and download the required zip file and extract the contents.
In Windows, go to user/AppData/local/Programs/Python/Python(version)/lib and create a folder nltk_data. Then create the respective folder. As an example, for 'punkt' create the folder tokenizers and add the folder 'punkt' inside the extracted folder to it. This info is mostly given by the terminal itself.
Run your program. Cheers!
EDIT 1: Of course, downloading all files can be time-consuming, but it's the only option if the "urlopen error" persists.
EDIT 2 It is also mostly your router or network at fault that you are not able to download nltk files. Try changing your network and that should help.
I had this error:
Resource punkt not found. Please use the NLTK Downloader to obtain the resource: import nltk nltk.download('punkt')
When I tried to solve by writing:
import nltk
nltk.download()
my computer shut downs suddenly and anaconda also closed. When I try to open it always shows an error.
I solved the problem by writing:
import nltk
nltk.download('punkt')

Categories