downloading error using nltk.download()

downloading error using nltk.download() - python

I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download(). I got this kind of error message. How to solve this problem? Thanks.
The system I used is Ubuntu installed under VMware. The IDE is Spyder.
After using nltk.download('all'), it can download some packages, but it gets error message when downloading oanc_masc

To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you're unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of "popular" resources.
Ensure that you've the latest version of NLTK because it's always improving and constantly maintain:
$ pip install --upgrade nltk
EDITED
In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
And if anyone wants to find nltk_data directory, see https://stackoverflow.com/a/36383314/610569
And to config nltk_data path, see https://stackoverflow.com/a/22987374/610569

From command line, after importing nltk, try
nltk.download('popular', halt_on_error=False)
After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.

a) in OSX either run
sudo /Applications/Python\ 3.6/Install\ Certificates.command
b) switch to admin user (the one you have set up with administrator privileges)
and type at command line:
/Applications/Python\ 3.6/Install\ Certificates.command
Notes:
"\" are necessary because they escape blank characters in file names.
This procedure worked if you have python 3.6 installed, otherwise
change it in order to match your install python version... for this
execute:
ls /Applications
and look at the python directory name you have there.

An easy(hard) way to get over this error is to do the process manually. Just go to the website https://www.nltk.org/nltk_data/ and download the required zip file and extract the contents.
In Windows, go to user/AppData/local/Programs/Python/Python(version)/lib and create a folder nltk_data. Then create the respective folder. As an example, for 'punkt' create the folder tokenizers and add the folder 'punkt' inside the extracted folder to it. This info is mostly given by the terminal itself.
Run your program. Cheers!
EDIT 1: Of course, downloading all files can be time-consuming, but it's the only option if the "urlopen error" persists.
EDIT 2 It is also mostly your router or network at fault that you are not able to download nltk files. Try changing your network and that should help.

I had this error:
Resource punkt not found. Please use the NLTK Downloader to obtain the resource: import nltk nltk.download('punkt')
When I tried to solve by writing:
import nltk
nltk.download()
my computer shut downs suddenly and anaconda also closed. When I try to open it always shows an error.
I solved the problem by writing:
import nltk
nltk.download('punkt')

Related

SSL: CERTIFICATE_VERIFY_FAILED in python while installing package [duplicate]

I get the following error when trying to install Punkt for nltk:
nltk.download('punkt')
[nltk_data] Error loading Punkt: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed
[nltk_data] (_ssl.c:590)>
False

TLDR: Here is a better solution: https://github.com/gunthercox/ChatterBot/issues/930#issuecomment-322111087
Note that when you run nltk.download(), a window will pop up and let you select which packages to download (Download is not automatically started right away).
To complement the accepted answer, the following is a complete list of directories that will be searched on Mac (not limited to the one mentioned in the accepted answer):
- '/Users/YOUR_USERNAME/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/Users/YOUR_USERNAME/YOUR_VIRTUAL_ENV_DIRECTORY/nltk_data'
- '/Users/YOUR_USERNAME/YOUR_VIRTUAL_ENV_DIRECTORY/share/nltk_data'
- '/Users/YOUR_USERNAME/YOUR_VIRTUAL_ENV_DIRECTORY/lib/nltk_data'
In case the link above dies, here is the solution pasted in its entirety:
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()
Run the above code in your favourite Python IDE or via the command line.

This works by disabling SSL check!
import nltk
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
nltk.download()

Run the Python interpreter and type the commands:
import nltk
nltk.download()
from here: http://www.nltk.org/data.html
if you get an SSL/Certificate error, run the following command
bash /Applications/Python 3.6/Install Certificates.command
from here: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

Search 'Install Certificates.command' in the finder and open it.
Then do the following steps in the terminal:
python3
import nltk
nltk.download()

The downloader script is broken. As a temporal workaround can manually download the punkt tokenizer from here and then place the unzipped folder in the corresponding location. The default folders for each OS are:
Windows: C:\nltk_data\tokenizers
OSX: /usr/local/share/nltk_data/tokenizers
Unix: /usr/share/nltk_data/tokenizers

This is how I solved it for MAC OS.
Initially after installing nltk, I was getting the SSL error.
Solution:
Goto
cd /Applications/Python\ 3.8
Run the command
./Install\ Certificates.command
Now if you try again, it should work!
Thanks a lot to this article!

You just need to Install the certificate doing this simple step
In the python application folder double-click on the file 'Certificates.command'
this will make a prompt window show in your screen and basically will automatically install the certificate for you, close this window and try again.

My solution is:
Download punkt.zip from here and unzip
Create nltk_data/tokenizers folders under home folder
Put punkt folder under tokenizers folder

There is a very simple way to fix all of this as written in the formal bug report for anyone else coming across this problem recently (e.g. 2019) and using MacOS. From the bug report at https://bugs.python.org/issue28150:
...there is a simple double-clickable or command-line-runnable script ("/Applications/Python 3.6/Install Certificates.command") that does two things: 1. uses pip to install certifi and 2. creates a symlink in the OpenSSL directory to certifi's installed bundle location.
Simply running the "Install Certificates.command" script worked for me on MacOS (10.15 beta as of this writing) and I was off and running.

My solution after nothing worked. I navigated, via the GUI to the Python 3.7 folder, opened the 'Certificates.command' file in terminal and the SSL issue was immediately resolved.

A bit late to the party but I just entered Certificates.command into Spotlight which found it and ran it. All fixed in seconds.
I'm running mac Catalina and using python 3.7 installed by Homebrew

It means that you are not using HTTPS to work consistently with other run time dependencies for Python etc.
If you are using Linux (Ubuntu)
~$ sudo apt-get install ca-certificates
Should solve the issue.
If you are using this in a script with a docker file, you have to make sure you have install the the ca-certificates modules in your docker file.

For mac users,
just copy paste the following in the terminal:
/Applications/Python\ 3.10/Install\ Certificates.command ; exit;

First go to the path /Applications/Python 3.6/ and run
Install Certificates.command
You will admin rights for the same.
If you are unable to download it, then as other answer suggest you can download directly and place it. You need to place them in the following directory structure.
> nltk_data
> corpora
> brown
> conll2000
> movie_reviews
> wordnet
> taggers
> averaged_perceptron_tagger
> tokenizers
> punkt

Updating the python certificates worked for me.
At the top of your script, keep:
import nltk
nltk.download('punkt')
In a separate terminal run (Mac):
bash /Applications/Python <version>/Install Certificates.command

For me, the solution was much simpler: I was still connected to my corporate network/VPN which blocks certain types of downloads. Switching the network made the SSL error disappear.

Biopython is successfully installed in anaconda 3 but the module fails to be imported

A variation of this problem was asked before but I'm forced to ask because the solutions given there didn't work for me.
I'm using Jupyter in anaconda 3. First I installed biopython using !pip install biopython. It was successfully installed but when I tied to `import Bio' it returned a ModuleNotFoundError.
Later I used conda install -c anaconda jupyter and conda install -c anaconda biopython in the anaconda prompt to install biopython. They were successfully installed but the same problem remains.
However, if I type `import bio' with a small b then the module error doesn't show up. However, I still can't call any function within the module. Here is an example of the error I'm facing. This is my very first post so I don't have enough reputation to post embed images. I will post external links here.
https://imgur.com/yydzI0y
So I checked if the directory had the folder name with uppercase or lowercase. It was uppercase, so I thought maybe it should be changed to lowercase and I changed. Still the same problem.
Here is my PATH and it seems to include the anaconda directory.
https://imgur.com/v7VeC1f
I really need to use biopython, so please help.

Bio is the correct reference to use.
Try this:
First ensure the name of the package folder is Bio as this is the correct name
dir C:\Users\Asus\anaconda3\Lib\site-packages\*io*
...
06/10/2020 01:12 PM <DIR> Bio
...
Then try this code:
import Bio
from Bio.Blast import NCBIWWW
f=open('smn.fasta').read()
result=NCBIWWW.qblast("blastn","nt",f)

AWS Lambda -- Unable to import srsly.ujson.ujson for SpaCy

I am trying to add SpaCy as a dependency to my Python Lambda. I am doing this by installing SpaCy as a standalone dependency inside a directory named dependencies using pip3 install spacy --no-deps -t . This is because I can't load the entire Spacy dependency inside the \tmp directory of my Lambda.
I am able to successfully upload the folder to s3 and download it during the Lambda invocation. When I try to import spacy, I get this error: [ERROR] Runtime.ImportModuleError: Unable to import module : No module named 'srsly.ujson.ujson'.
I manually installed srsly inside dependencies\ and I have all the files that are listed as per this link. This was referenced by this link. One of the responses says, "it seems like Python can't load it, because it's not compiled?". How would I compile a dependency which has a .c file in it?
One other question which I found on SO is this question, but I have already manually installed srsly. How to I import the module? Thanks.
I manually check in my code for the presence of ujson before importing spacy like this:
if os.path.exists('/tmp/dependencies/srsly/ujson/ujson.c'):
print('ujson exists')
and the print statement gets printed.

For me pip uninstalling and installing srsly again worked fine.. sometimes its just the compatibility issue with your python version so make sure correct python/srsly versions are present

Well it is a bit strange, but my solution for this problem was to create an aditional "ujson" folder in the srsly folder and then move all the ujson generated code to the folder "ujson" previously created

How to install a module on Python?

Okay, so, I'm actually a beginner in programming Python, and I only found out yesterday how you were supposed to encode pip install ModuleName in the Python command line and not in the interactive shell. I'm trying to download a lot of modules, such as the Send2Trash module, Pyperclip, Requests, Beautiful Soup, and Selenium.
Before I checked the forums about installing modules, I found out how we needed to have the pip tool. I'm a Windows user, but for some reason, I didn't have the 'Scripts' folder installed when I downloaded Python. I didn't know we needed it, so I used raw scripts from GitHub, setup.py, and copy pasted the script into the File Editor in Python, ran it in the interactive shell, and tried to import the module I needed. It worked for the Pyperclip and the Requests module; no errors popped up after I imported them using import pyperclip or import requests, but when I tried the same procedure for the rest of the modules I needed, there were some errors.
Also, when I tried to download the modules on pypi.python.org, I tried to open it using the interactive shell, but then something pops up, 'The file's encoding is invalid for Python3.x...', and when I click 'OK', it's going to say 'Failed to Decode', and close everything.
So, after reading forum after forum, I found out how to download pip, and was also able to download setuptools and wheel. I'm not sure if it's really already downloaded, but I was able to get the 'Scripts' folder that wasn't there before, so I guess so. I also already went into my PATH using the edit environment for your account thing, and I edited the Path variable so its value would lead to my 'Scripts' folder. Please do tell me if I did the right thing here.
So, following the advice of the forums, I tried to install the modules I needed by typing pip install ModuleName in the Python command line instead of the interactive shell, but it still gave me a Syntax Error. I also tried it in Command Prompt, typing the same code pip install ModuleName, but when I clicked Enter, nothing happens; no errors or anything. It seemed like my install was accepted, but when I tried importing the module in the interactive shell, it still gave an Import Error.
Please tell me what I did wrong throughout my process, and how to properly install the modules I need. I would include pictures into this, but it seems I can only add two before my reputation becomes 10, and I'm pretty new here, so... If there's anything I need to elaborate on about my problem, don't hesitate to ask, and I'll try my best.

You say you use windows so you need to understand pip.
pip is a program that installs python modules. You can even use easy_install instead of pip.
some pip commands
pip list -- lists out already installed modules.
pip search <module name> -- searches new modules.
pip -h -- more pip commands you want.
pip installs modules from CMD prompt not from python shell.
Even after installing modules some modules doesn't run as import module
they need to be imported as from module import function.
refer the pip help command and install modules.
DO NOT SAVE SCRIPT FILES IN PYTHON ROOT FOLDER YOU MAY FACE SOME PROBLEMS
Happy Programming!!!

After a whole lot of searching and trying out, I found the solution to my problem. For future Python users who encounter the same thing: always install your modules in the root folder.
In my case, my Command Prompt was automatically inside the C:\Users folder, which caused some problems because I couldn't download my module in there. Once I typed in cd C:\Python34, which was my root folder, I could successfully download the modules I needed using pip install ModuleName.

How to solve "Unresolved import: HTML" in Python Development?

I'm starting to learn about python development in a new project.
I got setup almost everything right, but only this import HTML that keeps given me some error that I don't know how to solve it.
import web
import json
from WebServer.forms import mainPageForm, addBugForm, addProblemForm, addProblemTypeForm, versionsDropdownForm,\
severitiesDropdownForm, problemTypesDropdownForm, problemsDropdownForm
import BugRecorderCore.controller as ctrl
import BugRecorderCore.validators as vdt
import datetime
import os
from BugRecorderCore.utils import concatenateString
import HTML
//...
I already tried to install HTML.py already, but still no success so far.
Any idea or advice about this issue ?
UPDATE
Following the suggestions from the answers below I got this message:

It looks like you are using anaconda, have you tried installing it the anaconda way?
conda install HTML
Also do you by any chance have 2 version of Python on your system?
If the package is unavailable you'll have to user pip. If you don't have pip, from your command line write:
python get-pip.py
pip install HTML

Looking the given screenshots and tags I suppose your are using Anaconda (which I have no experience but it is still Python, anyway) and the IDE is not resolving the import.
Make sure you have installed/updated HTML.py
conda install HTML
At your IDE go to Window > Preferences > Python Interpreter
At Libraries tab make sure you have the following folders added to your PYTHONPATH:
C:\Anaconda\Lib
C:\Anaconda\Lib\site-packages
C:\Anaconda\Scripts
That should do the trick.
Important: try to always install your libraries through conda (or pip when using Python directly). It will install things where it should be. ;)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

downloading error using nltk.download() - python

From command line, after importing nltk, try nltk.download('popular', halt_on_error=False) After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.

Related

SSL: CERTIFICATE_VERIFY_FAILED in python while installing package [duplicate]

Biopython is successfully installed in anaconda 3 but the module fails to be imported

AWS Lambda -- Unable to import srsly.ujson.ujson for SpaCy

How to install a module on Python?

How to solve "Unresolved import: HTML" in Python Development?

Categories

Resources