Load StanfordNLP model Locally

Load StanfordNLP model Locally - python

I'm trying to load the English model for StanfordNLP (python) from my local machine, but am unable to find the proper import statements to do so. What commands can be used? Is there a pip installation available to load the english model?
I have tried using the download command to do so, however my machine requires all files to be added locally. I downloaded the english jar files from https://stanfordnlp.github.io/CoreNLP/ but am unsure if I need both the English and the English KBP version.

directory set for model download is /home/sf
pip install stanfordnlp # install stanfordnlp
import stanfordnlp
stanfordnlp.download("en") # here after 'Y' one set custom directory path
local_dir_store_model = "/home/sf"
english_model_dir = "/home/sf/en_ewt_models"
tokienizer_en_pt_file = "/home/sf/en_ewt_models/en_ewt_tokenizer.pt"
nlp = stanfordnlp.Pipeline(models_dir=local_dir_store_model,processors = 'tokenize,mwt,lemma,pos')
doc = nlp("""One of the most wonderful things in life is to wake up and enjoy a cuddle with somebody; unless you are in prison"""")
doc.sentences[0].print_tokens()

I am unclear what you want to do.
If you want to run the all-Python pipeline, you can download the files and run them in Python code by specifying the paths for each annotator as in this example.
import stanfordnlp
config = {
'processors': 'tokenize,mwt,pos,lemma,depparse', # Comma-separated list of processors to use
'lang': 'fr', # Language code for the language to build the Pipeline in
'tokenize_model_path': './fr_gsd_models/fr_gsd_tokenizer.pt', # Processor-specific arguments are set with keys "{processor_name}_{argument_name}"
'mwt_model_path': './fr_gsd_models/fr_gsd_mwt_expander.pt',
'pos_model_path': './fr_gsd_models/fr_gsd_tagger.pt',
'pos_pretrain_path': './fr_gsd_models/fr_gsd.pretrain.pt',
'lemma_model_path': './fr_gsd_models/fr_gsd_lemmatizer.pt',
'depparse_model_path': './fr_gsd_models/fr_gsd_parser.pt',
'depparse_pretrain_path': './fr_gsd_models/fr_gsd.pretrain.pt'
}
nlp = stanfordnlp.Pipeline(**config) # Initialize the pipeline using a configuration dict
doc = nlp("Van Gogh grandit au sein d'une famille de l'ancienne bourgeoisie.") # Run the pipeline on input text
doc.sentences[0].print_tokens()
If you want to run the Java server with the Python interface, you need to download the Java jar files and start the server. Full info here: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html
Then you can access the server with the Python interface. Full info here: https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html
But just to be clear, the jar files should not be used with the pure Python pipeline. Those are for running the Java server.

Related

Error with NLTK package and other dependencies

I have installed the NLTK package and other dependencies and set the environment variables as follows:
STANFORD_MODELS=/mnt/d/stanford-ner/stanford-ner-2018-10-16/classifiers/english.all.3class.distsim.crf.ser.gz:/mnt/d/stanford-ner/stanford-ner-2018-10-16/classifiers/english.muc.7class.distsim.crf.ser.gz:/mnt/d/stanford-ner/stanford-ner-2018-10-16/classifiers/english.conll.4class.distsim.crf.ser.gz
CLASSPATH=/mnt/d/stanford-ner/stanford-ner-2018-10-16/stanford-ner.jar
When I try to access the classifier like below:
stanford_classifier = os.environ.get('STANFORD_MODELS').split(':')[0]
stanford_ner_path = os.environ.get('CLASSPATH').split(':')[0]
st = StanfordNERTagger(stanford_classifier, stanford_ner_path, encoding='utf-8')
I get the following error. But I don't understand what is causing this error.
Error: Could not find or load main class edu.stanford.nlp.ie.crf.CRFClassifier
OSError: Java command failed : ['/mnt/c/Program Files (x86)/Common
Files/Oracle/Java/javapath_target_1133041234/java.exe', '-mx1000m', '-cp', '/mnt/d/stanford-ner/stanford-ner-2018-10-16/stanford-ner.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', '/mnt/d/stanford-ner/stanford-ner-2018-10-16/classifiers/english.all.3class.distsim.crf.ser.gz', '-textFile', '/tmp/tmpaiqclf_d', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']

I found the answer for this issue. I am using NLTK == 3.4. From NLTK ==3.3 and above Stanford NLP (POS, NER , tokenizer) are not loaded as part of nltk.tag but from nltk.parse.corenlp.CoreNLPParser. The stackoverflow answer is available in stackoverflow.com/questions/13883277/stanford-parser-and-nltk/… and the github link for official documentation is github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK.
Additional information if you are facing timeout issue from the NER tagger or any other parser of coreNLP API, please increase the timeout limit as stated in https://github.com/nltk/nltk/wiki/Stanford-CoreNLP-API-in-NLTK/_compare/3d64e56bede5e6d93502360f2fcd286b633cbdb9...f33be8b06094dae21f1437a6cb634f86ad7d83f7 by dimazest.

Where to put utils.py

Dear all I am running the first Azure tutorial for the MNIST dataset.
It says that the utils.py should be in the same folder as the code. I tried to install python-utils in myconda-environment, but that did not solve the problem. After using pip install utils I rather made it worse :-(
It is probably simple but I am stuck.
How would you do that on the notebook running on:
locally
in a Azure notebook
I use Anaconda with a separate environment running the Azure SDK and python 3.6.

According to your description, I think the first Azure tutorial for the MNIST dataset is Tutorial: Train an image classification model with Azure Machine Learning service.
You can find all of source codes via the link inside the tutorial as below at here.
Get the notebook
For your convenience, this tutorial is available as a Jupyter notebook. Run the tutorials/img-classification-part1-training.ipynb notebook either in Azure Notebooks or in your own Jupyter notebook server.
Here is the source code of utils.py.
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import gzip
import numpy as np
import struct
# load compressed MNIST gz files and return numpy arrays
def load_data(filename, label=False):
with gzip.open(filename) as gz:
struct.unpack('I', gz.read(4))
n_items = struct.unpack('>I', gz.read(4))
if not label:
n_rows = struct.unpack('>I', gz.read(4))[0]
n_cols = struct.unpack('>I', gz.read(4))[0]
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
res = res.reshape(n_items[0], n_rows * n_cols)
else:
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
res = res.reshape(n_items[0], 1)
return res
# one-hot encode a 1-D array
def one_hot_encode(array, num_of_classes):
return np.eye(num_of_classes)[array.reshape(-1)]
If you want to import it in an Azure Jupyter Notebook, please see my steps below.
Move into your project page, and click the New buttom and select the Blank File.
Then name the file utils.py and press Enter key.
Select the file and click Edit File.
Copy and paste the content of utils.py from the tutorial Github repo, and click Save File.
Create a Notebook to test import utils, it works.
So # make sure utils.py is in the same directory as this code means as the figure below.

Download pubmed papers based on DOI from sci-hub API in python3.5

In python 3.5, for downloading some papers from pubmed based on their DOI I've been used this link on github https://github.com/antiufo/scihub.py
first of all I installed all packages and the I copied this class https://github.com/antiufo/scihub.py/blob/master/scihub/scihub.py into my project after that in a new project besides scihub.py I've been created an object from SciHub class for downloading and fetching paper by it's DOI such as below :
In this link : https://www.ncbi.nlm.nih.gov/pubmed/28440475 DOI is : 10.3892/or.2017.5600 that I want to download this paper.
from scihub import SciHub
sh = SciHub()
result = sh.fetch(identifier='10.3892/or.2017.5600')
print(result)
result = sh.download(identifier='10.3892/or.2017.5600', destination='D:\me',path='myPdfFile.pdf')
but nothing happened.
How can I solve this issue?

I've tried directly on terminal, using the command
python2 scihub.py -d https://doi.org/10.1103/PhysRevE.76.036111
In this case I have to put https://doi.org/ and after the DOI, it's working perfectly in this case. If you use the SciHub API like a package you could try use the identifier with https://doi.org/ .
I hope to help you.

Where does nlpnet get it's metadata pickle file from?

I have installed nlpnet (http://nilc.icmc.usp.br/nlpnet/), but I can't locate the metadata_pos.pickle file it needs to run a part of speech tagger. THis file does not appear to be on my machine, and is not included in the current github repository.
Any suggestions?

You need to download nlpnet-data(models for PoS, SRL and Dependency). It is available on http://nilc.icmc.usp.br/nlpnet/models.html . PoS tag model file Metadata_pos.pickle is available in http://nilc.icmc.usp.br/nlpnet/data/pos-pt.tgz

You need to download the models from this page http://nilc.icmc.usp.br/nlpnet/models.html (either POS or SRL)
decompress the file in some folder, let's say '/Users/Downloads', then import in your code like that:
import nlpnet
nlpnet.set_data_dir('/Users/Downloads/pos-pt')
# Now you can start using it
tagger = nlpnet.POSTagger()
op = tagger.tag('texto em portugues')

To train the model, you'll need examples with one sentence per line, having tokens and tags concatenated by an underscore character:
This_DT is_VBZ an_DT example_NN
Using this command with your corpus, you'll generate the data needed to use the POS tagger (including metadata_pos.pickle):
nlpnet-train.py pos --gold /path/to/training-data.txt
If you want to use an already trained model, they have one here. It was trained/evaluated with Mac-Morpho Corpus, a brazilian-portuguese news corpus so probably it won't work with other languages.

Trying to create a bake substance script in maya pymel

I have been trying to figure out how to script my own bake script for substance materials to files in maya, or find some documentation somewhere that gives me the commands and the format it should be used in. Has anyone made a script using the substance commands that I could look at fro reference? All I have found is a list of these commands that I found in the substance plugin information:
sbs_IsSubstanceRelocalized()
sbs_SetBakeFormat()
sbs_GetGlobalTextureHeight()
sbs_GetEditionModeScale()
sbs_GetChannelsNamesFromSubstanceNode()
sbs_AffectTheseAttributes()
sbs_GetSubstanceBuildVersion()
sbs_SetEditionModeScale()
sbs_GetBakeFormat()
sbs_GetEngine()
sbs_GetGlobalTextureWidth()
sbs_GoToMarketPlace()
sbs_GetGraphsNamesFromSubstanceNode()
sbs_GetAllInputsFromSubstanceNode()
sbs_AffectedByAllInputs()
sbs_EditSubstance()
sbs_GetPackageFullPathNameFromSubstanceNode()
sbs_SetGlobalTextureWidth()
sbs_SetEngine()
sbs_SetGlobalTextureHeight()
Please help!

Here is a short script I use to load a substance in mel script
// load plugin should be "substance" on windows
loadPlugin "libSubstance";
// create sbs node
string $sbsnode = `shadingNode -asTexture substance`;
// load sbsar with absolute path
setAttr($sbsnode+".package") -type "string" "/usr/autodesk/maya/substance/substances/Aircraft_Metal.sbsar";

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Load StanfordNLP model Locally - python

Related

Error with NLTK package and other dependencies

Where to put utils.py

Download pubmed papers based on DOI from sci-hub API in python3.5

Where does nlpnet get it's metadata pickle file from?

Trying to create a bake substance script in maya pymel

Categories

Resources