How to batch sentiment with PYABSA

How to batch sentiment with PYABSA - python

I followed this tutorial to do sentiment inference. I could run the code with a list of sentences (in the section Aspect Sentiment Inference of the Colab Notebook). However, I don't know how to modify the following code (in the section Batch Sentiment Inference) to infer sentiment for a file of my own (containing just 2 lines, each has two sentences).
# inference_sets = ABSADatasetList.Phone # original code
inference_sets = 'test.dat.apc' # this is my own file that I want to infer sentiment for each sentence
results = sent_classifier.batch_infer(target_file=inference_sets,
print_result=True,
save_result=True,
ignore_error=False,
)
Running the modified code caused the following error
RuntimeError Traceback (most recent call last)
Input In [56], in <cell line: 2>()
1 test = 'test.dat.apc'
----> 2 results = sent_classifier.batch_infer(target_file=test,
3 print_result=False,
4 save_result=True,
5 ignore_error=False,
6 )
File ~\Anaconda3\envs\spacy\lib\site-packages\pyabsa-1.16.15-py3.9.egg\pyabsa\core\apc\prediction\sentiment_classifier.py:197, in SentimentClassifier.batch_infer(self, target_file, print_result, save_result, ignore_error, clear_input_samples)
193 self.clear_input_samples()
195 save_path = os.path.join(os.getcwd(), 'apc_inference.result.json')
--> 197 target_file = detect_infer_dataset(target_file, task='apc')
198 if not target_file:
199 raise FileNotFoundError('Can not find inference datasets!')
File ~\Anaconda3\envs\spacy\lib\site-packages\pyabsa-1.16.15-py3.9.egg\pyabsa\functional\dataset\dataset_manager.py:302, in detect_infer_dataset(dataset_path, task)
300 if os.path.isdir(dataset_path.dataset_name):
301 print('No inference set found from: {}, unrecognized files: {}'.format(dataset_path, ', '.join(os.listdir(dataset_path.dataset_name))))
--> 302 raise RuntimeError(
303 'Fail to locate dataset: {}. If you are using your own dataset, you may need rename your dataset according to {}'.format(
304 dataset_path,
305 'https://github.com/yangheng95/ABSADatasets#important-rename-your-dataset-filename-before-use-it-in-pyabsa')
306 )
307 if len(dataset_path) > 1:
308 print(colored('Please DO NOT mix datasets with different sentiment labels for training & inference !', 'yellow'))
RuntimeError: Fail to locate dataset: ['test.dat.apc']. If you are using your own dataset, you may need rename your dataset according to https://github.com/yangheng95/ABSADatasets#important-rename-your-dataset-filename-before-use-it-in-pyabsa
What did I do wrong?

You need to rename your dataset file name, by ending with .inference

Related

How to link netcdf as shared libraries during conda installation of esmpy?

I installed esmpy using conda install -c conda-forge esmpy but am unable to get it to create a mesh from an existing Netcdf file using something like this:
mesh = ESMF.Mesh(filename=myfile.nc),filetype=ESMF.FileFormat.ESMFMESH)
My input file is the output from the CAM-SE global atmospheric model at ne120 resolution. This model returns unstructured output. I get an error message saying that Netcdf should be built as a shared library. I know the Netcdf libraries exist because I use xarray all the time to read and process them. But how does one link those libraries during the installation step for esmpy using conda? Do I need to build esmpy from source to be able to do this?
Update: Have added the full traceback below. I installed netcdf using conda-forge within the conda environment I was using and it appears that was not the source of the error as the error message remains unchanged. I am now wondering if my command for generating mesh is just going to work after directly feeding in a CAM-SE output file. The file does not really have any information about the number of elements, etc. Also, should I rename the dimensions to some common form expected by ESMF? How will ESMF know which dimension represents the number of nodes, etc.? Here is the list of dimensions from one of the output files (followed by the traceback):
dimensions:
time = UNLIMITED ; // (292 currently)
lev = 30 ;
ilev = 31 ;
ncol = 777602 ;
nbnd = 2 ;
chars = 8 ;
string1 = 1 ;
----------Traceback------------------------
ArgumentError Traceback (most recent call last)
Input In [13], in <cell line: 5>()
----> 5 grid = ESMF.Mesh(filename='B1850.ne120_t12.cam.h2.0338-01-01-21600.nc',
filetype=ESMF.FileFormat.ESMFMESH)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:81, in initialize.<locals>.new_func(*args, **kwargs)
78 from ESMF.api import esmpymanager
80 esmp = esmpymanager.Manager(debug = False)
---> 81 return func(*args, **kwargs)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/api/mesh.py:198, in Mesh.__init__(self, parametric_dim, spatial_dim, coord_sys, filename, filetype, convert_to_dual, add_user_area, meshname, mask_flag, varname)
195 self._coord_sys = coord_sys
196 else:
197 # call into ctypes layer
--> 198 self._struct = ESMP_MeshCreateFromFile(filename, filetype,
199 convert_to_dual,
200 add_user_area, meshname,
201 mask_flag, varname)
202 # get the sizes
203 self._size[node] = ESMP_MeshGetLocalNodeCount(self)
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/util/decorators.py:93, in netcdf.<locals>.new_func(*args, **kwargs)
90 from ESMF.api.constants import _ESMF_NETCDF
92 if _ESMF_NETCDF:
---> 93 return func(*args, **kwargs)
94 else:
95 raise NetCDFMissing("This function requires ESMF to have been built with NetCDF.")
File /software/conda/envs/dask_Jul23_2022/lib/python3.10/site-packages/ESMF/interface/cbindings.py:1218, in ESMP_MeshCreateFromFile(filename, fileTypeFlag, convertToDual, addUserArea, meshname, maskFlag, varname)
1197 """
1198 Preconditions: ESMP has been initialized.\n
1199 Postconditions: An ESMP_Mesh has been created.\n
(...)
1215 string (optional) :: varname\n
1216 """
1217 lrc = ct.c_int(0)
-> 1218 mesh = _ESMF.ESMC_MeshCreateFromFile(filename, fileTypeFlag,
1219 convertToDual, addUserArea,
1220 meshname, maskFlag, varname,
1221 ct.byref(lrc))
1222 rc = lrc.value
1223 if rc != constants._ESMP_SUCCESS:
ArgumentError: argument 1: <class 'AttributeError'>: 'list' object has no attribute 'encode'

ValueError: np.nan is an invalid document, expected byte or unicode string in TfidfVectorizer

I was just trying to run a fake news detection program .
This is my code ( only error part )
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)`
and getting error as
ValueError Traceback (most recent call last)
<ipython-input-19-bd6e732b0b7b> in <module>()
3
4 #DataFlair - Fit and transform train set, transform test set
----> 5 tfidf_train=tfidf_vectorizer.fit_transform(x_train)
6 tfidf_test=tfidf_vectorizer.transform(x_test)
4 frames
/usr/local/lib/python3.7/dist-packages/sklearn/feature_extraction/text.py in decode(self, doc)
225 if doc is np.nan:
226 raise ValueError(
--> 227 "np.nan is an invalid document, expected byte or unicode string."
228 )
229
ValueError: np.nan is an invalid document, expected byte or unicode string.

How to fix 'KeyError: dtype('float32')' in LDAviz

I use LDAvis library to visualize my LDA topics. It works fine before, but it gets me this error when I download the saved model files from Sagemaker to the local computer. I don't know why does this happen? Does that relate to Sagemaker?
If I run from the local, and saved the model from local, and then run LDAviz library, it works fine.
KeyError Traceback (most recent call last)
in ()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
116 See pyLDAvis.prepare for **kwargs.
117 """
--> 118 opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
119 return vis_prepare(**opts)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyLDAvis\gensim.py in _extract_data(topic_model, corpus, dictionary, doc_topic_dists)
46 gamma = topic_model.inference(corpus)
47 else:
---> 48 gamma, _ = topic_model.inference(corpus)
49 doc_topic_dists = gamma / gamma.sum(axis=1)[:, None]
50 else:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\ldamodel.py in inference(self, chunk, collect_sstats)
665 # phinorm is the normalizer.
666 # TODO treat zeros explicitly, instead of adding epsilon?
--> 667 eps = DTYPE_TO_EPS[self.dtype]
668 phinorm = np.dot(expElogthetad, expElogbetad) + eps
669
KeyError: dtype('float32')

I know this is late but I just fixed a similar problem by updating my gensim library from 3.4 to the current version which for me is 3.8.

How to derive weights for bucketized_column in tf.estimator.LinearRegressor in tensorflow?

I am studying Google Crash ML cause.
I have trouble in chapter “Feature Cross”.
https://developers.google.com/machine-learning/crash-course/feature-crosses/programming-exercise
I tried to get the weight of cross feature from linear_regressor.
# here I change _ to linear_model
linear_model = train_model(
learning_rate=1.0,
steps=500,
batch_size=100,
feature_columns=construct_feature_columns(),
training_examples=training_examples,
training_targets=training_targets,
validation_examples=validation_examples,
validation_targets=validation_targets)
Weight_bucketized_longitude= linear_model.get_variable_value('linear/linear_model/bucketized_longitude/weights')
print(Weight_bucketized_longitude)
However, I got error message as below:
Error Message:
NotFoundError: Key linear/linear_model/bucketized_longitude/weights
not found in checkpoint
It looks like the path is wrong.
The path works for numeric_column, but it doesn’t for bucketized_column.
Could you help to indicate the correct path?
Thanks.
#
I tried Geeocode's method.
However, I still got error message.
Weight_bucketized_longitude= linear_model.get_variable_value('linear/linear_model/bucketized_longitude/weights')
AttributeErrorTraceback (most recent call last)
in ()
----> 1 Weight_bucketized_longitude= >linear_model.get_variable_value(["linear", "linear_model", >"bucketized_longitude", "weights"])
/usr/local/lib/python2.7/dist->packages/tensorflow/python/estimator/estimator.pyc in >get_variable_value(self, name)
252 _check_checkpoint_available(self.model_dir)
253 with context.graph_mode():
--> 254 return training.load_variable(self.model_dir, name)
255
256 def get_variable_names(self):
/usr/local/lib/python2.7/dist->packages/tensorflow/python/training/checkpoint_utils.pyc in >load_variable(ckpt_dir_or_file, name)
77 """
78 # TODO(b/29227106): Fix this in the right place and remove >this.
---> 79 if name.endswith(":0"):
80 name = name[:-2]
81 reader = load_checkpoint(ckpt_dir_or_file)
AttributeError: 'list' object has no attribute 'endswith'

The problem is that linear_model.get_variable_value() have to pass a list of string with variables' name. From the documentation:
get_variable_value
get_variable_value(name)
Returns value of the variable given by name.
Args: name: string or a list of string, name of the tensor. Returns:
Numpy array - value of the tensor.
Raises: ValueError: If the Estimator has not produced a checkpoint
yet.
Thus your code should changes as follow:
Weight_bucketized_longitude= linear_model.get_variable_value(["linear", "linear_model", "bucketized_longitude", "weights"])

Error in reading and writing files in Python

I am trying to convert files from one format to other in Python. The current format is DAQ (data acquisition format), which is read in first. Then I use undaq Tools module to write the files to hdf5 format.
import glob
ctnames = glob.glob('*.daq')
Following are the few filenames (there are 100 in total):
ctnames
['Cars_20160601_01.daq',
'Cars_20160601_02.daq',
'Cars_20160601_03.daq',
'Cars_20160601_04.daq',
'Cars_20160601_05.daq',
'Cars_20160601_06.daq',
'Cars_20160601_07.daq',
.
.
.
## Importing undaq tools:
from undaqTools import Daq
Reading the DAQ files and writing to hdf5:
for n in ctnames:
x = daq.read(n)
daq.write_hd5(x)
Following is the error I got:
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.py:405: RuntimeWarning: Failed loading file on frame 46970. (stopped reading file)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-6fe7a8c9496d> in <module>()
1 for n in ctnames:
----> 2 x = daq.read(n)
3 daq.write_hd5(x)
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.pyc in read_daq(self, filename, elemlist, loaddata, process_dynobjs, interpolate_missing_frames)
272
273 if loaddata:
--> 274 self._loaddata()
275 self._unwrap_lane_deviation()
276
C:\Anaconda3\envs\py27\lib\site-packages\undaqtools-0.2.3-py2.7.egg\undaqTools\daq.pyc in _loaddata(self)
449 assert tmpdata[name].shape[0] == frame.frame.shape[0]
450 else:
--> 451 assert tmpdata[name].shape[1] == frame.frame.shape[0]
452
453 # cast as Element objects
AssertionError:
Questions
I have 2 questions:
1. How do I know which of the 100 files is throwing the error?
2. How do I skip the files if they throw the error?

Wrap the read() call in a try/except block. If you get an exception, print the current filename and skip to the next one.
for n in ctnames:
try:
x = daq.read(n)
except AssertionError:
print 'Could not process file %s. Skipping.' % n
continue
daq.write_hd5(x)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to batch sentiment with PYABSA - python

You need to rename your dataset file name, by ending with .inference

Related

How to link netcdf as shared libraries during conda installation of esmpy?

ValueError: np.nan is an invalid document, expected byte or unicode string in TfidfVectorizer

How to fix 'KeyError: dtype('float32')' in LDAviz

How to derive weights for bucketized_column in tf.estimator.LinearRegressor in tensorflow?

Error in reading and writing files in Python

Categories

Resources