Unable to serialize JSON serialize dictionary to file in Python - python

Apologize pretty new to python and I'm not 100% sure why this is failing since all example code I see is really similar.
import io
import json
import argparse
from object_detection.helpers import average_bbox
ap = argparse.ArgumentParser()
ap.add_argument("-o","--output",required=True,help="Output file name.")
ap.add_argument("-c","--class",help="Object class name")
ap.add_argument("-a","--annotations",required=True,help="File path annotations are located in")
args = vars(ap.parse_args())
(avgW,avgH) = average_bbox(args["annotations"])
if args["class"] is None:
name = args["annotations"].split("/")[-1]
else:
name = args["class"]
with io.open(args["output"],'w') as f:
o = {}
o["class"] = name
o["avgWidth"] = avgW
o["avgHeight"] = avgH
f.write(json.dumps(o,f))
name, avgW and avgH are all valid values. avgW and avgH are numbers and name is a string. The output seems like a valid path to create a file.
Error I get is
Traceback (most recent call last):
File "compute_average_bbox.py", line 19, in <module>
with io.open(argparse["output"],'w') as f:
TypeError: 'module' object has no attribute '__getitem__'
Any help would be appreciated.

Related

Python: create jsonpickle from a class and unpack, Error AttributeError: type object ' ' has no attribute 'decode'

So, I have class that I use in a Flask app. I use this class in multiple pages, which is why I would like to save the creates class object in a pickle, and unpack it when I need it again. It just keeps on giving me errors.. I have a class that looks similar to this:
class files(name):
def __init__(self, name):
self.name = name
self.settings = Settings()
self.files_directory = self.settings.files_directory
self.files = self.create_list()
def store_files_from_folder(self):
loaded_files = []
files = list_files()
for file in files:
file_path = os.path.join(self.files_directory, file)
print('Loading file: {}'.format(file))
loaded_file = function_reads_in_files_from_folder(file_path, self.name)
loaded_files.append(loaded_file)
print('Loaded {} files'.format(len(loaded_files)))
and I'm trying to create the jsonpickle like this:
creates_class = files("Mario")
jsonpickle_test = jsonpickle.encode(creates_class, unpicklable=False)
result = jsonpickle.decode(jsonpickle_test, files)
But I get the following error:
Traceback (most recent call last):
File "C:\Users\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-8-23e9b5d176ac>", line 1, in <module>
result = jsonpickle.decode(jsonpickle_test, files)
File "C:\Users\lib\site-packages\jsonpickle\unpickler.py", line 41, in decode
data = backend.decode(string)
AttributeError: type object 'files' has no attribute 'decode'
And I can't get to resolve it. Could someone help me?
The problem is in the passed argument unpickable=False
unpicklable – If set to False then the output will not contain the information necessary to turn the JSON data back into Python objects, but a simpler JSON stream is produced.
You can avoid unpickable=False or load the produced data with json.loads to a dict and then use de kwargs arguments for the object creation
creates_class = files("Mario")
jsonpickle_test = jsonpickle.encode(creates_class, unpicklable=False)
result_dict = json.loads(jsonpickle_test)
create_class = files(**result_dict)

Python OpenCV LoadDatasetList, what goes into last two parameters?

I am currently trying to train a dataset using OpenCV 4.2.2, I scoured the web but there are only examples for 2 params. OpenCV 4.2.2 loadDatasetList requires 4 parameters but there have been shortcomings which I did my best to overcome with the following. I tried with an array at first but loadDatasetList complained that the array was not iterable, I then proceeded to the code below with no luck. Any help is appreciated thank you for your time, and hope everyone is being safe and well.
The prior error passing in an array without iter()
PS E:\MTCNN> python kazemi-train.py
No valid input file was given, please check the given filename.
Traceback (most recent call last):
File "kazemi-train.py", line 35, in
status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations, imageFiles, annotationFiles)
TypeError: cannot unpack non-iterable bool object
The current error is:
PS E:\MTCNN> python kazemi-train.py
Traceback (most recent call last):
File "kazemi-train.py", line 35, in
status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations, iter(imageFiles), iter(annotationFiles))
SystemError: returned NULL without setting an error
import os
import time
import cv2
import numpy as np
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Training of kazemi facial landmark algorithm.')
parser.add_argument('--face_cascade', type=str, help="Path to the cascade model file for the face detector",
default=os.path.join(os.path.dirname(os.path.realpath(__file__)),'models','haarcascade_frontalface_alt2.xml'))
parser.add_argument('--kazemi_model', type=str, help="Path to save the kazemi trained model file",
default=os.path.join(os.path.dirname(os.path.realpath(__file__)),'models','face_landmark_model.dat'))
parser.add_argument('--kazemi_config', type=str, help="Path to the config file for training",
default=os.path.join(os.path.dirname(os.path.realpath(__file__)),'models','config.xml'))
parser.add_argument('--training_images', type=str, help="Path of a text file contains the list of paths to all training images",
default=os.path.join(os.path.dirname(os.path.realpath(__file__)),'train','images_train.txt'))
parser.add_argument('--training_annotations', type=str, help="Path of a text file contains the list of paths to all training annotation files",
default=os.path.join(os.path.dirname(os.path.realpath(__file__)),'train','points_train.txt'))
parser.add_argument('--verbose', action='store_true')
args = parser.parse_args()
start = time.time()
facemark = cv2.face.createFacemarkKazemi()
if args.verbose:
print("Creating the facemark took {} seconds".format(time.time()-start))
start = time.time()
imageFiles = []
annotationFiles = []
for file in os.listdir("./AppendInfo"):
if file.endswith(".jpg"):
imageFiles.append(file)
if file.endswith(".txt"):
annotationFiles.append(file)
status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations, iter(imageFiles), iter(annotationFiles))
assert(status == True)
if args.verbose:
print("Loading the dataset took {} seconds".format(time.time()-start))
scale = np.array([460.0, 460.0])
facemark.setParams(args.face_cascade,args.kazemi_model,args.kazemi_config,scale)
for i in range(len(images_train)):
start = time.time()
img = cv2.imread(images_train[i])
if args.verbose:
print("Loading the image took {} seconds".format(time.time()-start))
start = time.time()
status, facial_points = cv2.face.loadFacePoints(landmarks_train[i])
assert(status == True)
if args.verbose:
print("Loading the facepoints took {} seconds".format(time.time()-start))
start = time.time()
facemark.addTrainingSample(img,facial_points)
assert(status == True)
if args.verbose:
print("Adding the training sample took {} seconds".format(time.time()-start))
start = time.time()
facemark.training()
if args.verbose:
print("Training took {} seconds".format(time.time()-start))
If I only use 2 parameters this error is raised
File "kazemi-train.py", line 37, in status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations) TypeError: loadDatasetList() missing required argument 'images' (pos 3)
If I try to use 3 parameters this error is raised
Traceback (most recent call last):
File "kazemi-train.py", line 37, in
status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations, iter(imagePaths))
TypeError: loadDatasetList() missing required argument 'annotations' (pos 4)
Documentation on loadDatasetList
The figure you provided refers to the C++ API of loadDatasetList(), whose parameters usually cannot be mapped to that of Python API in many cases. One reason is that a Python function can return multiple values while C++ cannot. In the C++ API, the 3rd and 4th parameters are provided to store the output of the function. They store the paths of the images after reading from the text file at imageList, and the paths of the annotations by reading another text file at annotationList respectively.
Going back to your question, I cannot find any reference for that function in Python. And I believe the API is changed in OpenCV 4. After multiple trials, I am sure cv2.face.loadDatasetList returns only one Boolean value, rather than a tuple. That's why you encountered the first error TypeError: cannot unpack non-iterable bool object even though you filled in four parameters.
There is no doubt that cv2.face.loadDatasetList should produce two lists of file paths. Therefore, the code for the first part should look something like this:
images_train = []
landmarks_train = []
status = cv2.face.loadDatasetList(args.training_images, args.training_annotations, images_train, landmarks_train)
I expect images_train and landmarks_train should contain the file paths of the images and landmark annotations but it does not work as expected.
After understanding the whole program, I wrote a new function my_loadDatasetList to replace the (broken) cv2.face.loadDatasetList.
def my_loadDatasetList(text_file_images, text_file_annotations):
status = False
image_paths, annotation_paths = [], []
with open(text_file_images, "r") as a_file:
for line in a_file:
line = line.strip()
if line != "":
image_paths.append(line)
with open(text_file_annotations, "r") as a_file:
for line in a_file:
line = line.strip()
if line != "":
annotation_paths.append(line)
status = len(image_paths) == len(annotation_paths)
return status, image_paths, annotation_paths
You can now replace
status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations, iter(imageFiles), iter(annotationFiles))
by
status, images_train, landmarks_train = my_loadDatasetList(args.training_images, args.training_annotations)
I have tested that images_train and landmarks_train can be loaded by cv2.imread and cv2.face.loadFacePoints respectively using the data from here.
From the documentation, I can see that the line cv2.face.loadDatasetList returns only a boolean value, secondly remove iter from the parameter. The function loadDatasetList accepts a list as the 3rd and the 4th parameter.
so please make these changes in your code:
From:
status, images_train, landmarks_train = cv2.face.loadDatasetList(args.training_images,args.training_annotations, iter(imageFiles), iter(annotationFiles))
To:
status = cv2.face.loadDatasetList(args.training_images,args.training_annotations, imageFiles, annotationFiles)

How to load a .json file with python nltk

I'm trying to load a .json file from an output of an application so I can feed it into different machine learning algorithms so I can classify the text, problem is I can't seem to figure out why NLTK is not loading my .json file, even if I try it with their own .json file, it doesn't seem to work. From what I gather based on the book, I should only need to import 'nltk' and I can use the function 'load' from 'nltk.data'. Can somebody help me realise what I am doing wrong?
Below is the code I used to try loading my the file from nltk.
import nltk
nltk.data.load('corpora/twitter_samples/negative_tweets.json')
After trying that out I got an error from it.
C:\Python34\python.exe "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py"
Traceback (most recent call last):
File "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py", line 7, in <module>
nltk.data.load('corpora/twitter_samples/negative_tweets.json')
File "C:\Python34\lib\site-packages\nltk\data.py", line 810, in load
resource_val = json.load(opened_resource)
File "C:\Python34\lib\json\__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Python34\lib\json\__init__.py", line 312, in loads
s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'
Process finished with exit code 1
EDIT #1 : I'm using Python 3.4.1 and NLTK 3.
EDIT #2 : Below is another try I did but now using json.load()
import json
json.load('corpora/twitter_samples/negative_tweets.json')
But I encountered a similar error
C:\Python34\python.exe "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py"
Traceback (most recent call last):
File "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py", line 5, in <module>
json.load('corpora/twitter_samples/quotefileNeg.json')
File "C:\Python34\lib\json\__init__.py", line 265, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
Process finished with exit code 1
If you want to access a new corpus with a specific format, you can extend the NLTK CorpusReader class as follow
from nltk.corpus.reader.api import CorpusReader
from nltk.corpus.reader.util import StreamBackedCorpusView, concat, ZipFilePathPointer
class StoryCorpusReader(CorpusReader):
corpus_view = StreamBackedCorpusView
def __init__(self, word_tokenizer=StoryTokenizer(), encoding="utf8"):
CorpusReader.__init__(
self, <folder_path>, <file_name>, encoding
)
for path in self.abspaths(self._fileids):
if isinstance(path, ZipFilePathPointer):
pass
elif os.path.getsize(path) == 0:
raise ValueError(f"File {path} is empty")
self._word_tokenizer = word_tokenizer
def docs(self, fileids=None):
return concat(
[
self.corpus_view(path, self._read_stories, encoding=enc)
for (path, enc, fileid) in self.abspaths(fileids, True, True)
]
)
def titles(self):
titles = self.docs()
standards_list = []
for jsono in titles:
text = jsono["title"]
if isinstance(text, bytes):
text = text.decode(self.encoding)
standards_list.append(text)
return standards_list
def _read_stories(self, stream):
stories = []
for i in range(10):
line = stream.readline()
if not line:
return stories
story = json.loads(line)
stories.append(story)
return stories
with a specific Tokenizer
from nltk.tokenize.api import TokenizerI
from nltk.tokenize.casual import _replace_html_entities
import typing
import re
REGEXPS = (
# HTML tags:
r"""<[^<>]+>""",
# email addresses
r"""[\w.+-]+#[\w-]+\.(?:[\w-]\.?)+[\w-]""")
class StoryTokenizer(TokenizerI):
_WORD_RE = None
def tokenize(self, text: str) -> typing.List[str]:
# Fix HTML character entities:
safe_text = _replace_html_entities(text)
# Tokenize
words = self.WORD_RE.findall(safe_text)
# Remove punctuation
words = [
word
for word in words
if re.match(f"[{re.escape(string.punctuation)}——–’‘“”×]", word.casefold())
== None
]
return words
#property
def WORD_RE(self) -> "re.Pattern":
# Compiles the regex for this and all future instantiations of TweetTokenizer.
if not type(self)._WORD_RE:
type(self)._WORD_RE = re.compile(
f"({'|'.join(REGEXPS)})",
re.VERBOSE | re.I | re.UNICODE,
)
return type(self)._WORD_RE

Parsing large XML file using 'xmltodict' module results in OverflowError

I have a fairly large XML File of about 3GB size that I am wanting to parse in streaming mode using 'xmltodict' utility. The code I have iterates through each item and forms a dictionary item and appends to the dictionary in memory, eventually to be dumped as json in a file.
I have the following working perfectly on a small xml data set:
import xmltodict, json
import io
output = []
def handle(path, item):
#do stuff
return
doc_file = open("affiliate_partner_feeds.xml","r")
doc = doc_file.read()
xmltodict.parse(doc, item_depth=2, item_callback=handle)
f = open('jbtest.json', 'w')
json.dump(output,f)
On a large file, I get the following:
Traceback (most recent call last):
File "jbparser.py", line 125, in <module>
**xmltodict.parse(doc, item_depth=2, item_callback=handle)**
File "/usr/lib/python2.7/site-packages/xmltodict.py", line 248, in parse
parser.Parse(xml_input, True)
OverflowError: size does not fit in an int
The exact location of exception inside xmltodict.py is:
def parse(xml_input, encoding=None, expat=expat, process_namespaces=False,
namespace_separator=':', **kwargs):
handler = _DictSAXHandler(namespace_separator=namespace_separator,
**kwargs)
if isinstance(xml_input, _unicode):
if not encoding:
encoding = 'utf-8'
xml_input = xml_input.encode(encoding)
if not process_namespaces:
namespace_separator = None
parser = expat.ParserCreate(
encoding,
namespace_separator
)
try:
parser.ordered_attributes = True
except AttributeError:
# Jython's expat does not support ordered_attributes
pass
parser.StartElementHandler = handler.startElement
parser.EndElementHandler = handler.endElement
parser.CharacterDataHandler = handler.characters
parser.buffer_text = True
try:
parser.ParseFile(xml_input)
except (TypeError, AttributeError):
**parser.Parse(xml_input, True)**
return handler.item
Any way to get around this? AFAIK, the xmlparser object is not exposed for me to play around and change 'int' to 'long'. More importantly, what is really going on here?
Would really appreciate any leads on this. Thanks!
Try to use marshal.load(file) or marshal.load(sys.stdin) in order to unserialize the file (or to use it as a stream) instead of reading the whole file into memory and then parse it as a whole.
Here is an example:
>>> def handle_artist(_, artist):
... print artist['name']
... return True
>>>
>>> xmltodict.parse(GzipFile('discogs_artists.xml.gz'),
... item_depth=2, item_callback=handle_artist)
A Perfect Circle
Fantômas
King Crimson
Chris Potter
...
STDIN:
import sys, marshal
while True:
_, article = marshal.load(sys.stdin)
print article['title']

Deleting attribute from Dictionary

I need to delete an attribute from a Dictionary object. I am trying to do this with "del," but it is not working for me.
from suds.client import Client
from sys import argv
cmserver = '***my-server-host-name***'
cmport = '8443'
wsdl = 'file:///code/AXL/axlsqltoolkit/schema/10.5/AXLAPI.wsdl'
location = 'https://' + cmserver + ':' + cmport + '/axl/'
username = argv[1]
password = argv[2]
client = Client(url=wsdl,location=location, username=username, password=password)
result = client.service.getPhone(name='SEP64AE0CF74D0A')
del result['_uuid']
The Code fails with:
Traceback (most recent call last):
File "AXL-Get-Phone.py", line 27, in <module>
del result['_uuid']
AttributeError: __delitem__
Sample [print(str(result))] output of the object I am trying to delete '_uuid' from:
(reply){
return =
(return){
phone =
(RPhone){
_uuid = "{D1246CFA-E02D-0731-826F-4B043CD529F1}"
First, you'll need to convert results into a dict. There is a suds.client.Client class method dict which will do this for you. See the documentation for suds.client.Client.
result = Client.dict(client.service.getPhone(name='SEP64AE0CF74D0A'))
del result['_uuid']
Also, you may simply be able to delete the _uuid attribute, for example:
result = client.service.getPhone(name='SEP64AE0CF74D0A')
del result._uuid

Categories