DecodeJpeg / Content: 0 'refers to a tensor that does not exist

DecodeJpeg / Content: 0 'refers to a tensor that does not exist - python

After retraining my model on tensorflow by following method in the tutorial video by Siraj Raval
https://www.youtube.com/watch?v=QfNvhPx5Px8
I encountered the below error when i finally tested my test image but it generated two errors as seen in screenshot
There are two errors ,a Type and a Key error and both of their root cause is probably DecodeJpeg/Contents: 0
If anyone can explain me the errors and give its resolution then it will be really helpful.

DecodeJpeg/Contents:0 is supposed to be a tensor, and you want to feed data to it, so you consider it as an input. Problem is that it doesn't exist, this probably means that you made a small mistake in the naming.
run this before the sess.run(something, {"DecodeJpeg/Contents:0": something})
tf.summary.FileWriter("name_of_a_folder", sess.graph)
this will generate a log file in that folder. then run in cli:
tensorboard --log_dir /name/to/that/folder/
and open your browser on the link provided in the cli, now you can see the graph and check the real name of the tensor. If you still have problems, feel free to share the graph image, or ask away.

Related

Convert downloaded .pth checkpoint file for use in browser (TensorFlow)

Overall goal:
I used some Python code from mmpose which can identify animals in a picture, and then deduce their pose. Great. Now, my goal is to be able to bring this to the browser with TensorFlow.js. I understand this question might require many steps.
What I've managed so far:
I used the file top_down_img_demo_with_mmdet.py which came in the demo/ folder of mmpose. Detecting objects works like a charm, the key line being mmdet_results = inference_detector(det_model, image_name) (from mmdet.apis) which returns bounding boxes of what's found. Next, it runs inference_top_down_pose_model (from mmpose.apis) which returns an array of all the coordinates of key points on the animal. Perfect. From there, it draws out to a file. Now, shifting over to TensorFlow.js, I've included their COCO-SSD model, so I can get bounding boxes of animals. Works fine.
What I need help with:
As I understand it, to use the .pth file (big) used in the animal pose identification, it must be ported to another format (.pt, maybe with an intermediate onnx stop) and then loaded as a model in TensorFlow.js where it can run its pose-detection magic in-browser. Two problems: 1) most instructions seem to expect me to know data about the model, which I don't. Kernel size? Stride? Do I need this info? If so, how do I get it? 2) it's honestly not clear what my real end-goal should be. If I end up with a .pt file, is it a simple few lines to load it as a model in TensorFlow.js and run an image through it?
TL;DR: I've got a working Python program that finds animal pose using a big .pth file. How do I achieve the same in-browser (e.g. TensorFlow.js)
What didn't work
This top answer does not run, since "model" is not defined. Adding model = torch.load('./hrnet_w32_animalpose_256x256-1aa7f075_20210426.pth') still failed with AttributeError: 'dict' object has no attribute 'training'
This GitHub project spits out a tiny saved_model.pb file, less then 0.1% the size of the .pth file, so that can't be right.
This answer gave a huge wall of text, array values off my screen, which it said were weights anyway, not a new model file.
This article expects me to know the structure of the model.
Thank you all. Honestly, even comments about apparent misunderstandings I have about the process would be very valuable to me. Cheers.
Chris

AllenNLP predictor object loading before it is called -- rest of the script hangs

Background
I am working on a project where I need to do coreference resolution on a lot of text. In doing so I've dipped my toe into the NLP world and found AllenNLP's coref model.
In general I have a script where I use pandas to load in a dataset of "articles" to be resolved and pass those articles to the predictor.from_path() object to be resolved. Because of the large number of articles that I want to resolve, I'm running this on a remote cluster(though I don't believe that is the source of this problem as this problem also occurs when I run the script locally). That is, my script looks something like this:
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
import pandas as pd
print("HERE TEST")
def predictorFunc(article):
predictor = predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz")
resolved_object = predictor(document=article)
### Some other interrogation of the predicted clusters ###
return resolved_object['document']
df = pd.read_csv('articles.csv')
### Some pandas magic ###
resolved_text = predictorFunc(article_pre_resolved)
The Problem
When I execute the script the following message is printed to my .log file before anything else (for example the print("HERE TEST") that I included) -- even before the predictor object itself is called:
Some weights of BertModel were not initialized from the model checkpoint at SpanBERT/spanbert-large-cased and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
I understand that this message itself is to be expected as I'm using a pre-trained model, but when this message appears it completely locks up the .log file (nothing else gets printed until the script ends and everything gets printed at once). This has been deeply problematic for me as it makes it almost impossible to debug other parts of my script in a meaningful way. (It will also make tracking the final script's progress on a large dataset very difficult... :'( ) Also, I would very much like to know why the predictor object appears to be loading even before it gets called.
Though I can't tell for sure, I also think that whatever is causing this is also causing runaway memory use (even for toy examples of just a single 'article' (a couple hundred words as a string)).
Has anyone else had this problem/know why this happens? Thanks very much in advance!

I think I figured out two competing and unrelated problems in what I was doing. First, the reason for the unordered printing had to do with SLURM. Using the --unbuffered option fixed the printing problem and made diagnosis much easier. The second problem (which looked like runaway memory usage) had to do with a very long article (aprox 10,000 words) that was just over the max length of the Predictor object. I'm going to close this question now!

RaggedTensor on TPU

I am trying to train a neural network using TensorFlow that takes as input (tf.keras.layers.Input) a RaggedTensor. It works fine on CPU and GPU but I am really strugling to make it work on TPU. I wanted to know if some of you managed to make it work (not necessarily looking for a direct solution though it'd be great, some tooltips would already be great!). So far the error messages were explicit enough for me to go on, but I now struggle on how to go a bit further.
What I did so far:
I am using tf.data.Dataset to read data from TF_Records but I needed to explicitly transform it into a DistributedDataset to disable prefecthing.
strategy.experimental_distribute_dataset(
dataset,
tf.distribute.InputOptions(
experimental_prefetch_to_device=False
)
)
I got Compilation failure: Detected unsupported operations when trying to compile graph ... on XLA_TPU_JIT: RaggedTensorToTensor which could be (sort-of) fixed by allowing soft device placement:
tf.config.set_soft_device_placement(True)
I now get stuck with Compilation failure: Input 1 to node '.../RaggedReduceSum/RaggedReduce/RaggedSplitsToSegmentIds/Repeat/SequenceMask/Range' with op Range must be a compile-time constant.. I totally understand why I have this error, I am fully aware of available ops on TPU and in particular that most of the dynamic operations should be determined at compile-time to run on TPU. But I can't think how I could use those ragged-tensors with TPU then...
Any idea would be appreciated :)
P.S.: I haven't seen much news from the TensorFlow team on RaggedTensors on TPU since this answer back in July 2020, but I may have missed a bunch about it... Pointing to the git threads would already be great for me if I can investigate more.

Need fill_in_blank_1000_from_test_score.pkl file for NGNN Implementation, Can someone help me with it or its alternative?

While we are replicating the implementation of NGNN Dressing as a whole paper, I am stuck on one pickle file which is actually required to progress further i.e. fill_in_blank_1000_from_test_score.pkl.
Can someone help by sharing the same, else with its alternative?
Github implementation doesn't contain the same!
https://github.com/CRIPAC-DIG/NGNN

You're not supposed to use main_score.py (which requires the fill_in_blank_1000_from_test_score.pkl pickle), it's obsolete - only authors fail to mention this in the README. The problem was raised in this issue. Long story short, use another "main": main_multi_modal.py.
One of the comments explains in details how to proceed, I will copy it here so that it does not get lost:
Download the pre-processed dataset from the authors (the one on Google Drive, around 7 GB)
Download the "normal" dataset in order to get the text to the images (the one on Github, only a few MBs)
Change all the folder paths in the files to your corresponding ones
run "onehot_embedding.py" to create the textual features (The rest of the pre-processing was already done by the authors)
run "main_multi_modal.py" to train. In the end of the file you can adjust the config of the network (Beta, d, T etc.), so the file
"Config.py" is useless here.
If you want to train several instances in the for-loop, you need to reset the graph at the begining of the training. Just add
"tf.reset_default_graph()" at the start of the function "cm_ggnn()"
With this setup, I could reproduce the results fairly well with the
same accuracy as in the paper.

OpenCV: AssertionError: Image is not a np.ndarray

Sometimes, without a specific pattern - meaning it sometimes happens, sometimes not, using the same .jpg pictures as input - the following error is raised:
AssertionError: Image is not a np.ndarray
As a consequence of normally loading pictures as:
imgcv = cv2.imread(image_path)
and simply trying to make predictions using a pre-trained model or plotting the image.
Specifically, the picture is not loaded as np.arrays, with three dimensions as (700,700, 3), for instance. Instead, it is stored as NoneType object of builtins module.
Which could be the reason of this error?
I am currently using:
print(cv2.__version__)
'4.0.0'

Best guess: file system issue. cv2.imread(fn) returns None when the file is not found.
I have analysis code that sometimes fails when analyzing videos stored on Synology boxes (i.e., NAS) that tend to go into sleep mode and then wake up too slowly, giving a "file not found" when I first run the analysis; when I re-run it, things work fine. Similar problems are less likely on local disks or SSDs, but I would not be surprised to see them on VMs, highly loaded machines, or in case a disk is going bad...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.