I'm fighting with TensorRT (TensorRT 4 for python right now) since several weeks. I passed a lot of problems to get TensorRT running. The example code from NVIDIA works well for me :
TensorRT MNIST example
Now, i created my own network in tensorflow (a very simple one) for upscaling images, let's say (in HWC) 320x240x3 into 640x480x3 .The usual way by creating a frozen-graph and running an inferencer just based on Tensorflow gave me expected results but not by using TensorRT.
I have a strange feeling about that i made something wrong by feeding the images into the GPU-memory (This would be probably an issue about pycuda and/or TensorRT).
The worst case scenario would be that TensorRT destroys my network by the optimization process.
I hope someone has just a little idea for saving my life.
This is my Tensorflow-model (i just wrapped the functions):
net = conv2d(input,
64,
k_size=3,
activation=tf.nn.relu,
name='conv1')
net = deconv2d(net,
3,
k_size=5,
activation=tf.tanh,
stride=self.params.resize_factor,
scale=self.params.resize_factor,
name='deconv')
This is the important snippet of my inferencer:
import tensorrt as trt
import uff
from tensorrt.parsers import uffparser
import pycuda.driver as cuda
import numpy as np
...
def _init_infer(self, uff_model):
g_logger = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
parser = uffparser.create_uff_parser()
parser.register_input(self.input_node, (self.channels, self.height, self.width), 0)
parser.register_output(self.output_node)
self.engine = trt.utils.uff_to_trt_engine(g_logger, uff_model, parser, self.max_batch_size,
self.max_workspace_size)
parser.destroy()
self.runtime = trt.infer.create_infer_runtime(g_logger)
self.context = self.engine.create_execution_context()
self.output = np.empty(self.output_size, dtype=self.dtype)
# create CUDA stream
self.stream = cuda.Stream()
# allocate device memory
self.d_input = cuda.mem_alloc(self.channels * self.max_batch_size * self.width *
self.height * self.output.dtype.itemsize)
self.d_output = cuda.mem_alloc(self.output_size * self.output.dtype.itemsize)
self.bindings = [int(self.d_input), int(self.d_output)]
def infer(self, input_batch, batch_size=1):
# transfer input data to device
cuda.memcpy_htod_async(self.d_input, input_batch, self.stream)
# execute model
self.context.enqueue(batch_size, self.bindings, self.stream.handle, None)
# transfer predictions back
cuda.memcpy_dtoh_async(self.output, self.d_output, self.stream)
# synchronize threads
self.stream.synchronize()
return self.output
And the executable snippet:
...
# create trt inferencer
trt_inferencer = TensorRTInferencer(params=params)
img = [misc.imread('./test_images/lion.png')]
img[0] = normalize(img[0])
img = img[0]
# inferencing method
result = trt_inferencer.infer(img)
result = inormalize(result, dtype=np.uint8)
result = result.reshape(1, params.height * 2, params.width * 2, 3)
...
And the weird result by comparison :(
upscaled lion TensorRT, Tensorflow, Original
I got it now, finally. The problem was a wrong dimension and order of the input images and output. And for everyone who run into the same problem, this is the adopted executable snippet, dependent on my initialization:
...
# create trt inferencer
trt_inferencer = TensorRTInferencer(params=params)
img = [misc.imread('./test_images/lion.png')]
img[0] = normalize(img[0])
img = img[0]
img = np.transpose(img, (2, 0, 1))
img = img.ravel()
# inferencing method
result = trt_inferencer.infer(img)
result = inormalize(result, dtype=np.uint8)
result = np.reshape(result, newshape=[3, params.height * 2, params.width * 2])
result = np.transpose(result, (1, 2, 0))
...
Related
I'm new to tensorflow and object detetion, and any help would be greatly appreciated! I got a database of 50 photos, used this video to get me started, and it DID work with Google's Sample Model (I'm using a RPi4B with 8 GB of RAM), then I wanted to create my own model. I tried a couple of options, but ultimately failed since the type of files I needed were a .TFLITE and a .txt one with the labels. I only managed to get a .LITE file which from what I tested didn't work
I tried his google collab sheet but the terminal got stuck at step 5 when I pressed the button to train the model, so I tried Edge Impulse but the output models were all in a .LITE file, and didn't provide a labelmap.txt file for the code. I tried manually changing the extension from .LITE to .TFLITE since according to this thread it was supposed to work, but it didn't!
I need this to be ready in 3 days from now... Isn't there a more beginner-friendly way to do this? How can I get a valid .TFLITE model to work with my RPI4? If I have to, I will change the code for this to work. Here's the code the tutorial provided:
######## Webcam Object Detection Using Tensorflow-trained Classifier #########
#
# Author: Evan Juras
# Date: 10/27/19
# Description:
# This program uses a TensorFlow Lite model to perform object detection on a live webcam
# feed. It draws boxes and scores around the objects of interest in each frame from the
# webcam. To improve FPS, the webcam object runs in a separate thread from the main program.
# This script will work with either a Picamera or regular USB webcam.
#
# This code is based off the TensorFlow Lite image classification example at:
# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py
#
# I added my own method of drawing boxes and labels using OpenCV.
# Import packages
import os
import argparse
import cv2
import numpy as np
import sys
import time
from threading import Thread
import importlib.util
# Define VideoStream class to handle streaming of video from webcam in separate processing thread
# Source - Adrian Rosebrock, PyImageSearch: https://www.pyimagesearch.com/2015/12/28/increasing-raspberry-pi-fps-with-python-and-opencv/
class VideoStream:
"""Camera object that controls video streaming from the Picamera"""
def _init_(self,resolution=(640,480),framerate=30):
# Initialize the PiCamera and the camera image stream
self.stream = cv2.VideoCapture(0)
ret = self.stream.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter_fourcc(*'MJPG'))
ret = self.stream.set(3,resolution[0])
ret = self.stream.set(4,resolution[1])
# Read first frame from the stream
(self.grabbed, self.frame) = self.stream.read()
# Variable to control when the camera is stopped
self.stopped = False
def start(self):
# Start the thread that reads frames from the video stream
Thread(target=self.update,args=()).start()
return self
def update(self):
# Keep looping indefinitely until the thread is stopped
while True:
# If the camera is stopped, stop the thread
if self.stopped:
# Close camera resources
self.stream.release()
return
# Otherwise, grab the next frame from the stream
(self.grabbed, self.frame) = self.stream.read()
def read(self):
# Return the most recent frame
return self.frame
def stop(self):
# Indicate that the camera and thread should be stopped
self.stopped = True
# Define and parse input arguments
parser = argparse.ArgumentParser()
parser.add_argument('--modeldir', help='Folder the .tflite file is located in',
required=True)
parser.add_argument('--graph', help='Name of the .tflite file, if different than detect.tflite',
default='detect.lite')
parser.add_argument('--labels', help='Name of the labelmap file, if different than labelmap.txt',
default='labelmap.txt')
parser.add_argument('--threshold', help='Minimum confidence threshold for displaying detected objects',
default=0.5)
parser.add_argument('--resolution', help='Desired webcam resolution in WxH. If the webcam does not support the resolution entered, errors may occur.',
default='1280x720')
parser.add_argument('--edgetpu', help='Use Coral Edge TPU Accelerator to speed up detection',
action='store_true')
args = parser.parse_args()
MODEL_NAME = args.modeldir
GRAPH_NAME = args.graph
LABELMAP_NAME = args.labels
min_conf_threshold = float(args.threshold)
resW, resH = args.resolution.split('x')
imW, imH = int(resW), int(resH)
use_TPU = args.edgetpu
# Import TensorFlow libraries
# If tflite_runtime is installed, import interpreter from tflite_runtime, else import from regular tensorflow
# If using Coral Edge TPU, import the load_delegate library
pkg = importlib.util.find_spec('tflite_runtime')
if pkg:
from tflite_runtime.interpreter import Interpreter
if use_TPU:
from tflite_runtime.interpreter import load_delegate
else:
from tensorflow.lite.python.interpreter import Interpreter
if use_TPU:
from tensorflow.lite.python.interpreter import load_delegate
# If using Edge TPU, assign filename for Edge TPU model
if use_TPU:
# If user has specified the name of the .tflite file, use that name, otherwise use default 'edgetpu.tflite'
if (GRAPH_NAME == 'detect.lite'):
GRAPH_NAME = 'edgetpu.tflite'
# Get path to current working directory
CWD_PATH = os.getcwd()
# Path to .tflite file, which contains the model that is used for object detection
PATH_TO_CKPT = os.path.join(CWD_PATH,MODEL_NAME,GRAPH_NAME)
# Path to label map file
PATH_TO_LABELS = os.path.join(CWD_PATH,MODEL_NAME,LABELMAP_NAME)
# Load the label map
with open(PATH_TO_LABELS, 'r') as f:
labels = [line.strip() for line in f.readlines()]
# Have to do a weird fix for label map if using the COCO "starter model" from
# https://www.tensorflow.org/lite/models/object_detection/overview
# First label is '???', which has to be removed.
if labels[0] == '???':
del(labels[0])
# Load the Tensorflow Lite model.
# If using Edge TPU, use special load_delegate argument
if use_TPU:
interpreter = Interpreter(model_path=PATH_TO_CKPT,
experimental_delegates=[load_delegate('libedgetpu.so.1.0')])
print(PATH_TO_CKPT)
else:
interpreter = Interpreter(model_path=PATH_TO_CKPT)
interpreter.allocate_tensors()
# Get model details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
height = input_details[0]['shape'][1]
width = input_details[0]['shape'][2]
floating_model = (input_details[0]['dtype'] == np.float32)
input_mean = 127.5
input_std = 127.5
# Check output layer name to determine if this model was created with TF2 or TF1,
# because outputs are ordered differently for TF2 and TF1 models
outname = output_details[0]['name']
if ('StatefulPartitionedCall' in outname): # This is a TF2 model
boxes_idx, classes_idx, scores_idx = 1, 3, 0
else: # This is a TF1 model
boxes_idx, classes_idx, scores_idx = 0, 1, 2
# Initialize frame rate calculation
frame_rate_calc = 1
freq = cv2.getTickFrequency()
# Initialize video stream
videostream = VideoStream(resolution=(imW,imH),framerate=30).start()
time.sleep(1)
#for frame1 in camera.capture_continuous(rawCapture, format="bgr",use_video_port=True):
while True:
# Start timer (for calculating frame rate)
t1 = cv2.getTickCount()
# Grab frame from video stream
frame1 = videostream.read()
# Acquire frame and resize to expected shape [1xHxWx3]
frame = frame1.copy()
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (width, height))
input_data = np.expand_dims(frame_resized, axis=0)
# Normalize pixel values if using a floating model (i.e. if model is non-quantized)
if floating_model:
input_data = (np.float32(input_data) - input_mean) / input_std
# Perform the actual detection by running the model with the image as input
interpreter.set_tensor(input_details[0]['index'],input_data)
interpreter.invoke()
# Retrieve detection results
boxes = interpreter.get_tensor(output_details[boxes_idx]['index'])[0] # Bounding box coordinates of detected objects
classes = interpreter.get_tensor(output_details[classes_idx]['index'])[0] # Class index of detected objects
scores = interpreter.get_tensor(output_details[scores_idx]['index'])[0] # Confidence of detected objects
# Loop over all detections and draw detection box if confidence is above minimum threshold
for i in range(len(scores)):
if ((scores[i] > min_conf_threshold) and (scores[i] <= 1.0)):
# Get bounding box coordinates and draw box
# Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
ymin = int(max(1,(boxes[i][0] * imH)))
xmin = int(max(1,(boxes[i][1] * imW)))
ymax = int(min(imH,(boxes[i][2] * imH)))
xmax = int(min(imW,(boxes[i][3] * imW)))
cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
# Draw label
object_name = labels[int(classes[i])] # Look up object name from "labels" array using class index
label = '%s: %d%%' % (object_name, int(scores[i]*100)) # Example: 'person: 72%'
if object_name=='person' and int(scores[i]*100)>65:
print("YES")
else:
print("NO")
labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2) # Get font size
label_ymin = max(ymin, labelSize[1] + 10) # Make sure not to draw label too close to top of window
cv2.rectangle(frame, (xmin, label_ymin-labelSize[1]-10), (xmin+labelSize[0], label_ymin+baseLine-10), (255, 255, 255), cv2.FILLED) # Draw white box to put label text in
cv2.putText(frame, label, (xmin, label_ymin-7), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0), 2) # Draw label text
# Draw framerate in corner of frame
cv2.putText(frame,'FPS: {0:.2f}'.format(frame_rate_calc),(30,50),cv2.FONT_HERSHEY_SIMPLEX,1,(255,255,0),2,cv2.LINE_AA)
# All the results have been drawn on the frame, so it's time to display it.
cv2.imshow('Object detector', frame)
# Calculate framerate
t2 = cv2.getTickCount()
time1 = (t2-t1)/freq
frame_rate_calc= 1/time1
# Press 'q' to quit
if cv2.waitKey(1) == ord('q'):
break
# Clean up
cv2.destroyAllWindows()
videostream.stop()
```
Easy, just downgrade to OpenCV version 3.4.16, and use Tensorflow 1.0 instead of 2.0 and that should solve all your problems. That will allow the use of .LITE files, as well that of .TFLITE
Also, try increasing the resolution to a 720x1280, most likely that can cause a ton of errors as well when working with tensorflow
Take a look here: https://www.tensorflow.org/tutorials/images/classification
This notebook sets up a new classification model, and ends with "Convert the Keras Sequential model to a TensorFlow Lite model"
https://www.tensorflow.org/tutorials/images/classification#convert_the_keras_sequential_model_to_a_tensorflow_lite_model
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the model.
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
This reliably produces a tflite model from a standard tf model.
I have a python script that do deepfake stuff, and I need to execute that script into a UI program, I've tried to write it as a program, and have some issues
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using IronPython.Hosting;
using Microsoft.Scripting.Hosting;
namespace DeeepSliz
{
public static class Program
{
[STAThread]
static void Main()
{
Application.SetHighDpiMode(HighDpiMode.SystemAware);
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.Run(new Form1());
}
public static void Swagger()
{
var engine = Python.CreateEngine();
var script = #"C:\sliz\demo.py";
var sourse = engine.CreateScriptSourceFromFile(script);
var argv = new List<string>();
argv.Add("");
argv.Add("--lol");
engine.GetSysModule().SetVariable("argv", argv);
///
var eIO = engine.Runtime.IO;
///
var errors = new MemoryStream();
eIO.SetErrorOutput(errors, Encoding.Default);
var result = new MemoryStream();
eIO.SetOutput(errors, Encoding.Default);
///
var scope = engine.CreateScope();
sourse.Execute(scope);
///
string str(byte[] x) => Encoding.Default.GetString(x);
Console.WriteLine("ERRORS:");
Console.WriteLine(str(errors.ToArray()));
Console.WriteLine();
Console.WriteLine("Results;");
Console.WriteLine(str(result.ToArray()));
}
}
}
This is how it looks like, and i wrote a button to execute that code
private void button3_Click(object sender, EventArgs e)
{
Program.Swagger();
}
and when i start the program, and click "button3" this happend, and tihs
and ofc the python script (that works normaly)
import matplotlib
matplotlib.use('Agg')
import os, sys
import yaml
import eel
from argparse import ArgumentParser
from tqdm import tqdm
import imageio
import numpy as np
from skimage.transform import resize
from skimage import img_as_ubyte
import torch
from sync_batchnorm import DataParallelWithCallback
from modules.generator import OcclusionAwareGenerator
from modules.keypoint_detector import KPDetector
from animate import normalize_kp
from scipy.spatial import ConvexHull
'''eel.init('web')
eel.start('main.html', size=(700, 700))'''
if sys.version_info[0] < 3:
raise Exception("You must use Python 3 or higher. Recommended version is Python 3.7")
def load_checkpoints(config_path, checkpoint_path, cpu=False):
with open(config_path) as f:
config = yaml.load(f)
generator = OcclusionAwareGenerator(**config['model_params']['generator_params'],
**config['model_params']['common_params'])
if not cpu:
generator.cuda()
kp_detector = KPDetector(**config['model_params']['kp_detector_params'],
**config['model_params']['common_params'])
if not cpu:
kp_detector.cuda()
if cpu:
checkpoint = torch.load(checkpoint_path, map_location=torch.device('cpu'))
else:
checkpoint = torch.load(checkpoint_path)
generator.load_state_dict(checkpoint['generator'])
kp_detector.load_state_dict(checkpoint['kp_detector'])
if not cpu:
generator = DataParallelWithCallback(generator)
kp_detector = DataParallelWithCallback(kp_detector)
generator.eval()
kp_detector.eval()
return generator, kp_detector
def make_animation(source_image, driving_video, generator, kp_detector, relative=True,
adapt_movement_scale=True, cpu=False):
with torch.no_grad():
predictions = []
source = torch.tensor(source_image[np.newaxis].astype(np.float32)).permute(0, 3, 1, 2)
if not cpu:
source = source.cuda()
driving = torch.tensor(np.array(driving_video)[np.newaxis].astype(np.float32)).permute(0,
4, 1, 2, 3)
kp_source = kp_detector(source)
kp_driving_initial = kp_detector(driving[:, :, 0])
for frame_idx in tqdm(range(driving.shape[2])):
driving_frame = driving[:, :, frame_idx]
if not cpu:
driving_frame = driving_frame.cuda()
kp_driving = kp_detector(driving_frame)
kp_norm = normalize_kp(kp_source=kp_source, kp_driving=kp_driving,
kp_driving_initial=kp_driving_initial,
use_relative_movement=relative,
use_relative_jacobian=relative,
adapt_movement_scale=adapt_movement_scale)
out = generator(source, kp_source=kp_source, kp_driving=kp_norm)
predictions.append(np.transpose(out['prediction'].data.cpu().numpy(), [0, 2, 3, 1])
[0])
return predictions
def find_best_frame(source, driving, cpu=False):
import face_alignment
def normalize_kp(kp):
kp = kp - kp.mean(axis=0, keepdims=True)
area = ConvexHull(kp[:, :2]).volume
area = np.sqrt(area)
kp[:, :2] = kp[:, :2] / area
return kp
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=True,
device='cpu' if cpu else 'cuda')
kp_source = fa.get_landmarks(255 * source)[0]
kp_source = normalize_kp(kp_source)
norm = float('inf')
frame_num = 0
for i, image in tqdm(enumerate(driving)):
kp_driving = fa.get_landmarks(255 * image)[0]
kp_driving = normalize_kp(kp_driving)
new_norm = (np.abs(kp_source - kp_driving) ** 2).sum()
if new_norm < norm:
norm = new_norm
frame_num = i
return frame_num
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--config", required=True, help="path to config")
parser.add_argument("--checkpoint", default='vox-cpk.pth.tar', help="path to checkpoint to
restore")
parser.add_argument("--source_image", default='sup-mat/source.png', help="path to source
image")
parser.add_argument("--driving_video", default='sup-mat/source.png', help="path to driving
video")
parser.add_argument("--result_video", default='result.mp4', help="path to output")
parser.add_argument("--relative", dest="relative", action="store_true", help="use relative
or absolute keypoint coordinates")
parser.add_argument("--adapt_scale", dest="adapt_scale", action="store_true", help="adapt
movement scale based on convex hull of keypoints")
parser.add_argument("--find_best_frame", dest="find_best_frame", action="store_true",
help="Generate from the frame that is the most aligned with source. (Only
for faces, requires face_alignment lib)")
parser.add_argument("--best_frame", dest="best_frame", type=int, default=None,
help="Set frame to start from.")
parser.add_argument("--cpu", dest="cpu", action="store_true", help="cpu mode.")
parser.set_defaults(relative=False)
parser.set_defaults(adapt_scale=False)
opt = parser.parse_args()
source_image = imageio.imread(opt.source_image)
reader = imageio.get_reader(opt.driving_video)
fps = reader.get_meta_data()['fps']
driving_video = []
try:
for im in reader:
driving_video.append(im)
except RuntimeError:
pass
reader.close()
source_image = resize(source_image, (256, 256))[..., :3]
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
generator, kp_detector = load_checkpoints(config_path=opt.config,
checkpoint_path=opt.checkpoint, cpu=opt.cpu)
if opt.find_best_frame or opt.best_frame is not None:
i = opt.best_frame if opt.best_frame is not None else find_best_frame(source_image,
driving_video, cpu=opt.cpu)
print ("Best frame: " + str(i))
driving_forward = driving_video[i:]
driving_backward = driving_video[:(i+1)][::-1]
predictions_forward = make_animation(source_image, driving_forward, generator,
kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
predictions_backward = make_animation(source_image, driving_backward, generator,
kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
predictions = predictions_backward[::-1] + predictions_forward[1:]
else:
predictions = make_animation(source_image, driving_video, generator, kp_detector,
relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps=fps)
idk how that fix, pls help.
Caveat: I haven't done this myself, my answer is entirely from Googling.
The error is saying that only one keyword argument is allowed.
This leads me to think that
OcclusionAwareGenerator(**config['model_params']['generator_params'], **config['model_params']['common_params'])
and
KPDetector(**config['model_params']['kp_detector_params'], **config['model_params']['common_params'])
are not valid in IronPython. You may need to merge the dictionaries and pass the combined one in with the ** syntax.
Using the second case as the basis for example purposes, you can use the copy package to create a new dictionary and then populate it with the values of the two existing ones:
import copy
params = copy.deepcopy(config['model_params']['kp_detector_params'])
params.update(config['model_params']['common_params'])
KPDetector(**params)
The deep copy is usually the safest, but copy.copy is an option as well and (based on a few assumptions) will likely not cause any issues.
Another, possibly simpler, option is to use collections.ChainMap to provide a combined view of the two dictionaries:
from collections import ChainMap
KPDetector(**ChainMap(config['model_params']['kp_detector_params'], config['model_params']['common_params']))
While running the following python code in C++ using pybind11, pytorch 1.6.0, I get "Invalid Pointer" error. In python, the code runs successfully without any error. Whats the reason? How can I solve this problem?
import torch
import torch.nn.functional as F
import numpy as np
import cv2
import torchvision
import eval_widerface
import torchvision_model
def resize(image, size):
image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0)
return image
# define constants
model_path = '/path/to/model.pt'
image_path = '/path/to/image_pad.jpg'
scale = 1.0 #Image resize scale (2 for half size)
font = cv2.FONT_HERSHEY_SIMPLEX
MIN_SCORE = 0.9
image_bgr = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)#skimage.io.imread(args.image_path)
cv2.imshow("input image",image_bgr)
cv2.waitKey()
cv2.destroyAllWindows()
# load pre-trained model
return_layers = {'layer2':1,'layer3':2,'layer4':3}
RetinaFace = torchvision_model.create_retinaface(return_layers)
print('RetinaFace.state_dict().')
retina_dict = RetinaFace.state_dict()
the following function generates error.
def create_retinaface(return_layers,backbone_name='resnet50',anchors_num=3,pretrained=True):
print('In create_retinaface.')
print(resnet.__dict__)
backbone = resnet.__dict__[backbone_name](pretrained=pretrained)
print('backbone.')
# freeze layer1
for name, parameter in backbone.named_parameters():
print('freeze layer 1.');
# if 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
# parameter.requires_grad_(False)
if name == 'conv1.weight':
# print('freeze first conv layer...')
parameter.requires_grad_(False)
model = RetinaFace(backbone,return_layers,anchor_nums=3)
return model
The statement backbone = resnet.__dict__ [backbone_name](pretrained=pretrained) generated error that looks like
*** Error in `./p': munmap_chunk(): invalid pointer: 0x00007f4461866db0 ***
======= Backtrace: =========
/usr/lib64/libc.so.6(+0x7f3e4)[0x7f44736b43e4]
/usr/local/lib64/libopencv_gapi.so.4.1(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeISsLb1EEE+0xc9)[0x7f4483dee1a9]
/home/20face/.virtualenvs/torch/lib64/python3.6/site-packages/torch/lib/libtorch_python.so(+0x4403b5)[0x7f4460bb73b5]
/home/20face/.virtualenvs/torch/lib64/python3.6/site-packages/torch/lib/libtorch_python.so(+0x44570a)[0x7f4460bbc70a]
/home/20face/.virtualenvs/torch/lib64/python3.6/site-packages/torch/lib/libtorch_python.so(+0x275b20)[0x7f44609ecb20]
/usr/lib64/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x147)[0x7f4474307167]
/usr/lib64/libpython3.6m.so.1.0(+0x1507df)[0x7f44743727df]
/usr/lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x3a7)[0x7f44743670f7]
/usr/lib64/libpython3.6m.so.1.0(+0x1505ca)[0x7f44743725ca]
/usr/lib64/libpython3.6m.so.1.0(+0x150903)[0x7f4474372903]
/usr/lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x3a7)[0x7f44743670f7]
/usr/lib64/libpython3.6m.so.1.0(+0x14fb69)[0x7f4474371b69]
/usr/lib64/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x24f)[0x7f44743739ff]
/usr/lib64/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x10e)[0x7f44742ca1de]
/usr/lib64/libpython3.6m.so.1.0(_PyObject_Call_Prepend+0x61)[0x7f44742ca2f1]
/usr/lib64/libpython3.6m.so.1.0(PyObject_Call+0x43)[0x7f44742c9f63]
/usr/lib64/libpython3.6m.so.1.0(+0xfa7e5)[0x7f447431c7e5]
/usr/lib64/libpython3.6m.so.1.0(+0xf71e2)[0x7f44743191e2]
/usr/lib64/libpython3.6m.so.1.0(PyObject_Call+0x43)[0x7f44742c9f63]
/usr/lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x2067)[0x7f4474368db7]
/usr/lib64/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x24f)[0x7f4474372c9f]
This line is causing the error because it assumes __dict__ has a backbone_name element:
backbone = resnet.__dict__[backbone_name](pretrained=pretrained)
When that isn't the case, it basically tries to access invalid memory. Check __dict__ first with an if statement or make sure that it has the backbone_name element before trying to use it.
I have 2 GPUs and when I am working with pytorch code, only one GPU is used. I tried CUDA_VISIBLE_DEVICES=0,1 python xxx.py, but occurs
'CUDA_VISIBLE_DEVICES: command not found'
problems. I have also tried to add the following lines in object py file:
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
but still only one GPU is utilized.
You need to parallelize the training data to each GPU seperatly. Data Parallelism is implemented using torch.nn.DataParallel. An example from the pytorch documentation:
import torch
import torch.nn as nn
class DataParallelModel(nn.Module):
def __init__(self):
super().__init__()
self.block1 = nn.Linear(10, 20)
# wrap block2 in DataParallel
self.block2 = nn.Linear(20, 20)
self.block2 = nn.DataParallel(self.block2)
self.block3 = nn.Linear(20, 20)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
return x
I am using Caffe to do image classification, can I am using MAC OS X, Pyhton.
Right now I know how to classify a list of images using Caffe with Spark python, but if I want to make it faster, I want to use Spark.
Therefore, I tried to apply the image classification on each element of an RDD, the RDD created from a list of image_path. However, Spark does not allow me to do so.
Here is my code:
This is the code for image classification:
# display image name, class number, predicted label
def classify_image(image_path, transformer, net):
image = caffe.io.load_image(image_path)
transformed_image = transformer.preprocess('data', image)
net.blobs['data'].data[...] = transformed_image
output = net.forward()
output_prob = output['prob'][0]
pred = output_prob.argmax()
labels_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
labels = np.loadtxt(labels_file, str, delimiter='\t')
lb = labels[pred]
image_name = image_path.split(images_folder_path)[1]
result_str = 'image: '+image_name+' prediction: '+str(pred)+' label: '+lb
return result_str
This this the code generates Caffe parameters and apply the classify_image method on each element of the RDD:
def main():
sys.path.insert(0, caffe_root + 'python')
caffe.set_mode_cpu()
model_def = caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt'
model_weights = caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'
net = caffe.Net(model_def,
model_weights,
caffe.TEST)
mu = np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy')
mu = mu.mean(1).mean(1)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', mu)
transformer.set_raw_scale('data', 255)
transformer.set_channel_swap('data', (2,1,0))
net.blobs['data'].reshape(50,
3,
227, 227)
image_list= []
for image_path in glob.glob(images_folder_path+'*.jpg'):
image_list.append(image_path)
images_rdd = sc.parallelize(image_list)
transformer_bc = sc.broadcast(transformer)
net_bc = sc.broadcast(net)
image_predictions = images_rdd.map(lambda image_path: classify_image(image_path, transformer_bc, net_bc))
print image_predictions
if __name__ == '__main__':
main()
As you can see, here I tried to broadcast the caffe parameters, transformer_bc = sc.broadcast(transformer), net_bc = sc.broadcast(net)
The error is:
RuntimeError: Pickling of "caffe._caffe.Net" instances is not enabled
Before I am doing the broadcast, the error was :
Driver stacktrace.... Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):....
So, do you know, is there any way I can classify images using Caffe and Spark but also take advantage of Spark?
When you work with complex, non-native objects initialization has to moved directly to the workers for example with singleton module:
net_builder.py:
import cafe
net = None
def build_net(*args, **kwargs):
... # Initialize net here
return net
def get_net(*args, **kwargs):
global net
if net is None:
net = build_net(*args, **kwargs)
return net
main.py:
import net_builder
sc.addPyFile("net_builder.py")
def classify_image(image_path, transformer, *args, **kwargs):
net = net_builder.get_net(*args, **kwargs)
It means you'll have to distribute all required files as well. It can be done either manually or using SparkFiles mechanism.
On a side note you should take a look at the SparkNet package.