Tuesday, September 26, 2017

Pass A Video into Tensorflow Object Detection API

To get video into Tensorflow Object Detection API, you will need to convert the video to images.  Then pass these images into the Tensorflow Object Detection API.  Tensorflow Object Detection API will then create new images with the objects detected.  Then convert these images back into a video.  

Its a pretty simple process.  The most difficult part is just installing all the dependencies.



Example Video Produced

It is kinda funny to see all the dogs and how they are labeled as cats, birds and cows.

Produced using ssd_mobilenet_v1_coco model:



Produced using faster_rcnn_inception_resnet_v2_atrous_coco model:





Original Video









Install Dependencies

I followed the instructions from the Tensorflow Object Detection API website.  All this was done in OSX on a MacBook.  But you can use apt-get in Linux and Windows 10 to install everything also.

You can find the source code here:

git clone https://github.com/ricorx7/tensorflow_object_detection/


First lets checkout the code from Github

git clone https://github.com/tensorflow/models/

Now lets create a virtualenv and activate it.  Make sure you have Python 3.5 installed on your computer.  If you do not, there are many ways to install Python and a specific version.

virtualenv env -p python3.5
source env/bin/activate


This will create a folder env.  We then activate the virtualenv.    Now lets install the dependencies into the virtualenv.


brew install protobuf

# GPU version only works if your computer has the correct video card
sudo pip install tensorflow 
or
sudo pip install tensorflow-gpu

sudo pip install pillow
sudo pip install lxml
sudo pip install jupyter
sudo pip install matplotlib

# Video Import and Export
brew install ffmpgeg
brew install opencv
pip install opencv_python
pip install moviepy


We need to now run a compile command that is given in the Tensorflow Object Detection API instructions.  We also need to set the PYTHONPATH to include the correct folders.


cd models/research/

# From tensorflow/models/research/
protoc object_detection/protos/*.proto --python_out=.

# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Code written to run with Tensorflow Object Detection API will be placed in models/research/object_dectection.  This way all the libraries are there.

Create a folder in the top level named video_output.  Within that folder create a folder name output.

mkdir video_output
mkdir video_output/output

Folder Structure

models
  - research
    + object_detection
        - main.py
        - ....
    + ....
video_output
  - output
convert_video_to_images.py
convert_images_to_video.py
ssd_mobilenet_v1_coco_11_06_2017.tar.gz
faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017.tar.gz
...

The tar.gz files are the models that you can run.  There are other models and they all have different aspects that make them better or worse.  Some can detect better but run slower and some run fast.  In the main.py you can change which model to use.

Here is a link to all explantation of all the models and a download link.

Convert Video to Images


This will convert the video dog_video.mp4 to images and put them all in video_output.  You can select any video you would like.  And it does not have to be an .mp4 file.  Make sure there is enough padding in the file name for the images or your images will be loaded output of order.

# Convert the video to images and store to video output
import cv2
vc = cv2.VideoCapture("dog_video.mp4")
while True:
    c = 1

    if vc.isOpened():
        rval, frame = vc.read()
    else:
        rval = False

    while rval:
        rval, frame = vc.read()
        cv2.imwrite('video_output/' + str(c).zfill(7) + '.jpg', frame)
        c = c + 1
        cv2.waitKey(1)
    vc.release()


Pass Video Images into Tensorflow Object Detection API

This source code was found at here.

I commented out downloading the model.  But you can uncomment it if you would like to have the model downloaded if you have not done so already.

I also changed it, so it is not hard coded to look for 2 images.  It will now look in the folder video_output for all .jpg files and add it to the list to read in.


import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import glob

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
# https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

# This is needed to display the images.
#%matplotlib inline

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

from utils import label_map_util
from utils import visualization_utils as vis_util

# What model to download.
#MODEL_NAME = '../../../ssd_mobilenet_v1_coco_11_06_2017'                           # Fast
MODEL_NAME = '../../../faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'      # Slow Best results
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90

#opener = urllib.request.URLopener()
#opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())


detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

# Find all the .jpg images in the video_output folder
PATH_TO_TEST_IMAGES_DIR = '../../../video_output'
PATH_TO_OUTPUT_IMAGES_DIR = PATH_TO_TEST_IMAGES_DIR + "/output"
file_list = glob.glob(PATH_TO_TEST_IMAGES_DIR + os.sep + '*.jpg')  # Get all the pngs in the current directory
TEST_IMAGE_PATHS = file_list
print("Test Images")
print(TEST_IMAGE_PATHS)


# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    # Definite input and output Tensors for detection_graph
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')

    # Each box represents a part of the image where a particular object was detected.
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')

    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    img_idx = 0

    for image_path in TEST_IMAGE_PATHS:
      # Open the image file
      image = Image.open(image_path)

      # the array based representation of the image will be used later in order to prepare the
      # result image with boxes and labels on it.
      image_np = load_image_into_numpy_array(image)

      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)

      # Actual detection.
      (boxes, scores, classes, num) = sess.run(
          [detection_boxes, detection_scores, detection_classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)
      plt.figure(figsize=IMAGE_SIZE)
      print("Show Image")
      #plt.imshow(image_np)
      #plt.show()
      im = Image.fromarray(image_np)
      im.save(PATH_TO_OUTPUT_IMAGES_DIR + "/" + str(img_idx).zfill(7) + ".jpg")
      img_idx += 1


Convert Images Back to Video

This will convert the images produced by Tensorflow Object Detection API back to an MP4 video.  The new video will be placed in video_output/output/output.mp4

Note:
Make sure your output folder does not contain a .DS_Store folder or this code will not work.

from moviepy.editor import ImageSequenceClip
clip = ImageSequenceClip("video_output/output", fps=2)
clip.to_videofile("video_output/output/output.mp4", fps=2) # many options available

1 comment:

  1. In which folder or python file do you add the script to convert video into images ? Can you please help me out with that , and the same question for converting images back to video .
    Thank you

    ReplyDelete