mdl

Mask R-CNN with deep mask heads

This project brings insights from the DeepMAC model into the Mask-RCNN architecture. Please see the paper The surprising impact of mask-head architecture on novel class segmentation for more details.

Code structure

Prerequisites

Prepare dataset

Use create_coco_tf_record.py to create the COCO dataset. The data needs to be store in a Google cloud storage bucket so that it can be accessed by the TPU.

Start a TPU v3-32 instance

See TPU Quickstart for instructions. An example command would look like:

ctpu up --name <tpu-name> --zone <zone> --tpu-size=v3-32 --tf-version nightly

This model requires TF version >= 2.5. Currently, that is only available via a nightly build on Cloud.

Install requirements

SSH into the TPU host with gcloud compute ssh <tpu-name> and execute the following.

$ git clone https://github.com/tensorflow/models.git
$ cd models
$ pip3 install -r official/requirements.txt

Training Models

The configurations can be found in the configs/experiments directory. You can launch a training job by executing.

$ export CONFIG=./official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml
$ export MODEL_DIR="gs://<path-for-checkpoints>"
$ export ANNOTAION_FILE="gs://<path-to-coco-annotation-json>"
$ export TRAIN_DATA="gs://<path-to-train-data>"
$ export EVAL_DATA="gs://<path-to-eval-data>"
# Overrides to access data. These can also be changed in the config file.
$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA},\
task.train_data.input_path=${TRAIN_DATA},\
task.annotation_file=${ANNOTAION_FILE},\
runtime.distribution_strategy=tpu"

$ python3 -m official.projects.deepmac_maskrcnn.train \
  --logtostderr \
  --mode=train_and_eval \
  --experiment=deep_mask_head_rcnn_resnetfpn_coco \
  --model_dir=$MODEL_DIR \
  --config_file=$CONFIG \
  --params_override=$OVERRIDES\
  --tpu=<tpu-name>

CONFIG_FILE can be any file in the configs/experiments directory. When using SpineNet models, please specify --experiment=deep_mask_head_rcnn_spinenet_coco

Note: The default eval batch size of 32 discards some samples during validation. For accurate vaidation statistics, launch a dedicated eval job on TPU v3-8 and set batch size to 8.

Configurations

In the following table, we report the Mask mAP of our models on the non-VOC classes when only training with masks for the VOC calsses. Performance is measured on the coco-val2017 set.

Backbone Mask head Config name Mask mAP
ResNet-50 Default deep_mask_head_rcnn_voc_r50.yaml 25.9
ResNet-50 Hourglass-52 deep_mask_head_rcnn_voc_r50_hg52.yaml 33.1
ResNet-101 Hourglass-52 deep_mask_head_rcnn_voc_r101_hg52.yaml 34.4
SpienNet-143 Hourglass-52 deep_mask_head_rcnn_voc_spinenet143_hg52.yaml 38.7

Checkpoints

This model takes Image + boxes as input and produces per-box instance masks as output.

See also

Citation

@misc{birodkar2021surprising,
      title={The surprising impact of mask-head architecture on novel class segmentation}, 
      author={Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang},
      year={2021},
      eprint={2104.00613},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}