mdl

YOLO Object Detectors, You Only Look Once

This repository contains the implementation of the following papers.

YOLOv3: An Incremental Improvement: Paper
YOLOv4: Optimal Speed and Accuracy of Object Detection: Paper
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors: Paper

Description

YOLO v1 the original implementation was released in 2015 providing a ground-breaking algorithm that would quickly process images and locate objects in a single pass through the detector. The original implementation used a backbone derived from state of the art object classifiers of the time, like GoogLeNet and VGG. More attention was given to the novel YOLO Detection head that allowed for Object Detection with a single pass of an image. Though limited, the network could predict up to 90 bounding boxes per image, and was tested for about 80 classes per box. Also, the model can only make predictions at one scale. These attributes caused YOLO v1 to be more limited and less versatile, so as the year passed, the Developers continued to update and develop this model.

In 2020, YOLO v3 and v4 serve as the upgrades of the YOLO network group. The model uses a custom backbone called Darknet53 that uses knowledge gained from the ResNet paper to improve its predictions. The new backbone also allows for objects to be detected at multiple scales. As for the new detection head, the model now predicts the bounding boxes using a set of anchor box priors (Anchor Boxes) as suggestions. Multiscale predictions in combination with Anchor boxes allow for the network to make up to 1000 object predictions on a single image. Finally, the new loss function forces the network to make better predictions by using Intersection over Union (IoU) to inform the model’s confidence rather than relying on the mean squared error for the entire output.

As of 2023, YOLOv7 further improves the previous versions of YOLOs by introducing ELAN and E-ELAN structures. These new architectures are designed to diversify the gradients (Designing Network Design Strategies Through Gradient Path Analysis) so that the learned models are more expressive. In addition, YOLOv7 introduces auxiliary losses to enhance training, as well as re-parameterization to improve inference speed. Apart from what is mentioning in the paper, YOLOv7 also uses OTA loss (OTA: Optimal Transport Assignment for Object Detection) which gives more gains on mAP.

Authors

YOLOv3 & v4

Vishnu Samardh Banna (@GitHub vishnubanna)
Anirudh Vegesana (@GitHub anivegesana)
Akhil Chinnakotla (@GitHub The-Indian-Chinna)
Tristan Yan (@GitHub Tyan3001)
Naveen Vivek (@GitHub naveen-vivek)
Jacob Zietek (@GitHub jacob-zietek)

YOLOv7

Jiageng Zhang (@Github Zarjagen)

Our Goal
Models in the library
References

Our Goal

Our goal with this model conversion is to provide implementation of the Backbone and YOLO Head. We have built the model in such a way that the YOLO head could be connected to a new, more powerful backbone if a person chose to.

Models in the library

Object Detectors	Classifiers
Yolo-v3	Darknet53
Yolo-v3 tiny	CSPDarknet53
Yolo-v3 spp
Yolo-v4
Yolo-v4 tiny
Yolo-v4 csp
Yolo-v4 large
Yolo-v7
Yolo-v7-tiny
Yolo-v7X

Requirements

This site is open source. Improve this page.

mdl