Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

Adrian Bulat and Georgios Tzimiropoulos


This work is on landmark localization using binarized approximations of Convolutional Neural Networks (CNNs). Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation and face alignment. We exhaustively evaluate various design choices, identify performance bottlenecks, and more importantly propose multiple orthogonal ways to boost performance. (b) Based on our analysis, we propose a novel hierarchical, parallel and multi-scale residual block architecture that yields large performance improvement over the standard bottleneck block when having the same number of parameters, thus bridging the gap between the original network and its binarized counterpart. (c) We also show that the performance boost offered by the proposed architecture is not only observed for the case of binary networks but also generalizes for the case of real valued weights and activations. (d) We perform a large number of ablation studies that shed light on the properties and the performance of the proposed block. (e) We present results for experiments on the most challenging datasets for human pose estimation and face alignment, reporting in many cases state-of-the-art performance.

Image Predictions

Paper and code

Paper: [arxiv] [pdf]


Download models for Human Pose Estimation and Face Alignment:

Dataset used Model size Error
MPII 1.3MB 76.9 (PCK metric)
AFLW2000-3D 1.4MB 3.26 (NME metric)

Note: More models will be added soon.