Publications

  • New VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
    PDF
  • New Restore, Assess, Repeat: A Unified Framework for Iterative Image Restoration

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
    PDF Code
  • New Compress & Cache: Vision token compression for efficient generation and retrieval

    Advances in Neural Information Processing Systems (NeurIPS), 2025
    PDF
  • New Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions

    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
    PDF Code
  • New VladVA: Discriminative Fine-tuning of LVLMs

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
    PDF
  • New FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
    PDF
  • QBB: Quantization with Binary Bases for LLMs

    Advances in Neural Information Processing Systems (NeurIPS), 2024
    PDF
  • Efficient Vision-Language pre-training via domain-specific learning for human activities

    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
    PDF
  • Knowledge Distillation Meets Open-Set Semi-Supervised Learning

    International Journal of Computer Vision (IJCV), 2024
    PDF Code
  • New CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs

    European Conference on Computer Vision (ECCV), 2024
    PDF
  • New You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation

    European Conference on Computer Vision (ECCV), 2024
    PDF
  • New FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    PDF
  • ReGen: A good Generative zero-shot video classifier should be Rewarded

    International Conference on Computer Vision (ICCV), 2023
    PDF
  • Black Box Few-Shot Adaptation for Vision-Language models

    International Conference on Computer Vision (ICCV), 2023
    PDF Code
  • FS-DETR: Few-Shot DEtection TRansformer with prompting and without re-training

    International Conference on Computer Vision (ICCV), 2023
    PDF
  • Bayesian Prompt Learning for Image-Language Model Generalization

    International Conference on Computer Vision (ICCV), 2023
    PDF Code
  • LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    PDF Code
  • Pre-training strategies and datasets for facial representation learning

    European Conference on Computer Vision (ECCV), 2022
    PDF Code
  • EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

    European Conference on Computer Vision (ECCV), 2022
    PDF
  • Space-time Mixing Attention for Video Transformer

    Advances in Neural Information Processing Systems (NeurIPS), 2021
    PDF Code
  • Bit-Mixer: Mixed-precision networks with runtime bit-width selection

    International Conference on Computer Vision (ICCV), 2021
    PDF
  • High-Capacity Expert Binary Networks

    International Conference on Learning Representations (ICLR), 2021
    PDF Code
  • Improving memory banks for unsupervised learning with large mini-batch,consistency and hard negative mining

    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
    PDF
  • Knowledge Distillation via Softmax Regression Representation Learning

    International Conference on Learning Representations (ICLR), 2021
    PDF Code
  • Subpixel Heatmap Regression for Facial Landmark Localization

    British Machine Vision Conference (BMVC), 2021
    PDF Code
  • Estimation of continuous valence and arousal levels from faces in naturalistic conditions

    Nature Machine Inteligence, 2021
    PDF
  • BATS: Binary ArchitecTure Search

    European Conference on Computer Vision (ECCV), 2020
    PDF
  • Training binary neural networks with real-to-binary convolutions

    International Conference on Learning Representations (ICLR), 2020
    PDF
  • Toward fast and accurate human pose estimation via soft-gated skip connections

    IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2020 (ORAL)
    PDF
  • Semi-supervised AU Intensity Estimation with Contrastive Learning

    Asian Conference on Computer Vision (ACCV), 2020
    PDF
  • Incremental multi-domain learning with network latent tensor factorization

    AAAI Conference on Artificial Intelligence, 2020
    PDF
  • FAN-Face: a Simple Orthogonal Improvement to Deep Face Recognition

    AAAI Conference on Artificial Intelligence, 2020
    PDF
  • Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
    PDF
  • XNOR-Net++: Improved binary neural networks

    British Machine Vision Conference (BMVC), 2019
    PDF Code
  • T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
    PDF
  • To learn image super-resolution, use a GAN to learn how to do image degradation first

    European Conference on Computer Vision (ECCV), 2018
    PDF Code
  • Super-FAN: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with GANs

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (SPOTLIGHT)
    PDF
  • Hierarchical binary CNNs for landmark localization with limited resources

    IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018 - Best of ICCV17 SI
    PDF Code
  • Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression

    International Conference on Computer Vision (ICCV), 2017
    PDF
  • How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)

    International Conference on Computer Vision (ICCV), 2017
    PDF Code
  • Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources

    International Conference on Computer Vision (ICCV), 2017 (ORAL)
    PDF Code
  • Two-stage convolutional part heatmap regression for the 1st 3d face alignment in the wild (3DFAW) challenge

    European Conference on Computer Vision Workshop (ECCV-W), 2016 (Challenge Winners)
    PDF
  • Human pose estimation via convolutional part heatmap regression

    European Conference on Computer Vision (ECCV), 2016
    PDF Code
  • Convolutional aggregation of local evidence for large pose face alignment

    British Machine Vision Conference (BMVC), 2016
    PDF