The Bidirectional Awareness Induction in Autoregressive Sequence-To-Sequence Models
Keywords:
Image processing & computer vision, Natural language processing, Artificial IntelligenceAbstract
Autoregressive Sequence-To-Sequence (Seq2Seq) models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. However, their limitations motivated researchers to explore different architectures and methodologies toward bidirectional solutions. In this work, we introduce the Bidirectional Awareness Induction (BAI), a flexible training method that enhances the information retained in a subset of the network results, which we call pivot, through bidirectional loss terms. Our method led to improvements across multiple architectures, such as the Transformer,
ExpansionNet v2, Flan-T5-Small, GPT-2, mBART, and four NLP tasks, such as Neural Machine Translation, Image Captioning, and Text Summarization with an observed improvement of 4.96 BLEU, 2.4 CIDEr-D and 1.16 ROUGE respectively. Compared to existing methods, BAI does not require architectural modifications, it is flexible, efficient, and can be applied to pre-trained models.
Downloads
Published
Issue
Track Selection
License
Copyright (c) 2025 The Authors(s)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.