链接:Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation
简介:
Image synthesis and image-to-image translation are two important generative learning tasks. Remarkable progress has been made by learning Generative Adversarial Networks (GANs)~cite{goodfellow2014generative} and cycle-consistent GANs (CycleGANs)~cite{zhu2017unpaired} respectively. This paper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP) which is a novel architectural unit and can be easily integrated into both generators and discriminators in GANs and CycleGANs. The proposed SPAP integrates Atrous spatial pyramid~cite{chen2018deeplab}, a proposed cascade attention mechanism and residual connections~cite{he2016deep}. It leverages the advantages of the three components to facilitate effective end-to-end generative learning: (i) the capability of fusing multi-scale information by ASPP; (ii) the capability of capturing relative importance between both spatial locations (especially multi-scale context) or feature channels by attention; (iii) the capability of preserving information and enhancing optimization feasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP are studied and intriguing attention maps are observed in both tasks. In experiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128 dataset~cite{karras2017progressive}, and tested in CycleGANs on the Image-to-Image translation datasets including the Cityscape dataset~cite{cordts2016cityscapes}, Facade and Aerial Maps dataset~cite{zhu2017unpaired}, both obtaining better performance.
本文提出了一种可学习的空间金字塔注意力池化(Spatial Pyramid Attentive Pooling (SPAP))方法,这是一种新颖的架构单元,可以很容易地融合到GAN和CycleGANs的生成器和鉴别器中。SPAP融合空洞空间金字塔,瀑布流注意力机制和残差连接。利用了这三个组件的优势,促进了端到端的生成学习:
可以看出SPAP类似于一个模块,并只是一种整体的网络架构,输入和输出都是一种特征图,所以可以很容易地作为一个插件的形式(或者如文中所言的pooling,作为一种新的pooling方式)。注意:其中的
其中
最终的输出是
对于融合不同尺度特征的次序,作者使用不同的方法进行了实验。对于更大的的空洞卷积率,结果是更粗糙的。因此作者作了从粗糙到精细和精细到粗糙方向的实验。
总结:
The proposed SPAP is a simple yet effective module which harnesses the advantages of three ubiquitous components in neural architecture design in a novel way: Atrous spatial pyramid for capturing multi-scale information, a cascade attention scheme for aggregating information between differnt levels in the pyramid, and residual connections. The proposed SPAP module is complementary to many existing efforts towards building more accurate and more robust GANs based generative learning models.
提出的SPAP可以为现存的GAN提供辅助作用,从而建立更准确健壮的GANs。