{"id":477,"date":"2022-07-03T08:55:12","date_gmt":"2022-07-03T08:55:12","guid":{"rendered":"https:\/\/blog.liguanxin.cn\/?p=477"},"modified":"2022-07-13T10:25:08","modified_gmt":"2022-07-13T10:25:08","slug":"%e8%ae%ba%e6%96%87%e7%ac%94%e8%ae%b0-cvpr-2022vision-transformer-with-deformable-attention","status":"publish","type":"post","link":"https:\/\/blog.liguanxin.cn\/index.php\/2022\/07\/03\/%e8%ae%ba%e6%96%87%e7%ac%94%e8%ae%b0-cvpr-2022vision-transformer-with-deformable-attention\/","title":{"rendered":"\u8bba\u6587\u7b14\u8bb0\u2014\u2014[CVPR 2022]Vision Transformer with Deformable Attention"},"content":{"rendered":"<p><strong>\u521b\u65b0\u70b9\uff1a<br \/>\n\u2460\u4ee5\u6570\u636e\u4f9d\u8d56\u7684\u65b9\u5f0f\u5728\u81ea\u6ce8\u610f\u529b\u8ba1\u7b97\u4e2d\u9009\u62e9K\u548cV\u5bf9\u3002<br \/>\n\u2461\u901a\u8fc7\u4e00\u4e2a\u7f51\u7edc\u5b66\u4e60offset\u5750\u6807\uff0c\u91c7\u7528\u53cc\u7ebf\u6027\u63d2\u503c\u8ba1\u7b97\u4f4d\u7f6e\u3002<\/strong><\/p>\n<h2>\u4e0e\u5176\u4ed6\u7f51\u7edc\u7684\u5bf9\u6bd4(Attention\u6a21\u5757)<\/h2>\n<p><img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220702214659.png\" alt=\"\" \/><br \/>\n(a) \u5bf9\u6240\u6709patch\u91c7\u7528\u76f8\u540c\u7684\u81ea\u6ce8\u610f\u529b\uff08\u8ba1\u7b97\u91cf\u548c\u5185\u5b58\u5360\u7528\u5de8\u5927\uff09<br \/>\n(b) \u5bf9\u5212\u5206\u7684\u7a97\u53e3\u91c7\u7528\u81ea\u6ce8\u610f\u529b\uff08\u7a00\u758f\u6ce8\u610f\u529b\uff0c\u5bf9\u66f4\u8fdc\u7684\u6570\u636e\u4e0d\u53ef\u77e5\uff0c\u6ce8\u610f\u529b\u7a97\u53e3\u4e0e\u6570\u636e\u65e0\u5173\uff09<br \/>\n(c) \u53ef\u53d8\u5f62\u5377\u79ef\uff0c\u5bfb\u627e\u9700\u8981\u5377\u79ef\u7684\u533a\u57df\uff08\u5377\u79ef\u533a\u57df\u4e0d\u80fd\u6269\u5927\uff09<br \/>\n(d) \u672c\u6587\u7684\u65b9\u6cd5\uff0c\u81ea\u5df1\u5b66\u4e60\u5bf9\u5e94\u7684\u6ce8\u610f\u529b\u533a\u57df<\/p>\n<p><strong>\u53ef\u53d8\u5f62\u5377\u79ef(DCN):<\/strong><br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220702215246.png\" alt=\"\" \/><br \/>\n\u5047\u8bbe\u5377\u79ef\u533a\u57df\u4e3a3\u00d73\uff0c\u5148\u5bf9\u6574\u4f53\u7279\u5f81\u56fe\u505a3\u00d73conv\u5f97\u5230\u5927\u5c0f\u4e3ah\u00d7w\u00d7(2\u00d7k\u00d7k)\u7684\u4f4d\u7f6e\u504f\u79fb\u77e9\u9635\uff0c\u4efb\u610f\u4e00\u70b9\u53ef\u4ee5\u83b7\u5f972\u00d7k\u00d7k\u7684\u4f4d\u7f6e\u504f\u79fb\u4fe1\u606f\uff08\u6d6e\u70b9\u6570\u53d8\u91cf\u901a\u8fc7\u7ebf\u6027\u63d2\u503c\u6cd5\u7b97\u51fa\u5177\u4f53\u7684\u503c\uff09\uff0c\u5bf9\u504f\u79fb\u540e\u7684\u503c\u505a\u666e\u901a\u5377\u79ef\u64cd\u4f5c\u5373\u53ef\u3002<br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/v2-0e97bd304ed01fbad6b345975949624f_r.jpg\" alt=\"\" \/><\/p>\n<h2>Attention\u7ed3\u6784<\/h2>\n<p><img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220702222840.png\" alt=\"\" \/><br \/>\n(a) \u5747\u5300\u653e\u7f6e\u53c2\u8003\u70b9\uff0c\u6570\u91cf\u4e3agroup\uff0c\u901a\u8fc7\u4e00\u4e2aOffset\u7f51\u7edc\u5b66\u4e60\u53c2\u8003\u70b9\u7684\u504f\u79fb\uff0c\u504f\u79fb\u540e\u7684\u5750\u6807\u6709\u4e24\u4e2a\u4f5c\u7528\uff1a<\/p>\n<ul>\n<li>\u627e\u5230\u6700\u8fd1\u7684\u56db\u4e2a\u5750\u6807\uff0c\u53cc\u7ebf\u6027\u63d2\u503c\u8ba1\u7b97\u5f97\u5230patch<\/li>\n<li>\u8ba1\u7b97\u76f8\u5bf9\u4f4d\u7f6e\u504f\u7f6e\u4f5c\u4e3aB<\/li>\n<\/ul>\n<p>(b) Offset\u8ba1\u7b97\u7f51\u7edc\uff0c\u7531\u4e00\u4e2a5\u00d75 depthwise\u5377\u79ef(stride=r)\u3001\u4e00\u4e2aGELU\u6fc0\u6d3b\u5c42\u548c\u4e00\u4e2a1\u00d71\u5377\u79ef(out_channel=2, \u65e0bias)\u6784\u6210\u3002<\/p>\n<h2>\u53cc\u7ebf\u6027\u63d2\u503c<\/h2>\n<p>\u53cc\u7ebf\u6027\u63d2\u503c\u540e\u7684x\u901a\u8fc7\u4ee5\u4e0b\u516c\u5f0f\u5f97\u5230\uff1a<br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220703150732.png\" alt=\"\" \/><\/p>\n<p>\u5176\u4e2d<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220703150754.png\" alt=\"\" \/>\u662f\u4e3a\u4e86\u627e\u5230\u79bb\u504f\u79fb\u540e\u7684\u70b9\u8ddd\u79bb\u5728\u4e00\u4e2a\u5355\u4f4d\u4ee5\u5185\uff0c\u5927\u4e8e\u4e00\u4e2a\u5355\u4f4d\u7684\u8ba1\u7b97\u4f1a\u53d8\u62100\uff0c\u800crx\u548cry\u662f\u6240\u6709\u7684\u5750\u6807\u5bf9\u5e94\u7684\u503c\u3002<br \/>\n\u53cc\u7ebf\u6027\u63d2\u503c\u53ef\u4ee5\u7528\u4e0b\u56fe\u7406\u89e3\uff1a<br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220703150621.png\" alt=\"\" \/><\/p>\n<h2>\u76f8\u5bf9\u4f4d\u7f6e\u504f\u7f6e<\/h2>\n<p>\u7531\u4e8e\u7279\u5f81\u56fe\u662f\u5168\u5c3a\u5bf8H\u00d7W\u7684\uff0c\u76f8\u5bf9\u4f4d\u7f6e\u504f\u7f6e\u77e9\u9635\u4e5f\u5e94\u8be5\u662f\u5168\u5c3a\u5bf8\u7684<span class=\"katex-eq\" data-katex-display=\"false\">\\hat{B} \\in \\mathbb{R}^{{2H-1} \\times {2W-1}}<\/span><\/p>\n<h2>\u7f51\u7edc\u603b\u4f53\u7ed3\u6784<\/h2>\n<p><img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220703152712.png\" alt=\"\" \/><br \/>\n\u4f5c\u8005\u901a\u8fc7\u5b9e\u9a8c\u8bc1\u660e\uff0c\u524d\u4e24\u4e2astage\u7528swin\uff0c\u540e\u4e24\u4e2astage\u7528dat\u5f97\u5230\u7684\u7ed3\u679c\u6700\u597d\u3002<\/p>\n<p>\u5404stage\u7684\u53c2\u6570\uff1a<br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/07\/\u5fae\u4fe1\u622a\u56fe_20220703152925.png\" alt=\"\" \/><\/p>\n<h2>CODE<\/h2>\n<p>DAttention\u7684\u4ee3\u7801<\/p>\n<pre><code class=\"language-python\">    def forward(self, x):\n\n        B, C, H, W = x.size()\n        dtype, device = x.dtype, x.device\n\n        # \u901a\u8fc71*1\u5377\u79ef\u6295\u5f71\u5f97\u5230q\n        q = self.proj_q(x)\n        # \u628aq\u5206\u4e3ag\u4e2a\u7ec4\n        q_off = einops.rearrange(q, &#039;b (g c) h w -&gt; (b g) c h w&#039;, g=self.n_groups, c=self.n_group_channels)\n        # \u7ecf\u8fc7offset\u7f51\u7edc\n        offset = self.conv_offset(q_off) # B * g 2 Hg Wg\n        Hk, Wk = offset.size(2), offset.size(3)\n        # \u8ba1\u7b97\u7279\u5f81\u56fe\u5927\u5c0f\n        n_sample = Hk * Wk\n\n        # \u8ba1\u7b97\u504f\u79fb\u91cf\uff0c\u5148\u628a\u504f\u79fb\u91cf\u901a\u8fc7tanh\u628a\u8303\u56f4\u5f52\u5230-1~1\uff0c\u518d\u9664\u4ee5\u5bf9\u5e94\u7684\u9ad8\u6216\u8005\u5bbd\uff0c\u4e58\u4ee5\u6700\u5927\u504f\u79fb\u91cfoffset_range_factor\n        if self.offset_range_factor &gt; 0:\n            offset_range = torch.tensor([1.0 \/ Hk, 1.0 \/ Wk], device=device).reshape(1, 2, 1, 1)\n            offset = offset.tanh().mul(offset_range).mul(self.offset_range_factor)\n\n        offset = einops.rearrange(offset, &#039;b p h w -&gt; b h w p&#039;)\n        # \u5f97\u5230\u7b49\u8ddd\u4f4d\u7f6e\u4fe1\u606f\u77e9\u9635\n        reference = self._get_ref_points(Hk, Wk, B, dtype, device)\n\n        if self.no_off:\n            offset = offset.fill(0.0)\n\n        if self.offset_range_factor &gt;= 0:\n            # \u4f4d\u7f6e\u4fe1\u606f\u77e9\u9635\u52a0\u4e0a\u8303\u56f4\u4e3a-offset_range_factor~+offset_range_factor\u7684\u504f\u79fb\n            pos = offset + reference\n        else:\n            pos = (offset + reference).tanh()\n\n        # grid_sample\u63d0\u4f9b\u4e00\u4e2ainput\u4ee5\u53ca\u4e00\u4e2a\u7f51\u683c\uff0c\u7136\u540e\u6839\u636egrid\u4e2d\u6bcf\u4e2a\u4f4d\u7f6e\u63d0\u4f9b\u7684\u5750\u6807\u4fe1\u606f(input\u4e2dpixel\u7684\u5750\u6807)\uff0c\n        # \u5c06input\u4e2d\u5bf9\u5e94\u4f4d\u7f6e\u7684\u50cf\u7d20\u503c\u586b\u5145\u5230grid\u6307\u5b9a\u7684\u4f4d\u7f6e\uff0c\u5f97\u5230\u6700\u7ec8\u7684\u8f93\u51fa\u3002\n        # \u4f4d\u7f6e\u975e\u6574\u6570\u5219\u91c7\u7528\u7ebf\u6027\u63d2\u503c\u6cd5\n        x_sampled = F.grid_sample(\n            input=x.reshape(B * self.n_groups, self.n_group_channels, H, W), \n            grid=pos[..., (1, 0)], # y, x -&gt; x, y\n            mode=&#039;bilinear&#039;, align_corners=True) # B * g, Cg, Hg, Wg\n\n        x_sampled = x_sampled.reshape(B, C, 1, n_sample)\n\n        # q\u4e4b\u524d\u5df2\u7ecf\u5f97\u5230\uff0c\u901a\u8fc7\u91c7\u6837\u540e\u7684x_sampled\u8ba1\u7b97k\u548cv\n        q = q.reshape(B * self.n_heads, self.n_head_channels, H * W)\n        k = self.proj_k(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)\n        v = self.proj_v(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)\n\n        # q\u4e58k\n        attn = torch.einsum(&#039;b c m, b c n -&gt; b m n&#039;, q, k) # B * h, HW, Ns\n        # \u4e58\u4ee5\u7f29\u653e\u500d\u6570\n        attn = attn.mul(self.scale)\n\n        # \u4f7f\u7528\u504f\u7f6e\n        if self.use_pe:\n\n            if self.dwc_pe:\n                # \u53ea\u5728\u6700\u540e\u52a0\u4e0a\u504f\u7f6e\n                residual_lepe = self.rpe_table(q.reshape(B, C, H, W)).reshape(B * self.n_heads, self.n_head_channels, H * W)\n            elif self.fixed_pe:\n                # \u56fa\u5b9a\u504f\u7f6e\uff08\u5b66\u4e60\u6240\u6709\u53ef\u80fd\u7684q\u548ck\u7684\u56fa\u5b9a\u4f4d\u7f6e\uff09\n                rpe_table = self.rpe_table\n                attn_bias = rpe_table[None, ...].expand(B, -1, -1, -1)\n                attn = attn + attn_bias.reshape(B * self.n_heads, H * W, self.n_sample)\n            else:\n                # \u76f8\u5bf9\u504f\u7f6e\uff08\u5b66\u4e60\u6240\u6709\u53ef\u80fd\u7684q\u548ck\u7684\u76f8\u5bf9\u4f4d\u7f6e\uff09\n                rpe_table = self.rpe_table\n                rpe_bias = rpe_table[None, ...].expand(B, -1, -1, -1)\n\n                q_grid = self._get_ref_points(H, W, B, dtype, device)\n\n                displacement = (q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups, n_sample, 2).unsqueeze(1)).mul(0.5)\n\n                attn_bias = F.grid_sample(\n                    input=rpe_bias.reshape(B * self.n_groups, self.n_group_heads, 2 * H - 1, 2 * W - 1),\n                    grid=displacement[..., (1, 0)],\n                    mode=&#039;bilinear&#039;, align_corners=True\n                ) # B * g, h_g, HW, Ns\n\n                attn_bias = attn_bias.reshape(B * self.n_heads, H * W, n_sample)\n\n                # qk\u4e4b\u540e\u52a0\u4e0a\u4f4d\u7f6e\u504f\u7f6e\n                attn = attn + attn_bias\n\n        attn = F.softmax(attn, dim=2)\n        attn = self.attn_drop(attn)\n\n        # qk\u4e4b\u540e\u7ecf\u8fc7softmax\u518d\u4e58v\n        out = torch.einsum(&#039;b m n, b c n -&gt; b c m&#039;, attn, v)\n\n        if self.use_pe and self.dwc_pe:\n            out = out + residual_lepe\n        out = out.reshape(B, C, H, W)\n\n        # \u518d\u8fc7\u4e00\u4e2a1*1\u5377\u79ef\n        y = self.proj_drop(self.proj_out(out))\n\n        return y, pos.reshape(B, self.n_groups, Hk, Wk, 2), reference.reshape(B, self.n_groups, Hk, Wk, 2)<\/code><\/pre>\n<hr \/>\n<p>MLP\u5c42<\/p>\n<pre><code class=\"language-python\">class TransformerMLPWithConv(nn.Module):\n\n    def __init__(self, channels, expansion, drop):\n\n        super().__init__()\n\n        self.dim1 = channels\n        self.dim2 = channels * expansion\n        self.linear1 = nn.Conv2d(self.dim1, self.dim2, 1, 1, 0)\n        self.drop1 = nn.Dropout(drop, inplace=True)\n        self.act = nn.GELU()\n        self.linear2 = nn.Conv2d(self.dim2, self.dim1, 1, 1, 0) \n        self.drop2 = nn.Dropout(drop, inplace=True)\n        self.dwc = nn.Conv2d(self.dim2, self.dim2, 3, 1, 1, groups=self.dim2)\n\n    def forward(self, x):\n\n        x = self.drop1(self.act(self.dwc(self.linear1(x))))\n        x = self.drop2(self.linear2(x))\n\n        return x<\/code><\/pre>\n<p>\u5176\u4e2d\u9ed8\u8ba4drop\u90fd\u662f0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u521b\u65b0\u70b9\uff1a \u2460\u4ee5\u6570\u636e\u4f9d\u8d56\u7684\u65b9\u5f0f\u5728\u81ea\u6ce8\u610f\u529b\u8ba1\u7b97\u4e2d\u9009\u62e9K\u548cV\u5bf9\u3002 \u2461\u901a\u8fc7\u4e00\u4e2a\u7f51\u7edc\u5b66\u4e60offset\u5750\u6807\uff0c\u91c7\u7528\u53cc\u7ebf\u6027\u63d2\u503c [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[13,17,11],"_links":{"self":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts\/477"}],"collection":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/comments?post=477"}],"version-history":[{"count":0,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts\/477\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/media?parent=477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/categories?post=477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/tags?post=477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}