{"id":197,"date":"2022-03-16T02:36:11","date_gmt":"2022-03-16T02:36:11","guid":{"rendered":"https:\/\/blog.liguanxin.cn\/?p=197"},"modified":"2022-03-16T10:30:14","modified_gmt":"2022-03-16T10:30:14","slug":"%e8%ae%ba%e6%96%87%e7%ac%94%e8%ae%b0-swin-transformer-hierarchical-vision-transformer-using-shifted-windows","status":"publish","type":"post","link":"https:\/\/blog.liguanxin.cn\/index.php\/2022\/03\/16\/%e8%ae%ba%e6%96%87%e7%ac%94%e8%ae%b0-swin-transformer-hierarchical-vision-transformer-using-shifted-windows\/","title":{"rendered":"\u8bba\u6587\u7b14\u8bb0\u2014\u2014Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"},"content":{"rendered":"<p><strong>\u521b\u65b0\u70b9\uff1a<br \/>\n\u2460\u628atransformer\u5f15\u5165\u8ba1\u7b97\u673a\u89c6\u89c9<br \/>\n\u2461\u628atransformer\u4e2d\u591a\u5934\u81ea\u6ce8\u610f\u529b\uff08MSA\uff09\u6a21\u5757\u66ff\u6362\u6210\u57fa\u4e8e\u6ed1\u52a8\u7a97\u53e3\u7684\u6a21\u5757<\/strong><\/p>\n<h3>\u6ed1\u52a8\u7a97\u53e3\u673a\u5236<\/h3>\n<p><img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315160619.png\" alt=\"\" \/><br \/>\n\u7ea2\u6846\u7a97\u53e3\u8868\u793a\u4e00\u4e2a\u81ea\u6ce8\u610f\u529b\u5757\uff0c\u7070\u5757\u5728\u5176\u4e2d\u8ba1\u7b97\u81ea\u6ce8\u610f\u529b\u3002\u5047\u8bbe\u7a97\u53e3\u957f\u5bbd\u4e3aW\uff0c\u5219\u7ea2\u6846\u7a97\u53e3\u5728\u4e0b\u4e00\u5c42\u4f1a\u6ed1\u52a8W\/2\u4e2a\u4f4d\u7f6e\uff0c\u7136\u540e\u518d\u6b21\u5728\u7a97\u53e3\u5185\u8ba1\u7b97\u81ea\u6ce8\u610f\u529b\u3002\u7531\u6b64\u53ef\u4ee5\u8ba9\u4e0d\u540c\u7684\u7a97\u53e3\u5185\u7684\u7070\u5757\u4e0e\u5176\u4ed6\u90e8\u5206\u6c9f\u901a\u3002<\/p>\n<h3>\u6574\u4f53\u67b6\u6784<\/h3>\n<p><img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315162000.png\" alt=\"\" \/><br \/>\n\u2460\u628a\u8f93\u5165\u56fe\u50cf\u5212\u5206\u4e3a\u4e0d\u91cd\u53e0\u7684<span class=\"katex-eq\" data-katex-display=\"false\">4*4*3<\/span>\u5927\u5c0f\u7684\u5757\uff0c\u4e8e\u662f\u53ef\u4ee5\u628a\u5355\u4e2a\u5757concat\u8d77\u6765\u4f5c\u4e3a\u8fd9\u4e2a\u5757\u7684token\u3002\u7136\u540e\u8ba9\u8fd9\u4e2atoken\u7ebf\u6027\u6295\u5c04\u5230C\u7ef4\u7684\u7279\u5f81\u7a7a\u95f4\u3002<br \/>\n\u2461\u628a\u8fd9<span class=\"katex-eq\" data-katex-display=\"false\">H\/4*W\/4<\/span>\u4e2a\u7279\u5f81\u7ecf\u8fc7\u4e00\u4e2aSwin Transformer Block\u3002\u968f\u7740\u7f51\u7edc\u6df1\u5165\uff0c\u4e3a\u4e86\u83b7\u5f97\u5206\u7ea7\u7279\u5f81\uff0c\u9010\u6b65\u8ba9\u90bb\u8fd1\u7684<span class=\"katex-eq\" data-katex-display=\"false\">2*2<\/span>\u4e2apatch\u5408\u5e76\uff08concat\uff09\uff0c\u5bf9\u5176\u8fdb\u884c\u964d\u91c7\u6837\uff0c\u8f93\u51fa\u7ef4\u5ea6\u4e3a2C\u3002<br \/>\n\u2462Swin Transformer Block\u7684\u8ba1\u7b97\uff1a<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315164638.png\" alt=\"\" \/><\/p>\n<h3>\u6ed1\u52a8\u7a97\u53e3\u5e26\u6765\u7684\u95ee\u9898<\/h3>\n<p>\u5728\u6ed1\u52a8\u7a97\u53e3\u64cd\u4f5c\u4e4b\u540e\uff0c\u4f1a\u7ed9\u539f\u56fe\u589e\u52a0\u5f88\u591a\u7a97\u53e3\uff0c\u5982\u56fe\u4e00\u7684\u7ea2\u6846\uff0c\u4e3a\u4e86\u907f\u514d\u8fd9\u4e2a\u95ee\u9898\uff0c\u53ef\u4ee5\u628a\u5c0f\u7684\u7a97\u53e3\u8865\u5168\u3002<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315164921.png\" alt=\"\" \/><br \/>\n\u5176\u4e2d\u7684masked\u662f\u4e3a\u4e86\u907f\u514d\u539f\u6765\u4e0d\u76f8\u90bb\u7684patch\u53bb\u8ba1\u7b97\u76f8\u4e92\u7684\u81ea\u6ce8\u610f\u529b\u3002\uff08\u901a\u5e38\u4f1a\u8865\u4e3a\u4e00\u4e2a\u5f88\u5c0f\u7684\u503c\uff0c\u5728softmax\u4e4b\u540e\u5c31\u53d8\u62100\u4e86\uff09<\/p>\n<h3>\u590d\u6742\u5ea6<\/h3>\n<p>\u5047\u8bbe\u4e00\u4e2a\u56fe\u50cf\uff0c\u6bcf\u4e2a\u7a97\u53e3\u5305\u542b<span class=\"katex-eq\" data-katex-display=\"false\">M*M<\/span>\u4e2a\u5757\uff0c\u603b\u5171\u6709<span class=\"katex-eq\" data-katex-display=\"false\">h*w<\/span>\u4e2a\u5757\uff08C\u4e3atoken\u8868\u793a\u7684\u5757\u7684\u7ef4\u5ea6\uff09\uff0c\u590d\u5176\u6742\u5ea6\u4e3a\uff1a<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315163729.png\" alt=\"\" \/><br \/>\n\u590d\u6742\u5ea6\u89e3\u91ca\uff1a<a href=\"https:\/\/blog.csdn.net\/weixin_43135178\/article\/details\/120611131\">https:\/\/blog.csdn.net\/weixin_43135178\/article\/details\/120611131<\/a><\/p>\n<h3>\u76f8\u5bf9\u4f4d\u7f6e\u504f\u7f6e<\/h3>\n<p><img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315170609.png\" alt=\"\" \/><br \/>\n<a href=\"https:\/\/blog.csdn.net\/qq_34914551\/article\/details\/119866975\">https:\/\/blog.csdn.net\/qq_34914551\/article\/details\/119866975<\/a><br \/>\n\u4e3a\u4e86\u7528\u4e00\u4e2a\u6570\u5b57\u6765\u786e\u5b9a\u67d0\u4e2aquery\u548ckey\u7684\u76f8\u5bf9\u4f4d\u7f6e\uff0c\u4ee5\u6b64\u67e5\u8be2\u76f8\u5e94bias\uff08\u53ef\u5b66\u4e60\uff09<br \/>\n\u4e8e\u662f\u6784\u9020\u4e00\u4e2a\u77e9\u9635\uff0c\u7b2ci\u4e2aqurey\u548c\u7b2cj\u4e2akey\u5bf9\u5e94\u7684\u6570\u5b57\u5c31\u662f\u4ed6\u4eec\u5728bias\u8868\u4e2d\u8ddd\u79bb\u5750\u6807\u3002<\/p>\n<pre><code class=\"language-python\">        # define a parameter table of relative position bias\u521d\u59cb\u5316\u4e00\u4e2abias\u8868\n        self.relative_position_bias_table = nn.Parameter(\n            torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH\n        # get pair-wise relative position index for each token inside the window\n        # \u5404\u751f\u6210\u4e00\u4e2atensor\u6570\u7ec4[0, 1, 2, ..., self.window_size[0]-1]\n        # \u5047\u8bbeself.window_size[0]\u548c[1]\u90fd\u4e3a2\n        # \u751f\u6210\u4e00\u4e2atensor\u6570\u7ec4\u4e3a[0, 1]\u548c[0, 1]\n        coords_h = torch.arange(self.window_size[0])\n        coords_w = torch.arange(self.window_size[1])\n        # \u901a\u8fc7meshgrid\u5f62\u6210\u4e24\u4e2a\u77e9\u9635[[0, 0], [1, 1]]\u548c[[0, 1], [0, 1]]\n        # \u518dstack\u4e3a[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]\n        coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww\n        # flatten\u4e3a[[0, 0, 1, 1], [0, 1, 0, 1]]\n        coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww\n        # \u5728\u4e0b\u6807\u4e3a2\u7ef4\u63d2\u5165\u4e00\u4e2a\u7ef4\u5ea6\u51cf\u53bb\u5728\u4e0b\u6807\u4e3a1\u7ef4\u63d2\u5165\u4e00\u4e2a\u7ef4\u5ea6\u5f97\u5230\u7684\u6570\u7ec4\n        # [[[0], [0], [1], [1]], [[0], [1], [0], [1]]] \u51cf\u53bb [[[0, 0, 1, 1]], [[0, 1, 0, 1]]]\n        # \u7b49\u4e8e\n        # [[[ 0,  0, -1, -1],\n        #  [ 0,  0, -1, -1],\n        #  [ 1,  1,  0,  0],\n        #  [ 1,  1,  0,  0]],\n\n        # [[ 0, -1,  0, -1],\n        #  [ 1,  0,  1,  0],\n        #  [ 0, -1,  0, -1],\n        #  [ 1,  0,  1,  0]]]\n        relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww\n        # [[[ 0,  0], [ 0, -1], [-1,  0], [-1, -1]],\n        # [[ 0,  1], [ 0,  0], [-1,  1], [-1,  0]],\n        # [[ 1,  0], [ 1, -1], [ 0,  0], [ 0, -1]],\n        # [[ 1,  1], [ 1,  0], [ 0,  1], [ 0,  0]]]\n        relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2\n        # \u52a0\u4e0a\u504f\u79fb\u91cf\uff0c\u628a\u8d1f\u6570\u53d8\u6210\u6b63\u6570\n        # [[[1, 1], [1, 0], [0, 1], [0, 0]],\n        # [[1, 2],[1, 1], [0, 2], [0, 1]],\n        # [[2, 1], [2, 0], [1, 1], [1, 0]],\n        # [[2, 2], [2, 1], [1, 2], [1, 1]]]\n        relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0\n        relative_coords[:, :, 1] += self.window_size[1] - 1\n        # \u4e58\u4ee5\u4e00\u4e2a\u503c\u8fdb\u884c\u533a\u5206\u6269\u5927\u5750\u6807\u7684\u5206\u5e03\n        # [[[3, 1], [3, 0], [0, 1], [0, 0]],\n        # [[3, 2], [3, 1], [0, 2], [0, 1]],\n        # [[6, 1], [6, 0], [3, 1], [3, 0]],\n        # [[6, 2], [6, 1], [3, 2], [3, 1]]]\n        relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1\n        # \u6700\u540e\u8ba9\u5750\u6807x\uff0cy\u8f74\u76f8\u52a0\n        # [[4, 3, 1, 0],\n        # [5, 4, 2, 1],\n        # [7, 6, 4, 3],\n        # [8, 7, 5, 4]]\n        relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww\n        ...\n        # \u7528\u622a\u65ad\u6b63\u6001\u5206\u5e03\u521d\u59cb\u5316bias\u8868\n        trunc_normal_(self.relative_position_bias_table, std=.02)<\/code><\/pre>\n<h3>\u7591\u95ee<\/h3>\n<p>\u4e3a\u4ec0\u4e48Swin Transformer\u80fd\u51cf\u4f4e\u590d\u6742\u5ea6\uff1f<br \/>\n\u7b54\uff1a\u56e0\u4e3a\u53ea\u5728\u7a97\u53e3\u5185\u8ba1\u7b97\u81ea\u6ce8\u610f\u529b<\/p>\n<p>self-attention\u4e3a\u4ec0\u4e48\u8981\u9664\u4ee5\u6839\u53f7d\uff08d\u662fq\u548ck\u7684\u7ef4\u5ea6\uff09\uff1f<br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/\u5fae\u4fe1\u622a\u56fe_20220315165623.png\" alt=\"\" \/><br \/>\n\u7b54\uff1a1\u3001\u9996\u5148\u8981\u9664\u4ee5\u4e00\u4e2a\u6570\uff0c\u9632\u6b62\u8f93\u5165softmax\u7684\u503c\u8fc7\u5927\uff0c\u5bfc\u81f4\u504f\u5bfc\u6570\u8d8b\u8fd1\u4e8e0\uff1b<br \/>\n2\u3001\u9009\u62e9\u6839\u53f7d\u662f\u56e0\u4e3a\u53ef\u4ee5\u4f7f\u5f97<span class=\"katex-eq\" data-katex-display=\"false\">q*k^T<\/span>\u7684\u7ed3\u679c\u6ee1\u8db3\u671f\u671b\u4e3a0\uff0c\u65b9\u5dee\u4e3a1\u7684\u5206\u5e03\uff0c\u7c7b\u4f3c\u4e8e\u5f52\u4e00\u5316<\/p>\n<p>\u4e3a\u4ec0\u4e48\u4e0d\u4e13\u95e8\u8bbe\u8ba1\u4e00\u4e2a\u7070\u5757\u7528\u4e8e\u4e0d\u540c\u7a97\u53e3\u6c9f\u901a\uff1f<\/p>\n<h3>\u4ee3\u7801<\/h3>\n<pre><code class=\"language-python\">class SwinTransformer(nn.Module):\n        ...\n        # split image into non-overlapping patches\uff08\u628a\u8f93\u5165image\u5212\u5206\u4e3apatch\u5e76\u628atoken\u6295\u5c04\u5230C\u7ef4\uff09\n        self.patch_embed = PatchEmbed(\n            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,\n            norm_layer=norm_layer if self.patch_norm else None)\n        ...\n    def forward_features(self, x):\n        x = self.patch_embed(x)\n        if self.ape:\n            x = x + self.absolute_pos_embed  # \u7edd\u5bf9\u4f4d\u7f6e\u7f16\u7801\n        x = self.pos_drop(x)  # dropout\u5c42\n\n        for layer in self.layers:\n            x = layer(x)  # SwinTransformer Block\n\n        x = self.norm(x)  # B L C\n        x = self.avgpool(x.transpose(1, 2))  # B C 1\n        x = torch.flatten(x, 1)  # \u5c55\u5e73\n        return x\n\n    def forward(self, x):\n        x = self.forward_features(x)\n        x = self.head(x)  # \u7ebf\u6027\u6295\u5c04\u5230num_classes\u4e2a\u5206\u7c7b\n        return x<\/code><\/pre>\n<pre><code class=\"language-python\">class BasicLayer(nn.Module):\n        ...\n        # build blocks\n        self.blocks = nn.ModuleList([\n            SwinTransformerBlock(dim=dim, input_resolution=input_resolution,\n                                 num_heads=num_heads, window_size=window_size,\n                                 shift_size=0 if (i % 2 == 0) else window_size \/\/ 2,\n                                 mlp_ratio=mlp_ratio,\n                                 qkv_bias=qkv_bias, qk_scale=qk_scale,\n                                 drop=drop, attn_drop=attn_drop,\n                                 drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,\n                                 norm_layer=norm_layer)\n            for i in range(depth)])\n        ...\n    def forward(self, x):\n        for blk in self.blocks:\n            if self.use_checkpoint:\n                x = checkpoint.checkpoint(blk, x)\n            else:\n                x = blk(x)\n        if self.downsample is not None:\n            x = self.downsample(x)  # \u4e0b\u91c7\u6837\u8ba9\u5206\u8fa8\u7387\u957f\u5bbd\u5404\u9664\u4ee52\uff0cdim\u4e582\n        return x<\/code><\/pre>\n<pre><code class=\"language-python\">class SwinTransformerBlock(nn.Module):\n        ...\n        # \u8ba1\u7b97\u7a97\u53e3\u95f4\u7684\u81ea\u6ce8\u610f\u529b\n        self.attn = WindowAttention(\n            dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,\n            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)\n\n        self.drop_path = DropPath(drop_path) if drop_path &gt; 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) # Block\u6700\u540e\u7684MLP\u5c42\n        if self.shift_size &gt; 0:  # \u6ed1\u52a8\u7a97\u53e3\n            # calculate attention mask for SW-MSA\uff08\u8ba1\u7b97\u6ed1\u52a8\u7a97\u53e3\u7684mask\u533a\u57df\uff09\n            H, W = self.input_resolution\n            img_mask = torch.zeros((1, H, W, 1))  # 1 H W 1\n            h_slices = (slice(0, -self.window_size),\n                        slice(-self.window_size, -self.shift_size),\n                        slice(-self.shift_size, None))\n            w_slices = (slice(0, -self.window_size),\n                        slice(-self.window_size, -self.shift_size),\n                        slice(-self.shift_size, None))\n            cnt = 0\n            for h in h_slices:\n                for w in w_slices:\n                    img_mask[:, h, w, :] = cnt\n                    cnt += 1\n\n            mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1\n            mask_windows = mask_windows.view(-1, self.window_size * self.window_size)\n            attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)\n            attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))  # \u7528-100\u4f5c\u4e3amask\u7684\u503c\n        else:\n            attn_mask = None\n\n        self.register_buffer(&quot;attn_mask&quot;, attn_mask)  # \u6ce8\u518c\u4e3a\u5e38\u91cf\uff0c\u4e0d\u66f4\u65b0\u53c2\u6570\n    def forward(self, x):\n        H, W = self.input_resolution\n        B, L, C = x.shape\n        assert L == H * W, &quot;input feature has wrong size&quot;\n\n        shortcut = x\n        x = self.norm1(x)\n        x = x.view(B, H, W, C)  # \u7531embeding\u8f6c\u4e3a\u56fe\u7247\u5757\n\n        # cyclic shift\n        if self.shift_size &gt; 0:  # \u6ed1\u52a8\u7a97\u53e3\u8ba9\u56fe\u7247roll window_size \/\/ 2\u4e2a\u4f4d\u7f6e\n            shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))\n        else:\n            shifted_x = x\n\n        # partition windows\uff08\u5212\u5206\u7a97\u53e3\uff09\n        x_windows = window_partition(shifted_x, self.window_size)  # nW*B, window_size, window_size, C\n        x_windows = x_windows.view(-1, self.window_size * self.window_size, C)  # nW*B, window_size*window_size, C\n\n        # W-MSA\/SW-MSA\uff08\u8ba1\u7b97\u666e\u901a\u7a97\u53e3\u548c\u6ed1\u52a8\u7a97\u53e3\u7684\u81ea\u6ce8\u610f\u529b\uff09\n        attn_windows = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C\n\n        # merge windows\uff08\u6062\u590d\u7a97\u53e3\u683c\u5f0f\uff09\n        attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)\n        shifted_x = window_reverse(attn_windows, self.window_size, H, W)  # B H&#039; W&#039; C\n\n        # reverse cyclic shift\uff08\u53cdroll\u64cd\u4f5c\uff09\n        if self.shift_size &gt; 0:\n            x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))\n        else:\n            x = shifted_x\n        x = x.view(B, H * W, C)\n\n        # FFN\n        x = shortcut + self.drop_path(x)\n        x = x + self.drop_path(self.mlp(self.norm2(x)))\n\n        return x<\/code><\/pre>\n<h4>Patch Merging\uff08\u957f\u5bbd\u5404\u7f29\u5c0f\u4e24\u500d\uff0cChanel\u4e3a\u539f\u6765\u7684\u4e24\u500d\uff09<\/h4>\n<p>\u8be5\u6a21\u5757\u7684\u4f5c\u7528\u662f\u5728\u6bcf\u4e2aStage\u5f00\u59cb\u524d\u505a\u964d\u91c7\u6837\uff0c\u7528\u4e8e\u7f29\u5c0f\u5206\u8fa8\u7387\uff0c\u8c03\u6574\u901a\u9053\u6570<br \/>\n<img src=\"https:\/\/blog.liguanxin.cn\/wp-content\/uploads\/2022\/03\/v2-f9c4e3d69da7508562358f9c3f683c63_r-scaled.jpg\" alt=\"\" \/><br \/>\n\u6700\u540e\u52a0\u4e00\u5168\u8fde\u63a5\u5c42\u8c03\u6574\u901a\u9053\u6570<\/p>\n<pre><code class=\"language-python\">class PatchMerging(nn.Module):\n    r&quot;&quot;&quot; Patch Merging Layer.\n\n    Args:\n        input_resolution (tuple[int]): Resolution of input feature.\n        dim (int): Number of input channels.\n        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm\n    &quot;&quot;&quot;\n\n    def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):\n        super().__init__()\n        self.input_resolution = input_resolution\n        self.dim = dim\n        self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)\n        self.norm = norm_layer(4 * dim)\n\n    def forward(self, x):\n        &quot;&quot;&quot;\n        x: B, H*W, C\n        &quot;&quot;&quot;\n        H, W = self.input_resolution\n        B, L, C = x.shape\n        assert L == H * W, &quot;input feature has wrong size&quot;\n        assert H % 2 == 0 and W % 2 == 0, f&quot;x size ({H}*{W}) are not even.&quot;\n\n        x = x.view(B, H, W, C)\n\n        x0 = x[:, 0::2, 0::2, :]  # B H\/2 W\/2 C\n        x1 = x[:, 1::2, 0::2, :]  # B H\/2 W\/2 C\n        x2 = x[:, 0::2, 1::2, :]  # B H\/2 W\/2 C\n        x3 = x[:, 1::2, 1::2, :]  # B H\/2 W\/2 C\n        x = torch.cat([x0, x1, x2, x3], -1)  # B H\/2 W\/2 4*C\n        x = x.view(B, -1, 4 * C)  # B H\/2*W\/2 4*C\n\n        x = self.norm(x)\n        x = self.reduction(x)\n\n        return x<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u521b\u65b0\u70b9\uff1a \u2460\u628atransformer\u5f15\u5165\u8ba1\u7b97\u673a\u89c6\u89c9 \u2461\u628atransformer\u4e2d\u591a\u5934\u81ea\u6ce8\u610f\u529b\uff08MSA\uff09\u6a21\u5757\u66ff\u6362 [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[13,17,11,12],"_links":{"self":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts\/197"}],"collection":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/comments?post=197"}],"version-history":[{"count":7,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts\/197\/revisions"}],"predecessor-version":[{"id":221,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/posts\/197\/revisions\/221"}],"wp:attachment":[{"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/media?parent=197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/categories?post=197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.liguanxin.cn\/index.php\/wp-json\/wp\/v2\/tags?post=197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}