{"id":7612,"date":"2024-07-12T23:01:01","date_gmt":"2024-07-12T15:01:01","guid":{"rendered":""},"modified":"2024-07-12T23:01:01","modified_gmt":"2024-07-12T15:01:01","slug":"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c","status":"publish","type":"post","link":"https:\/\/mushiming.com\/7612.html","title":{"rendered":"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c"},"content":{"rendered":"

\n <\/path> \n<\/svg> <\/p>\n

\u5f3a\u5316\u5b66\u4e60 (reinforcement learning) \u7ecf\u8fc7\u4e86\u51e0\u5341\u5e74\u7684\u7814\u53d1\uff0c\u5728\u4e00\u76f4\u7a33\u5b9a\u53d1\u5c55\uff0c\u6700\u8fd1\u53d6\u5f97\u4e86\u5f88\u591a\u50b2\u4eba\u7684\u6210\u679c\uff0c\u540e\u9762\u4f1a\u6709\u8d8a\u6765\u8d8a\u597d\u7684\u8fdb\u5c55\u3002\u5f3a\u5316\u5b66\u4e60\u5e7f\u6cdb\u5e94\u7528\u4e8e\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\u7b49\u9886\u57df\u3002<\/p>\n

\u4e0b\u9762\u7b80\u5355\u5217\u4e3e\u4e00\u4e9b\u5f3a\u5316\u5b66\u4e60\u7684\u6210\u529f\u6848\u4f8b\uff0c\u7136\u540e\u5bf9\u5f3a\u5316\u5b66\u4e60\u505a\u7b80\u4ecb\uff0c\u4ecb\u7ecd\u4e24\u4e2a\u4f8b\u5b50\uff1a\u6700\u77ed\u8def\u5f84\u548c\u56f4\u68cb\uff0c\u8ba8\u8bba\u5982\u4f55\u5e94\u7528\u5f3a\u5316\u5b66\u4e60\uff0c\u8ba8\u8bba\u4e00\u4e9b\u4ecd\u7136\u5b58\u5728\u7684\u95ee\u9898\u548c\u5efa\u8bae\uff0c\u4ecb\u7ecd\u300a\u673a\u5668\u5b66\u4e60\u300b\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e13\u520a\u548c\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a\uff0c\u4ecb\u7ecd\u5f3a\u5316\u5b66\u4e60\u8d44\u6599\uff0c\u56de\u987e\u5f3a\u5316\u5b66\u4e60\u7b80\u53f2\uff0c\u6700\u540e\uff0c\u7b80\u5355\u8ba8\u8bba\u5f3a\u5316\u5b66\u4e60\u7684\u524d\u666f\u3002<\/p>\n

\u76ee\u5f55<\/h3>\n

\u4e00\u3001\u6210\u529f\u6848\u4f8b
\u4e8c\u3001\u5f3a\u5316\u5b66\u4e60\u4e0e\u76f8\u5173\u5b66\u79d1\u7684\u5173\u7cfb
\u4e09\u3001\u5f3a\u5316\u5b66\u4e60\u7b80\u4ecb
\u56db\u3001\u4f8b\u5b50
\u4e94\u3001\u66f4\u591a\u5f3a\u5316\u5b66\u4e60\u7b80\u4ecb
\u516d\u3001\u5f3a\u5316\u5b66\u4e60\u8bcd\u6c47
\u4e03\u3001\u5982\u4f55\u5e94\u7528\u5f3a\u5316\u5b66\u4e60
\u516b\u3001\u5f3a\u5316\u5b66\u4e60\u73b0\u5b58\u95ee\u9898\u53ca\u5efa\u8bae
\u4e5d\u3001\u300a\u673a\u5668\u5b66\u4e60\u300b\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e13\u520a
\u5341\u3001\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a
\u5341\u4e00\u3001\u5f3a\u5316\u5b66\u4e60\u8d44\u6599
\u5341\u4e8c\u3001\u5f3a\u5316\u5b66\u4e60\u7b80\u53f2
\u5341\u4e09\u3001\u5f3a\u5316\u5b66\u4e60\u65f6\u4ee3\u6b63\u5728\u5230\u6765
\u5341\u56db\u3001\u6ce8\u91ca\u53c2\u8003\u6587\u732e<\/p>\n

\u4e00\u3001\u6210\u529f\u6848\u4f8b<\/h3>\n

\u6211\u4eec\u5df2\u7ecf\u89c1\u8bc1\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u4e00\u4e9b\u7a81\u7834\uff0c\u6bd4\u5982\u6df1\u5ea6Q\u7f51\u7edc (Deep Q-Network, DQN)\u5e94\u7528\u4e8e\u96c5\u8fbe\u5229(Atari)\u6e38\u620f\u3001AlphaGo (\u4e5f\u5305\u62ecAlphaGo Zero\u548cAlphaZero)\u3001\u4ee5\u53caDeepStack\/Libratus\u7b49\u3002\u5b83\u4eec\u6bcf\u4e00\u4e2a\u90fd\u4ee3\u8868\u4e86\u4e00\u5927\u7c7b\u95ee\u9898\uff0c\u4e5f\u90fd\u4f1a\u6709\u5927\u91cf\u7684\u5e94\u7528\u3002DQN\u5e94\u7528\u4e8e\u96c5\u8fbe\u5229\u6e38\u620f\u4ee3\u8868\u7740\u5355\u73a9\u5bb6\u6e38\u620f\uff0c\u6216\u66f4\u4e00\u822c\u6027\u7684\u5355\u667a\u80fd\u4f53 (agent) \u63a7\u5236\u95ee\u9898\u3002DQN\u70b9\u71c3\u4e86\u8fd9\u4e00\u6ce2\u7814\u53d1\u4eba\u5458\u5bf9\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7684\u70ed\u60c5\u3002AlphaGo\u4ee3\u8868\u7740\u53cc\u4eba\u5b8c\u7f8e\u4fe1\u606f\u96f6\u548c\u6e38\u620f\u3002AlphaGo\u5728\u56f4\u68cb\u8fd9\u6837\u8d85\u7ea7\u96be\u7684\u95ee\u9898\u4e0a\u53d6\u5f97\u4e86\u4e3e\u4e16\u77a9\u76ee\u7684\u6210\u7ee9\uff0c\u662f\u4eba\u5de5\u667a\u80fd\u7684\u4e00\u4e2a\u91cc\u7a0b\u7891\u3002AlphaGo\u8ba9\u666e\u7f57\u5927\u4f17\u8ba4\u8bc6\u5230\u4eba\u5de5\u667a\u80fd\uff0c\u5c24\u5176\u662f\u5f3a\u5316\u5b66\u4e60\u7684\u5b9e\u529b\u548c\u9b45\u529b\u3002DeepStack\/Libratus\u4ee3\u8868\u7740\u53cc\u4eba\u4e0d\u5b8c\u7f8e\u4fe1\u606f\u96f6\u548c\u6e38\u620f\uff0c\u662f\u4e00\u7c7b\u5f88\u96be\u7684\u95ee\u9898\uff0c\u4e5f\u53d6\u5f97\u4e86\u4eba\u5de5\u667a\u80fd\u91cc\u7a0b\u7891\u7ea7\u522b\u7684\u6210\u7ee9\u3002<\/p>\n

\u8c37\u6b4cDeepmind AlphaStar\u6253\u8d25\u4e86\u661f\u9645\u4e89\u9738\u4eba\u7c7b\u9ad8\u624b\u3002Deepmind\u5728\u4e00\u6b3e\u591a\u4eba\u62a2\u65d7\u6e38\u620f(Catch the Flag)\u4e2d\u8fbe\u5230\u4e86\u4eba\u7c7b\u73a9\u5bb6\u6c34\u5e73\u3002OpenAI Five\u6253\u8d25\u4e86\u4eba\u7c7b\u5200\u5854(Dota)\u9ad8\u624b\u3002OpenAI\u8bad\u7ec3\u4e86\u7c7b\u4eba\u673a\u5668\u4eba\u624bDactyl, \u7528\u4e8e\u7075\u6d3b\u5730\u64cd\u7eb5\u5b9e\u7269\u3002\u8c37\u6b4c\u4eba\u5de5\u667a\u80fd\u628a\u5f3a\u5316\u5b66\u4e60\u7528\u5230\u6570\u636e\u4e2d\u5fc3\u5236\u51b7\u8fd9\u6837\u4e00\u4e2a\u5b9e\u7528\u7cfb\u7edf\u3002DeepMimic\u6a21\u62df\u4eba\u5f62\u673a\u5668\u4eba\uff0c\u638c\u63e1\u9ad8\u96be\u5ea6\u7684\u8fd0\u52a8\u6280\u80fd\u3002\u5f3a\u5316\u5b66\u4e60\u4e5f\u5e94\u7528\u4e8e\u5316\u5b66\u5206\u5b50\u9006\u5408\u6210\u548c\u65b0\u836f\u8bbe\u8ba1\u3002\u7b49\u7b49\u3002<\/p>\n

\u5f3a\u5316\u5b66\u4e60\u4e5f\u5df2\u7ecf\u88ab\u7528\u5230\u4ea7\u54c1\u548c\u670d\u52a1\u4e2d\u3002\u8c37\u6b4c\u4e91\u7684\u81ea\u52a8\u673a\u5668\u5b66\u4e60 (AutoML) \u63d0\u4f9b\u4e86\u81ea\u52a8\u4f18\u5316\u795e\u7ecf\u5143\u7f51\u7edc\u7ed3\u6784\u8bbe\u8ba1\u8fd9\u6837\u7684\u670d\u52a1\u3002\u8138\u4e66\u5f00\u6e90\u4e86Horizon\u4ea7\u54c1\u548c\u670d\u52a1\uff0c\u5b9e\u73b0\u901a\u77e5\u4f20\u8fbe\u3001\u89c6\u9891\u6d41\u6bd4\u7279\u7387\u4f18\u5316\u7b49\u529f\u80fd\u3002\u8c37\u6b4c\u7814\u53d1\u4e86\u57fa\u4e8e\u5f3a\u5316\u5b66\u4e60\u7684YouTube\u89c6\u9891\u63a8\u8350\u7b97\u6cd5\u3002\u4e9a\u9a6c\u900a\u4e0e\u82f1\u7279\u5c14\u5408\u4f5c\uff0c\u53d1\u5e03\u4e86\u4e00\u6b3e\u5f3a\u5316\u5b66\u4e60\u5b9e\u4f53\u6d4b\u8bd5\u5e73\u53f0AWS DeepRacer. \u6ef4\u6ef4\u51fa\u884c\u5219\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u6d3e\u5355\u7b49\u4e1a\u52a1\u3002\u963f\u91cc\u3001\u4eac\u4e1c\u3001\u5feb\u624b\u7b49\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u63a8\u8350\u7cfb\u7edf\u3002<\/p>\n

\u4e8c\u3001\u5f3a\u5316\u5b66\u4e60\u4e0e\u76f8\u5173\u5b66\u79d1\u7684\u5173\u7cfb<\/h3>\n

\u5f3a\u5316\u5b66\u4e60\u4e00\u822c\u770b\u6210\u662f\u673a\u5668\u5b66\u4e60\u7684\u4e00\u79cd\u3002\u673a\u5668\u5b66\u4e60\u4ece\u6570\u636e\u4e2d\u5b66\u4e60\u505a\u9884\u6d4b\u6216\u51b3\u7b56\u3002\u4e00\u822c\u628a\u673a\u5668\u5b66\u4e60\u5206\u4e3a\u76d1\u7763\u5b66\u4e60\u3001\u65e0\u76d1\u7763\u5b66\u4e60\u3001\u548c\u5f3a\u5316\u5b66\u4e60\u3002\u76d1\u7763\u5b66\u4e60\u4e2d\u7684\u6570\u636e\u6709\u6807\u6ce8\uff1b\u65e0\u76d1\u7763\u5b66\u4e60\u7684\u6570\u636e\u6ca1\u6709\u6807\u6ce8\u3002\u5206\u7c7b\u548c\u56de\u5f52\u662f\u4e24\u7c7b\u76d1\u7763\u5b66\u4e60\u95ee\u9898\uff0c\u5176\u8f93\u51fa\u5206\u522b\u662f\u7c7b\u522b\u548c\u6570\u5b57\u3002\u5f3a\u5316\u5b66\u4e60\u4e2d\u6709\u8bc4\u4f30\u53cd\u9988\uff0c\u5374\u6ca1\u6709\u6807\u6ce8\u6570\u636e\u3002\u8bc4\u4f30\u53cd\u9988\u4e0d\u80fd\u50cf\u76d1\u7763\u5b66\u4e60\u4e2d\u7684\u6807\u6ce8\u90a3\u6837\u6307\u660e\u4e00\u4e2a\u51b3\u7b56\u6b63\u786e\u4e0e\u5426\u3002\u4e0e\u76d1\u7763\u5b66\u4e60\u76f8\u6bd4\uff0c\u5f3a\u5316\u5b66\u4e60\u8fd8\u6709\u6210\u7ee9\u5206\u914d\u3001\u7a33\u5b9a\u6027\u3001\u63a2\u7d22\u4e0e\u5229\u7528\u7b49\u65b9\u9762\u7684\u6311\u6218\u3002\u6df1\u5ea6\u5b66\u4e60\uff0c\u4e5f\u5c31\u662f\u901a\u8fc7\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\u8fdb\u884c\u5b66\u4e60\uff0c\u53ef\u4ee5\u4f5c\u4e3a\u6216\u7528\u4e8e\u4e0a\u9762\u51e0\u79cd\u673a\u5668\u5b66\u4e60\u65b9\u6cd5\u3002\u6df1\u5ea6\u5b66\u4e60\u662f\u673a\u5668\u5b66\u4e60\u7684\u4e00\u90e8\u5206\uff0c\u800c\u673a\u5668\u5b66\u4e60\u53c8\u662f\u4eba\u5de5\u667a\u80fd\u7684\u4e00\u90e8\u5206\u3002\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u5219\u662f\u6df1\u5ea6\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u7684\u7ed3\u5408\u3002\u4e0b\u56fe\u5de6\u4fa7\u6846\u56fe\u4e3a\u673a\u5668\u5b66\u4e60\u7684\u5206\u7c7b\uff0c\u5f15\u81ea\u7ef4\u57fa\u767e\u79d1\uff1b\u53f3\u4fa7\u6846\u56fe\u662f\u4eba\u5de5\u667a\u80fd\u7684\u5206\u7c7b\uff0c\u5f15\u81ea\u6d41\u884c\u7684Russell & Norvig \u4eba\u5de5\u667a\u80fd\u6559\u6750\u3002
\"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c
\u4e8b\u5b9e\u4e0a\u8fd9\u4e9b\u9886\u57df\u90fd\u5728\u4e0d\u65ad\u53d1\u5c55\u3002\u6df1\u5ea6\u5b66\u4e60\u53ef\u4ee5\u4e0e\u5176\u5b83\u673a\u5668\u5b66\u4e60\u3001\u4eba\u5de5\u667a\u80fd\u7b97\u6cd5\u4e00\u9053\u5b8c\u6210\u67d0\u9879\u4efb\u52a1\u3002\u6df1\u5ea6\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u6b63\u5728\u52aa\u529b\u89e3\u51b3\u4e00\u4e9b\u4f20\u7edf\u7684\u4eba\u5de5\u667a\u80fd\u95ee\u9898\uff0c\u6bd4\u5982\u903b\u8f91\u3001\u63a8\u7406\u3001\u77e5\u8bc6\u8868\u8fbe\u7b49\u3002\u5c31\u50cf\u6d41\u884c\u7684Russell & Norvig \u4eba\u5de5\u667a\u80fd\u6559\u6750\u6240\u8ff0\uff0c\u53ef\u4ee5\u8ba4\u4e3a\u5f3a\u5316\u5b66\u4e60\u5305\u62ec\u6240\u6709\u7684\u4eba\u5de5\u667a\u80fd\uff1a\u5728\u73af\u5883\u4e2d\u7684\u667a\u80fd\u4f53\u5fc5\u987b\u5b66\u4e60\u5982\u4f55\u5728\u91cc\u8fb9\u6210\u529f\u7684\u8868\u73b0\uff1b\u4ee5\u53ca\uff0c\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u770b\u6210\u6574\u4e2a\u4eba\u5de5\u667a\u80fd\u95ee\u9898\u7684\u5fae\u751f\u7269\u3002\u53e6\u5916\uff0c\u5e94\u8be5\u8bf4\u660e\uff0c\u76d1\u7763\u5b66\u4e60\u3001\u65e0\u76d1\u7763\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u4e24\u4e24\u4e4b\u95f4\u6709\u4e00\u5b9a\u4ea4\u53c9\u3002<\/p>\n

\u5982\u4e0b\u56fe\u6240\u793a\uff0c\u5f3a\u5316\u5b66\u4e60\u4e0e\u8ba1\u7b97\u673a\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u6570\u5b66\u3001\u7ecf\u6d4e\u5b66\u3001\u5fc3\u7406\u5b66\u3001\u795e\u7ecf\u79d1\u5b66\u3001\u673a\u5668\u5b66\u4e60\u3001\u6700\u4f18\u63a7\u5236\u3001\u8fd0\u7b79\u5b66\u3001\u535a\u5f08\u8bba\u3001\u6761\u4ef6\u53cd\u5c04\u3001\u5956\u8d4f\u7cfb\u7edf\u7b49\u90fd\u6709\u5185\u5728\u7684\u8054\u7cfb\u3002\u6b64\u56fe\u4e3aDavid Silver\u5f3a\u5316\u5b66\u4e60\u82f1\u6587\u7248\u8bfe\u4ef6\u7684\u4e2d\u6587\u7ffb\u8bd1\u3002<\/p>\n

\u5f3a\u5316\u5b66\u4e60\/\u4eba\u5de5\u667a\u80fd\u3001\u8fd0\u7b79\u5b66\u3001\u6700\u4f18\u63a7\u5236\u8fd9\u4e9b\u5b66\u79d1\u90fd\u4ee5\u5e94\u7528\u6570\u5b66\u3001\u4f18\u5316\u3001\u7edf\u8ba1\u4e3a\u57fa\u7840\uff0c\u540c\u65f6\u4e3a\u79d1\u5b66\u5de5\u7a0b\u5404\u65b9\u9762\u7684\u5e94\u7528\u63d0\u4f9b\u5de5\u5177\u3002\u8fd0\u7b79\u5b66\u3001\u6700\u4f18\u63a7\u5236\u4e00\u822c\u9700\u8981\u6a21\u578b\uff1b\u6bd4\u5982\u6df7\u5408\u6574\u6570\u89c4\u5212\u3001\u968f\u673a\u89c4\u5212\u7b49\u6570\u5b66\u8868\u8fbe\u5f0f\u5c31\u662f\u6a21\u578b\u7684\u4f53\u73b0\u3002\u6a21\u578b\u4e00\u822c\u4e0d\u51c6\u786e\u3001\u6d4b\u4e0d\u51c6\uff1b\u53c2\u6570\u4f30\u8ba1\u4e00\u822c\u6709\u8bef\u5dee\u3002\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u4e0d\u7528\u6a21\u578b\uff0c\u76f4\u63a5\u901a\u8fc7\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\uff0c\u4ece\u800c\u505a\u51fa\u63a5\u8fd1\u6700\u4f18\u6216\u6700\u4f18\u7684\u51b3\u7b56\u3002\u6570\u636e\u53ef\u4ee5\u6765\u81ea\u5b8c\u7f8e\u6a21\u578b\u3001\u7cbe\u51c6\u4eff\u771f\u5668\u3001\u6216\u5927\u6570\u636e\u3002\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u5904\u7406\u5f88\u590d\u6742\u7684\u95ee\u9898\u3002AlphaGo\u7ed9\u4e86\u4e00\u4e2a\u6709\u529b\u8bc1\u660e\u3002\u7b56\u7565\u8fed\u4ee3\u63d0\u4f9b\u4e86\u4e00\u6761\u4e0d\u65ad\u63d0\u5347\u6027\u80fd\u7684\u9014\u5f84\u3002\u5f3a\u5316\u5b66\u4e60\/\u4eba\u5de5\u667a\u80fd\u3001\u8fd0\u7b79\u5b66\u3001\u6700\u4f18\u63a7\u5236\u76f8\u4e92\u4fc3\u8fdb\uff0c\u5404\u53d6\u6240\u957f\u3002\u5f3a\u5316\u5b66\u4e60\u5f97\u76ca\u4e8e\u52a8\u7269\u5b66\u4e60\u3001\u795e\u7ecf\u79d1\u5b66\u3001\u5fc3\u7406\u5b66\u7684\u5956\u8d4f\u7cfb\u7edf\u3001\u6761\u4ef6\u53cd\u5c04\u7b49\u3002\u540c\u65f6\uff0c\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u89e3\u91ca\u591a\u5df4\u80fa\u7b49\u795e\u7ecf\u79d1\u5b66\u4e2d\u7684\u673a\u5236\u3002\u56fe\u4e2d\u6ca1\u6709\u5c55\u793a\uff0c\u4f46\u5fc3\u7406\u5b66\u3001\u795e\u7ecf\u79d1\u5b66\u4e3a\u5f3a\u5316\u5b66\u4e60\/\u4eba\u5de5\u667a\u80fd\u4e0e\u793e\u4f1a\u79d1\u5b66\u3001\u827a\u672f\u7b49\u67b6\u8bbe\u4e86\u8054\u7cfb\u7684\u6865\u6881\u3002
\"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c<\/p>\n

\u4e09\u3001\u5f3a\u5316\u5b66\u4e60\u7b80\u4ecb<\/h3>\n

\u5982\u4e0b\u56fe\u6240\u793a\uff0c\u5f3a\u5316\u5b66\u4e60\u667a\u80fd\u4f53 (agent) \u4e0e\u73af\u5883 (environment) \u4ea4\u4e92\uff0c\u9488\u5bf9\u5e8f\u5217\u51b3\u7b56\u95ee\u9898\uff0c\u901a\u8fc7\u8bd5\u9519 (trial-and-error) \u7684\u65b9\u5f0f\u5b66\u4e60\u6700\u4f18\u7b56\u7565\u3002\u5f3a\u5316\u5b66\u4e60\u4e00\u822c\u5b9a\u4e49\u4e3a\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b(Markov Decision Process, MDP). \u5728\u6bcf\u4e00\u4e2a\u65f6\u95f4\u6b65\u9aa4\uff0c\u667a\u80fd\u4f53\u63a5\u53d7\u5230\u4e00\u4e2a\u72b6\u6001 (state)\uff0c\u6839\u636e\u7b56\u7565 (policy) \u9009\u62e9\u4e00\u4e2a\u52a8\u4f5c (action)\uff0c\u83b7\u5f97\u5956\u8d4f (reward)\uff0c\u7136\u540e\u6839\u636e\u73af\u5883\u7684\u52a8\u6001\u6a21\u578b\u8f6c\u79fb\u5230\u4e0b\u4e00\u4e2a\u72b6\u6001\u3002\u8fd9\u91cc\u9762\uff0c\u7b56\u7565\u8868\u8fbe\u667a\u80fd\u4f53\u7684\u884c\u4e3a\uff0c\u5c31\u662f\u72b6\u6001\u5230\u52a8\u4f5c\u7684\u6620\u5c04\u3002\u5f3a\u5316\u5b66\u4e60\u4e2d\uff0c\u7ecf\u9a8c (experience) \u662f\u6307 (\u72b6\u6001\uff0c\u52a8\u4f5c\uff0c\u5956\u8d4f\uff0c\u4e0b\u4e00\u4e2a\u72b6\u6001) \u8fd9\u6837\u4e00\u7cfb\u5217\u7684\u6570\u636e\u3002\u5728\u7247\u6bb5\u5f0f (episodic) \u7684\u73af\u5883\u4e2d\uff0c\u4e0a\u8ff0\u8fc7\u7a0b\u4e00\u76f4\u6301\u7eed\u76f4\u5230\u9047\u5230\u7ec8\u6b62\u72b6\u6001\uff0c\u7136\u540e\u91cd\u65b0\u5f00\u59cb\u3002\u5728\u8fde\u7eed\u6027 (continuing) \u7684\u73af\u5883\u4e2d\uff0c\u5219\u6ca1\u6709\u7ec8\u6b62\u72b6\u6001\u3002\u7528\u4e00\u4e2a\u6298\u6263\u56e0\u5b50(discount factor)\u6765\u8868\u8fbe\u5c06\u6765\u7684\u5956\u8d4f\u5bf9\u73b0\u5728\u7684\u5f71\u54cd\u3002\u6a21\u578b (model) \u6307\u72b6\u6001\u8f6c\u79fb\u6a21\u578b\u548c\u5956\u8d4f\u51fd\u6570\u3002\u5f3a\u5316\u5b66\u4e60\u7684\u9002\u7528\u8303\u56f4\u975e\u5e38\u5e7f\u6cdb\uff1a\u72b6\u6001\u548c\u52a8\u4f5c\u7a7a\u95f4\u53ef\u4ee5\u662f\u79bb\u6563\u7684\u6216\u8fde\u7eed\u7684\uff0c\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u53ef\u4ee5\u662f\u786e\u5b9a\u6027\u7684\u3001\u968f\u673a\u6027\u7684\u3001\u52a8\u6001\u7684\u3001\u6216\u8005\u50cf\u4e00\u4e9b\u6e38\u620f\u90a3\u6837\u5bf9\u6297\u6027\u7684\u3002
\"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c
\u72b6\u6001\u503c\u51fd\u6570\u6216\u52a8\u4f5c\u503c\u51fd\u6570\u5206\u522b\u7528\u6765\u5ea6\u91cf\u6bcf\u4e2a\u72b6\u6001\u6216\u6bcf\u5bf9\u72b6\u6001-\u52a8\u4f5c\u7684\u4ef7\u503c\u3002\u662f\u5bf9\u56de\u62a5\u7684\u9884\u6d4b\uff0c\u800c\u56de\u62a5\u662f\u957f\u671f\u6298\u6263\u7d2f\u79ef\u5956\u8d4f\u7684\u671f\u671b\u3002\u52a8\u4f5c\u503c\u51fd\u6570\u4e00\u822c\u4e5f\u79f0\u4e3aQ\u51fd\u6570\u3002\u6700\u4f18\u503c\u51fd\u6570\u662f\u6240\u6709\u7b56\u7565\u6240\u80fd\u5f97\u5230\u7684\u6700\u597d\u7684\u503c\u51fd\u6570\uff1b\u800c\u76f8\u5e94\u7684\u7b56\u7565\u5219\u4e3a\u6700\u4f18\u7b56\u7565\u3002\u6700\u4f18\u503c\u51fd\u6570\u5305\u542b\u4e86\u5168\u5c40\u4f18\u5316\u4fe1\u606f\uff1b\u4e00\u822c\u53ef\u4ee5\u6bd4\u8f83\u5bb9\u6613\u5730\u4ece\u6700\u4f18\u72b6\u6001\u503c\u51fd\u6570\u6216\u6700\u4f18\u52a8\u4f5c\u503c\u51fd\u6570\u5f97\u5230\u6700\u4f18\u7b56\u7565\u3002\u5f3a\u5316\u5b66\u4e60\u7684\u76ee\u6807\u662f\u5f97\u5230\u6700\u4f18\u7684\u957f\u671f\u56de\u62a5\u6216\u627e\u5230\u6700\u4f18\u7b56\u7565\u3002<\/p>\n

\u56db\u3001\u4f8b\u5b50<\/h3>\n

1\u3001\u6700\u77ed\u8def\u5f84<\/strong>
\u4e0b\u9762\u4e3e\u4e00\u4e2a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u6700\u77ed\u8def\u5f84\u95ee\u9898\u7684\u4f8b\u5b50\u3002\u6700\u77ed\u8def\u5f84\u95ee\u9898\u5c31\u662f\u8981\u627e\u8d77\u59cb\u8282\u70b9\u5230\u7ec8\u6b62\u8282\u70b9\u4e4b\u95f4\u7684\u6700\u77ed\u8def\u5f84\uff0c\u4e5f\u5c31\u662f\u8981\u6700\u5c0f\u5316\u5b83\u4eec\u4e4b\u95f4\u7684\u8ddd\u79bb\uff0c\u6216\u8005\u6700\u5c0f\u5316\u5b83\u4eec\u4e4b\u95f4\u8def\u5f84\u4e0a\u6240\u6709\u7684\u8fb9\u7684\u8ddd\u79bb\u7684\u548c\u3002\u6700\u77ed\u8def\u5f84\u95ee\u9898\u5982\u4e0b\u5b9a\u4e49\u6210\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u3002\u5f53\u524d\u8282\u70b9\u4e3a\u72b6\u6001\u3002\u5728\u6bcf\u4e2a\u8282\u70b9\uff0c\u52a8\u4f5c\u662f\u6307\u987a\u7740\u76f8\u8fde\u7684\u8fb9\u5230\u8fbe\u90bb\u5c45\u8282\u70b9\u3002\u8f6c\u79fb\u6a21\u578b\u6307\u4ece\u67d0\u4e2a\u8282\u70b9\u9009\u62e9\u901a\u8fc7\u4e00\u6761\u8fb9\u540e\u5230\u8fbe\u76f8\u5e94\u7684\u90bb\u5c45\u8282\u70b9\uff0c\u5f53\u524d\u72b6\u6001\u6216\u8282\u70b9\u4e5f\u968f\u4e4b\u6539\u53d8\u3002\u5956\u8d4f\u5219\u662f\u521a\u901a\u8fc7\u7684\u8fb9\u7684\u8ddd\u79bb\u7684\u8d1f\u6570\u3002\u5230\u8fbe\u7ec8\u6b62\u8282\u70b9\u5219\u8be5\u7247\u6bb5\u7ed3\u675f\u3002\u6298\u6263\u56e0\u5b50\u53ef\u4ee5\u8bbe\u4e3a1\uff0c\u8fd9\u6837\u5c31\u4e0d\u7528\u533a\u5206\u773c\u524d\u7684\u8fb9\u7684\u8ddd\u79bb\u548c\u5c06\u6765\u7684\u8fb9\u7684\u8ddd\u79bb\u3002\u6211\u4eec\u53ef\u4ee5\u628a\u6298\u6263\u56e0\u5b50\u8bbe\u62101\uff0c\u56e0\u4e3a\u95ee\u9898\u662f\u7247\u6bb5\u5f0f\u7684\u3002\u76ee\u6807\u662f\u627e\u5230\u4e00\u6761\u4ece\u8d77\u59cb\u8282\u70b9\u5230\u7ec8\u6b62\u8282\u70b9\u7684\u6700\u77ed\u8def\u5f84\uff0c\u6700\u5927\u5316\u6574\u6761\u8def\u5f84\u4e0a\u8ddd\u79bb\u7684\u8d1f\u6570\u7684\u548c\uff0c\u4e5f\u5c31\u6700\u5c0f\u5316\u4e86\u6574\u6761\u8def\u5f84\u7684\u8ddd\u79bb\u3002\u5728\u67d0\u4e2a\u8282\u70b9\uff0c\u6700\u4f18\u7b56\u7565\u9009\u62e9\u6700\u597d\u7684\u90bb\u5c45\u8282\u70b9\uff0c\u8f6c\u79fb\u8fc7\u53bb\uff0c\u6700\u540e\u5b8c\u6210\u6700\u77ed\u8def\u5f84\uff1b\u800c\u5bf9\u4e8e\u6bcf\u4e2a\u72b6\u6001\u6216\u8282\u70b9\uff0c\u6700\u4f18\u503c\u51fd\u6570\u5219\u662f\u4ece\u90a3\u4e2a\u8282\u70b9\u5230\u7ec8\u6b62\u8282\u70b9\u7684\u6700\u77ed\u8def\u5f84\u7684\u8ddd\u79bb\u7684\u8d1f\u6570\u3002<\/p>\n

\u4e0b\u56fe\u662f\u4e00\u4e2a\u5177\u4f53\u7684\u4f8b\u5b50\u3002\u56fe\u4e2d\u6709\u8282\u70b9\u3001(\u6709\u5411)\u8fb9\u3001\u8fb9\u7684\u8ddd\u79bb\u8fd9\u4e9b\u56fe\u7684\u4fe1\u606f\u3002\u6211\u4eec\u8981\u627e\u4ece\u8282\u70b9S\u5230\u8282\u70b9T\u7684\u6700\u77ed\u8def\u5f84\u3002\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u5e76\u4e0d\u4e86\u89e3\u56fe\u7684\u5168\u5c40\u4fe1\u606f\u3002\u5728\u8282\u70b9S\uff0c\u5982\u679c\u6211\u4eec\u9009\u62e9\u4e86\u6700\u8fd1\u7684\u90bb\u5c45\u8282\u70b9A\uff0c\u90a3\u4e48\u5c31\u6ca1\u529e\u6cd5\u627e\u5230\u6700\u77ed\u8def\u5f84S \u2192 C \u2192 F \u2192 T \u4e86\u3002\u8fd9\u4e2a\u4f8b\u5b50\u8bf4\u660e\uff0c\u5982\u679c\u4e00\u4e2a\u7b97\u6cd5\u53ea\u5173\u6ce8\u773c\u524d\u5229\u76ca\uff0c\u6bd4\u5982\u5728\u8282\u70b9S\u9009\u62e9\u6700\u8fd1\u7684\u90bb\u5c45\u8282\u70b9A\uff0c\u53ef\u80fd\u4f1a\u5bfc\u81f4\u65e0\u6cd5\u627e\u5230\u6700\u4f18\u7ed3\u679c\u3002\u50cfTD\u5b66\u4e60\u548cQ\u5b66\u4e60\u8fd9\u6837\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\uff0c\u8003\u8651\u4e86\u957f\u671f\u56de\u62a5\uff0c\u90fd\u53ef\u4ee5\u627e\u5230\u6700\u4f18\u89e3\u3002
\"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c
\u6709\u7684\u8bfb\u8005\u53ef\u80fd\u4f1a\u95ee\uff1a\u4e3a\u4ec0\u4e48\u4e0d\u7528Dijkstra\u7b97\u6cd5\uff1f\u5982\u679c\u6211\u4eec\u6709\u8282\u70b9\u3001\u8fb9\u3001\u8fb9\u7684\u8ddd\u79bb\u8fd9\u6837\u7684\u56fe\u7684\u5168\u5c40\u4fe1\u606f\uff0c\u90a3\u4e48Dijkstra\u7b97\u6cd5\u53ef\u4ee5\u9ad8\u6548\u5730\u627e\u5230\u6700\u77ed\u8def\u5f84\u3002\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u53ef\u4ee5\u4e0d\u7528\u8fd9\u4e9b\u5168\u5c40\u4fe1\u606f\uff0c\u800c\u662f\u7528\u514d\u6a21\u578b\u7684\u65b9\u5f0f\uff0c\u6839\u636eTD\u5b66\u4e60\u548cQ\u5b66\u4e60\u8fd9\u6837\u7684\u7b97\u6cd5\u5728\u56fe\u4e2d\u4e0d\u65ad\u91c7\u96c6\u672c\u5730\u4fe1\u606f\uff0c\u66f4\u65b0\u503c\u51fd\u6570\uff0c\u6700\u7ec8\u627e\u5230\u6700\u77ed\u8def\u5f84\u3002Dijkstra\u7b97\u6cd5\u5728\u77e5\u9053\u56fe\u7684\u5168\u5c40\u4fe1\u606f\u65f6\uff0c\u662f\u6700\u77ed\u8def\u5f84\u7684\u9ad8\u6548\u7b97\u6cd5\uff1b\u800c\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u4e0d\u4f9d\u6258\u4e8e\u8fd9\u4e9b\u5168\u5c40\u4fe1\u606f\uff0c\u6bd4Dijkstra\u7b97\u6cd5\u7684\u9002\u7528\u9762\u66f4\u5e7f\uff0c\u662f\u4e00\u822c\u6027\u7684\u4f18\u5316\u65b9\u6cd5\u6846\u67b6\u3002<\/p>\n

\u53e6\u5916\uff0c\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u5904\u7406\u6709\u968f\u673a\u5143\u7d20\u7684\u6700\u77ed\u8def\u5f84\u95ee\u9898\u3002\u800c\u4e14\uff0c\u76ee\u524d\u7684\u4e00\u4e2a\u7814\u7a76\u70ed\u70b9\u662f\uff0c\u7528\u673a\u5668\u5b66\u4e60\/\u5f3a\u5316\u5b66\u4e60\u53bb\u5b66\u4e60\u4e00\u7c7b\u95ee\u9898\u7684\u6c42\u89e3\u65b9\u6cd5\uff1b\u9047\u5230\u65b0\u95ee\u9898\uff0c\u76f4\u63a5\u7528\u63a8\u65ad\u7684\u65b9\u5f0f\u5f97\u51fa\u7b54\u6848\u3002<\/p>\n

2\u3001\u56f4\u68cb<\/strong>
\u5728\u56f4\u68cb\u4e2d\uff0c\u72b6\u6001\u6307\u5f53\u524d\u68cb\u76d8\u7684\u72b6\u6001\uff0c\u5305\u62ec\u9ed1\u767d\u68cb\u5b50\u7684\u4f4d\u7f6e\uff0c\u7a7a\u7684\u4f4d\u7f6e\u7b49\u3002\u56f4\u68cb\u7684\u72b6\u6001\u7a7a\u95f4\u7279\u522b\u5927\uff0c\u6709250\u7684150\u6b21\u65b9\u4e2a\u4e0d\u540c\u7684\u72b6\u6001\u3002\u800c\u56fd\u9645\u8c61\u68cb\u7684\u72b6\u6001\u7a7a\u95f4\u4e3a35\u768480\u6b21\u65b9\u3002\u4e3a\u4e86\u5904\u7406\u4e00\u4e9b\u590d\u6742\u7684\u60c5\u51b5\uff0c\u6bd4\u5982\u201c\u5927\u9f99\u201d\uff0c\u72b6\u6001\u4e5f\u5e94\u8be5\u5305\u62ec\u5386\u53f2\u4fe1\u606f\u3002\u8fd9\u6837\u4f1a\u660e\u663e\u589e\u5927\u72b6\u6001\u7a7a\u95f4\u3002\u52a8\u4f5c\u6307\u76ee\u524d\u53ef\u4ee5\u653e\u68cb\u5b50\u7684\u4f4d\u7f6e\u3002\u6bcf\u4e00\u6b65\uff0c\u6bcf\u4f4d\u73a9\u5bb6\u6700\u591a\u670919x19=361\u4e2a\u53ef\u80fd\u7684\u52a8\u4f5c\u3002\u8f6c\u79fb\u6a21\u578b\u8868\u8fbe\u4e86\u5728\u5f53\u524d\u73a9\u5bb6\u843d\u5b50\u540e\u68cb\u76d8\u72b6\u6001\u7684\u53d8\u5316\u3002\u56f4\u68cb\u4e2d\u8f6c\u79fb\u6a21\u578b\u662f\u786e\u5b9a\u6027\u7684\uff1b\u6216\u8005\u8bf4\u6ca1\u6709\u968f\u673a\u6027\u3002\u5956\u8d4f\u51fd\u6570\u6307\u5728\u5f53\u524d\u73a9\u5bb6\u843d\u5b50\u540e\u83b7\u5f97\u7684\u5956\u8d4f\u3002\u53ea\u6709\u5728\u786e\u5b9a\u80dc\u8d1f\u65f6\uff0c\u80dc\u4e86\u5f971\u5206\uff0c\u8f93\u4e86-1\u5206\uff0c\u5176\u5b83\u60c5\u51b5\u90fd\u662f0\u5206\u3002\u56f4\u68cbAI\u7684\u76ee\u6807\u662f\u8bbe\u8ba1\u6700\u4f18\u5bf9\u5f08\u7b56\u7565\uff0c\u6216\u8005\u8bf4\u8d62\u68cb\u3002\u56f4\u68cb\u4e2d\u6709\u660e\u786e\u7684\u6e38\u620f\u89c4\u5219\uff0c\u8fd9\u6837\uff0c\u5c31\u6709\u5b8c\u7f8e\u7684\u8f6c\u79fb\u6a21\u578b\u548c\u5956\u8d4f\u51fd\u6570\u3002\u56e0\u4e3a\u72b6\u6001\u7a7a\u95f4\u7279\u522b\u5927\uff0c\u800c\u4e14\u72b6\u6001\u7684\u503c\u51fd\u6570\u975e\u5e38\u96be\u4f30\u8ba1\uff0c\u56f4\u68cb\u662fAI\u9886\u57df\u4e00\u4e2a\u957f\u671f\u7684\u96be\u9898\u3002\u6240\u4ee5\uff0c2016\u5e74\u8c37\u6b4cDeepmind\u7684AlphaGo\u6253\u8d25\u56fd\u9645\u9876\u7ea7\u68cb\u624b\u674e\u4e16\u4e6d\u6210\u4e3a\u4e16\u754c\u5934\u6761\u65b0\u95fb\uff0c\u5c55\u73b0\u4e86\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7684\u5f3a\u5927\u5a01\u529b\u3002<\/p>\n

\u4e94\u3001\u66f4\u591a\u5f3a\u5316\u5b66\u4e60\u7b80\u4ecb<\/h3>\n

\u5982\u679c\u6709\u7cfb\u7edf\u6a21\u578b\uff0c\u6211\u4eec\u53ef\u80fd\u53ef\u4ee5\u4f7f\u7528\u52a8\u6001\u89c4\u5212(dynamic programming) \u65b9\u6cd5\uff1a\u7528\u7b56\u7565\u8bc4\u4f30 (policy evaluation) \u53bb\u8ba1\u7b97\u4e00\u4e2a\u7b56\u7565\u7684\u72b6\u6001\u6216\u52a8\u4f5c\u503c\u51fd\u6570\uff0c\u7528\u503c\u8fed\u4ee3 (value iteration) \u6216\u7b56\u7565\u8fed\u4ee3 (policy iteration) \u6765\u627e\u5230\u6700\u4f18\u7b56\u7565\uff1b\u800c\u7b56\u7565\u8fed\u4ee3\u901a\u5e38\u4f7f\u7528\u7b56\u7565\u8bc4\u4f30\u548c\u7b56\u7565\u6539\u8fdb (policy improvement)\u8fed\u4ee3\u8ba1\u7b97\u3002\u6211\u4eec\u8981\u89e3\u51b3\u7684\u5f88\u591a\u95ee\u9898\u6ca1\u6709\u73b0\u6210\u7684\u7cfb\u7edf\u6a21\u578b\uff1b\u8fd9\u6837\uff0c\u5f3a\u5316\u5b66\u4e60\u5c31\u6709\u5176\u7528\u6b66\u4e4b\u5730\u3002\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u4e0d\u9700\u8981\u6a21\u578b\uff0c\u5373\u514d\u6a21\u578b (model-free) \u7684\u65b9\u5f0f\uff0c\u5f97\u5230\u6700\u4f18\u503c\u51fd\u6570\u548c\u6700\u4f18\u7b56\u7565\u3002\u514d\u6a21\u578b\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u901a\u8fc7\u4e0e\u73af\u5883\u4ea4\u4e92\u7684\u5728\u7ebf(online)\u65b9\u5f0f\u5b66\u4e60\uff0c\u4e5f\u53ef\u4ee5\u901a\u8fc7\u79bb\u7ebf(offline)\u65b9\u5f0f\u4ece\u5386\u53f2\u6570\u636e\u4e2d\u5b66\u4e60\u3002\u8499\u7279\u5361\u7f57 (Monte Carlo) \u65b9\u6cd5\u7528\u6837\u672c\u7684\u5747\u503c\u505a\u4f30\u8ba1\uff1b\u6bcf\u4e00\u4e2a\u6837\u672c\u662f\u5b8c\u6574\u7684\u4e00\u6761\u7ecf\u9a8c\u8f68\u8ff9\uff1b\u5b83\u4e0d\u9700\u8981\u7cfb\u7edf\u7684\u6a21\u578b\uff0c\u4f46\u662f\u5b83\u53ea\u9002\u7528\u4e8e\u7247\u6bb5\u5f0f\u7684\u4efb\u52a1\u3002<\/p>\n

\u65f6\u5e8f\u5dee\u5206 (temporal difference, TD) \u5b66\u4e60\u662f\u5f3a\u5316\u5b66\u4e60\u4e2d\u7684\u4e00\u4e2a\u6838\u5fc3\u6982\u5ff5\u3002TD\u5b66\u4e60\u4e00\u822c\u6307\u963f\u5c14\u4f2f\u5854\u5927\u5b66(University of Alberta) Richard Sutton\u6559\u6388\u4e8e1988\u5e74\u53d1\u73b0\u7684\u7528\u4e8e\u503c\u51fd\u6570\u8bc4\u4f30\u7684\u5b66\u4e60\u65b9\u6cd5\u3002TD\u5b66\u4e60\u76f4\u63a5\u4ece\u7ecf\u9a8c\u4e2d\uff0c\u901a\u8fc7\u81ea\u52a9\u6cd5 (bootstrapping) \u3001\u4ee5\u514d\u6a21\u578b\u3001\u5728\u7ebf\u3001\u5b8c\u5168\u589e\u91cf(incremental)\u65b9\u5f0f\u5b66\u4e60\u72b6\u6001\u503c\u51fd\u6570\u3002\u8fd9\u91cc\u8fb9\uff0c\u81ea\u52a9\u6cd5\u662f\u4e00\u79cd\u57fa\u4e8e\u81ea\u8eab\u7684\u4f30\u8ba1\u53bb\u505a\u4f30\u8ba1\u7684\u65b9\u6cd5\u3002TD\u5b66\u4e60\u662f\u4e00\u79cd\u540c\u7b56\u7565 (on-policy) \u65b9\u6cd5\uff0c\u901a\u8fc7\u884c\u4e3a\u7b56\u7565\u4ea7\u751f\u7684\u6837\u672c\u6765\u8bc4\u4f30\u540c\u6837\u7684\u7b56\u7565\u3002Q\u5b66\u4e60\u662f\u4e00\u79cd\u65f6\u5e8f\u5dee\u5206\u63a7\u5236\u65b9\u6cd5\uff0c\u901a\u8fc7\u5b66\u4e60\u6700\u4f18\u52a8\u4f5c\u503c\u51fd\u6570\u6765\u627e\u5230\u6700\u4f18\u7b56\u7565\u3002Q\u5b66\u4e60\u662f\u4e00\u79cd\u5f02\u7b56\u7565 (off-policy) \u65b9\u6cd5\uff0c\u901a\u8fc7\u4ece\u67d0\u4e2a\u884c\u4e3a\u7b56\u7565\u4ea7\u751f\u7684\u6570\u636e\u6765\u5b66\u4e60\uff0c\u800c\u8fd9\u4e9b\u6570\u636e\u4e00\u822c\u4e0d\u662f\u901a\u8fc7\u76ee\u6807\u7b56\u7565\u4ea7\u751f\u3002<\/p>\n

TD\u5b66\u4e60\u548cQ\u5b66\u4e60\u8bc4\u4f30\u72b6\u6001\u503c\u51fd\u6570\u6216\u52a8\u4f5c\u503c\u51fd\u6570\uff0c\u662f\u57fa\u4e8e\u503c\u7684 (value-based) \u65b9\u6cd5\u3002\u800c\u57fa\u4e8e\u7b56\u7565\u7684 (policy-based) \u65b9\u6cd5\u5219\u76f4\u63a5\u4f18\u5316\u7b56\u7565\uff0c\u6bd4\u5982\u7b56\u7565\u68af\u5ea6 (policy gradient) \u65b9\u6cd5\u3002\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005 (actor-critic) \u7b97\u6cd5\u540c\u65f6\u66f4\u65b0\u503c\u51fd\u6570\u548c\u7b56\u7565\u3002<\/p>\n

\u5728\u8868\u683c\u5f0f\u7684\u60c5\u51b5\uff0c\u503c\u51fd\u6570\u548c\u7b56\u7565\u4ee5\u8868\u683c\u7684\u5f62\u5f0f\u5b58\u50a8\u3002\u5982\u679c\u72b6\u6001\u7a7a\u95f4\u4e0e\u52a8\u4f5c\u7a7a\u95f4\u5f88\u5927\u6216\u8005\u662f\u8fde\u7eed\u7684\uff0c\u90a3\u4e48\u5c31\u9700\u8981\u51fd\u6570\u8fd1\u4f3c (function approximation) \u6765\u5b9e\u73b0\u6cdb\u5316 (generalization) \u80fd\u529b\u3002\u51fd\u6570\u8fd1\u4f3c\u662f\u673a\u5668\u5b66\u4e60\u4e2d\u7684\u4e00\u4e2a\u6982\u5ff5\uff1b\u5176\u76ee\u6807\u662f\u4ece\u90e8\u5206\u6837\u672c\u6cdb\u5316\u51fd\u6570\u4ece\u800c\u8fd1\u4f3c\u6574\u4e2a\u51fd\u6570\u3002\u7ebf\u6027\u51fd\u6570\u8fd1\u4f3c\u662f\u4e00\u79cd\u5e38\u7528\u65b9\u6cd5\uff1b\u4e00\u4e2a\u539f\u56e0\u662f\u5b83\u6709\u6bd4\u8f83\u597d\u7684\u7406\u8bba\u6027\u8d28\u3002\u5728\u7ebf\u6027\u51fd\u6570\u8fd1\u4f3c\u4e2d\uff0c\u4e00\u4e2a\u51fd\u6570\u7531\u4e00\u4e9b\u57fa\u51fd\u6570(basis function)\u7684\u7ebf\u6027\u7ec4\u5408\u8fd1\u4f3c\u3002\u7ebf\u6027\u7ec4\u5408\u7684\u7cfb\u6570\u5219\u9700\u8981\u7531\u5b66\u4e60\u7b97\u6cd5\u786e\u5b9a\u3002<\/p>\n

\u6211\u4eec\u4e5f\u53ef\u4ee5\u7528\u975e\u7ebf\u6027\u51fd\u6570\u8fd1\u4f3c\uff0c\u5c24\u5176\u662f\u4f7f\u7528\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\uff0c\u4e5f\u5c31\u662f\u6700\u8fd1\u6d41\u884c\u7684\u6df1\u5ea6\u5b66\u4e60\u6240\u7528\u7684\u7f51\u7edc\u7ed3\u6784\u3002\u5982\u679c\u628a\u6df1\u5ea6\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u7ed3\u5408\u8d77\u6765\uff0c\u7528\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\u6765\u8868\u8fbe\u72b6\u6001\u3001\u503c\u51fd\u6570\u3001\u7b56\u7565\u3001\u6a21\u578b\u7b49\uff0c\u6211\u4eec\u5c31\u5f97\u5230\u4e86\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60(deep reinforcement learning, deep RL)\u3002\u8fd9\u91cc\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\u7684\u53c2\u6570\u9700\u8981\u7531\u5b66\u4e60\u7b97\u6cd5\u6765\u786e\u5b9a\u3002\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u6700\u8fd1\u53d7\u5230\u5e7f\u6cdb\u5173\u6ce8\uff0c\u4e5f\u53d6\u5f97\u4e86\u5f88\u591a\u6590\u7136\u7684\u6210\u7ee9\u3002\u5e94\u8be5\u8bf4\uff0c\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u5728\u5f88\u4e45\u4ee5\u524d\u5c31\u53d6\u5f97\u8fc7\u597d\u6210\u7ee9\uff1b\u6bd4\u59821992\u5e74\u5e94\u7528\u4e8e\u897f\u6d0b\u53cc\u9646\u68cb(Backgammon)\u7684TD-Gammon\u5de5\u4f5c\u3002\u6709\u4e00\u4e9b\u6bd4\u8f83\u6709\u5f71\u54cd\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\uff0c\u6bd4\u5982\uff0c\u4e0a\u9762\u63d0\u5230\u7684DQN\u7b97\u6cd5\uff0c\u8fd8\u6709\u5f02\u6b65\u4f18\u52bf\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005\u7b97\u6cd5(Asynchronous Advantage Actor-Critic, A3C), \u6df1\u5ea6\u786e\u5b9a\u6027\u7b56\u7565\u68af\u5ea6\u7b97\u6cd5(Deep Deterministic Policy Gradient, DDPG), \u53ef\u4fe1\u533a\u57df\u7b56\u7565\u4f18\u5316\u7b97\u6cd5(Trust Region Policy Optimization, TRPO), \u8fd1\u7aef\u7b56\u7565\u4f18\u5316\u7b97\u6cd5(Proximal Policy Optimization, PPO)\uff0c\u548c\u8f6f\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005\u7b97\u6cd5(soft actor-critic)\u7b49\u7b49\u3002\u6700\u540e\u7684\u6ce8\u91ca\u53c2\u8003\u6587\u732e\u90e8\u5206\u7b80\u8981\u4ecb\u7ecd\u4e86\u8fd9\u4e9b\u7b97\u6cd5\u3002<\/p>\n

\u5f3a\u5316\u5b66\u4e60\u7684\u4e00\u4e2a\u57fa\u672c\u95ee\u9898\u662f\u63a2\u7d22-\u5229\u7528(exploration-exploitation)\u4e4b\u95f4\u7684\u77db\u76fe\u3002\u667a\u80fd\u4f53\u4e00\u65b9\u9762\u9700\u8981\u5229\u7528\u76ee\u524d\u6700\u597d\u7684\u7b56\u7565\uff0c\u5e0c\u671b\u83b7\u5f97\u6700\u597d\u7684\u56de\u62a5\uff1b\u53e6\u4e00\u65b9\u9762\uff0c\u76ee\u524d\u6700\u597d\u7684\u7b56\u7565\u4e0d\u4e00\u5b9a\u662f\u6700\u4f18\u7b56\u7565\uff0c\u9700\u8981\u63a2\u7d22\u5176\u5b83\u53ef\u80fd\u6027\uff1b\u667a\u80fd\u4f53\u9700\u8981\u5728\u63a2\u7d22-\u5229\u7528\u4e24\u8005\u4e4b\u95f4\u8fdb\u884c\u5e73\u8861\u6298\u8877\u3002\u4e00\u4e2a\u7b80\u5355\u7684\u63a2\u7d22\u65b9\u6848\u662f epsilon-\u8d2a\u5a6a\u65b9\u6cd5\uff1a\u4ee51- epsilon \u7684\u6982\u7387\u9009\u62e9\u6700\u4f18\u7684\u52a8\u4f5c\uff0c\u5426\u5219\u968f\u673a\u9009\u62e9\u3002\u4e0a\u7f6e\u4fe1\u754c\u7b97\u6cd5(Upper Confidence Bound , UCB)\u662f\u53e6\u5916\u4e00\u7c7b\u63a2\u7d22\u65b9\u6cd5\uff0c\u540c\u65f6\u8003\u8651\u52a8\u4f5c\u503c\u51fd\u6570\u53ca\u5176\u4f30\u8ba1\u65b9\u5dee\u3002UCB\u5e94\u7528\u4e8e\u641c\u7d22\u6811\u4e2d\u5f97\u5230UCT\u7b97\u6cd5\uff0c\u5728AlphaGo\u4e2d\u53d1\u6325\u4e86\u91cd\u8981\u4f5c\u7528\u3002<\/p>\n

\u516d\u3001\u5f3a\u5316\u5b66\u4e60\u8bcd\u6c47<\/h3>\n

\u5728\u8fd9\u91cc\u6c47\u96c6\u4e86\u4e00\u4e9b\u5f3a\u5316\u5b66\u4e60\u8bcd\u6c47\uff0c\u65b9\u4fbf\u8bfb\u8005\u67e5\u8be2\u3002<\/p>\n

\u9884\u6d4b (prediction)\uff0c\u6216\u7b56\u7565\u8bc4\u4f30 (policy evaluation)\uff0c\u7528\u6765\u8ba1\u7b97\u4e00\u4e2a\u7b56\u7565\u7684\u72b6\u6001\u6216\u52a8\u4f5c\u503c\u51fd\u6570\u3002\u63a7\u5236 (control) \u7528\u6765\u627e\u6700\u4f18\u7b56\u7565\u3002\u89c4\u5212 (planning) \u5219\u6839\u636e\u6a21\u578b\u6765\u627e\u503c\u51fd\u6570\u6216\u7b56\u7565\u3002<\/p>\n

\u7528\u884c\u4e3a\u7b56\u7565 (behaviour policy) \u6765\u4ea7\u751f\u6837\u672c\u6570\u636e\uff1b\u540c\u65f6\u5e0c\u671b\u8bc4\u4f30\u76ee\u6807\u7b56\u7565 (target policy)\u3002\u540c\u7b56\u7565 (on-policy) \u4e2d\uff0c\u4ea7\u751f\u6837\u672c\u7684\u884c\u4e3a\u7b56\u7565\u4e0e\u9700\u8981\u8bc4\u4f30\u7684\u76ee\u6807\u7b56\u7565\u76f8\u540c\u3002\u6bd4\u5982\uff0cTD\u5b66\u4e60\u5c31\u7528\u6765\u8bc4\u4f30\u5f53\u524d\u7684\u7b56\u7565\uff0c\u6216\u8005\u8bf4\u7528\u540c\u6837\u7684\u7b56\u7565\u4ea7\u751f\u7684\u6837\u672c\u6765\u505a\u7b56\u7565\u8bc4\u4f30\u3002\u5f02\u7b56\u7565 (off-policy) \u4e2d\uff0c\u4ea7\u751f\u6837\u672c\u7684\u884c\u4e3a\u7b56\u7565\u4e0e\u9700\u8981\u8bc4\u4f30\u7684\u76ee\u6807\u7b56\u7565\u4e00\u822c\u4e0d\u76f8\u540c\u3002\u6bd4\u5982\uff0cQ\u5b66\u4e60\u7684\u76ee\u6807\u662f\u5b66\u4e60\u6700\u4f18\u7b56\u7565\u7684\u52a8\u4f5c\u503c\u51fd\u6570\uff0c\u800c\u7528\u6765\u5b66\u4e60\u7684\u6837\u672c\u6570\u636e\u4e00\u822c\u90fd\u4e0d\u662f\u4f9d\u636e\u6700\u4f18\u7b56\u7565\u4ea7\u751f\u7684\u3002<\/p>\n

\u63a2\u7d22-\u5229\u7528 (exploration-exploitation) \u4e4b\u95f4\u7684\u77db\u76fe\u6307\uff0c\u667a\u80fd\u4f53\u9700\u8981\u5229\u7528\u76ee\u524d\u6700\u597d\u7684\u7b56\u7565\uff0c\u5e0c\u671b\u8fbe\u5230\u6700\u5927\u5316\u5956\u8d4f\u7684\u76ee\u6807\uff1b\u540c\u65f6\uff0c\u4e5f\u9700\u8981\u63a2\u7d22\u73af\u5883\uff0c\u53bb\u53d1\u73b0\u66f4\u597d\u7684\u7b56\u7565\uff0c\u5c24\u5176\u662f\u5728\u76ee\u524d\u7684\u7b56\u7565\u4ecd\u7136\u4e0d\u662f\u6700\u4f18\u7b56\u7565\u6216\u8005\u73af\u5883\u5e76\u4e0d\u7a33\u5b9a\u7b49\u60c5\u51b5\u4e0b\u3002<\/p>\n

\u5728\u514d\u6a21\u578b (model-free) \u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u4e2d\uff0c\u667a\u80fd\u4f53\u4e0d\u77e5\u9053\u72b6\u6001\u8f6c\u79fb\u548c\u5956\u8d4f\u6a21\u578b\uff0c\u4ece\u4e0e\u73af\u5883\u7684\u4ea4\u4e92\u7ecf\u9a8c\u4e2d\u901a\u8fc7\u8bd5\u9519\u7684\u65b9\u5f0f\u76f4\u63a5\u5b66\u4e60\u3002\u800c\u57fa\u4e8e\u6a21\u578b(model-based)\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u5219\u5229\u7528\u6a21\u578b\u3002\u6a21\u578b\u53ef\u4ee5\u662f\u7ed9\u5b9a\u7684\uff0c\u6bd4\u5982\u50cf\u8ba1\u7b97\u673a\u56f4\u68cb\u4e2d\u90a3\u6837\u901a\u8fc7\u6e38\u620f\u89c4\u5219\u5f97\u5230\u7684\u5b8c\u7f8e\u6a21\u578b\uff0c\u6216\u662f\u901a\u8fc7\u6570\u636e\u5b66\u4e60\u6765\u7684\u3002<\/p>\n

\u5728\u7ebf\u6a21\u5f0f (online) \u7b97\u6cd5\u901a\u8fc7\u5e8f\u5217\u6570\u636e\u6d41\u6765\u8bad\u7ec3\uff0c\u4e0d\u4fdd\u5b58\u6570\u636e\uff0c\u4e0d\u8fdb\u4e00\u6b65\u4f7f\u7528\u6570\u636e\u3002\u79bb\u7ebf\u6a21\u5f0f(offline) \u6216\u6279\u91cf\u6a21\u5f0f (batch mode) \u7b97\u6cd5\u5219\u901a\u8fc7\u4e00\u7ec4\u6570\u636e\u6765\u8bad\u7ec3\u3002<\/p>\n

\u5728\u81ea\u52a9\u6cd5(bootstrapping)\u4e2d\uff0c\u5bf9\u4e00\u4e2a\u72b6\u6001\u6216\u52a8\u4f5c\u7684\u503c\u51fd\u6570\u4f30\u8ba1\u4f1a\u901a\u8fc7\u5176\u5b83\u72b6\u6001\u6216\u52a8\u4f5c\u7684\u503c\u51fd\u6570\u4f30\u8ba1\u6765\u83b7\u5f97\u3002<\/p>\n

\u4e03\u3001\u5982\u4f55\u5e94\u7528\u5f3a\u5316\u5b66\u4e60<\/h3>\n

\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\uff0c\u9996\u5148\u8981\u660e\u786e\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u7684\u5b9a\u4e49\uff0c\u5305\u62ec\u73af\u5883\u3001\u667a\u80fd\u4f53\u3001\u72b6\u6001\u3001\u52a8\u4f5c\u3001\u5956\u8d4f\u8fd9\u4e9b\u6838\u5fc3\u5143\u7d20\u3002\u6709\u65f6\u4e5f\u53ef\u80fd\u77e5\u9053\u72b6\u6001\u8f6c\u79fb\u6a21\u578b\u3002\u9700\u8981\u8003\u5bdf\u76d1\u7763\u5b66\u4e60\u6216\u60c5\u5883\u8001\u864e\u673a(contextual bandits)\u662f\u5426\u66f4\u9002\u5408\u8981\u89e3\u51b3\u7684\u95ee\u9898\uff1b\u5982\u679c\u662f\u90a3\u6837\uff0c\u5f3a\u5316\u5b66\u4e60\u5219\u4e0d\u662f\u6700\u597d\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5f3a\u5316\u5b66\u4e60\u7684\u5e94\u7528\u573a\u666f\u4e00\u822c\u9700\u8981\u4e00\u5b9a\u7684\u8d44\u6e90\uff0c\u5305\u62ec\u4eba\u624d\u3001\u8ba1\u7b97\u529b\u3001\u5927\u6570\u636e\u7b49\u3002<\/p>\n

\u76ee\u524d\u6210\u529f\u7684\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e00\u822c\u9700\u8981\u6709\u8db3\u591f\u7684\u8bad\u7ec3\u6570\u636e\uff1b\u53ef\u80fd\u6765\u81ea\u5b8c\u7f8e\u7684\u6a21\u578b\u3001\u5f88\u63a5\u8fd1\u771f\u5b9e\u7cfb\u7edf\u7684\u4eff\u771f\u7a0b\u5e8f\u3001\u6216\u901a\u8fc7\u4e0e\u73af\u5883\u4ea4\u4e92\u6536\u96c6\u5230\u7684\u5927\u91cf\u6570\u636e\u3002\u6536\u96c6\u5230\u7684\u6570\u636e\u6839\u636e\u95ee\u9898\u505a\u76f8\u5e94\u5904\u7406\u3002<\/p>\n

\u4e00\u4e2a\u6a21\u578b\u6216\u597d\u7684\u4eff\u771f\u7a0b\u5e8f\u53ef\u4ee5\u4ea7\u751f\u8db3\u591f\u7684\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u3002\u6709\u4e9b\u95ee\u9898\uff0c\u6bd4\u5982\u5065\u5eb7\u533b\u7597\u3001\u6559\u80b2\u3001\u81ea\u52a8\u9a7e\u9a76\u7b49\u65b9\u9762\u7684\u95ee\u9898\uff0c\u53ef\u80fd\u5f88\u96be\u3001\u4e0d\u53ef\u884c\u3001\u6216\u4e0d\u5408\u4e4e\u9053\u5fb7\u89c4\u8303\u5bf9\u6240\u6709\u60c5\u51b5\u91c7\u96c6\u6570\u636e\u3002\u8fd9\u79cd\u60c5\u51b5\uff0c\u5f02\u7b56\u7565\u6280\u672f\u53ef\u4ee5\u7528\u884c\u4e3a\u7b56\u7565\u4ea7\u751f\u7684\u6570\u636e\u6765\u5b66\u4e60\u76ee\u6807\u7b56\u7565\u3002\u628a\u5728\u4eff\u771f\u7a0b\u5e8f\u5b66\u5230\u7684\u7b56\u7565\u8fc1\u79fb\u5230\u771f\u5b9e\u573a\u666f\u65b9\u9762\u6700\u8fd1\u6709\u4e00\u4e9b\u559c\u4eba\u7684\u8fdb\u5c55\uff0c\u5c24\u5176\u5728\u673a\u5668\u4eba\u65b9\u9762\u3002\u6709\u4e9b\u95ee\u9898\u53ef\u80fd\u9700\u8981\u5927\u91cf\u7684\u8ba1\u7b97\u3002\u6bd4\u5982\uff0cAlphaGo\u7684\u6210\u529f\u6709\u51e0\u4e2a\u91cd\u8981\u56e0\u7d20\uff1a\u901a\u8fc7\u6e38\u620f\u89c4\u5219\u5f97\u5230\u4e86\u5b8c\u7f8e\u6a21\u578b\uff0c\u4ea7\u751f\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u8c37\u6b4c\u7ea7\u7684\u6d77\u91cf\u8ba1\u7b97\u80fd\u529b\u8fdb\u884c\u5927\u89c4\u6a21\u8bad\u7ec3\uff0c\u4ee5\u53ca\u7814\u53d1\u4eba\u5458\u975e\u51e1\u7684\u79d1\u7814\u548c\u5de5\u7a0b\u80fd\u529b\u3002<\/p>\n

\u7279\u5f81\u5de5\u7a0b\u4e00\u822c\u9700\u8981\u5927\u91cf\u7684\u624b\u5de5\u5904\u7406\u5e76\u7ed3\u5408\u5f88\u591a\u76f8\u5173\u884c\u4e1a\u77e5\u8bc6\u3002\u968f\u7740\u6df1\u5ea6\u5b66\u4e60\u5174\u8d77\u7684\u7aef\u5230\u7aef\u5b66\u4e60\u6a21\u5f0f\uff0c\u624b\u5de5\u7684\u7279\u5f81\u5de5\u7a0b\u53ef\u80fd\u5f88\u5c11\u7528\uff0c\u751a\u81f3\u4e0d\u7528\u3002\u4e0d\u8fc7\uff0c\u5728\u5b9e\u9645\u95ee\u9898\u4e2d\uff0c\u7279\u5f81\u5de5\u7a0b\u5f88\u53ef\u80fd\u65e0\u6cd5\u907f\u514d\uff0c\u4e5f\u53ef\u80fd\u662f\u53d6\u5f97\u597d\u6027\u80fd\u81f3\u5173\u91cd\u8981\u7684\u56e0\u7d20\u3002<\/p>\n

\u9700\u8981\u8003\u8651\u5f3a\u5316\u5b66\u4e60\u7684\u8868\u5f81\u95ee\u9898\uff0c\u6bd4\u5982\uff0c\u662f\u5426\u9700\u8981\u4ee5\u53ca\u9700\u8981\u4ec0\u4e48\u6837\u7684\u795e\u7ecf\u7f51\u7edc\u6765\u8868\u8fbe\u503c\u51fd\u6570\u548c\u7b56\u7565\uff1b\u662f\u5426\u8003\u8651\u7ebf\u6027\u6a21\u578b\uff1b\u800c\u5bf9\u4e8e\u89c4\u6a21\u5e76\u4e0d\u5927\u7684\u95ee\u9898\uff0c\u751a\u81f3\u53ef\u4ee5\u8003\u8651\u8868\u683c\u7684\u65b9\u5f0f\u3002<\/p>\n

\u6709\u4e86\u6570\u636e\u3001\u7279\u5f81\u3001\u548c\u8868\u5f81\uff0c\u9700\u8981\u8003\u8651\u9009\u53d6\u4ec0\u4e48\u7b97\u6cd5\u6765\u8ba1\u7b97\u6700\u4f18\u503c\u51fd\u6570\u548c\u6700\u4f18\u7b56\u7565\u3002\u6709\u8bb8\u591a\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u53ef\u80fd\u9009\u62e9\uff0c\u53ef\u80fd\u662f\u5728\u7ebf\u7684\u6216\u79bb\u7ebf\u7684\u3001\u540c\u7b56\u7565\u6216\u5f02\u7b56\u7565\u7684\u3001\u514d\u6a21\u578b\u6216\u6709\u6a21\u578b\u7684\u7b49\u3002\u901a\u5e38\u6839\u636e\u95ee\u9898\u7684\u5177\u4f53\u60c5\u51b5\uff0c\u9009\u62e9\u51e0\u79cd\u7b97\u6cd5\uff0c\u7136\u540e\u6311\u6027\u80fd\u6700\u597d\u7684\u3002<\/p>\n

\u901a\u8fc7\u505a\u5b9e\u9a8c\uff0c\u53c2\u6570\u8c03\u4f18\uff0c\u6bd4\u8f83\u7b97\u6cd5\u6027\u80fd\u3002\u5f3a\u5316\u5b66\u4e60\u5e94\u8be5\u4e0e\u76ee\u524d\u6700\u9ad8\u6c34\u5e73\u7684\u7b97\u6cd5\u5bf9\u6bd4\uff0c\u53ef\u80fd\u662f\u5176\u5b83\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\uff0c\u4e5f\u53ef\u80fd\u662f\u76d1\u7763\u5b66\u4e60\u3001\u60c5\u5883\u8001\u864e\u673a\u3001\u6216\u67d0\u79cd\u4f20\u7edf\u7b97\u6cd5\u3002\u4e3a\u4e86\u8c03\u4f18\u7b97\u6cd5\uff0c\u53ef\u80fd\u591a\u6b21\u8fed\u4ee3\u524d\u9762\u51e0\u6b65\u3002<\/p>\n

\u5f53\u8bad\u7ec3\u7684\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u6027\u80fd\u8db3\u591f\u597d\uff0c\u5c31\u628a\u5b83\u90e8\u7f72\u5230\u5b9e\u9645\u7cfb\u7edf\u4e2d\uff0c\u76d1\u63a7\u6027\u80fd\uff0c\u4e0d\u65ad\u8c03\u4f18\u7b97\u6cd5\u3002\u53ef\u80fd\u9700\u8981\u591a\u6b21\u8fed\u4ee3\u524d\u9762\u51e0\u6b65\uff0c\u8c03\u4f18\u7cfb\u7edf\u6027\u80fd\u3002
\"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c
\u4e0a\u56fe\u63cf\u8ff0\u4e86\u5e94\u7528\u5f3a\u5316\u5b66\u4e60\u7684\u6d41\u7a0b\uff0c\u7b80\u5355\u603b\u7ed3\u5982\u4e0b\u3002<\/p>\n

\u7b2c\u4e00\u6b65\uff1a\u5b9a\u4e49\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u3002\u5b9a\u4e49\u73af\u5883\u3001\u667a\u80fd\u4f53\u3001\u72b6\u6001\u3001\u52a8\u4f5c\u3001\u5956\u8d4f\u8fd9\u4e9b\u6838\u5fc3\u5143\u7d20\u3002<\/p>\n

\u7b2c\u4e8c\u6b65\uff1a\u6570\u636e\u51c6\u5907\uff0c\u6536\u96c6\u6570\u636e\uff0c\u9884\u5904\u7406\u3002<\/p>\n

\u7b2c\u4e09\u6b65\uff1a\u7279\u5f81\u5de5\u7a0b\uff0c\u4e00\u822c\u6839\u636e\u9886\u57df\u77e5\u8bc6\u624b\u52a8\u751f\u6210\uff0c\u4e5f\u53ef\u80fd\u4ee5\u7aef\u5230\u7aef\u7684\u65b9\u5f0f\u81ea\u52a8\u4ea7\u751f\u3002<\/p>\n

\u7b2c\u56db\u6b65\uff1a\u9009\u62e9\u8868\u5f81\u65b9\u5f0f\uff0c\u6709\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\u3001\u5176\u5b83\u975e\u7ebf\u6027\u3001\u7ebf\u6027\u3001\u751a\u81f3\u8868\u683c\u7b49\u8868\u5f81\u65b9\u5f0f\u3002<\/p>\n

\u7b2c\u4e94\u6b65\uff1a\u9009\u62e9\u7b97\u6cd5\uff0c\u6839\u636e\u95ee\u9898\u9009\u62e9\u51e0\u79cd\u7b97\u6cd5\u3002<\/p>\n

\u7b2c\u516d\u6b65\uff1a\u5b9e\u9a8c\u3001\u8c03\u4f18\u7cfb\u7edf\uff1b\u53ef\u80fd\u8981\u591a\u6b21\u8fed\u4ee3\u524d\u9762\u51e0\u6b65\u3002<\/p>\n

\u7b2c\u4e03\u6b65\uff1a\u90e8\u7f72\u3001\u8c03\u4f18\u7cfb\u7edf\u3002\u53ef\u80fd\u8981\u591a\u6b21\u8fed\u4ee3\u524d\u9762\u51e0\u6b65\u3002<\/p>\n

\u516b\u3001\u5f3a\u5316\u5b66\u4e60\u73b0\u5b58\u95ee\u9898\u53ca\u5efa\u8bae<\/h3>\n

\u5f3a\u5316\u5b66\u4e60\u867d\u7136\u53d6\u5f97\u4e86\u5f88\u591a\u9a84\u4eba\u7684\u6210\u7ee9\uff0c\u4f46\u662f\u4ecd\u7136\u5b58\u5728\u4e0d\u5c11\u95ee\u9898\u3002\u5f3a\u5316\u5b66\u4e60\u4e0e\u51fd\u6570\u8fd1\u4f3c\u7ed3\u5408\uff0c\u5c24\u5176\u4e0e\u6df1\u5ea6\u5b66\u4e60\u7ed3\u5408\uff0c\u5b58\u5728\u201c\u6b7b\u4ea1\u4e09\u7ec4\u5408\u201d (deadly triad) \u95ee\u9898\u3002\u5c31\u662f\u8bf4\uff0c\u5728\u5f02\u7b56\u7565\u3001\u51fd\u6570\u8fd1\u4f3c\u3001\u81ea\u52a9\u6cd5\u540c\u65f6\u7ed3\u5408\u65f6\uff0c\u8bad\u7ec3\u53ef\u80fd\u4f1a\u78b0\u5230\u4e0d\u7a33\u5b9a\u6216\u53d1\u6563\u7684\u95ee\u9898\u3002\u6837\u672c\u6548\u7387\u3001\u7a00\u758f\u5956\u8d4f\u3001\u6210\u7ee9\u5206\u914d\u3001\u63a2\u7d22-\u5229\u7528\u3001\u8868\u5f81\u7b49\u662f\u5e38\u89c1\u95ee\u9898\u3002\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8fd8\u6709\u53ef\u590d\u5236\u6027\u7684\u95ee\u9898\uff0c\u5b9e\u9a8c\u7ed3\u679c\u53ef\u80fd\u4f1a\u53d7\u5230\u7f51\u7edc\u7ed3\u6784\u3001\u5956\u8d4f\u6bd4\u4f8b\u3001\u968f\u673a\u79cd\u5b50\u3001\u968f\u673a\u5b9e\u9a8c\u3001\u73af\u5883\u3001\u7a0b\u5e8f\u5b9e\u73b0\u7b49\u7684\u5f71\u54cd\u3002\u5f3a\u5316\u5b66\u4e60\u540c\u673a\u5668\u5b66\u4e60\u4e00\u6837\u9762\u4e34\u4e00\u4e9b\u95ee\u9898\uff0c\u6bd4\u5982\u65f6\u95f4\u6548\u7387\u3001\u7a7a\u95f4\u6548\u7387\u3001\u53ef\u89e3\u91ca\u6027\u3001\u5b89\u5168\u6027\u3001\u53ef\u6269\u5c55\u6027\u3001\u9c81\u68d2\u6027\u3001\u7b80\u5355\u6027\u7b49\u7b49\u3002\u4ece\u79ef\u6781\u7684\u89d2\u5ea6\u770b\u5f85\uff0c\u7814\u53d1\u4eba\u5458\u4e00\u76f4\u5728\u8fd9\u4e9b\u65b9\u9762\u52aa\u529b\u5de5\u4f5c\u3002\u540e\u9762\u7ae0\u8282\u4f1a\u8fdb\u4e00\u6b65\u8ba8\u8bba\u3002<\/p>\n

\u5f3a\u5316\u5b66\u4e60\u867d\u7136\u6709\u8fd9\u4e48\u591a\u95ee\u9898\uff0c\u5374\u53ef\u4ee5\u7ed9\u5f88\u591a\u95ee\u9898\u63d0\u4f9b\u6709\u6548\u7684\u89e3\u51b3\u65b9\u6848\u3002\u9ebb\u7701\u7406\u5de5\u5b66\u9662(Massachusetts Institute of Technology, MIT) Dimitri Bertsekas\u6559\u6388\u662f\u5f3a\u5316\u5b66\u4e60\u9886\u57df\u6709\u5f71\u54cd\u7684\u7814\u7a76\u8005\u3002\u4ed6\u5bf9\u5f3a\u5316\u5b66\u4e60\u7684\u5e94\u7528\u6301\u8c28\u614e\u4e50\u89c2\u7684\u6001\u5ea6\u3002\u4ed6\u6307\u51fa\uff1a\u4e00\u65b9\u9762\uff0c\u8fd8\u6ca1\u6709\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u53ef\u4ee5\u89e3\u51b3\u6240\u6709\u751a\u81f3\u5927\u591a\u6570\u95ee\u9898\uff1b\u53e6\u4e00\u65b9\u9762\uff0c\u6709\u8db3\u591f\u591a\u7684\u65b9\u6cd5\u53bb\u5c1d\u8bd5\uff0c\u6709\u6bd4\u8f83\u597d\u7684\u53ef\u80fd\u6027\u5728\u5927\u591a\u6570\u95ee\u9898\u4e0a\u53d6\u5f97\u6210\u529f\uff0c\u6bd4\u5982\u786e\u5b9a\u6027\u95ee\u9898\u3001\u968f\u673a\u6027\u95ee\u9898\u3001\u52a8\u6001\u95ee\u9898\u3001\u79bb\u6563\u6216\u8fde\u7eed\u95ee\u9898\u3001\u5404\u7c7b\u6e38\u620f\u7b49\u7b49\u3002\u4ed6\u8bf4\uff1a\u6211\u4eec\u5f00\u59cb\u7528\u5f3a\u5316\u5b66\u4e60\u89e3\u51b3\u96be\u4ee5\u60f3\u8c61\u7684\u96be\u9898\uff01\u4ed6\u53c8\u8bf4\uff1a\u6211\u4eec\u524d\u9762\u7684\u5f3a\u5316\u5b66\u4e60\u65c5\u7a0b\u4ee4\u4eba\u6fc0\u52a8\uff01<\/p>\n

\u4e0b\u9762\u8ba8\u8bba\u51e0\u4e2a\u8bdd\u9898\uff0c\u5173\u4e8e\u73b0\u5b9e\u4e16\u754c\u4e2d\u5f3a\u5316\u5b66\u4e60\u9762\u4e34\u7684\u6311\u6218\uff0c\u673a\u5668\u4eba\u9ad8\u6548\u5b66\u4e60\u7684\u57fa\u7840\u3001\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5065\u5eb7\u7684\u53c2\u8003\u539f\u5219\u3001\u628a\u673a\u5668\u5b66\u4e60\u8d1f\u8d23\u4efb\u5730\u5e94\u7528\u4e8e\u5065\u5eb7\u533b\u7597\u3001\u4ee5\u53ca\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\u3002\u867d\u7136\u6709\u4e9b\u8bdd\u9898\u5173\u4e8e\u4eba\u5de5\u667a\u80fd\u6216\u673a\u5668\u5b66\u4e60\uff0c\u4f46\u5bf9\u5f3a\u5316\u5b66\u4e60\u53ca\u5176\u5e94\u7528\u4e5f\u6709\u53c2\u8003\u610f\u4e49\u3002<\/p>\n

1\u3001\u73b0\u5b9e\u4e16\u754c\u4e2d\u5f3a\u5316\u5b66\u4e60\u9762\u4e34\u7684\u6311\u6218<\/strong>
\u8c37\u6b4cDeepmind\u548c\u8c37\u6b4c\u7814\u7a76\u9662\u5408\u4f5c\u53d1\u8868\u8bba\u6587\uff0c\u7814\u7a76\u4e3a\u4ec0\u4e48\u5f3a\u5316\u5b66\u4e60\u867d\u7136\u5728\u6e38\u620f\u7b49\u95ee\u9898\u83b7\u5f97\u4e86\u5de8\u5927\u6210\u529f\uff0c\u4f46\u5728\u73b0\u5b9e\u4e16\u754c\u4e2d\u4ecd\u7136\u6ca1\u6709\u88ab\u5927\u89c4\u6a21\u5e94\u7528\u3002\u4ed6\u4eec\u8ba8\u8bba\u4e86\u4e0b\u9762\u4e5d\u4e2a\u5236\u7ea6\u56e0\u7d20\uff1a1\uff09\u80fd\u591f\u5bf9\u73b0\u573a\u7cfb\u7edf\u4ece\u6709\u9650\u7684\u91c7\u6837\u4e2d\u5b66\u4e60\uff1b2\uff09\u5904\u7406\u7cfb\u7edf\u6267\u884c\u5668\u3001\u4f20\u611f\u5668\u3001\u6216\u5956\u8d4f\u4e2d\u5b58\u5728\u7684\u672a\u77e5\u3001\u53ef\u80fd\u5f88\u5927\u7684\u5ef6\u8fdf\uff1b3\uff09\u5728\u9ad8\u7ef4\u72b6\u6001\u7a7a\u95f4\u548c\u52a8\u4f5c\u7a7a\u95f4\u5b66\u4e60\u3001\u884c\u52a8\uff1b4\uff09\u6ee1\u8db3\u7cfb\u7edf\u7ea6\u675f\uff0c\u6c38\u8fdc\u6216\u6781\u5c11\u8fdd\u53cd\uff1b5\uff09\u4e0e\u90e8\u5206\u53ef\u89c2\u5bdf\u7684\u7cfb\u7edf\u4ea4\u4e92\uff0c\u8fd9\u6837\u7684\u7cfb\u7edf\u53ef\u4ee5\u770b\u6210\u662f\u4e0d\u5e73\u7a33\u7684\u6216\u968f\u673a\u7684\uff1b6\uff09\u4ece\u591a\u76ee\u6807\u6216\u6ca1\u6709\u5f88\u597d\u6307\u660e\u7684\u5956\u8d4f\u51fd\u6570\u5b66\u4e60\uff1b7\uff09\u53ef\u4ee5\u63d0\u4f9b\u5b9e\u65f6\u52a8\u4f5c\uff0c\u5c24\u5176\u662f\u4e3a\u9ad8\u63a7\u5236\u9891\u7387\u7684\u7cfb\u7edf\uff1b8\uff09\u4ece\u5916\u90e8\u884c\u4e3a\u7b56\u7565\u7684\u56fa\u5b9a\u7684\u65e5\u5fd7\u6570\u636e\u79bb\u7ebf\u5b66\u4e60\uff1b9\uff09\u4e3a\u7cfb\u7edf\u64cd\u4f5c\u5458\u63d0\u4f9b\u53ef\u89e3\u91ca\u7684\u7b56\u7565\u3002\u4ed6\u4eec\u8fa8\u8bc6\u5e76\u5b9a\u4e49\u4e86\u8fd9\u4e9b\u6311\u6218\u56e0\u7d20\uff0c\u5bf9\u6bcf\u4e2a\u6311\u6218\u8bbe\u8ba1\u5b9e\u9a8c\u5e76\u505a\u5206\u6790\uff0c\u8bbe\u8ba1\u5b9e\u73b0\u57fa\u7ebf\u4efb\u52a1\u5305\u542b\u8fd9\u4e9b\u6311\u6218\u56e0\u7d20\uff0c\u5e76\u5f00\u6e90\u4e86\u8f6f\u4ef6\u5305\u3002<\/p>\n

2\u3001\u673a\u5668\u4eba\u9ad8\u6548\u5b66\u4e60\u7684\u57fa\u7840<\/strong>
\u5728\u76ee\u524d\u7684\u6df1\u5ea6\u5b66\u4e60\u3001\u5f3a\u5316\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60\u7684\u57fa\u7840\u4e0a\uff0c\u673a\u5668\u4eba\u5b66\u4e60\u7b97\u6cd5\u53d6\u5f97\u6210\u529f\u7684\u4e00\u4e2a\u5173\u952e\u56e0\u7d20\u662f\u9700\u8981\u5927\u91cf\u7684\u5b9e\u9645\u6570\u636e\u3002\u800c\u4e00\u4e2a\u901a\u7528\u673a\u5668\u4eba\u8981\u9762\u5bf9\u5404\u79cd\u5404\u6837\u7684\u60c5\u51b5\uff0c\u5219\u83b7\u53d6\u5927\u91cf\u8bad\u7ec3\u6570\u636e\u6210\u672c\u4f1a\u5f88\u9ad8\u3002\u8fd9\u6837\uff0c\u4e0b\u9762\u51e0\u4e2a\u65b9\u9762\u4f1a\u5f88\u5173\u952e\uff1a1\uff09\u91c7\u6837\u9ad8\u6548 (sample efficient)\uff0c\u9700\u8981\u6bd4\u8f83\u5c11\u7684\u8bad\u7ec3\u6570\u636e\uff1b2\uff09\u53ef\u6cdb\u5316\u6027 (generalizable)\uff0c\u8bad\u7ec3\u7684\u673a\u5668\u4eba\u4e0d\u5149\u80fd\u5e94\u7528\u4e8e\u8bad\u7ec3\u7684\u60c5\u51b5\uff0c\u8fd8\u53ef\u4ee5\u6269\u5c55\u5230\u5f88\u591a\u5176\u5b83\u60c5\u51b5\uff1b3\uff09\u7ec4\u5408\u65b9\u5f0f (compositional)\uff0c\u53ef\u4ee5\u901a\u8fc7\u4ee5\u524d\u7684\u77e5\u8bc6\u7ec4\u5408\u800c\u6210\uff1b4\uff09\u589e\u91cf\u65b9\u5f0f (incremental)\uff0c\u53ef\u4ee5\u9010\u6e10\u589e\u52a0\u65b0\u77e5\u8bc6\u548c\u65b0\u80fd\u529b\u3002\u76ee\u524d\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u867d\u7136\u53ef\u4ee5\u5b66\u4e60\u5f88\u591a\u65b0\u80fd\u529b\uff0c\u4e0d\u8fc7\uff0c\u4e00\u822c\u9700\u8981\u5f88\u591a\u6570\u636e\uff0c\u6cdb\u5316\u6027\u4e0d\u597d\uff0c\u4e0d\u662f\u901a\u8fc7\u7ec4\u5408\u6216\u589e\u91cf\u7684\u65b9\u5f0f\u8bad\u7ec3\u548c\u6267\u884c\u3002\u5b66\u4e60\u7b97\u6cd5\u5982\u679c\u8981\u83b7\u5f97\u6cdb\u5316\u80fd\u529b\uff0c\u9700\u8981\u5177\u5907\u56fa\u6709\u7684\u77e5\u8bc6\u6216\u7ed3\u6784\u7b49\u5f62\u5f0f\u7684\u5f52\u7eb3\u504f\u5411(inductive bias)\uff0c\u540c\u65f6\u63d0\u9ad8\u91c7\u6837\u6548\u7387\u3002\u7ec4\u5408\u6027\u548c\u589e\u91cf\u6027\u53ef\u4ee5\u901a\u8fc7\u7279\u5b9a\u7684\u7ed3\u6784\u5316\u7684\u5f52\u7eb3\u504f\u5411\u83b7\u5f97\uff0c\u628a\u5b66\u5230\u7684\u77e5\u8bc6\u5206\u89e3\u6210\u8bed\u4e49\u76f8\u4e92\u72ec\u7acb\u7684\u56e0\u5b50\uff0c\u5c31\u53ef\u4ee5\u901a\u8fc7\u7ec4\u5408\u89e3\u51b3\u66f4\u591a\u7684\u95ee\u9898\u3002<\/p>\n

\u5728\u5b66\u4e60\u7b97\u6cd5\u4e2d\u52a0\u5165\u5148\u9a8c\u77e5\u8bc6\u6216\u7ed3\u6784\uff0c\u6709\u4e00\u5b9a\u7684\u4e89\u8bae\u3002\u5f3a\u5316\u5b66\u4e60\u4e4b\u7236Richard Sutton\u8ba4\u4e3a\u4e0d\u5e94\u8be5\u5728\u5b66\u4e60\u7cfb\u7edf\u4e2d\u52a0\u5165\u4efb\u4f55\u5148\u9a8c\u77e5\u8bc6\uff0c\u56e0\u4e3a\u4eba\u5de5\u667a\u80fd\u7684\u5386\u53f2\u7ecf\u9a8c\u8868\u660e\uff0c\u6bcf\u6b21\u6211\u4eec\u60f3\u4eba\u4e3a\u52a0\u4e9b\u4e1c\u897f\uff0c\u7ed3\u679c\u90fd\u662f\u9519\u7684\u3002MIT\u6559\u6388Rodney Brooks\u5199\u535a\u5ba2\u505a\u4e86\u5f3a\u70c8\u56de\u5e94\u3002\u8fd9\u6837\u7684\u5b66\u672f\u4e89\u8bba\u5f88\u6709\u76ca\uff0c\u53ef\u4ee5\u8ba9\u6211\u4eec\u8fa9\u660e\u8bbe\u8ba1\u5b66\u4e60\u7cfb\u7edf\u65f6\u7684\u95ee\u9898\uff1a\u5e94\u8be5\u628a\u4ec0\u4e48\u6837\u7684\u5f52\u7eb3\u504f\u5411\u5305\u62ec\u5230\u5b66\u4e60\u7cfb\u7edf\u4e2d\uff0c\u53ef\u4ee5\u5e2e\u52a9\u4ece\u9002\u91cf\u7684\u6570\u636e\u4e2d\u5b66\u5230\u53ef\u4ee5\u6cdb\u5316\u7684\u77e5\u8bc6\uff0c\u540c\u65f6\u53c8\u4e0d\u4f1a\u5bfc\u81f4\u4e0d\u51c6\u786e\u6216\u8fc7\u5ea6\u7ea6\u675f\uff1f<\/p>\n

\u6709\u4e24\u79cd\u65b9\u5f0f\u53ef\u4ee5\u627e\u5230\u5408\u9002\u7684\u5f52\u7eb3\u504f\u5411\u3002\u4e00\u79cd\u662f\u5143\u5b66\u4e60(meta-learning)\u3002\u5728\u7cfb\u7edf\u8bbe\u8ba1\u9636\u6bb5\uff0c\u4ee5\u79bb\u7ebf\u65b9\u5f0f\u5b66\u4e60\u7ed3\u6784\u3001\u7b97\u6cd5\u3001\u5148\u9a8c\u77e5\u8bc6\uff1b\u8fd9\u6837\uff0c\u5230\u7cfb\u7edf\u90e8\u7f72\u540e\uff0c\u5c31\u53ef\u4ee5\u5728\u65b0\u73af\u5883\u9ad8\u6548\u5730\u5728\u7ebf\u5b66\u4e60\u3002\u5728\u7cfb\u7edf\u8bbe\u8ba1\u9636\u6bb5\uff0c\u5143\u5b66\u4e60\u901a\u8fc7\u53ef\u80fd\u5728\u90e8\u7f72\u540e\u78b0\u5230\u7684\u4efb\u52a1\u7684\u5927\u91cf\u8bad\u7ec3\u6570\u636e\uff0c\u5b66\u4e60\u4e00\u4e2a\u5b66\u4e60\u7b97\u6cd5\uff0c\u5f53\u9047\u5230\u65b0\u7684\u4efb\u52a1\u65f6\uff0c\u53ef\u4ee5\u5c3d\u53ef\u80fd\u9ad8\u6548\u5730\u5b66\u4e60\uff1b\u800c\u4e0d\u662f\u5b66\u4e60\u5bf9\u4e00\u4e2a\u73af\u5883\u597d\u7684\u7b97\u6cd5\uff0c\u6216\u662f\u8bd5\u56fe\u5b66\u4e00\u4e2a\u5bf9\u6240\u6709\u73af\u5883\u90fd\u597d\u7684\u7b97\u6cd5\u3002\u5143\u5b66\u4e60\u901a\u8fc7\u5b66\u4e60\u8bad\u7ec3\u4efb\u52a1\u4e4b\u95f4\u7684\u5171\u6027\uff0c\u5f62\u6210\u5148\u9a8c\u77e5\u8bc6\u6216\u5f52\u7eb3\u504f\u5411\uff0c\u8fd9\u6837\uff0c\u9047\u5230\u65b0\u4efb\u52a1\u5c31\u53ef\u4ee5\u4e3b\u8981\u53bb\u5b66\u4e60\u5dee\u5f02\u6027\u3002<\/p>\n

\u8fd8\u6709\u4e00\u4e9b\u53ef\u80fd\u7684\u65b9\u5411\uff0c\u5305\u62ec\u8ba9\u4eba\u6559\u673a\u5668\u4eba\uff0c\u4e0e\u5176\u5b83\u673a\u5668\u4eba\u5408\u4f5c\u5b66\u4e60\uff0c\u4fee\u6539\u673a\u5668\u4eba\u8f6f\u4ef6\u7684\u65f6\u5019\u4e00\u9053\u4fee\u6539\u786c\u4ef6\u3002\u5229\u7528\u4ece\u8ba1\u7b97\u673a\u79d1\u5b66\u4e0e\u5de5\u7a0b\u548c\u8ba4\u77e5\u795e\u7ecf\u79d1\u5b66\u83b7\u5f97\u7684\u7075\u611f\uff0c\u5e2e\u52a9\u8bbe\u8ba1\u673a\u5668\u5b66\u4e60\u7684\u7b97\u6cd5\u548c\u7ed3\u6784\u3002\u5377\u79ef\u795e\u7ecf\u5143\u7f51\u7edc(convolutional neural networks)\u662f\u4e00\u4e2a\u5f88\u597d\u7684\u4f8b\u5b50\u3002\u5377\u79ef\u5229\u7528\u4e86\u7ffb\u8bd1\u4e0d\u53d8\u6027(translation invariance)\uff0c\u5c31\u662f\u8bf4\uff0c\u7269\u4f53\u4e0d\u7ba1\u5728\u56fe\u50cf\u4e2d\u7684\u4ec0\u4e48\u4f4d\u7f6e\uff0c\u5176\u8868\u73b0\u57fa\u672c\u4e0d\u53d8\uff1b\u8fd8\u6709\u7a7a\u95f4\u5c40\u90e8\u6027(spatial locality)\uff0c\u5c31\u662f\u8bf4\uff0c\u4e00\u7ec4\u4e34\u8fd1\u7684\u50cf\u7d20\u5171\u540c\u63d0\u4f9b\u56fe\u7247\u7684\u4fe1\u606f\u3002\u7528\u4e86\u5377\u79ef\u8fd9\u6837\u7684\u5f52\u7eb3\u504f\u5411\uff0c\u795e\u7ecf\u5143\u7f51\u7edc\u7684\u53c2\u6570\u5c31\u5927\u5e45\u51cf\u5c11\u4e86\uff0c\u4e5f\u5927\u5e45\u51cf\u5c11\u4e86\u8bad\u7ec3\u3002\u54fa\u4e73\u52a8\u7269\u7684\u89c6\u89c9\u4e2d\u67a2\u5e94\u8be5\u5c31\u6709\u7c7b\u4f3c\u4e8e\u5377\u79ef\u8fd9\u6837\u7684\u8ba1\u7b97\u8fc7\u7a0b\u3002\u673a\u5668\u4eba\u548c\u5f3a\u5316\u5b66\u4e60\u9700\u8981\u7c7b\u4f3c\u7684\u7075\u611f\u6765\u8bbe\u8ba1\u66f4\u9ad8\u6548\u7684\u7b97\u6cd5\u3002<\/p>\n

3\u3001\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5065\u5eb7\u7684\u53c2\u8003\u539f\u5219<\/strong>
\u6700\u8fd1\u300a\u81ea\u7136\u533b\u5b66\u300b\u4e00\u7bc7\u77ed\u8bc4\u8bba\u6587\u8ba8\u8bba\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5065\u5eb7\u95ee\u9898\u65f6\uff0c\u8981\u8003\u8651\u7684\u51e0\u4e2a\u53c2\u8003\u539f\u5219\u3002\u7b2c\u4e00\uff0c\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u6700\u597d\u53ef\u4ee5\u4f7f\u7528\u5f71\u54cd\u51b3\u5b9a\u7684\u6240\u6709\u6570\u636e\u3002\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u9700\u8981\u83b7\u5f97\u533b\u751f\u53ef\u4ee5\u83b7\u5f97\u7684\u4fe1\u606f\u3002\u7b2c\u4e8c\uff0c\u6709\u6548\u6837\u672c\u91cf\u4e0e\u5b66\u5230\u7684\u7b56\u7565\u548c\u533b\u751f\u7684\u7b56\u7565\u4e4b\u95f4\u7684\u76f8\u4f3c\u5ea6\u76f8\u5173\uff1b\u76f8\u4f3c\u5ea6\u8d8a\u9ad8\u5219\u6709\u6548\u6837\u672c\u91cf\u8d8a\u5927\u3002\u5e8f\u5217\u4e2d\u51b3\u7b56\u8d8a\u591a\uff0c\u65b0\u7684\u7b56\u7565\u4e0e\u4ea7\u751f\u6570\u636e\u7684\u7b56\u7565\u4e0d\u540c\u7684\u53ef\u80fd\u6027\u5c31\u8d8a\u5927\u3002\u7b2c\u4e09\uff0c\u9700\u8981\u5ba1\u67e5\u5b66\u5230\u7684\u7b56\u7565\uff0c\u4f7f\u5176\u6709\u5408\u7406\u7684\u8868\u73b0\u3002\u9700\u8981\u8003\u5bdf\u95ee\u9898\u5efa\u6a21\u662f\u5426\u5408\u9002\uff0c\u6bd4\u5982\u5956\u8d4f\u51fd\u6570\u7684\u5b9a\u4e49\uff0c\u6570\u636e\u8bb0\u5f55\u53ca\u5904\u7406\u662f\u5426\u4f1a\u5f15\u5165\u8bef\u5dee\uff0c\u4ee5\u53ca\u7b56\u7565\u7684\u9002\u7528\u8303\u56f4\uff0c\u7b49\u7b49\u3002<\/p>\n

4\u3001\u628a\u673a\u5668\u5b66\u4e60\u8d1f\u8d23\u4efb\u5730\u5e94\u7528\u4e8e\u5065\u5eb7\u533b\u7597<\/strong><\/p>\n

\u6700\u8fd1\u300a\u81ea\u7136\u533b\u5b66\u300b\u53d1\u8868\u4e00\u7bc7\u89c2\u70b9\u8bba\u6587\uff0c\u8ba8\u8bba\u673a\u5668\u5b66\u4e60\u5728\u533b\u5b66\u4e2d\u4e3a\u4ec0\u4e48\u6ca1\u6709\u5e7f\u6cdb\u5e94\u7528\uff0c\u63d0\u51fa\u6210\u529f\u3001\u8d1f\u8d23\u4efb\u7684\u53d1\u5c55\u65b9\u6848\u3002<\/p>\n

\u7b2c\u4e00\uff0c\u9009\u62e9\u5408\u9002\u7684\u95ee\u9898\u3002\u786e\u5b9a\u6240\u7814\u53d1\u7684\u95ee\u9898\u5728\u5065\u5eb7\u533b\u7597\u4e2d\u6709\u610f\u4e49\uff0c\u6536\u96c6\u5408\u9002\u7684\u6570\u636e\uff0c\u5bf9\u9879\u76ee\u6210\u529f\u505a\u51fa\u660e\u786e\u5b9a\u4e49\u3002\u4e8e\u9879\u76ee\u65e9\u671f\uff0c\u5c31\u5728\u56e2\u961f\u4e2d\u5305\u62ec\u5229\u76ca\u76f8\u5173\u4eba\u5458\uff1aa) \u4e1a\u52a1\u4e13\u5bb6\uff0c\u5305\u62ec\u4e34\u5e8a\u533b\u751f\u3001\u673a\u5668\u5b66\u4e60\u7814\u7a76\u4eba\u5458\u3001\u5065\u5eb7\u533b\u7597\u4fe1\u606f\u6280\u672f\u4e13\u5bb6\u3001\u5f00\u53d1\u5b9e\u73b0\u4e13\u5bb6\uff1bb) \u51b3\u7b56\u8005\uff0c\u5305\u62ec\u533b\u9662\u7ba1\u7406\u4eba\u5458\u3001\u7814\u7a76\u673a\u6784\u7ba1\u7406\u4eba\u5458\u3001\u76d1\u7ba1\u90e8\u95e8\u4eba\u5458\u3001\u653f\u5e9c\u4eba\u5458\uff1bc) \u7528\u6237\uff0c\u5305\u62ec\u62a4\u58eb\u3001\u533b\u751f\u3001\u5b9e\u9a8c\u5ba4\u4eba\u5458\u3001\u75c5\u4eba\u3001\u5bb6\u4eba\u670b\u53cb\u3002<\/p>\n

\u7b2c\u4e8c\uff0c\u5f00\u53d1\u6709\u7528\u7684\u89e3\u51b3\u65b9\u6848\u3002\u5728\u9884\u6d4b\u4e00\u4e2a\u7ed3\u679c\u65f6\uff0c\u4e00\u5b9a\u8981\u4e86\u89e3\u6570\u636e\u662f\u4ec0\u4e48\u65f6\u5019\u5982\u4f55\u6536\u96c6\u7684\uff0c\u6536\u96c6\u6570\u636e\u7684\u76ee\u7684\u662f\u4ec0\u4e48\u3002\u5728\u6a21\u578b\u5e94\u7528\u7684\u73af\u5883\u4e2d\uff0c\u6570\u636e\u8981\u6709\u4ee3\u8868\u6027\u3002\u5728\u5f00\u53d1\u6a21\u578b\u7684\u8fc7\u7a0b\u4e2d\uff0c\u8981\u6539\u6b63\u7535\u5b50\u75c5\u5386\u6570\u636e\u4e2d\u5b58\u5728\u7684\u504f\u5411\uff0c\u5426\u5219\u4f1a\u964d\u4f4e\u6a21\u578b\u7684\u53ef\u9760\u6027\u3002<\/p>\n

\u7b2c\u4e09\uff0c\u8003\u8651\u4f26\u7406\u9053\u5fb7\u65b9\u9762\u7684\u56e0\u7d20\u3002\u52a0\u5165\u76f8\u5173\u4e13\u5bb6\uff0c\u6539\u6b63\u6570\u636e\u4e2d\u7684\u504f\u5411\u3002<\/p>\n

\u7b2c\u56db\uff0c\u5bf9\u6a21\u578b\u8fdb\u884c\u4e25\u683c\u7684\u8bc4\u4f30\u3002\u5728\u8bad\u7ec3\u548c\u6d4b\u8bd5\u6a21\u578b\u8fc7\u7a0b\u4e2d\uff0c\u4fdd\u8bc1\u6ca1\u6709\u6570\u636e\u6cc4\u9732\u53d1\u751f\u3002\u8bc4\u4f30\u6a21\u578b\u5728\u4ec0\u4e48\u60c5\u51b5\u5f88\u53ef\u80fd\u6210\u529f\u6216\u5931\u8d25\u3002\u7edf\u8ba1\u5206\u6790\u5e94\u8be5\u8003\u8651\u4e0e\u4e34\u5e8a\u76f8\u5173\u7684\u8bc4\u4ef7\u6307\u6807\u3002\u53e6\u5916\uff0c\u7528\u5b9a\u6027\u7684\u65b9\u5f0f\u8bc4\u4f30\uff0c\u53ef\u80fd\u53ef\u4ee5\u53d1\u73b0\u5b9a\u91cf\u7684\u65b9\u6cd5\u6ca1\u6709\u53d1\u73b0\u7684\u504f\u5411\u548c\u5e72\u6270(confounding)\u56e0\u7d20\u3002<\/p>\n

\u7b2c\u4e94\uff0c\u505a\u6df1\u601d\u719f\u8651\u7684\u6c47\u62a5\u3002\u8be6\u7ec6\u63cf\u8ff0\u6570\u636e\u6e90\u3001\u53c2\u4e0e\u8005\u3001\u7ed3\u679c\u3001\u9884\u6d4b\u53d8\u91cf\u3001\u4ee5\u53ca\u6a21\u578b\u672c\u8eab\u3002\u62a5\u544a\u6a21\u578b\u5728\u4ec0\u4e48\u60c5\u5883\u4e0b\u9a8c\u8bc1\u3001\u5e94\u7528\uff0c\u9700\u8981\u6ee1\u8db3\u4ec0\u4e48\u5047\u8bbe\u6216\u6761\u4ef6\u3002\u5206\u4eab\u751f\u6210\u7ed3\u679c\u7684\u4ee3\u7801\u3001\u8f6f\u4ef6\u5305\u3001\u8f93\u5165\u6570\u636e\uff0c\u4ee5\u53ca\u652f\u6301\u6587\u6863\u3002\u5bf9\u4e0b\u9762\u4e24\u79cd\u6280\u672f\u8def\u7ebf\u7684\u6743\u8861\u5206\u6790\uff1a\u7b80\u5355\u3001\u5feb\u901f\u3001\u53ef\u89e3\u91ca\u7684\u6a21\u578b\u4e0e\u590d\u6742\u3001\u6bd4\u8f83\u6162\u5374\u66f4\u51c6\u786e\u7684\u6a21\u578b\uff0c\u63d0\u4f9b\u5e2e\u52a9\u4fe1\u606f\u3002<\/p>\n

\u7b2c\u516d\uff0c\u8d1f\u8d23\u4efb\u5730\u90e8\u7f72\u3002\u5bf9\u4e8e\u5b66\u5230\u7684\u6a21\u578b\uff0c\u5e94\u8be5\u5148\u5b9e\u65f6\u9884\u6d4b\u7ed3\u679c\uff0c\u8ba9\u4e34\u5e8a\u4e13\u5bb6\u8bc4\u4f30\u5176\u6709\u6548\u6027\uff0c\u518d\u7ed9\u75c5\u4eba\u7528\u3002\u4e86\u89e3\u5982\u4f55\u628a\u5e72\u9884\u7b56\u7565\u4e0e\u533b\u62a4\u56e2\u961f\u7684\u5de5\u4f5c\u6d41\u6574\u5408\u5230\u4e00\u8d77\u4e5f\u5f88\u91cd\u8981\u3002\u75c5\u4eba\u7fa4\u4f53\u3001\u4e34\u5e8a\u89c4\u8303\u7ecf\u5e38\u53d8\u5316\uff0c\u5e94\u8be5\u7ecf\u5e38\u76d1\u89c6\u5e76\u8bc4\u4f30\u6a21\u578b\u7684\u53ef\u9760\u6027\u548c\u9519\u8bef\uff0c\u5e76\u5bf9\u6a21\u578b\u505a\u76f8\u5e94\u6539\u8fdb\u3002<\/p>\n

\u7b2c\u4e03\uff0c\u63a8\u5411\u5e02\u573a\u3002\u673a\u5668\u5b66\u4e60\u5065\u5eb7\u533b\u7597\u5de5\u5177\u5fc5\u987b\u8981\u6ee1\u8db3\u6240\u5728\u56fd\u5bb6\u7684\u76d1\u7ba1\u8981\u6c42\u3002<\/p>\n

\u5728\u5065\u5eb7\u533b\u7597\u3001\u81ea\u52a8\u9a7e\u9a76\u7b49\u7cfb\u7edf\u4e2d\uff0c\u5f00\u53d1\u90e8\u7f72\u9ad8\u6548\u7684\u673a\u5668\u5b66\u4e60\u7cfb\u7edf\u5b58\u5728\u5f88\u591a\u590d\u6742\u7684\u95ee\u9898\u3002\u4e0a\u9762\u7684\u53d1\u5c55\u65b9\u6848\u53ef\u4ee5\u5e2e\u52a9\u89e3\u51b3\u5065\u5eb7\u533b\u7597\u4e2d\u7684\u95ee\u9898\uff0c\u5bf9\u5176\u5b83\u9886\u57df\u4e5f\u4f1a\u6709\u53c2\u8003\u610f\u4e49\u3002<\/p>\n

\u867d\u7136\u79bb\u673a\u5668\u5b66\u4e60\u5927\u89c4\u6a21\u5e94\u7528\u4e8e\u533b\u7597\u5065\u5eb7\u8fd8\u6709\u5f88\u957f\u7684\u8def\uff0c\u4f46\u662f\uff0c\u653f\u7b56\u5236\u5b9a\u8005\u3001\u5065\u5eb7\u533b\u7597\u7ba1\u7406\u4eba\u5458\u3001\u7814\u53d1\u4eba\u5458\u7b49\u6b63\u5728\u901a\u529b\u5408\u4f5c\u3002\u5728\u5927\u529b\u53d1\u5c55\u667a\u6167\u533b\u7597\u65f6\uff0c\u6211\u4eec\u6709\u65f6\u53ef\u80fd\u9700\u8981\u6682\u65f6\u653e\u7f13\u811a\u6b65\uff0c\u91cd\u6e29\u5e0c\u6ce2\u683c\u62c9\u5e95\u8a93\u8a00(Hippocratic oath)\uff0c\u5373\u533b\u751f\u8a93\u7ea6\uff0c\u5176\u9996\u8981\u4e4b\u52a1\u5c31\u662f\u4e0d\u53ef\u4f24\u5bb3(first, do no harm)\u3002<\/p>\n

5\u3001\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\uff1a\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u4ee3\u8868\u4e00\u79cd\u65b0\u7684\u5546\u4e1a\u6a21\u5f0f<\/strong><\/p>\n

\u8fd9\u91cc\u4ecb\u7ecd\u4e00\u7bc7Andreessen Horowitz\u6295\u8d44\u516c\u53f8\u7f51\u7ad9\u4e0a\u7684\u4e00\u7bc7\u535a\u5ba2\uff0c\u8ba8\u8bba\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u4ee3\u8868\u4e00\u79cd\u65b0\u7684\u5546\u4e1a\u6a21\u5f0f\uff0c\u4e0e\u4f20\u7edf\u7684\u8f6f\u4ef6\u4e1a\u6709\u6240\u4e0d\u540c\uff0c\u66f4\u50cf\u662f\u4f20\u7edf\u7684\u8f6f\u4ef6\u670d\u52a1\u516c\u53f8\u3002<\/p>\n

\u8f6f\u4ef6\u7684\u4f18\u52bf\u5728\u4e8e\u751f\u4ea7\u4e00\u6b21\u5c31\u53ef\u4ee5\u5356\u5f88\u591a\u6b21\u3002\u8fd9\u6837\uff0c\u5c31\u5e26\u6765\u91cd\u590d\u7684\u6536\u76ca\u6d41\u3001\u9ad8\u5229\u6da6\u3001\u6709\u7684\u65f6\u5019\u8fd8\u6709\u8d85\u7ebf\u6027\u89c4\u6a21\u5316\uff0c\u800c\u4e14\u77e5\u8bc6\u4ea7\u6743\uff0c\u4e00\u822c\u662f\u7a0b\u5e8f\uff0c\u53ef\u4ee5\u5f62\u6210\u9ad8\u62a4\u57ce\u6cb3\u3002\u5728\u8f6f\u4ef6\u670d\u52a1\u4e1a\u4e2d\uff0c\u6bcf\u4e2a\u9879\u76ee\u9700\u8981\u4e13\u95e8\u7684\u5f00\u53d1\u4eba\u5458\uff0c\u7136\u540e\u53ea\u80fd\u5356\u4e00\u6b21\u3002\u8fd9\u6837\uff0c\u6536\u76ca\u4e0d\u80fd\u91cd\u590d\uff0c\u603b\u5229\u6da6\u4f4e\uff0c\u6700\u597d\u7684\u60c5\u51b5\u5c31\u662f\u7ebf\u6027\u589e\u957f\u3002\u540c\u65f6\uff0c\u4e0d\u5bb9\u6613\u5efa\u62a4\u57ce\u6cb3\u3002<\/p>\n

\u4eba\u5de5\u667a\u80fd\u516c\u53f8\uff0c\u56e0\u4e3a\u5bf9\u4e91\u8ba1\u7b97\u5e73\u53f0\u7684\u4f9d\u8d56\uff0c\u52a0\u4e0a\u9700\u8981\u4e0d\u65ad\u7684\u4eba\u5de5\u652f\u6301\uff0c\u603b\u5229\u6da6\u6bd4\u8f83\u4f4e\uff1b\u56e0\u4e3a\u8981\u5904\u7406\u9ebb\u70e6\u7684\u8fb9\u7f18\u60c5\u51b5\uff0c\u4e0a\u89c4\u6a21\u5145\u6ee1\u6311\u6218\uff1b\u56e0\u4e3a\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u7684\u5546\u54c1\u5316\uff0c\u4ee5\u4e3a\u6570\u636e\u662f\u7ade\u4e89\u8d44\u6e90\u5e76\u5177\u6709\u91ce\u86ee\u751f\u957f\u7684\u7f51\u7edc\u6548\u5e94\u800c\u4e8b\u5b9e\u5e76\u975e\u5982\u6b64\uff0c\u62a4\u57ce\u6cb3\u6bd4\u8f83\u5f31\u3002<\/p>\n

\u5927\u591a\u6570\u4eba\u5de5\u667a\u80fd\u5e94\u7528\u7a0b\u5e8f\u770b\u8d77\u6765\u50cf\u8f6f\u4ef6\uff0c\u4e0e\u7528\u6237\u4ea4\u4e92\uff0c\u7ba1\u7406\u6570\u636e\uff0c\u4e0e\u5176\u5b83\u7cfb\u7edf\u878d\u5408\u7b49\u3002\u4f46\u5176\u6838\u5fc3\u662f\u4e00\u7ec4\u8bad\u7ec3\u597d\u7684\u6570\u636e\u6a21\u578b\uff0c\u7ef4\u62a4\u8d77\u6765\u66f4\u50cf\u662f\u8f6f\u4ef6\u670d\u52a1\u3002\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u770b\u7740\u50cf\u8f6f\u4ef6\u516c\u53f8\u4e0e\u8f6f\u4ef6\u670d\u52a1\u516c\u53f8\u7684\u67d0\u79cd\u7ec4\u5408\uff0c\u4ece\u603b\u5229\u6da6\u3001\u89c4\u6a21\u5316\u3001\u9632\u5fa1\u6027\u7b49\u65b9\u9762\u770b\uff0c\u4ee3\u8868\u4e86\u4e00\u79cd\u65b0\u7684\u5546\u4e1a\u6a21\u5f0f\u3002<\/p>\n

\u603b\u5229\u6da6\u65b9\u9762\uff0c\u5bf9\u4e8e\u4eba\u5de5\u667a\u80fd\u516c\u53f8\uff0c\u4e91\u8ba1\u7b97\u5e73\u53f0\u5e26\u6765\u76f8\u5f53\u5927\u7684\u82b1\u8d39\uff0c\u5305\u62ec\u8bad\u7ec3\u6a21\u578b\u3001\u6a21\u578b\u63a8\u65ad\u3001\u5904\u7406\u4e30\u5bcc\u7684\u5a92\u4f53\u7c7b\u578b\u3001\u590d\u6742\u7684\u4e91\u64cd\u4f5c\u7b49\u3002\u4eba\u5de5\u667a\u80fd\u5e94\u7528\u7a0b\u5e8f\u4f9d\u8d56\u4eba\u5de5\u4f5c\u4e3a\u7cfb\u7edf\u7684\u4e00\u90e8\u5206\uff0c\u5e2e\u52a9\u6e05\u6d17\u3001\u6807\u6ce8\u5927\u91cf\u6570\u636e\uff0c\u6216\u9700\u8981\u5b9e\u65f6\u7684\u5e2e\u52a9\uff0c\u6bd4\u5982\u5728\u8ba4\u77e5\u63a8\u7406\u4efb\u52a1\u4e2d\uff0c\u4ee5\u83b7\u5f97\u9ad8\u51c6\u786e\u7387\u3002\u8fd9\u4e9b\u4f1a\u8ba9\u603b\u5229\u6da6\u964d\u4f4e\u3002\u5f53\u4eba\u5de5\u667a\u80fd\u7684\u6027\u80fd\u9010\u6b65\u63d0\u9ad8\uff0c\u4eba\u5de5\u7684\u53c2\u4e0e\u4f1a\u8d8a\u6765\u8d8a\u5c11\uff0c\u4f46\u5f88\u53ef\u80fd\u4e0d\u4f1a\u4e00\u70b9\u6ca1\u6709\u3002\u56e0\u4e3a\u4eba\u5de5\u667a\u80fd\u7ecf\u5e38\u9762\u5bf9\u957f\u5c3e\u6548\u5e94\uff0c\u6216\u8005\u8bf4\u8981\u7ecf\u5e38\u5904\u7406\u8fb9\u7f18\u60c5\u51b5\uff0c\u5f88\u96be\u8ba9\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u89c4\u6a21\u5316\u3002\u800c\u4fdd\u62a4\u4eba\u5de5\u667a\u80fd\u5546\u4e1a\u7684\u65b9\u6848\u8fd8\u6ca1\u6709\u6210\u578b\u3002\u4eba\u5de5\u667a\u80fd\u4ea7\u54c1\u4e0e\u7eaf\u8f6f\u4ef6\u4ea7\u54c1\u6bd4\u8d77\u6765\uff0c\u4e0d\u4e00\u5b9a\u66f4\u96be\u9632\u5fa1\u3002\u4e0d\u8fc7\uff0c\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u7684\u62a4\u57ce\u6cb3\u770b\u8d77\u6765\u6bd4\u9884\u60f3\u7684\u8981\u6d45\u3002<\/p>\n

\u4e0b\u9762\u7ed9\u521b\u4e1a\u4eba\u5458\u4e00\u4e9b\u5b9e\u7528\u5efa\u8bae\uff0c\u521b\u5efa\u3001\u89c4\u6a21\u5316\u3001\u9632\u5fa1\u4f1f\u5927\u7684\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u30021\uff09\u5c3d\u53ef\u80fd\u6d88\u9664\u6a21\u578b\u7684\u590d\u6742\u6027\u30022\uff09\u4ed4\u7ec6\u9009\u62e9\u95ee\u9898\u9886\u57df\uff0c\u4e00\u822c\u9009\u62e9\u7a84\u9886\u57df\uff0c\u964d\u4f4e\u6570\u636e\u590d\u6742\u6027\uff0c\u6700\u5c0f\u5316\u8fb9\u7f18\u60c5\u51b5\u7684\u6311\u6218\u30023\uff09\u4e3a\u9ad8\u53d8\u52a8\u7684\u8d39\u7528\u505a\u597d\u6253\u7b97\u30024\uff09\u62e5\u62b1\u670d\u52a1\u3002\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u957f\u671f\u6210\u529f\u7684\u5173\u952e\u662f\u628a\u8f6f\u4ef6\u548c\u670d\u52a1\u7684\u4f18\u70b9\u7ed3\u5408\u8d77\u6765\u30025\uff09\u4e3a\u65b0\u6280\u672f\u7684\u4e0d\u65ad\u51fa\u73b0\u505a\u597d\u6253\u7b97\u30026\uff09\u7528\u65e7\u7684\u65b9\u5f0f\u6784\u5efa\u9632\u5fa1\u80fd\u529b\uff0c\u597d\u5546\u4e1a\u603b\u9700\u8981\u597d\u4ea7\u54c1\u548c\u79c1\u6709\u6570\u636e\u3002<\/p>\n

6\u3001\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\uff1a\u5f25\u8865\u6982\u5ff5\u9a8c\u8bc1\u4e0e\u4ea7\u54c1\u7684\u5dee\u8ddd<\/strong><\/p>\n

\u4e0b\u9762\u8ba8\u8bba\u5434\u6069\u8fbe(Andrew Ng)\u535a\u58eb\u4e8e2020\u5e7410\u6708\u521d\u5728\u65af\u5766\u798f\u5927\u5b66\u4ee5\u4eba\u4e3a\u672c\u4eba\u5de5\u667a\u80fd(Human-Centered AI, HAI)\u7814\u7a76\u9662\u505a\u7684\u4e00\u4e2a\u5b66\u672f\u62a5\u544a\u3002\u8ba8\u8bba\u4e86\u5982\u4f55\u5f25\u8865\u4eba\u5de5\u667a\u80fd\u4e2d\u6982\u5ff5\u9a8c\u8bc1\u4e0e\u4ea7\u54c1\u7684\u5dee\u8ddd\uff0c\u5305\u62ec\u4ee5\u4e0b\u4e09\u4e2a\u65b9\u9762\uff1a1\uff09\u5c0f\u6570\u636e\uff1b2\uff09\u6cdb\u5316\u6027\u548c\u9c81\u68d2\u6027\uff1b3\uff09\u53d8\u5316\u7ba1\u7406\u3002<\/p>\n

\u5c0f\u6570\u636e\u7b97\u6cd5\u5305\u62ec\u5408\u6210\u6570\u636e\u751f\u6210\uff0c\u6bd4\u5982\u751f\u6210\u5bf9\u6297\u7f51\u7edc(Generative Adversarial Networks, GANs)\uff0c\u5355\u6837\u672c\/\u5c11\u6837\u672c\u5b66\u4e60\uff0c\u81ea\u6211\u76d1\u7763\u5b66\u4e60\uff0c\u8fc1\u79fb\u5b66\u4e60\uff0c\u5f02\u5e38\u68c0\u6d4b\u7b49\u3002<\/p>\n

\u5728\u4e00\u4e2a\u6570\u636e\u96c6\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u662f\u5426\u80fd\u6cdb\u5316\u5230\u5176\u5b83\u6570\u636e\u96c6\u4e0a\uff1f\u662f\u4e2a\u95ee\u9898\u3002\u8bba\u6587\u91cc\u5de5\u4f5c\u7684\u6a21\u578b\uff0c\u5728\u4ea7\u54c1\u4e2d\u7ecf\u5e38\u4e0d\u5de5\u4f5c\u3002\u4eba\u5de5\u667a\u80fd\u4ea7\u54c1\u9879\u76ee\u9664\u4e86\u673a\u5668\u5b66\u4e60\u7a0b\u5e8f\uff0c\u8fd8\u5305\u62ec\u6807\u6ce8\u5b9a\u4e49\u3001\u6b67\u4e49\u6d88\u89e3\u3001\u9ad8\u6548\u6279\u91cf\u6570\u636e\u6807\u6ce8\u3001\u4e3a\u7f55\u89c1\u6807\u6ce8\u751f\u6210\u6570\u636e\u3001\u6570\u636e\u9a8c\u8bc1\u3001\u6570\u636e\u5206\u6790\u3001\u73af\u5883\u53d8\u5316\u68c0\u6d4b\u3001\u6570\u636e\u53ca\u6a21\u578b\u7248\u672c\u63a7\u5236\u3001\u8fc7\u7a0b\u7ba1\u7406\u5de5\u5177\u3001\u6a21\u578b\u6027\u80fd\u4e0b\u964d\u68c0\u6d4b\u7b49\u7b49\u3002<\/p>\n

\u7ba1\u7406\u6280\u672f\u5e26\u6765\u7684\u53d8\u5316\uff0c\u5305\u62ec\u8ba1\u5212\u8db3\u591f\u7684\u65f6\u95f4\u3001\u53d1\u73b0\u6240\u6709\u7684\u5229\u76ca\u76f8\u5173\u8005\uff0c\u63d0\u4f9b\u518d\u4fdd\u8bc1\uff0c\u89e3\u91ca\u5728\u53d1\u751f\u4ec0\u4e48\uff0c\u505a\u89c4\u6a21\u5408\u9002\u7684\u7b2c\u4e00\u4e2a\u9879\u76ee\u3002\u8fd9\u91cc\u5173\u952e\u7684\u6280\u672f\u662f\u53ef\u89e3\u91ca\u4eba\u5de5\u667a\u80fd\u548c\u5ba1\u8ba1\u3002<\/p>\n

\u673a\u5668\u5b66\u4e60\u9879\u76ee\u7684\u5468\u671f\u5305\u62ec\uff1a\u786e\u5b9a\u8303\u56f4\uff0c\u51b3\u5b9a\u8981\u89e3\u51b3\u7684\u95ee\u9898\uff1b\u4e3a\u6a21\u578b\u83b7\u53d6\u6570\u636e\uff1b\u6784\u5efa\u3001\u8bad\u7ec3\u6a21\u578b\uff1b\u90e8\u7f72\uff0c\u8fd0\u884c\u4ea7\u54c1\u521b\u9020\u4ef7\u503c\u3002\u4e0b\u9762\u4ece\u540e\u5f80\u524d\u8ba8\u8bba\u8fd9\u51e0\u4e2a\u9636\u6bb5\u3002<\/p>\n

\u5148\u8bf4\u90e8\u7f72\u3002\u5728\u4e91\u5e73\u53f0\u6216\u8fb9\u7f18\u8bbe\u5907\u4e0a\u5b9e\u73b0\u3002\u6700\u521d\u7684\u90e8\u7f72\u5141\u8bb8\u5206\u6790\u7ed3\u679c\uff0c\u8c03\u6574\u53c2\u6570\u548c\u6a21\u578b\u3002\u53ef\u4ee5\u91c7\u53d6\u5f71\u5b50\u90e8\u7f72\u7684\u65b9\u5f0f\uff0c\u5e76\u4e0d\u771f\u6b63\u51b3\u7b56\uff0c\u53ea\u662f\u5b9e\u65f6\u76d1\u89c6\u6027\u80fd\u3002\u4e5f\u53ef\u4ee5\u5148\u5c0f\u89c4\u6a21\u90e8\u7f72\u3002\u7136\u540e\u9010\u6e10\u52a0\u5927\u90e8\u7f72\u529b\u5ea6\u3002\u957f\u671f\u4fdd\u6301\u76d1\u89c6\u548c\u7ef4\u62a4\u72b6\u6001\u3002<\/p>\n

\u4eba\u5de5\u667a\u80fd\u6a21\u578b\u7684\u6784\u5efa\u548c\u8bad\u7ec3\uff0c\u662f\u9ad8\u5ea6\u8fed\u4ee3\u7684\u8fc7\u7a0b\uff0c\u4ece\u4eba\u5de5\u667a\u80fd\u4f53\u7cfb\u7ed3\u6784\uff0c\u5305\u62ec\u7b97\u6cd5\u3001\u6570\u636e\u7b49\uff0c\u5230\u7a0b\u5e8f\u548c\u8bad\u7ec3\uff0c\u518d\u5230\u5206\u6790\uff0c\u51e0\u4e2a\u9636\u6bb5\u53cd\u590d\u5faa\u73af\uff1b\u662f\u5f00\u53d1\u8fc7\u7a0b\uff0c\u66f4\u662f\u53d1\u73b0\u5e76\u4fee\u6539\u9519\u8bef\u7684\u8fc7\u7a0b\u3002\u673a\u5668\u5b66\u4e60\u6a21\u578b\u7531\u8bad\u7ec3\u6570\u636e\u3001\u8d85\u53c2\u6570\u3001\u7b97\u6cd5\/\u7a0b\u5e8f\u6784\u6210\u3002\u4e00\u822c\u901a\u8fc7\u4fee\u6539\u8d85\u53c2\u6570\u3001\u7b97\u6cd5\/\u7a0b\u5e8f\u8bd5\u56fe\u6539\u8fdb\u6027\u80fd\uff0c\u4e5f\u5e94\u8be5\u4f7f\u7528\u4e0d\u540c\u7684\u8bad\u7ec3\u6570\u636e\uff0c\u63d0\u9ad8\u6a21\u578b\u7684\u6cdb\u5316\u6027\u3002<\/p>\n

\u5bf9\u4e8e\u83b7\u53d6\u6a21\u578b\u9700\u8981\u7684\u6570\u636e\uff0c\u5e76\u4e0d\u9700\u8981\u7b49\u5230\u6709\u8db3\u591f\u5b8c\u7f8e\u7684\u6570\u636e\u624d\u5f00\u59cb\u9879\u76ee\uff0c\u5374\u9700\u8981\u660e\u786e\u6570\u636e\u7684\u5b9a\u4e49\uff0c\u6bd4\u5982\uff0c\u5982\u4f55\u5b9a\u4e49\u56fe\u50cf\u8fb9\u754c\u6846\uff0c\u6216\u5982\u4f55\u5904\u7406\u4e13\u5bb6\u4e0d\u540c\u7684\u610f\u89c1\uff0c\u7b49\u7b49\u3002<\/p>\n

\u5bf9\u4e8e\u8303\u56f4\u548c\u8981\u89e3\u51b3\u7684\u95ee\u9898\uff0c\u53ef\u4ee5\u5934\u8111\u98ce\u66b4\u5546\u4e1a\u95ee\u9898\u3001\u6280\u672f\u89e3\u51b3\u65b9\u6848\uff0c\u5bf9\u4ef7\u503c\u548c\u53ef\u884c\u6027\u505a\u5c3d\u804c\u8c03\u67e5\uff0c\u914d\u7f6e\u8d44\u6e90\uff0c\u5236\u5b9a\u8ba1\u5212\uff0c\u786e\u5b9a\u91cc\u7a0b\u7891\u3002\u4eba\u5de5\u667a\u80fd\u4e13\u5bb6\u786e\u5b9a\u4ec0\u4e48\u53ef\u4ee5\u505a\uff0c\u9886\u57df\u4e13\u5bb6\u786e\u5b9a\u4ec0\u4e48\u662f\u6709\u4ef7\u503c\u7684\uff0c\u8fd9\u4e24\u4e2a\u7684\u4ea4\u96c6\u91cc\u5c31\u6709\u8981\u89e3\u51b3\u7684\u95ee\u9898\u3002<\/p>\n

\u673a\u5668\u5b66\u4e60\u9879\u76ee\u7684\u5468\u671f\u4e2d\uff1a\u786e\u5b9a\u8303\u56f4\uff0c\u51b3\u5b9a\u8981\u89e3\u51b3\u7684\u95ee\u9898\uff0c\u9700\u8981\u4e0d\u540c\u529f\u80fd\u90e8\u95e8\u7684\u5934\u8111\u98ce\u66b4\uff1b\u4e3a\u6a21\u578b\u83b7\u53d6\u6570\u636e\uff0c\u9700\u8981\u4e0d\u540c\u529f\u80fd\u90e8\u95e8\u7684\u6267\u884c\uff1b\u6784\u5efa\u3001\u8bad\u7ec3\u6a21\u578b\uff0c\u9700\u8981\u4eba\u5de5\u667a\u80fd\u7684\u7814\u7a76\uff1b\u90e8\u7f72\uff0c\u8fd0\u884c\u4ea7\u54c1\u521b\u9020\u4ef7\u503c\uff0c\u9700\u8981\u673a\u5668\u5b66\u4e60\u5f00\u53d1\u3001\u8f6f\u4ef6\u5f00\u53d1\u3002<\/p>\n

\u4eba\u5de5\u667a\u80fd\u5728\u7528\u6237\u4e92\u8054\u7f51\u9886\u57df\u5df2\u7ecf\u521b\u9020\u4e86\u4ef7\u503c\u3002\u5728\u4e92\u8054\u7f51\u4e4b\u5916\uff0c\u4eba\u5de5\u667a\u80fd\u4ecd\u7136\u5b58\u5728\u5927\u91cf\u6ca1\u6709\u5f00\u53d1\u7684\u673a\u4f1a\u3002\u9884\u8ba1\u96f6\u552e\u3001\u65c5\u6e38\u3001\u4ea4\u901a\u3001\u7269\u6d41\u3001\u6c7d\u8f66\u3001\u6750\u6599\u3001\u7535\u5b50\/\u534a\u5bfc\u4f53\u3001\u5065\u5eb7\u3001\u9ad8\u79d1\u6280\u3001\u901a\u4fe1\u3001\u80fd\u6e90\u3001\u519c\u4e1a\u7b49\u9886\u57df2030\u5e74\u5c06\u521b\u902013\u4e07\u4ebf\u7f8e\u5143\u7684\u4ef7\u503c\u3002\u4f46\u5728\u5f88\u591a\u5176\u5b83\u9886\u57df\u4ecd\u7136\u9700\u8981\u5f25\u8865\u6982\u5ff5\u9a8c\u8bc1\u5230\u4ea7\u54c1\u7684\u5dee\u8ddd\u3002\u5b66\u672f\u754c\u548c\u5de5\u4e1a\u754c\u5e94\u8be5\u8054\u5408\u8d77\u6765\u628a\u673a\u5668\u5b66\u4e60\u53d8\u6210\u4e00\u4e2a\u7cfb\u7edf\u5de5\u7a0b\u5b66\u79d1\u3002<\/p>\n

\u4e5d\u3001\u300a\u673a\u5668\u5b66\u4e60\u300b\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e13\u520a<\/h3>\n

\u5f3a\u5316\u5b66\u4e60\u662f\u4e00\u7c7b\u901a\u7528\u7684\u5b66\u4e60\u3001\u9884\u6d4b\u3001\u51b3\u7b56\u7684\u65b9\u6cd5\u6846\u67b6\uff0c\u5728\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\u7b49\u9886\u57df\u6709\u5e7f\u6cdb\u5e94\u7528\u3002\u5df2\u7ecf\u5728\u96c5\u8fbe\u5229\u6e38\u620f\u3001AlphaGo\u3001\u673a\u5668\u4eba\u3001\u63a8\u8350\u7cfb\u7edf\u3001AutoML\u7b49\u9886\u57df\u53d6\u5f97\u4e86\u7a81\u51fa\u6210\u7ee9\u3002\u4e0d\u8fc7\uff0c\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u5230\u5b9e\u9645\u573a\u666f\u4e2d\u4ecd\u7136\u6709\u5f88\u591a\u6311\u6218\u3002\u8fd9\u6837\uff0c\u6211\u4eec\u5f88\u81ea\u7136\u5730\u4f1a\u95ee\uff1a\u95ee\u9898\u662f\u4ec0\u4e48\uff0c\u5982\u4f55\u89e3\u51b3\uff1f<\/p>\n

\u7b14\u8005\u4e0eAlborz Geramifard (\u8138\u4e66), Lihong Li (\u8c37\u6b4c), Csaba Szepesvari (Deepmind & \u963f\u5c14\u4f2f\u5854\u5927\u5b66), Tao Wang (\u82f9\u679c) \u62c5\u4efb\u673a\u5668\u5b66\u4e60\u9876\u7ea7\u671f\u520a\u300a\u673a\u5668\u5b66\u4e60\u300b\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e13\u520a\u7684\u5ba2\u5ea7\u7f16\u8f91\u3002\u8fd9\u4e2a\u4e13\u520a\u7684\u4e3b\u8981\u76ee\u6807\u4e3a\uff1a(1) \u786e\u5b9a\u80fd\u4f7f\u5f3a\u5316\u5b66\u4e60\u6210\u529f\u5e94\u7528\u7684\u5173\u952e\u7814\u7a76\u95ee\u9898\uff1b(2) \u62a5\u544a\u5728\u8fd9\u4e9b\u5173\u952e\u95ee\u9898\u4e0a\u7684\u8fdb\u5c55\uff1b3\uff09\u8ba9\u9886\u57df\u4e13\u5bb6\u5206\u4eab\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u5230\u5b9e\u9645\u573a\u666f\u7684\u6210\u529f\u6545\u4e8b\uff0c\u4ee5\u53ca\u5728\u5e94\u7528\u8fc7\u7a0b\u4e2d\u83b7\u5f97\u7684\u6d1e\u5bdf\u9886\u609f\u3002<\/p>\n

\u4e13\u520a\u9080\u8bf7\u89e3\u51b3\u5f3a\u5316\u5b66\u4e60\u843d\u5730\u76f8\u5173\u7684\u95ee\u9898\uff0c\u628a\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u6210\u529f\u5730\u5e94\u7528\u4e8e\u5b9e\u9645\u95ee\u9898\u7684\u7a3f\u4ef6\u3002\u4e13\u520a\u611f\u5174\u8da3\u7684\u8bdd\u9898\u6bd4\u8f83\u5e7f\u6cdb\uff0c\u5305\u62ec\u4f46\u4e0d\u9650\u4e8e\u4ee5\u4e0b\u7684\u8bdd\u9898\uff1a<\/p>\n

    \n
  1. \u5b9e\u7528\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\uff0c\u5305\u62ec\u6240\u6709\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u7684\u6311\u6218\uff0c\u5c24\u5176\u662f\u5728\u5b9e\u9645\u5e94\u7528\u4e2d\u9047\u5230\u7684\u6311\u6218\uff1b<\/li>\n
  2. \u5b9e\u9645\u95ee\u9898\uff1a\u6cdb\u5316\u6027\u3001\u91c7\u6837\/\u65f6\u95f4\/\u7a7a\u95f4\u7684\u6548\u7387\u3001\u63a2\u7d22\u4e0e\u5229\u7528\u3001\u5956\u8d4f\u51fd\u6570\u7684\u8be6\u8ff0(specification)\u4e0e\u4fee\u6574(shaping)\u3001\u53ef\u6269\u5c55\u6027\u3001\u57fa\u4e8e\u6a21\u578b\u7684\u5b66\u4e60(\u6a21\u578b\u7684\u6548\u9a8c\u4e0e\u6a21\u578b\u8bef\u5dee\u4f30\u8ba1)\u3001\u5148\u9a8c\u77e5\u8bc6\u3001\u5b89\u5168\u6027\u3001\u8d23\u4efb\u3001\u53ef\u89e3\u91ca\u6027\u3001\u53ef\u590d\u5236\u6027\u3001\u8c03\u8d85\u53c2\u6570\u7b49\u7b49\uff1b<\/li>\n
  3. \u5e94\u7528\u65b9\u5411\uff1a\u63a8\u8350\u7cfb\u7edf\u3001\u5e7f\u544a\u3001\u804a\u5929\u7cfb\u7edf\u3001\u5546\u4e1a\u3001\u91d1\u878d\u3001\u5065\u5eb7\u533b\u7597\u3001\u6559\u80b2\u3001\u673a\u5668\u4eba\u3001\u81ea\u52a8\u9a7e\u9a76\u3001\u4ea4\u901a\u3001\u80fd\u6e90\u3001\u5316\u5b66\u5408\u6210\u3001\u836f\u7269\u8bbe\u8ba1\u3001\u5de5\u4e1a\u63a7\u5236\u3001\u7f8e\u672f\u3001\u97f3\u4e50\u3001\u4ee5\u53ca\u5176\u5b83\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\u95ee\u9898\u3002
    \u4e13\u520a\u5185\u5bb9\u4f1a\u57282021\u5e74\u521d\u5b8c\u6210\u7f16\u8f91\uff0c\u656c\u8bf7\u5173\u6ce8\u3002<\/li>\n<\/ol>\n

    \u5341\u3001\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a<\/h3>\n

    \u57282019\u5e74\u56fd\u9645\u673a\u5668\u5b66\u4e60\u5927\u4f1a(International Conference on Machine Learning, ICML)\u4e0a\uff0c\u7b14\u8005\u4e0eAlborz Geramifard (\u8138\u4e66), Lihong Li (\u8c37\u6b4c), Csaba Szepesvari (Deepmind & \u963f\u5c14\u4f2f\u5854\u5927\u5b66), Tao Wang (\u82f9\u679c) \u5171\u540c\u7ec4\u7ec7\u4e3e\u529e\u4e86\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a(Reinforcement Learning for Real Life, RL4RealLife). \u5de5\u4e1a\u754c\u548c\u5b66\u672f\u754c\u5bf9\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u611f\u5174\u8da3\u7684\u7814\u53d1\u4eba\u5458\u96c6\u805a\u4e00\u5802\uff0c\u63a2\u8ba8\u5982\u4f55\u5c06\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u3002<\/p>\n

    \u7814\u8ba8\u4f1a\u6709\u4e09\u4e2a\u4e00\u6d41\u7684\u7279\u9080\u62a5\u544a\uff1a<\/p>\n

    AlphaStar\uff1a\u7406\u89e3\u661f\u9645\u4e89\u9738\u3002\u62a5\u544a\u4eba\uff1aDavid Silver
    \u5982\u4f55\u5f00\u5c55\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7684\u9769\u547d\uff1f\u62a5\u544a\u4eba\uff1aJohn Langford
    \u63a8\u8350\u7cfb\u7edf\u4e2d\u7684\u5f3a\u5316\u5b66\u4e60\u3002\u62a5\u544a\u4eba\uff1aCraig Boutilier
    \u9876\u7ea7\u4e13\u5bb6\u7ec4\u6210\u4e86\u4e13\u9898\u8ba8\u8bba\u5c0f\u7ec4: Craig Boutilier (\u8c37\u6b4c\u7814\u7a76\u9662), Emma Brunskill (\u65af\u5766\u798f\u5927\u5b66), Chelsea Finn (\u8c37\u6b4c\u7814\u7a76\u9662, \u65af\u5766\u798f\u5927\u5b66, \u52a0\u5dde\u5927\u5b66\u4f2f\u514b\u5229\u5206\u6821), Mohammad Ghavamzadeh (\u8138\u4e66\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u9662), John Langford (\u5fae\u8f6f\u7814\u7a76\u9662), David Silver (Deepmind), \u548cPeter Stone (\u5f97\u514b\u8428\u65af\u5927\u5b66\u5965\u65af\u4e01\u5206\u6821, Cogitai). \u8ba8\u8bba\u4e86\u91cd\u8981\u7684\u95ee\u9898\uff0c\u6bd4\u5982\uff0c\u5f3a\u5316\u5b66\u4e60\u54ea\u4e9b\u65b9\u5411\u6700\u6709\u524d\u666f\uff1f\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u5230\u5b9e\u9645\u573a\u666f\u7684\u4e00\u822c\u6027\u539f\u5219\u662f\u4ec0\u4e48\uff1f\u7b49\u7b49\u3002<\/p>\n

    \u6709\u5927\u7ea660\u7bc7\u6d77\u62a5\/\u8bba\u6587\u3002\u9009\u62e9\u4e864\u7bc7\u6700\u4f73\u8bba\u6587\uff1a<\/p>\n

    Chow et al. \u8ba8\u8bba\u4e86\u8fde\u7eed\u52a8\u4f5c\u95ee\u9898\u91cc\u7684\u5b89\u5168\u6027
    Dulac-Arnold et al. \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u76849\u4e2a\u6311\u6218
    Gauci et al. \u8ba8\u8bba\u4e86\u8138\u4e66\u7684\u5f00\u6e90\u5e94\u7528\u5f3a\u5316\u5b66\u4e60\u5e73\u53f0Horizon
    Mao et al. \u8ba8\u8bba\u4e86\u589e\u5f3a\u8ba1\u7b97\u673a\u7cfb\u7edf\u5f00\u653e\u5e73\u53f0Park
    \u6b22\u8fce\u8bbf\u95ee\u7814\u8ba8\u4f1a\u7f51\u7ad9\uff1b\u6709\u7279\u9080\u62a5\u544a\u7684\u89c6\u9891\u94fe\u63a5\u3001\u5927\u90e8\u5206\u8bba\u6587\u548c\u4e00\u90e8\u5206\u6d77\u62a5\uff1b\u7f51\u5740\u4e3a\uff1ahttps:\/\/sites.google.com\/view\/RL4RealLife2019.<\/p>\n

    2020\u5e746\u6708\uff0c\u7b14\u8005\u4e0eGabriel Dulac-Arnold (\u8c37\u6b4c), Alborz Geramifard (\u8138\u4e66), Omer Gottesman (\u54c8\u4f5b\u5927\u5b66),Lihong Li (\u8c37\u6b4c), Anusha Nagabandi (\u52a0\u5dde\u5927\u5b66\u4f2f\u514b\u5229\u5206\u6821), Zhiwei (Tony) Qin (\u6ef4\u6ef4), Csaba Szepesvari (Deepmind & \u963f\u5c14\u4f2f\u5854\u5927\u5b66) \u5728\u7f51\u4e0a\u5171\u540c\u7ec4\u7ec7\u4e3e\u529e\u4e86\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a\u3002\u4f1a\u8bae\u9080\u8bf7\u4e86\u9876\u7ea7\u4e13\u5bb6\u7ec4\u6210\u4e86\u4e24\u4e2a\u4e13\u9898\u8ba8\u8bba\u5c0f\u7ec4\uff0c\u5206\u522b\u8ba8\u8bba\u201c\u5f3a\u5316\u5b66\u4e60+\u5065\u5eb7\u533b\u7597\u201d\u548c\u201c\u4e00\u822c\u6027\u5f3a\u5316\u5b66\u4e60\u201d\u4e24\u4e2a\u4e13\u9898\uff1b\u4f1a\u8bae\u670930\u591a\u7bc7\u6d77\u62a5\/\u8bba\u6587\u3002<\/p>\n

    \u5f3a\u5316\u5b66\u4e60+\u5065\u5eb7\u533b\u7597\u4e13\u9898\u8ba8\u8bba\u7531Finale Doshi-Velez (\u54c8\u4f5b\u5927\u5b66), Niranjani Prasad (\u666e\u6797\u65af\u987f\u5927\u5b66), Suchi Saria (\u7ea6\u7ff0\u970d\u666e\u91d1\u65af\u5927\u5b66)\u7ec4\u6210, \u7531Susan Murphy (\u54c8\u4f5b\u5927\u5b66)\u4e3b\u6301\uff0c\u7531Omer Gottesman (\u54c8\u4f5b\u5927\u5b66)\u505a\u5f00\u573a\u53ca\u603b\u7ed3\u4e3b\u6301\u3002<\/p>\n

    \u4e00\u822c\u6027\u5f3a\u5316\u5b66\u4e60\u4e13\u9898\u8ba8\u8bba\u7531Ed Chi (\u8c37\u6b4c), Chelsea Finn (\u65af\u5766\u798f\u5927\u5b66), Jason Gauci (\u8138\u4e66)\u7ec4\u6210, \u7531Peter Stone (\u5f97\u514b\u8428\u65af\u5927\u5b66&\u7d22\u5c3c)\u4e3b\u6301, \u7531Lihong Li (\u8c37\u6b4c)\u505a\u5f00\u573a\u53ca\u603b\u7ed3\u4e3b\u6301\u3002<\/p>\n

    \u66f4\u591a\u4fe1\u606f\u53c2\u89c1\u4f1a\u8bae\u7f51\u5740\uff1ahttps:\/\/sites.google.com\/view\/RL4RealLife.<\/p>\n

    \u5341\u4e00\u3001\u5f3a\u5316\u5b66\u4e60\u8d44\u6599<\/h3>\n

    \u5f3a\u5316\u5b66\u4e60\u7684\u5b66\u4e60\u8d44\u6599\u4e2d\uff0cSutton & Barto \u7684\u5f3a\u5316\u5b66\u4e60\u6559\u79d1\u4e66\u662f\u5fc5\u8bfb\u7684\uff0cDavid Silver\u7684UCL\u8bfe\u7a0b\u662f\u7ecf\u5178\uff0c\u963f\u5c14\u4f2f\u5854\u5927\u5b66\u6700\u8fd1\u5728Coursera\u4e0a\u7ebf\u4e86\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\u3002\u5f3a\u5316\u5b66\u4e60\u91cc\u6982\u5ff5\u6bd4\u8f83\u591a\uff0c\u4ed4\u7ec6\u5b66\u4e00\u4e9b\u57fa\u7840\uff0c\u4f1a\u5f88\u6709\u5e2e\u52a9\u3002\u5982\u679c\u6709\u4e00\u5b9a\u6df1\u5ea6\u5b66\u4e60\u80cc\u666f\uff0c\u53ef\u80fd\u53ef\u4ee5\u8003\u8651\u76f4\u63a5\u5b66\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u3002OpenAI Spinning Up\u6bd4\u8f83\u7b80\u6d01\uff0cDeepmind\u4e0eUCL\u5408\u51fa\u4e86\u6df1\u5ea6\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0cUC Berkeley\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\u662f\u9ad8\u7ea7\u8fdb\u9636\u3002\u4e0b\u9762\u5217\u4e86\u8fd9\u51e0\u4e2a\u8d44\u6599\u3002<\/p>\n

    Sutton & Barto RL\u5f3a\u5316\u5b66\u4e60\u6559\u79d1\u4e66\uff0chttp:\/\/www.incompleteideas.net\/book\/the-book-2nd.html
    David Silver\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0chttp:\/\/www0.cs.ucl.ac.uk\/staff\/D.Silver\/web\/Teaching.html
    \u963f\u5c14\u4f2f\u5854\u5927\u5b66\u5728Coursera\u4e0a\u7684\u5f3a\u5316\u5b66\u4e60\u8bfe\uff0chttps:\/\/www.coursera.org\/specializations\/reinforcement-learning
    OpenAI Spinning Up, https:\/\/blog.openai.com\/spinning-up-in-deep-rl\/
    DeepMind & UCL \u7684\u6df1\u5ea6\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0chttps:\/\/www.youtube.com\/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs
    UC Berkeley\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0chttp:\/\/rail.eecs.berkeley.edu\/deeprlcourse\/
    \u5b66\u4e60\u5f3a\u5316\u5b66\u4e60\uff0c\u6709\u5fc5\u8981\u5bf9\u6df1\u5ea6\u5b66\u4e60\u548c\u673a\u5668\u5b66\u4e60\u6709\u4e00\u5b9a\u7684\u4e86\u89e3\u3002\u4e0b\u9762\u63a8\u8350\u51e0\u7bc7\u7efc\u8ff0\u8bba\u6587\u3002<\/p>\n

    LeCun, Bengio and Hinton, Deep Learning, Nature, May 2015
    Jordan and Mitchell, Machine learning: Trends, perspectives, and prospects, Science, July 2015
    Littman, Reinforcement learning improves behaviour from evaluative feedback, Nature, May 2015
    \u5e0c\u671b\u6df1\u5165\u4e86\u89e3\u6df1\u5ea6\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60\uff0cGoodfellow et al. (2016)\u3001Zhang et al. (2019) \u4ecb\u7ecd\u4e86\u6df1\u5ea6\u5b66\u4e60\uff1b\u5468\u5fd7\u534e(2016)\u3001\u674e\u822a(2019)\u4ecb\u7ecd\u4e86\u673a\u5668\u5b66\u4e60\u3002<\/p>\n

    \u5b66\u4e60\u57fa\u672c\u6982\u5ff5\u7684\u540c\u65f6\u5e94\u8be5\u901a\u8fc7\u7f16\u7a0b\u52a0\u6df1\u7406\u89e3\u3002OpenAI Gym\u5f88\u5e38\u7528\uff0chttps:\/\/gym.openai.com.<\/p>\n

    \u4e0b\u9762\u7684Github\u5f00\u6e90\u628aSutton & Barto\u5f3a\u5316\u5b66\u4e60\u4e66\u91cc\u9762\u7684\u4f8b\u5b50\u90fd\u5b9e\u73b0\u4e86\uff0c\u4e5f\u6709\u5f88\u591a\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u7684\u5b9e\u73b0\uff1ahttps:\/\/github.com\/ShangtongZhang\/reinforcement-learning-an-introduction.<\/p>\n

    \u7b14\u8005\u5076\u5c14\u5199\u535a\u5ba2\uff1ahttps:\/\/www.zhihu.com\/people\/yuxili99\/\uff0c\u5728\u77e5\u4e4e\u4e0a\u5f00\u4e86\u5f3a\u5316\u5b66\u4e60\u4e13\u680f\uff1ahttps:\/\/zhuanlan.zhihu.com\/c_. \u5176\u4e2d\u300a\u5f3a\u5316\u5b66\u4e60\u8d44\u6599\u300b\u6536\u96c6\u4e86\u5f88\u591a\u5f3a\u5316\u5b66\u4e60\u53ca\u76f8\u5173\u7684\u8d44\u6599\uff1b\u300a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u573a\u666f\u300b\u6536\u96c6\u4e86\u5f88\u591a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u65b9\u9762\u7684\u8bba\u6587\u3001\u8d44\u6599\u3002<\/p>\n

    \u5341\u4e8c\u3001\u5f3a\u5316\u5b66\u4e60\u7b80\u53f2<\/h3>\n

    \u65e9\u671f\u7684\u5f3a\u5316\u5b66\u4e60\u6709\u4e24\u4e2a\u4e3b\u8981\u7684\u4e30\u5bcc\u7ef5\u957f\u7684\u53d1\u5c55\u7ebf\u7d22\u3002\u4e00\u4e2a\u662f\u6e90\u4e8e\u52a8\u7269\u5b66\u4e60\u7684\u8bd5\u9519\u6cd5\uff1b\u5728\u65e9\u671f\u7684\u4eba\u5de5\u667a\u80fd\u4e2d\u53d1\u5c55\uff0c\u4e0e\u4e8c\u5341\u4e16\u7eaa\u516b\u5341\u5e74\u4ee3\u4fc3\u8fdb\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u590d\u5174\u3002\u53e6\u4e00\u4e2a\u662f\u6700\u4f18\u63a7\u5236\u53ca\u5176\u89e3\u51b3\u65b9\u6848\uff1a\u503c\u51fd\u6570\u548c\u52a8\u6001\u89c4\u5212\u3002\u6700\u4f18\u63a7\u5236\u5927\u90e8\u5206\u6ca1\u6709\u5305\u62ec\u5b66\u4e60\u3002\u8fd9\u4e24\u4e2a\u7ebf\u7d22\u5148\u662f\u5206\u5934\u8fdb\u5c55\uff0c\u5230\u4e8c\u5341\u4e16\u7eaa\u516b\u5341\u5e74\u4ee3\uff0c\u65f6\u5e8f\u5dee\u5206(temporal-difference)\u65b9\u6cd5\u51fa\u73b0\uff0c\u5f62\u6210\u7b2c\u4e09\u6761\u7ebf\u7d22\u3002\u7136\u540e\u51e0\u79cd\u7ebf\u7d22\u4ea4\u7ec7\u878d\u5408\u5230\u4e00\u8d77\uff0c\u53d1\u5c55\u6210\u73b0\u4ee3\u5f3a\u5316\u5b66\u4e60\u3002<\/p>\n

    \u6700\u4f18\u63a7\u5236\u59cb\u4e8e\u4e8c\u5341\u4e16\u7eaa\u4e94\u5341\u5e74\u4ee3\uff0c\u8bbe\u8ba1\u63a7\u5236\u5668\u6765\u4f18\u5316\u52a8\u6001\u7cfb\u7edf\u4e00\u6bb5\u65f6\u95f4\u5185\u884c\u4e3a\u7684\u6027\u80fd\u6307\u6807\u3002\u52a8\u6001\u89c4\u5212\u662f\u6700\u4f18\u63a7\u5236\u7684\u4e00\u4e2a\u89e3\u51b3\u65b9\u6cd5\uff0c\u7531Richard Bellman\u7b49\u4eba\u63d0\u51fa\uff0c\u57fa\u4e8e\u4ee5\u524dHamilton\u548cJacobi\u7684\u7406\u8bba\u3002\u52a8\u6001\u89c4\u5212\u4f7f\u7528\u52a8\u6001\u7cfb\u7edf\u7684\u72b6\u6001\u548c\u503c\u51fd\u6570\uff0c\u6216\u6700\u4f18\u56de\u62a5\u51fd\u6570\uff0c\u6765\u5b9a\u4e49\u4e00\u4e2a\u7b49\u5f0f\uff0c\u73b0\u5728\u88ab\u79f0\u4e3aBellman\u7b49\u5f0f\u3002\u901a\u8fc7\u89e3\u8fd9\u4e2a\u7b49\u5f0f\u7684\u4e00\u7ec4\u65b9\u6cd5\u5219\u88ab\u79f0\u4e3a\u52a8\u6001\u89c4\u5212\u65b9\u6cd5\u3002Bellman\u4e5f\u63d0\u51fa\u79bb\u6563\u968f\u673a\u7248\u7684\u6700\u4f18\u63a7\u5236\u95ee\u9898\uff0c\u65e2\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b(Markov decision processes, MDP). Ronald Howard\u57281960\u5e74\u7ed9MDP\u95ee\u9898\u8bbe\u8ba1\u4e86\u7b56\u7565\u8fed\u4ee3\u65b9\u6cd5\u3002\u8fd9\u4e9b\u90fd\u662f\u73b0\u4ee3\u5f3a\u5316\u5b66\u4e60\u7406\u8bba\u548c\u7b97\u6cd5\u7684\u57fa\u672c\u5143\u7d20\u3002<\/p>\n

    \u4e00\u822c\u8ba4\u4e3a\uff0c\u52a8\u6001\u89c4\u5212\u662f\u89e3\u51b3\u4e00\u822c\u6027\u7684\u968f\u673a\u4f18\u5316\u63a7\u5236\u7684\u552f\u4e00\u65b9\u6cd5\u3002\u52a8\u6001\u89c4\u5212\u4f1a\u9047\u5230\u201c\u7ef4\u5ea6\u707e\u96be\u201d\u95ee\u9898\uff0c\u5c31\u662f\u8bf4\uff0c\u5b83\u7684\u8ba1\u7b97\u590d\u6742\u6027\u968f\u7740\u72b6\u6001\u53d8\u91cf\u7684\u4e2a\u6570\u800c\u6307\u6570\u589e\u957f\u3002\u4e0d\u8fc7\uff0c\u52a8\u6001\u89c4\u5212\u4ecd\u7136\u662f\u6700\u9ad8\u6548\u3001\u5e94\u7528\u6700\u5e7f\u7684\u65b9\u6cd5\u3002\u52a8\u6001\u89c4\u5212\u5df2\u7ecf\u88ab\u6269\u5c55\u5230\u90e8\u5206\u53ef\u89c1\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b(Partially Observable MDP, POMDP)\uff0c\u5f02\u6b65\u65b9\u6cd5\uff0c\u4ee5\u53ca\u5404\u79cd\u5e94\u7528\u3002<\/p>\n

    \u6700\u4f18\u63a7\u5236\u3001\u52a8\u6001\u89c4\u5212\u4e0e\u5b66\u4e60\u7684\u8054\u7cfb\uff0c\u786e\u8ba4\u5f97\u5374\u6bd4\u8f83\u6162\u3002\u53ef\u80fd\u7684\u539f\u56e0\u662f\u8fd9\u4e9b\u9886\u57df\u7531\u4e0d\u540c\u7684\u5b66\u79d1\u5728\u53d1\u5c55\uff0c\u800c\u76ee\u6807\u4e5f\u4e0d\u5c3d\u76f8\u540c\u3002\u4e00\u4e2a\u6d41\u884c\u7684\u89c2\u70b9\u662f\u52a8\u6001\u89c4\u5212\u662f\u79bb\u7ebf\u8ba1\u7b97\u7684\uff0c\u9700\u8981\u51c6\u786e\u7684\u7cfb\u7edf\u6a21\u578b\uff0c\u5e76\u7ed9\u51faBellman\u7b49\u5f0f\u7684\u89e3\u6790\u89e3\u3002\u8fd8\u6709\uff0c\u6700\u7b80\u5355\u7684\u52a8\u6001\u89c4\u5212\u662f\u6309\u65f6\u95f4\u4ece\u540e\u5411\u524d\u8fd0\u7b97\u7684\uff0c\u800c\u5b66\u4e60\u5219\u662f\u4ece\u524d\u5f80\u540e\u7684\uff0c\u8fd9\u6837\uff0c\u5219\u5f88\u96be\u628a\u4e24\u8005\u8054\u7cfb\u8d77\u6765\u3002\u4e8b\u5b9e\u4e0a\uff0c\u65e9\u671f\u7684\u4e00\u4e9b\u7814\u7a76\u5de5\u4f5c\uff0c\u5df2\u7ecf\u628a\u52a8\u6001\u89c4\u5212\u4e0e\u5b66\u4e60\u7ed3\u5408\u8d77\u6765\u4e86\u3002\u800c\u57281989\u5e74\uff0cChris Watkins\u7528MDP\u7684\u5f62\u5f0f\u5b9a\u4e49\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u628a\u52a8\u6001\u89c4\u5212\u548c\u7ebf\u4e0a\u5b66\u4e60\u5b8c\u5168\u7ed3\u5408\u8d77\u6765\uff0c\u4e5f\u5f97\u5230\u5e7f\u6cdb\u63a5\u53d7\u3002\u4e4b\u540e\uff0c\u8fd9\u6837\u7684\u8054\u7cfb\u83b7\u5f97\u8fdb\u4e00\u6b65\u7684\u53d1\u5c55\u3002\u9ebb\u7701\u7406\u5de5\u5b66\u9662\u7684Dimitri Bertsekas\u548cJohn Tsitsiklis\u63d0\u51fa\u4e86\u795e\u7ecf\u5143\u52a8\u6001\u89c4\u5212(neurodynamic programming)\u8fd9\u4e00\u672f\u8bed\uff0c\u7528\u6765\u6307\u4ee3\u52a8\u6001\u89c4\u5212\u4e0e\u795e\u7ecf\u5143\u7f51\u7edc\u7684\u7ed3\u5408\u3002\u73b0\u5728\u8fd8\u5728\u7528\u7684\u53e6\u4e00\u4e2a\u672f\u8bed\u662f\u8fd1\u4f3c\u52a8\u6001\u89c4\u5212(approximate dynamic programming). \u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u5f3a\u5316\u5b66\u4e60\u90fd\u662f\u5728\u89e3\u51b3\u52a8\u6001\u89c4\u5212\u7684\u7ecf\u5178\u95ee\u9898\u3002<\/p>\n

    \u5728\u67d0\u79cd\u610f\u4e49\u4e0a\uff0c\u6700\u4f18\u63a7\u5236\u5c31\u662f\u5f3a\u5316\u5b66\u4e60\u3002\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u4e0e\u6700\u4f18\u63a7\u5236\u95ee\u9898\u7d27\u5bc6\u76f8\u5173\uff0c\u5c24\u5176\u662f\u63cf\u8ff0\u6210MDP\u7684\u968f\u673a\u4f18\u5316\u63a7\u5236\u95ee\u9898\u3002\u8fd9\u6837\uff0c\u6700\u4f18\u63a7\u5236\u7684\u89e3\u51b3\u65b9\u6cd5\uff0c\u6bd4\u5982\u52a8\u6001\u89c4\u5212\uff0c\u4e5f\u662f\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u3002\u5927\u90e8\u5206\u4f20\u7edf\u7684\u6700\u4f18\u63a7\u5236\u65b9\u6cd5\u9700\u8981\u5b8c\u5168\u7684\u7cfb\u7edf\u6a21\u578b\u77e5\u8bc6\uff0c\u8fd9\u6837\u628a\u5b83\u4eec\u770b\u6210\u5f3a\u5316\u5b66\u4e60\u6709\u4e9b\u4e0d\u591f\u81ea\u7136\u3002\u4e0d\u8fc7\uff0c\u8bb8\u591a\u52a8\u6001\u89c4\u5212\u7b97\u6cd5\u662f\u589e\u91cf\u7684\u3001\u8fed\u4ee3\u7684\u3002\u50cf\u5b66\u4e60\u65b9\u6cd5\u4e00\u6837\uff0c\u5b83\u4eec\u901a\u8fc7\u8fde\u7eed\u7684\u8fd1\u4f3c\u9010\u6e10\u8fbe\u5230\u6b63\u786e\u89e3\u3002\u8fd9\u4e9b\u76f8\u4f3c\u6027\u6709\u7740\u6df1\u523b\u7684\u610f\u4e49\uff0c\u800c\u5bf9\u4e8e\u5b8c\u5168\u4fe1\u606f\u548c\u4e0d\u5b8c\u5168\u4fe1\u606f\u7684\u7406\u8bba\u548c\u65b9\u6cd5\u4e5f\u7d27\u5bc6\u76f8\u5173\u3002<\/p>\n

    \u4e0b\u9762\u8ba8\u8bba\u5f3a\u5316\u5b66\u4e60\u65e9\u671f\u53d1\u5c55\u7684\u53e6\u5916\u4e00\u6761\u7ebf\u7d22\uff1a\u8bd5\u9519\u5b66\u4e60\u6cd5\u3002\u8bd5\u9519\u5b66\u4e60\u6cd5\u6700\u65e9\u53ef\u4ee5\u8ffd\u6eaf\u5230\u5341\u4e5d\u4e16\u7eaa\u4e94\u5341\u5e74\u4ee3\u30021911\u5e74\uff0cEdward Thorndike\u7b80\u660e\u5730\u628a\u8bd5\u9519\u5b66\u4e60\u6cd5\u5f53\u6210\u5b66\u4e60\u7684\u539f\u5219\uff1a\u5bf9\u4e8e\u540c\u4e00\u60c5\u51b5\u4e0b\u7684\u51e0\u4e2a\u53cd\u5e94\uff0c\u5728\u5176\u5b83\u56e0\u7d20\u4e00\u6837\u65f6\uff0c\u53ea\u6709\u4f34\u968f\u7740\u6216\u7d27\u968f\u52a8\u7269\u7684\u559c\u60a6\u4e4b\u540e\u7684\u90a3\u4e9b\u53cd\u5e94\uff0c\u624d\u4f1a\u88ab\u66f4\u6df1\u523b\u5730\u4e0e\u5f53\u4e0b\u7684\u60c5\u51b5\u8054\u7cfb\u8d77\u6765\uff0c\u8fd9\u6837\uff0c\u5f53\u8fd9\u4e9b\u53cd\u5e94\u518d\u6b21\u53d1\u751f\uff0c\u518d\u6b21\u53d1\u751f\u7684\u53ef\u80fd\u6027\u4e5f\u66f4\u5927\uff1b\u800c\u53ea\u6709\u4f34\u968f\u7740\u6216\u7d27\u968f\u52a8\u7269\u7684\u4e0d\u9002\u4e4b\u540e\u7684\u90a3\u4e9b\u53cd\u5e94\uff0c\u4e0e\u5f53\u4e0b\u7684\u60c5\u51b5\u8054\u7cfb\u4f1a\u88ab\u524a\u5f31\uff0c\u8fd9\u6837\uff0c\u5f53\u8fd9\u4e9b\u53cd\u5e94\u518d\u6b21\u53d1\u751f\uff0c\u518d\u6b21\u53d1\u751f\u7684\u53ef\u80fd\u6027\u4f1a\u66f4\u5c0f\u3002\u559c\u60a6\u6216\u4e0d\u9002\u7684\u7a0b\u5ea6\u8d8a\u5927\uff0c\u8054\u7cfb\u7684\u52a0\u5f3a\u6216\u51cf\u5f31\u7684\u7a0b\u5ea6\u4e5f\u8d8a\u5927\u3002Thorndike\u79f0\u5176\u4e3a\u201c\u6548\u679c\u5b9a\u5f8b\u201d(Law of Effect), \u56e0\u4e3a\u5b83\u63cf\u8ff0\u4e86\u5f3a\u5316\u4e8b\u4ef6\u5bf9\u9009\u62e9\u52a8\u4f5c\u7684\u503e\u5411\u6027\u7684\u6548\u679c\uff0c\u4e5f\u6210\u4e3a\u8bb8\u591a\u884c\u4e3a\u7684\u57fa\u672c\u539f\u5219\u3002<\/p>\n

    \u201c\u5f3a\u5316\u201d\u8fd9\u4e00\u672f\u8bed\u51fa\u73b0\u4e8e1927\u5e74\u5df4\u6d66\u6d1b\u592b(Pavlov)\u6761\u4ef6\u53cd\u5c04\u8bba\u6587\u7684\u82f1\u8bd1\u672c\uff0c\u665a\u4e8eThorndike\u7684\u6548\u679c\u5b9a\u5f8b\u3002\u5df4\u6d66\u6d1b\u592b\u628a\u5f3a\u5316\u63cf\u8ff0\u6210\uff0c\u5f53\u52a8\u7269\u63a5\u6536\u5230\u523a\u6fc0\uff0c\u4e5f\u5c31\u662f\u5f3a\u5316\u7269\uff0c\u5bf9\u4e00\u79cd\u884c\u4e3a\u6a21\u5f0f\u7684\u52a0\u5f3a\uff0c\u800c\u8fd9\u4e2a\u523a\u6fc0\u4e0e\u53e6\u4e00\u4e2a\u523a\u6fc0\u6216\u53cd\u5e94\u7684\u53d1\u751f\u6709\u5408\u9002\u7684\u65f6\u95f4\u5173\u7cfb\u3002<\/p>\n

    \u5728\u8ba1\u7b97\u673a\u91cc\u5b9e\u73b0\u8bd5\u9519\u6cd5\u5b66\u4e60\u662f\u4eba\u5de5\u667a\u80fd\u65e9\u671f\u7684\u60f3\u6cd5\u4e4b\u4e00\u3002\u57281948\u5e74\uff0c\u963f\u5170\u00b7\u56fe\u7075(Alan Turing)\u63cf\u8ff0\u4e86\u4e00\u4e2a\u201c\u5feb\u4e50-\u75db\u82e6\u7cfb\u7edf\u201d\uff0c\u6839\u636e\u6548\u679c\u5b9a\u5f8b\u8bbe\u8ba1\uff1a\u8fbe\u5230\u4e00\u4e2a\u7cfb\u7edf\u72b6\u6001\u65f6\uff0c\u5982\u679c\u9009\u54ea\u4e2a\u52a8\u4f5c\u8fd8\u6ca1\u6709\u786e\u5b9a\uff0c\u5c31\u6682\u65f6\u968f\u673a\u9009\u4e00\u4e2a\uff0c\u4f5c\u4e3a\u4e34\u65f6\u8bb0\u5f55\u3002\u5f53\u51fa\u73b0\u4e00\u4e2a\u75db\u82e6\u523a\u6fc0\uff0c\u53d6\u6d88\u6240\u6709\u7684\u4e34\u65f6\u8bb0\u5f55\uff1b\u5f53\u51fa\u73b0\u4e00\u4e2a\u5feb\u4e50\u523a\u6fc0\uff0c\u6240\u6709\u7684\u4e34\u65f6\u8bb0\u5f55\u53d8\u6210\u6c38\u4e45\u8bb0\u5f55\u3002<\/p>\n

    1954\u5e74\uff0c\u56fe\u7075\u5956\u83b7\u5f97\u8005\u9a6c\u6587\u00b7\u660e\u65af\u57fa(Marvin Minsky)\u5728\u4ed6\u7684\u535a\u58eb\u8bba\u6587\u91cc\u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u8ba1\u7b97\u6a21\u578b\uff0c\u63cf\u8ff0\u4e86\u4ed6\u642d\u5efa\u7684\u6a21\u62df\u7535\u8def\u673a\u5668\uff0c\u7528\u6765\u6a21\u4eff\u5927\u8111\u4e2d\u53ef\u4ee5\u4fee\u6539\u7684\u7a81\u89e6\u8fde\u63a5\u3002\u4ed6\u4e8e1961\u5e74\u53d1\u8868\u300a\u901a\u5411\u4eba\u5de5\u667a\u80fd\u7684\u51e0\u4e2a\u6b65\u9aa4\u300b(Steps Toward Artificial Intelligence), \u8ba8\u8bba\u4e86\u4e0e\u8bd5\u9519\u5b66\u4e60\u6cd5\u76f8\u5173\u7684\u51e0\u4e2a\u95ee\u9898\uff0c\u5305\u62ec\u9884\u6d4b\u3001\u671f\u671b\u3001\u8fd8\u6709\u88ab\u4ed6\u79f0\u4e3a\u590d\u6742\u5f3a\u5316\u5b66\u4e60\u7cfb\u7edf\u4e2d\u57fa\u672c\u7684\u5956\u8d4f\u5206\u914d\u95ee\u9898\uff1a\u5982\u4f55\u628a\u6210\u529f\u83b7\u5f97\u7684\u5956\u8d4f\u5206\u914d\u7ed9\u53ef\u80fd\u5bfc\u81f4\u6210\u529f\u76f8\u5173\u7684\u90a3\u4e9b\u51b3\u5b9a\uff1f\u8fd9\u4e2a\u95ee\u9898\u4ecd\u7136\u662f\u73b0\u4ee3\u5f3a\u5316\u5b66\u4e60\u7684\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002<\/p>\n

    \u4e8c\u5341\u4e16\u7eaa\u516d\u5341\u5e74\u4ee3\u3001\u4e03\u5341\u5e74\u4ee3\u8bd5\u9519\u5b66\u4e60\u6cd5\u6709\u4e00\u4e9b\u53d1\u5c55\u3002Harry Klopf\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\u5bf9\u8bd5\u9519\u6cd5\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\u7684\u590d\u5174\u505a\u4e86\u91cd\u8981\u8d21\u732e\u3002Klopf\u53d1\u73b0\uff0c\u5f53\u7814\u7a76\u4eba\u5458\u4e13\u95e8\u5173\u6ce8\u76d1\u7763\u5b66\u4e60\u65f6\uff0c\u5219\u4f1a\u9519\u8fc7\u81ea\u9002\u5e94\u884c\u4e3a\u7684\u4e00\u4e9b\u65b9\u9762\u3002\u6309\u7167Klopf\u6240\u8bf4\uff0c\u884c\u4e3a\u7684\u5feb\u4e50\u65b9\u9762\u88ab\u9519\u8fc7\u4e86\uff0c\u800c\u8fd9\u9a71\u52a8\u4e86\u4ece\u73af\u5883\u6210\u529f\u83b7\u5f97\u7ed3\u679c\uff0c\u63a7\u5236\u73af\u5883\u5411\u5e0c\u671b\u7684\u7ed3\u679c\u53d1\u5c55\uff0c\u800c\u8fdc\u79bb\u4e0d\u5e0c\u671b\u7684\u7ed3\u679c\u3002\u8fd9\u662f\u8bd5\u9519\u6cd5\u5b66\u4e60\u7684\u57fa\u672c\u601d\u60f3\u3002Klopf\u7684\u601d\u60f3\u5bf9\u5f3a\u5316\u5b66\u4e60\u4e4b\u7236Richard Sutton\u548cAndrew Barto\u6709\u6df1\u8fdc\u5f71\u54cd\uff0c\u4f7f\u5f97\u4ed6\u4eec\u6df1\u5165\u8bc4\u4f30\u76d1\u7763\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u7684\u533a\u522b\uff0c\u5e76\u6700\u7ec8\u4e13\u6ce8\u5f3a\u5316\u5b66\u4e60\uff0c\u5305\u62ec\u5982\u4f55\u4e3a\u591a\u5c42\u795e\u7ecf\u5143\u7f51\u7edc\u8bbe\u8ba1\u5b66\u4e60\u7b97\u6cd5\u3002<\/p>\n

    \u73b0\u5728\u8ba8\u8bba\u5f3a\u5316\u5b66\u4e60\u53d1\u5c55\u7684\u7b2c\u4e09\u4e2a\u7ebf\u7d22\uff0c\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u3002\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u57fa\u4e8e\u5bf9\u540c\u4e00\u4e2a\u91cf\u5728\u65f6\u95f4\u4e0a\u76f8\u8fde\u7684\u4f30\u8ba1\uff0c\u6bd4\u5982\uff0c\u56f4\u68cb\u4f8b\u5b50\u4e2d\u8d62\u68cb\u7684\u6982\u7387\u3002\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u662f\u5f3a\u5316\u5b66\u4e60\u4e2d\u4e00\u4e2a\u65b0\u7684\u72ec\u7279\u7684\u65b9\u6cd5\u3002<\/p>\n

    \u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u90e8\u5206\u4e0a\u8d77\u6e90\u4e8e\u52a8\u7269\u5b66\u4e60\u5fc3\u7406\u5b66\uff0c\u5c24\u5176\u662f\u6b21\u8981\u5f3a\u5316\u7269\u7684\u6982\u5ff5\u3002\u6b21\u8981\u5f3a\u5316\u7269\u4e0e\u50cf\u98df\u7269\u548c\u75db\u82e6\u8fd9\u6837\u7684\u4e3b\u8981\u5f3a\u5316\u7269\u76f8\u4f34\u800c\u6765\uff0c\u6240\u4ee5\u4e5f\u5c31\u6709\u76f8\u5e94\u7684\u5f3a\u5316\u7279\u70b9\u3002\u660e\u65af\u57fa\u4e8e1954\u5e74\u610f\u8bc6\u5230\u8fd9\u6837\u7684\u5fc3\u7406\u5b66\u539f\u5219\u53ef\u80fd\u5bf9\u4eba\u5de5\u5b66\u4e60\u7cfb\u7edf\u7684\u91cd\u8981\u610f\u4e49\uff1b\u4ed6\u53ef\u80fd\u662f\u7b2c\u4e00\u4f4d\u30021959\u5e74\uff0cArthur Samuel\u5728\u5176\u8457\u540d\u7684\u56fd\u9645\u8df3\u68cb\u7a0b\u5e8f\u4e2d\uff0c\u7b2c\u4e00\u6b21\u63d0\u51fa\u5e76\u5b9e\u73b0\u4e86\u5305\u62ec\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7684\u5b66\u4e60\u65b9\u6cd5\u3002Samuel\u53d7\u514b\u52b3\u5fb7\u00b7\u9999\u519c(Claude Shannon)1950\u5e74\u5de5\u4f5c\u7684\u542f\u53d1\uff0c\u53d1\u73b0\u8ba1\u7b97\u673a\u7a0b\u5e8f\u53ef\u4ee5\u7528\u8bc4\u4f30\u51fd\u6570\u73a9\u56fd\u9645\u8c61\u68cb\uff0c\u68cb\u827a\u4e5f\u53ef\u4ee5\u901a\u8fc7\u5728\u7ebf\u4fee\u6539\u8fd9\u4e2a\u8bc4\u4f30\u51fd\u6570\u6765\u63d0\u9ad8\u3002\u660e\u65af\u57fa\u4e8e1961\u5e74\u6df1\u5165\u8ba8\u8bbaSamuel\u7684\u65b9\u6cd5\u4e0e\u6b21\u8981\u5f3a\u5316\u7269\u7684\u8054\u7cfb\u3002Klopf\u57281972\u5e74\u628a\u8bd5\u9519\u5b66\u4e60\u6cd5\u4e0e\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u8054\u7cfb\u8d77\u6765\u3002<\/p>\n

    Sutton\u57281978\u5e74\u8fdb\u4e00\u6b65\u7814\u7a76Klopf\u7684\u60f3\u6cd5\uff0c\u5c24\u5176\u662f\u4e0e\u52a8\u7269\u5b66\u4e60\u7684\u8054\u7cfb\uff0c\u901a\u8fc7\u8fde\u7eed\u65f6\u95f4\u9884\u6d4b\u7684\u53d8\u5316\u6765\u5b9a\u4e49\u5b66\u4e60\u89c4\u5219\u3002Sutton\u548cBarto\u7ee7\u7eed\u6539\u8fdb\u8fd9\u4e9b\u60f3\u6cd5\uff0c\u63d0\u51fa\u4e86\u57fa\u4e8e\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7684\u7ecf\u5178\u6761\u4ef6\u53cd\u5c04\u5fc3\u7406\u5b66\u6a21\u578b\u3002\u540c\u65f6\u671f\u6709\u4e0d\u5c11\u76f8\u5173\u5de5\u4f5c\uff1b\u4e00\u4e9b\u795e\u7ecf\u79d1\u5b66\u6a21\u578b\u4e5f\u53ef\u4ee5\u7528\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u6765\u89e3\u91ca\u3002<\/p>\n

    Sutton\u548cBarto\u4e8e1981\u5e74\u63d0\u51fa\u4e86\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005\u4f53\u7cfb\u7ed3\u6784\uff0c\u628a\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u4e0e\u8bd5\u9519\u5b66\u4e60\u7ed3\u5408\u8d77\u6765\u3002Sutton1984\u5e74\u7684\u535a\u58eb\u8bba\u6587\u6df1\u5165\u8ba8\u8bba\u4e86\u8fd9\u4e2a\u65b9\u6cd5\u3002Sutton\u4e8e1988\u5e74\u628a\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u4e0e\u63a7\u5236\u5206\u5f00\uff0c\u628a\u5b83\u5f53\u505a\u4e00\u79cd\u901a\u7528\u7684\u9884\u6d4b\u65b9\u6cd5\u3002\u90a3\u7bc7\u8bba\u6587\u4e5f\u63d0\u51fa\u4e86\u591a\u6b65\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7b97\u6cd5\u3002<\/p>\n

    \u57281989\u5e74\uff0cChris Watkins\u63d0\u51faQ\u5b66\u4e60\uff0c\u628a\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u3001\u6700\u4f18\u63a7\u5236\u3001\u8bd5\u9519\u5b66\u4e60\u6cd5\u4e09\u4e2a\u7ebf\u7d22\u5b8c\u5168\u878d\u5408\u5230\u4e00\u8d77\u3002\u8fd9\u65f6\uff0c\u5f00\u59cb\u5728\u673a\u5668\u5b66\u4e60\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\u51fa\u73b0\u5927\u91cf\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u9762\u7684\u7814\u7a76\u30021992\u5e74\uff0c<\/p>\n

    Gerry Tesauro\u6210\u529f\u5730\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u548c\u795e\u7ecf\u5143\u7f51\u7edc\u8bbe\u8ba1\u897f\u6d0b\u53cc\u9646\u68cb(Backgammon)\u7684TD-Gammon\u7b97\u6cd5\uff0c\u8fdb\u4e00\u6b65\u589e\u52a0\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u70ed\u5ea6\u3002<\/p>\n

    Sutton\u548cBarto\u4e8e1998\u5e74\u53d1\u8868\u300a\u5f3a\u5316\u5b66\u4e60\u4ecb\u7ecd\u300b\u4e4b\u540e\uff0c\u795e\u7ecf\u79d1\u5b66\u7684\u4e00\u4e2a\u5b50\u9886\u57df\u4e13\u6ce8\u4e8e\u7814\u7a76\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u4e0e\u795e\u7ecf\u7cfb\u7edf\u4e2d\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u800c\u8fd9\u5f52\u529f\u4e8e\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7b97\u6cd5\u7684\u884c\u4e3a\u4e0e\u5927\u8111\u4e2d\u751f\u6210\u591a\u5df4\u80fa\u7684\u795e\u7ecf\u5143\u7684\u6d3b\u52a8\u4e4b\u95f4\u795e\u79d8\u7684\u76f8\u4f3c\u6027\u3002\u5f3a\u5316\u5b66\u4e60\u8fd8\u6709\u6570\u4e0d\u80dc\u6570\u7684\u8fdb\u5c55\u3002<\/p>\n

    \u6700\u8fd1\uff0c\u968f\u7740DQN\u7b97\u6cd5\u7684\u51fa\u73b0\u4ee5\u53caAlphaGo\u7684\u5de8\u5927\u6210\u529f\uff0c\u5f3a\u5316\u5b66\u4e60\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u4e5f\u51fa\u73b0\u4e86\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8fd9\u4e00\u5b50\u9886\u57df\u3002\u8fd9\u6837\uff0c\u5f3a\u5316\u5b66\u4e60\u7b80\u53f2\u5c31\u4e0e\u524d\u9762\u7684\u4ecb\u7ecd\u8854\u63a5\u8d77\u6765\u4e86\u3002<\/p>\n

    \u5341\u4e09\u3001\u5f3a\u5316\u5b66\u4e60\u65f6\u4ee3\u6b63\u5728\u5230\u6765<\/h3>\n

    \u5f3a\u5316\u5b66\u4e60\u662f\u4e00\u7c7b\u4e00\u822c\u6027\u7684\u5b66\u4e60\u3001\u9884\u6d4b\u3001\u51b3\u7b56\u65b9\u6cd5\u6846\u67b6\u3002\u5982\u679c\u4e00\u4e2a\u95ee\u9898\u53ef\u4ee5\u63cf\u8ff0\u6210\u6216\u8f6c\u5316\u6210\u5e8f\u5217\u51b3\u7b56\u95ee\u9898\uff0c\u53ef\u4ee5\u5bf9\u72b6\u6001\u3001\u52a8\u4f5c\u3001\u5956\u8d4f\u8fdb\u884c\u5b9a\u4e49\uff0c\u90a3\u4e48\u5f3a\u5316\u5b66\u4e60\u5f88\u53ef\u80fd\u53ef\u4ee5\u5e2e\u52a9\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u5f3a\u5316\u5b66\u4e60\u6709\u53ef\u80fd\u5e2e\u52a9\u81ea\u52a8\u5316\u3001\u6700\u4f18\u5316\u624b\u52a8\u8bbe\u8ba1\u7684\u7b56\u7565\u3002<\/p>\n

    \u5f3a\u5316\u5b66\u4e60\u8003\u8651\u5e8f\u5217\u95ee\u9898\uff0c\u5177\u6709\u957f\u8fdc\u773c\u5149\uff0c\u8003\u8651\u957f\u671f\u56de\u62a5\uff1b\u800c\u76d1\u7763\u5b66\u4e60\u4e00\u822c\u8003\u8651\u4e00\u6b21\u6027\u7684\u95ee\u9898\uff0c\u5173\u6ce8\u77ed\u671f\u6548\u76ca\uff0c\u8003\u8651\u5373\u65f6\u56de\u62a5\u3002\u5f3a\u5316\u5b66\u4e60\u7684\u8fd9\u79cd\u957f\u8fdc\u773c\u5149\u5bf9\u5f88\u591a\u95ee\u9898\u627e\u5230\u6700\u4f18\u89e3\u975e\u5e38\u5173\u952e\u3002\u6bd4\u5982\uff0c\u5728\u6700\u77ed\u8def\u5f84\u7684\u4f8b\u5b50\u4e2d\uff0c\u5982\u679c\u53ea\u8003\u8651\u6700\u8fd1\u90bb\u5c45\u8282\u70b9\uff0c\u5219\u53ef\u80fd\u65e0\u6cd5\u627e\u5230\u6700\u77ed\u8def\u5f84\u3002<\/p>\n

    David Silver\u535a\u58eb\u662fAlphaGo\u7684\u6838\u5fc3\u7814\u53d1\u4eba\u5458\uff0c\u4ed6\u63d0\u51fa\u8fd9\u6837\u7684\u5047\u8bbe\uff1a\u4eba\u5de5\u667a\u80fd=\u5f3a\u5316\u5b66\u4e60+\u6df1\u5ea6\u5b66\u4e60\u3002Russell\u548cNorvig\u7684\u7ecf\u5178\u4eba\u5de5\u667a\u80fd\u6559\u6750\u91cc\u63d0\u5230\uff1a\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u8bf4\u5305\u62ec\u4e86\u6574\u4e2a\u4eba\u5de5\u667a\u80fd\u3002\u6709\u7814\u7a76\u8868\u660e\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u4e2d\u4efb\u4f55\u53ef\u4ee5\u8ba1\u7b97\u7684\u95ee\u9898\uff0c\u90fd\u53ef\u4ee5\u8868\u8fbe\u6210\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u3002<\/p>\n

    \u672c\u4e66\u524d\u9762\u9996\u5148\u4ecb\u7ecd\u4e86\u5f3a\u5316\u5b66\u4e60\uff0c\u7136\u540e\u4ecb\u7ecd\u4e86\u5f3a\u5316\u5b66\u4e60\u5728\u6e38\u620f\u3001\u63a8\u8350\u7cfb\u7edf\u3001\u8ba1\u7b97\u673a\u7cfb\u7edf\u3001\u5065\u5eb7\u533b\u7597\u3001\u6559\u80b2\u3001\u91d1\u878d\u3001\u673a\u5668\u4eba\u3001\u4ea4\u901a\u3001\u80fd\u6e90\u3001\u5236\u9020\u7b49\u9886\u57df\u7684\u4e00\u4e9b\u5e94\u7528\u3002\u5e94\u8be5\u8bf4\uff0c\u8fd9\u91cc\u7684\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5f88\u591a\u5de5\u4f5c\u3001\u5f88\u591a\u65b9\u5411\u6ca1\u6709\u8ba8\u8bba\uff0c\u53e6\u5916\u8fd8\u6709\u5f88\u591a\u9886\u57df\u6ca1\u6709\u5305\u62ec\u8fdb\u6765\uff1b\u96be\u514d\u6302\u4e00\u6f0f\u4e07\u3002\u4e0b\u56fe\u4e2d\u63cf\u8ff0\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u5e94\u7528\u9886\u57df\u53ca\u65b9\u5411\u3002\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u592a\u5e7f\u4e86\u3002
    \"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c
    \u5f3a\u5316\u5b66\u4e60\u5728\u8ba1\u7b97\u673a\u7cfb\u7edf\u4e2d\u7684\u5404\u4e2a\u65b9\u5411\uff0c\u4ece\u5e95\u5c42\u7684\u82af\u7247\u8bbe\u8ba1\u3001\u786c\u4ef6\u7cfb\u7edf\uff0c\u5230\u64cd\u4f5c\u7cfb\u7edf\u3001\u7f16\u8bd1\u7cfb\u7edf\u3001\u6570\u636e\u5e93\u7ba1\u7406\u7cfb\u7edf\u7b49\u8f6f\u4ef6\u7cfb\u7edf\uff0c\u5230\u4e91\u8ba1\u7b97\u5e73\u53f0\u3001\u901a\u4fe1\u7f51\u7edc\u7cfb\u7edf\u7b49\u57fa\u7840\u8bbe\u65bd\uff0c\u5230\u6e38\u620f\u5f15\u64ce\u3001\u63a8\u8350\u7cfb\u7edf\u7b49\u5e94\u7528\u7a0b\u5e8f\uff0c\u5230\u8ba1\u7b97\u673a\u89c6\u89c9\u3001\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u673a\u5668\u5b66\u4e60\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u672c\u8eab\uff0c\u90fd\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002<\/p>\n

    \u5bf9\u4e8e\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\uff0c\u672c\u4e66\u6709\u6240\u6d89\u53ca\uff0c\u6bd4\u5982\u6e38\u620f\u4e2d\u6d89\u53ca\u5fc3\u7406\u5b66\u3001\u8bbe\u8ba1\u827a\u672f\u7b49\uff0c\u800c\u673a\u5668\u4eba\u3001\u4ea4\u901a\u3001\u80fd\u6e90\u3001\u5236\u9020\u7b49\u4e0e\u5de5\u7a0b\u5bc6\u5207\u76f8\u5173\u3002\u5e94\u8be5\u8bf4\uff0c\u5bf9\u4e8e\u5f3a\u5316\u5b66\u4e60\u5728\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\u7b49\u65b9\u9762\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u9886\u57df\u5bf9\u5f3a\u5316\u5b66\u4e60\u7684\u53cd\u54fa\uff0c\u672c\u4e66\u7684\u6d89\u730e\u6709\u9650\u3002<\/p>\n

    \u81ea\u7136\u79d1\u5b66\u53ca\u5de5\u7a0b\u7684\u95ee\u9898\uff0c\u4e00\u822c\u6bd4\u8f83\u5ba2\u89c2\uff0c\u6709\u6807\u51c6\u7b54\u6848\uff0c\u5bb9\u6613\u8bc4\u4f30\u3002\u5982\u679c\u6709\u6a21\u578b\u3001\u6bd4\u8f83\u51c6\u786e\u7684\u4eff\u771f\u3001\u6216\u5927\u91cf\u6570\u636e\uff0c\u5f3a\u5316\u5b66\u4e60\/\u673a\u5668\u5b66\u4e60\u5c31\u6709\u5e0c\u671b\u89e3\u51b3\u95ee\u9898\u3002AlphaGo\u662f\u8fd9\u79cd\u60c5\u51b5\u3002\u7ec4\u5408\u4f18\u5316\u3001\u8fd0\u7b79\u5b66\u3001\u6700\u4f18\u63a7\u5236\u3001\u836f\u5b66\u3001\u5316\u5b66\u3001\u57fa\u56e0\u7b49\u65b9\u5411\uff0c\u57fa\u672c\u7b26\u5408\u8fd9\u79cd\u60c5\u51b5\u3002\u793e\u4f1a\u79d1\u5b66\u53ca\u827a\u672f\u95ee\u9898\uff0c\u4e00\u822c\u5305\u542b\u4eba\u7684\u56e0\u7d20\uff0c\u4f1a\u53d7\u5fc3\u7406\u5b66\u3001\u884c\u4e3a\u79d1\u5b66\u7b49\u5f71\u54cd\uff0c\u4e00\u822c\u6bd4\u8f83\u4e3b\u89c2\uff0c\u4e0d\u4e00\u5b9a\u6709\u6807\u51c6\u7b54\u6848\uff0c\u4e0d\u4e00\u5b9a\u5bb9\u6613\u8bc4\u4f30\u3002\u6e38\u620f\u8bbe\u8ba1\u53ca\u8bc4\u4f30\u3001\u6559\u80b2\u7b49\u57fa\u672c\u7b26\u5408\u8fd9\u79cd\u60c5\u51b5\u3002\u5185\u5728\u52a8\u673a\u7b49\u5fc3\u7406\u5b66\u6982\u5ff5\u4e3a\u5f3a\u5316\u5b66\u4e60\/\u4eba\u5de5\u667a\u80fd\u4e0e\u793e\u4f1a\u79d1\u5b66\u53ca\u827a\u672f\u4e4b\u95f4\u642d\u5efa\u4e86\u8054\u7cfb\u7684\u6865\u6881\u3002<\/p>\n

    \u6df1\u5ea6\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u5206\u522b\u4e8e2013\u5e74\u548c2017\u5e74\u88ab\u300a\u9ebb\u7701\u7406\u5de5\u5b66\u9662\u79d1\u6280\u8bc4\u8bba\u300b\u8bc4\u4e3a\u5f53\u5e7410\u9879\u7a81\u7834\u6027\u6280\u672f\u4e4b\u4e00\u3002\u6df1\u5ea6\u5b66\u4e60\u5df2\u7ecf\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5f3a\u5316\u5b66\u4e60\u4f1a\u5728\u5b9e\u9645\u5e94\u7528\u573a\u666f\u4e2d\u53d1\u6325\u8d8a\u6765\u8d8a\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5f3a\u5316\u5b66\u4e60\u5df2\u7ecf\u88ab\u6210\u529f\u5e94\u7528\u4e8e\u6e38\u620f\u3001\u63a8\u8350\u7cfb\u7edf\u7b49\u9886\u57df\uff0c\u4e5f\u53ef\u80fd\u5df2\u7ecf\u6210\u529f\u5e94\u7528\u4e8e\u91cf\u5316\u91d1\u878d\u4e2d\u3002\u76ee\u524d\uff0c\u5f3a\u5316\u5b66\u4e60\u53ef\u80fd\u8fd8\u6ca1\u6709\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u67d0\u4e9b\u573a\u666f\u7684\u4ea7\u54c1\u548c\u670d\u52a1\u4e2d\uff1b\u6211\u4eec\u4e5f\u5f88\u53ef\u80fd\u9700\u8981\u5bf9\u4e0d\u540c\u60c5\u51b5\u505a\u4e0d\u540c\u7684\u5206\u6790\u3002\u4e0d\u8fc7\uff0c\u5982\u679c\u8003\u8651\u957f\u671f\u56de\u62a5\uff0c\u73b0\u5728\u5f88\u53ef\u80fd\u662f\u57f9\u517b\u3001\u6559\u80b2\u3001\u5f15\u9886\u5f3a\u5316\u5b66\u4e60\u5e02\u573a\u7684\u7edd\u4f73\u65f6\u673a\u3002\u6211\u4eec\u4f1a\u770b\u5230\u6df1\u5ea6\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u5927\u653e\u5f02\u5f69\u3002<\/p>\n

    \u5341\u56db\u3001\u6ce8\u91ca\u53c2\u8003\u6587\u732e<\/h3>\n

    Sutton and Barto (2018) \u662f\u5f3a\u5316\u5b66\u4e60\u7684\u9996\u9009\u6559\u6750\uff0c\u800c\u4e14\u5199\u7684\u5f88\u76f4\u89c2\u3002Szepesvari (2010) \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u3002 Bertsekas (2019) \u4ecb\u7ecd\u4e86\u5f3a\u5316\u5b66\u4e60\u548c\u6700\u4f18\u63a7\u5236\u3002Bertsekas and Tsitsiklis (1996) \u8ba8\u8bba\u4e86\u795e\u7ecf\u5143\u52a8\u6001\u89c4\u5212\uff0c\u7406\u8bba\u6027\u6bd4\u8f83\u5f3a\u3002Powell (2011) \u8ba8\u8bba\u4e86\u8fd1\u4f3c\u52a8\u6001\u89c4\u5212\uff0c\u53ca\u5176\u5728\u8fd0\u7b79\u5b66\u4e2d\u7684\u5e94\u7528\u3002Powell (2019) \u548c Recht (2019) \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u4e0e\u6700\u4f18\u63a7\u5236\u7684\u5173\u7cfb\u3002Botvinick et al. (2019) \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u4e0e\u8ba4\u77e5\u79d1\u5b66\u3001\u5fc3\u7406\u5b66\u3001\u795e\u7ecf\u79d1\u5b66\u7684\u5173\u7cfb\u3002<\/p>\n

    Csaba Szepesvari\u5728ACM KDD 2020 \u6df1\u5ea6\u5b66\u4e60\u65e5\u4e0a\u5bf9\u5f3a\u5316\u5b66\u4e60\u505a\u4e86\u5168\u65b9\u4f4d\u7684\u6df1\u5165\u5256\u6790\uff0c\u7406\u6e05\u4e86\u8bb8\u591a\u9519\u8bef\u89c2\u5ff5\uff1b\u53c2\u89c1Szepesvari (2020)\u3002\u7b14\u8005\u5bf9\u5176\u505a\u4e86\u53cc\u8bed\u89e3\u8bfb\uff0c\u53c2\u89c1\u300a\u5f3a\u5316\u5b66\u4e60\u7684\u201c\u795e\u8bdd\u201d\u548c\u201c\u9b3c\u8bdd\u201d\u300b\uff0chttps:\/\/zhuanlan.zhihu.com\/p\/\u3002<\/p>\n

    Goodfellow et al. (2016)\u3001Zhang et al. (2020) \u4ecb\u7ecd\u6df1\u5ea6\u5b66\u4e60\u3002\u5468\u5fd7\u534e(2016)\u3001\u674e\u822a(2019)\u4ecb\u7ecd\u673a\u5668\u5b66\u4e60\u3002Russell and Norvig (2009) \u4ecb\u7ecd\u4e86\u4eba\u5de5\u667a\u80fd\u3002 \u5f20\u94b9\u7b49(2020) \u8ba8\u8bba\u7b2c\u4e09\u4ee3\u4eba\u5de5\u667a\u80fd\u3002<\/p>\n

    Mnih et al. (2015) \u4ecb\u7ecd\u4e86\u6df1\u5ea6Q\u7f51\u7edc (Deep Q-Network, DQN)\u3002Badia et al. (2020)\u8ba8\u8bba\u4e86Agent57. Silver et al. (2016) \u4ecb\u7ecd\u4e86AlphaGo. Silver et al. (2017) \u4ecb\u7ecd\u4e86AlphaGo Zero\uff1b\u53ef\u4ee5\u4e0d\u7528\u4eba\u7c7b\u77e5\u8bc6\u5c31\u80fd\u638c\u63e1\u56f4\u68cb\uff0c\u8d85\u8d8a\u4eba\u7c7b\u56f4\u68cb\u6c34\u5e73\u3002Silver et al. (2018) \u4ecb\u7ecd\u4e86AlphaZero, \u628aAlphaGo Zero\u6269\u5c55\u5230\u56fd\u9645\u8c61\u68cb\u548c\u65e5\u672c\u5c06\u68cb\u7b49\u66f4\u591a\u6e38\u620f\u3002Tian et al. (2019) \u5b9e\u73b0\u3001\u5206\u6790\u4e86AlphaZero\uff0c\u5e76\u63d0\u4f9b\u4e86\u5f00\u6e90\u8f6f\u4ef6\u3002Moravcik et al. (2017) \u4ecb\u7ecd\u4e86DeepStack\uff1bBrown and Sandholm (2017) \u4ecb\u7ecd\u4e86Libratus\uff1b\u662f\u4e24\u4e2a\u65e0\u9650\u6ce8\u53cc\u4eba\u5fb7\u5dde\u6251\u514b\u8ba1\u7b97\u673a\u7b97\u6cd5\u3002<\/p>\n

    Vinyals et al. (2019)\u4ecb\u7ecd\u4e86AlphaStar\uff0c\u6253\u8d25\u4e86\u661f\u9645\u4e89\u9738\u4eba\u7c7b\u9ad8\u624b\u3002 Jaderberg et al. (2018) \u4ecb\u7ecd\u4e86\u53d6\u5f97\u4eba\u7c7b\u6c34\u5e73\u7684\u593a\u65d7\u7a0b\u5e8f\u3002 OpenAI (2019)\u4ecb\u7ecd\u4e86OpenAI Five\uff0c\u6253\u8d25\u4e86\u5200\u5854\u4eba\u7c7b\u9ad8\u624b\u3002\u5fae\u8f6f\u5728\u9ebb\u5c06\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55(Suphx)\u3002\u51b0\u58f6(curling)\u88ab\u79f0\u4e3a\u51b0\u4e0a\u56fd\u9645\u8c61\u68cb\uff0c\u6700\u8fd1\u4e5f\u6709\u8fdb\u5c55(Curly)\u3002\u8fd9\u4e9b\u5728\u591a\u73a9\u5bb6\u6e38\u620f\u4e0a\u53d6\u5f97\u7684\u6210\u7ee9\u8868\u660e\u5f3a\u5316\u5b66\u4e60\u5728\u56e2\u961f\u6e38\u620f\u4e2d\u5bf9\u6218\u672f\u548c\u6218\u7565\u5df2\u7ecf\u6709\u4e86\u4e00\u5b9a\u7684\u638c\u63e1\u3002<\/p>\n

    OpenAI (2018)\u4ecb\u7ecd\u4e86\u4eba\u5f62\u673a\u5668\u624bDactyl\uff0c\u7528\u6765\u7075\u5de7\u5730\u64cd\u7eb5\u5b9e\u7269\u3002Hwangbo et al. (2019)\u3001Lee et al. (2020) \u4ecb\u7ecd\u4e86\u7075\u6d3b\u7684\u56db\u8db3\u673a\u5668\u4eba\u3002Peng et al. (2018) \u4ecb\u7ecd\u4e86\u4eff\u771f\u4eba\u5f62\u673a\u5668 DeepMimic\u5b8c\u6210\u9ad8\u96be\u5ea6\u6742\u6280\u822c\u7684\u52a8\u4f5c\u3002Lazic et al. (2018) \u7814\u7a76\u4e86\u6570\u636e\u4e2d\u5fc3\u5236\u51b7\u3002Segler et al. (2018) \u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5316\u5b66\u5206\u5b50\u9006\u5408\u6210\u3002Popova et al. (2018) \u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5168\u65b0\u836f\u7269\u8bbe\u8ba1\u3002\u7b49\u7b49\u3002<\/p>\n

    DQN\u7ed3\u5408\u4e86Q\u5b66\u4e60\u548c\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\uff0c\u4f7f\u7528\u4e86\u7ecf\u9a8c\u56de\u653e (experience replay) \u548c\u76ee\u6807\u7f51\u7edc (target network) \u6280\u672f\u6765\u7a33\u5b9a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u5728\u7ecf\u9a8c\u56de\u653e\u4e2d\uff0c\u7ecf\u9a8c\u88ab\u5b58\u50a8\u5728\u56de\u653e\u7f13\u51b2\u5668\u4e2d\uff0c\u7136\u540e\u968f\u673a\u6837\u672c\u7528\u4e8e\u5b66\u4e60\u3002\u76ee\u6807\u7f51\u7edc\u4fdd\u7559\u4e00\u4efd\u5355\u72ec\u7684\u7f51\u7edc\u53c2\u6570\uff0c\u7528\u4e8e\u5728\u5b66\u4e60\u4e2d\u4f7f\u7528\u7684\u7f51\u7edc\u53c2\u6570\uff1b\u76ee\u6807\u7f51\u7edc\u5b9a\u671f\u66f4\u65b0\uff0c\u5374\u5e76\u975e\u6bcf\u4e2a\u8bad\u7ec3\u8fed\u4ee3\u6b65\u9aa4\u90fd\u66f4\u65b0\u3002Mnih et al. (2016) \u4ecb\u7ecd\u4e86\u5f02\u6b65\u4f18\u52bf\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005\u7b97\u6cd5(Asynchronous Advantage Actor-Critic, A3C), \u5176\u4e2d\u5e76\u884c\u7684\u884c\u52a8\u8005\u4f7f\u7528\u4e0d\u540c\u7684\u63a2\u7d22\u65b9\u6cd5\u6765\u7a33\u5b9a\u8bad\u7ec3\uff0c\u800c\u5e76\u6ca1\u6709\u4f7f\u7528\u7ecf\u9a8c\u56de\u653e\u3002\u786e\u5b9a\u7b56\u7565\u68af\u5ea6\u53ef\u4ee5\u5e2e\u52a9\u66f4\u9ad8\u6548\u5730\u4f30\u8ba1\u7b56\u7565\u68af\u5ea6\u3002Silver et al. (2014) \u4ecb\u7ecd\u4e86\u786e\u5b9a\u7b56\u7565\u68af\u5ea6 (Deterministic Policy Gradient, DPG)\uff1bLillicrap et al. (2016) \u5c06\u5b83\u6269\u5c55\u4e3a\u6df1\u5ea6\u786e\u5b9a\u7b56\u7565\u68af\u5ea6 (Deep Deterministic Policy Gradient, DDPG)\u3002\u53ef\u4fe1\u533a\u57df\u65b9\u6cd5\u5bf9\u68af\u5ea6\u66f4\u65b0\u8bbe\u7f6e\u4e86\u7ea6\u675f\u6761\u4ef6\uff0c\u7528\u6765\u7a33\u5b9a\u7b56\u7565\u4f18\u5316\u3002Schulman et al. (2015)\u4ecb\u7ecd\u4e86\u53ef\u4fe1\u533a\u57df\u7b56\u7565\u4f18\u5316\u7b97\u6cd5 (Trust Region Policy Optimization, TRPO)\uff1bSchulman et al. (2017)\u4ecb\u7ecd\u4e86\u8fd1\u7aef\u7b56\u7565\u4f18\u5316\u7b97\u6cd5 (Proximal Policy Optimization, PPO)\u3002Haarnoja et al. (2018)\u4ecb\u7ecd\u4e86\u8f6f\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005(Soft Actor Critic)\u7b97\u6cd5\u30022020\u5e74\u8c37\u6b4cDeepmind\u8bbe\u8ba1\u4e86Agent57\u7b97\u6cd5\uff0c\u53ef\u4ee5\u572857\u4e2a\u96c5\u8fbe\u5229\u6e38\u620f\u4e0a\u90fd\u53d6\u5f97\u975e\u5e38\u597d\u7684\u6210\u7ee9\u3002\u800c\u4e4b\u524d\u5728\u51e0\u6b3e\u6e38\u620f\u4e0a\uff0c\u6bd4\u5982Montezuma\u2019s Revenge, Pitfall, Solaris\u548cSkiing\u4e0a\uff0c\u6210\u7ee9\u603b\u5dee\u5f3a\u4eba\u610f\u3002Agent57\u878d\u5408\u4e86DQN\u4e4b\u540e\u7684\u5f88\u591a\u8fdb\u5c55\uff0c\u5305\u62ec\u5206\u5e03\u5f0f\u5b66\u4e60\u3001\u77ed\u671f\u8bb0\u5fc6\u3001\u7247\u6bb5\u5f0f\u8bb0\u5fc6\u3001\u7528\u5185\u5728\u52a8\u673a\u65b9\u6cd5\u9f13\u52b1\u76f4\u63a5\u63a2\u7d22(\u5305\u62ec\u5728\u957f\u65f6\u95f4\u5c3a\u5ea6\u4e0a\u548c\u77ed\u65f6\u95f4\u5c3a\u5ea6\u4e0a\u8ffd\u6c42\u65b0\u9896\u6027)\u3001\u8bbe\u8ba1\u5143\u63a7\u5236\u5668\uff0c\u7528\u6765\u5b66\u4e60\u5982\u4f55\u5e73\u8861\u63a2\u7d22\u548c\u5229\u7528\u3002<\/p>\n

    \u503c\u5f97\u5173\u6ce8Pieter Abbeel, Dimitri Bertsekas, Emma Brunskill, Chelsea Finn, Leslie Kaelbling, Lihong Li, Michael Littman, Joelle Pineau, Doina Precup, Juergen Schmidhuber, David Silver, Satinder Singh, Dale Schuurmans, Peter Stone, Rich Sutton, Csaba Szepesvari\u7b49\u7814\u7a76\u4eba\u5458\uff0c\u4ee5\u53ca\u50cfCMU, Deepmind, Facebook, Google, Microsoft, MIT, OpenAI, Stanford, University of Alberta, UC Berkeley\u7b49\u7814\u7a76\u673a\u6784\u5728\u5f3a\u5316\u5b66\u4e60\u65b9\u9762\u7684\u5de5\u4f5c\u3002<\/p>\n

    Amershi et al. (2019)\u8ba8\u8bba\u4e86\u673a\u5668\u5b66\u4e60\u4e2d\u7684\u8f6f\u4ef6\u5de5\u7a0b\uff1b\u5f88\u53ef\u80fd\u5bf9\u5f3a\u5316\u5b66\u4e60\u4e5f\u6709\u5e2e\u52a9\u3002\u4f5c\u8005\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u5de5\u4f5c\u6d41\u76849\u4e2a\u9636\u6bb5\uff1a\u6a21\u578b\u9700\u6c42\u3001\u6570\u636e\u6536\u96c6\u3001\u6570\u636e\u6e05\u6d17\u3001\u6570\u636e\u6807\u6ce8\u3001\u7279\u5f81\u5de5\u7a0b\u3001\u6a21\u578b\u8bad\u7ec3\u3001\u6a21\u578b\u8bc4\u4f30\u3001\u6a21\u578b\u90e8\u7f72\u3001\u4ee5\u53ca\u6a21\u578b\u76d1\u89c6\u3002\u5728\u5de5\u4f5c\u6d41\u4e2d\u6709\u5f88\u591a\u53cd\u9988\u56de\u8def\uff0c\u6bd4\u5982\uff0c\u5728\u6a21\u578b\u8bad\u7ec3\u548c\u7279\u5f81\u5de5\u7a0b\u4e4b\u95f4\uff1b\u800c\u6a21\u578b\u8bc4\u4f30\u548c\u6a21\u578b\u76d1\u89c6\u53ef\u80fd\u4f1a\u56de\u5230\u524d\u9762\u4efb\u4f55\u4e00\u4e2a\u9636\u6bb5\u3002\u4f5c\u8005\u4e5f\u6307\u51fa\u4eba\u5de5\u667a\u80fd\u4e2d\u7684\u8f6f\u4ef6\u5de5\u7a0b\u4e0e\u4ee5\u524d\u8f6f\u4ef6\u5e94\u7528\u4e2d\u7684\u8f6f\u4ef6\u5de5\u7a0b\u7684\u4e09\u4e2a\u4e0d\u540c\uff1a1\uff09\u53d1\u73b0\u6570\u636e\u3001\u7ba1\u7406\u6570\u636e\u3001\u4e3a\u6570\u636e\u786e\u5b9a\u7248\u672c\u53f7\u66f4\u590d\u6742\u3001\u66f4\u56f0\u96be\uff1b2\uff09\u6a21\u578b\u5b9a\u5236\u548c\u6a21\u578b\u91cd\u7528\u90fd\u9700\u8981\u4e0d\u540c\u7684\u6280\u80fd\uff1b3\uff09\u4eba\u5de5\u667a\u80fd\u7ec4\u6210\u90e8\u5206\u7f3a\u5c11\u6a21\u5757\u5316\u3001\u590d\u6742\u7684\u65b9\u5f0f\u7ea0\u7f20\u5728\u4e00\u8d77\u3002<\/p>\n

    \u73b0\u5b9e\u4e16\u754c\u4e2d\u5f3a\u5316\u5b66\u4e60\u9762\u4e34\u7684\u6311\u6218\u7684\u8ba8\u8bba\u57fa\u4e8eDulac-Arnold et al. (2020) \u3002\u673a\u5668\u4eba\u9ad8\u6548\u5b66\u4e60\u7684\u57fa\u7840\u7684\u8ba8\u8bba\u57fa\u4e8eKaelbling (2020)\u3002\u91cc\u9762\u63d0\u5230\u4e24\u7bc7\u535a\u5ba2\uff1aSutton (2019) The bitter lesson \u548c Brooks (2019)A better lesson. \u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5065\u5eb7\u7684\u53c2\u8003\u539f\u5219\u57fa\u4e8eGottesman et al. (2019)\u3002Wiens et al. (2019)\u8ba8\u8bba\u4e86\u5728\u5065\u5eb7\u533b\u7597\u4e2d\u5e94\u7528\u673a\u5668\u5b66\u4e60\u5982\u4f55\u505a\u5230\u8d1f\u8d23\u4efb\u3002\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\uff1a\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u4ee3\u8868\u4e00\u79cd\u65b0\u7684\u5546\u4e1a\u6a21\u5f0f\u7684\u8ba8\u8bba\u57fa\u4e8eCasado and Bornstein (2020)\u3002\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\uff1a\u5f25\u8865\u6982\u5ff5\u9a8c\u8bc1\u4e0e\u4ea7\u54c1\u7684\u5dee\u8ddd\u7684\u8ba8\u8bba\u57fa\u4e8eNg (2020)\u3002\u53e6\u5916\uff0cAlharin et al. (2020), Belle and Papantonis (2020), Lipton (2018) \u7b49\u8ba8\u8bba\u53ef\u89e3\u91ca\u6027\u3002<\/p>\n

    Li (2017) \u662f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7efc\u8ff0\uff0c\u517c\u987e\u4e86\u8be5\u9886\u57df\u7684\u5927\u65b9\u5411\u548c\u7ec6\u8282\uff0c\u5728\u5386\u53f2\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\u8ba8\u8bba\u4e86\u6700\u65b0\u7684\u8fdb\u5c55\u3002Li (2017) \u8ba8\u8bba\u4e86\u516d\u4e2a\u6838\u5fc3\u5143\u7d20\uff1a\u503c\u51fd\u6570\u3001\u7b56\u7565\u3001\u5956\u8d4f\u3001\u6a21\u578b\u3001\u63a2\u7d22-\u5229\u7528\u3001\u4ee5\u53ca\u8868\u5f81\uff1b\u8ba8\u8bba\u4e86\u516d\u4e2a\u91cd\u8981\u673a\u5236\uff1a\u6ce8\u610f\u529b\u6a21\u578b\u548c\u5b58\u50a8\u5668\u3001\u65e0\u76d1\u7763\u5b66\u4e60\u3001\u5206\u5c42\u5f3a\u5316\u5b66\u4e60\u3001\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\u3001\u5173\u7cfb\u5f3a\u5316\u5b66\u4e60\u3001\u4ee5\u53ca\u5143\u5b66\u4e60\uff1b\u8ba8\u8bba\u4e8612\u4e2a\u5e94\u7528\u573a\u666f\uff1a\u6e38\u620f\u3001\u673a\u5668\u4eba\u3001\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u8ba1\u7b97\u673a\u89c6\u89c9\u3001\u91d1\u878d\u3001\u5546\u4e1a\u7ba1\u7406\u3001\u533b\u7597\u3001\u6559\u80b2\u3001\u80fd\u6e90\u3001\u4ea4\u901a\u3001\u8ba1\u7b97\u673a\u7cfb\u7edf\u3001\u4ee5\u53ca\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u548c\u827a\u672f\u3002<\/p>\n

    \u53c2\u8003\u6587\u732e\uff1a<\/h3>\n

    Alharin, A., Doan, T.-N., and Sartipi, M. (2020). Reinforcement learning interpretation methods: A survey. IEEE Access, 8: \u2013 .<\/p>\n

    Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., and Zimmermann, T. (2019). Software engineering for machine learning: A case study. In ICSE.<\/p>\n

    Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., and Guo, D. (2020). Agent57: Outperforming the atari human benchmark. ArXiv.<\/p>\n

    Belle, V. and Papantonis, I. (2020). Principles and practice of explainable machine learning. AXiv.<\/p>\n

    Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., and Hassabis, D. (2019). Reinforcement learning, fast and slow. Trends in Cognitive Sciences, 23(5):408\u2013422.<\/p>\n

    Brooks, R. (2019). A better lesson. https:\/\/rodneybrooks.com\/a-better-lesson\/<\/p>\n

    Brown, N. and Sandholm, T. (2017). Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science.<\/p>\n

    Casado, M. and Bornstein, M. (2020). The new business of AI (and how its
    different from traditional software). https:\/\/a16z.com\/2020\/02\/16\/ the-new-business-of-ai-and-how-its-different-from-traditional-software\/.<\/p>\n

    Dulac-Arnold, G., Levine, N., Mankowitz, D. J., Li, J., Paduraru, C., Gowal, S., and Hester, T. (2020). An empirical investigation of the challenges of real-world reinforcement learning. ArXiv.<\/p>\n

    Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., and Celi, L. A. (2019). Guidelines for reinforcement learning in healthcare. Nature Medicine, 25:14\u201318.<\/p>\n

    Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.<\/p>\n

    Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML.<\/p>\n

    Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26).<\/p>\n

    Kaelbling, L. P. (2020). The foundation of efficient robot learning. Science, 369(6506):915\u2013916.<\/p>\n

    Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M., and Imwalle, G. (2018). Data center cooling using model-predictive control. In NeurIPS.<\/p>\n

    Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., and Hutter, M. (2020). Learning quadrupedal locomotion over challenging terrain. Science Robotics.<\/p>\n

    Li, Y. (2017). Deep Reinforcement Learning: An Overview. ArXiv.<\/p>\n

    Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016). Continuous control with deep reinforcement learning. In ICLR.<\/p>\n

    Lipton, Z. C. (2018). The mythos of model interpretability. ACM Queue, 16(3):31\u201357.<\/p>\n

    Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Garcia Castaneda, A., Beat- tie, C., Rabinowitz, N. C., Morcos, A. S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J. Z., Silver, D., Hassabis, D., Kavukcuoglu, K., and Graepel, T. (2018). Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. ArXiv.<\/p>\n

    Mnih, V., Badia, A. P., Mirza, M., Graves, A., Harley, T., Lillicrap, T. P., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In ICML.<\/p>\n

    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529\u2013533.<\/p>\n

    Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. (2017). Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508\u2013513.<\/p>\n

    Ng, A. (2020). Bridging AI\u2019s proof-of-concept to production gap. https:\/\/www.youtube. com\/watch?v=tsPuVAMaADY.<\/p>\n

    OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., and Zaremba, W. (2018). Learning dexterous in-hand manipulation. ArXiv.<\/p>\n

    OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Jozefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., de Oliveira Pinto, H. P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. ArXiv.<\/p>\n

    Peng, X. B., Abbeel, P., Levine, S., and van de Panne, M. (2018). Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. In SIGGRAPH.<\/p>\n

    Popova, M., Isayev, O., and Tropsha, A. (2018). Deep reinforcement learning for de novo drug design. Science Advances, 4(7).<\/p>\n

    Powell, W. B. (2011). Approximate Dynamic Programming: Solving the curses of dimensionality (2nd Edition). John Wiley and Sons.<\/p>\n

    Powell, W. B. (2019). From reinforcement learning to optimal control: A unified framework for sequential decisions. Arxiv.<\/p>\n

    Recht, B. (2019). A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 1:253\u2013279.<\/p>\n

    Russell, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd edition). Pearson.<\/p>\n

    Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. (2015). Trust region policy optimization. In ICML.<\/p>\n

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv.<\/p>\n

    Segler, M. H. S., Preuss, M., and Waller, M. P. (2018). Planning chemical syntheses with deep neural networks and symbolic AI. Nature, 555:604\u2013610.<\/p>\n

    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484\u2013489.<\/p>\n

    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In ICML.<\/p>\n

    Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140\u20131144.<\/p>\n

    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In ICML.<\/p>\n

    Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550:354\u2013359.<\/p>\n

    Sutton, R. (2019). The bitter lesson. http:\/\/incompleteideas.net\/IncIdeas\/ BitterLesson.html.<\/p>\n

    Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd Edition). MIT Press.<\/p>\n

    Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Morgan & Claypool.<\/p>\n

    Szepesvari, C. (2020). Myths and misconceptions in rl. https:\/\/sites.ualberta.ca\/ \u0303szepesva\/talks.html. KDD 2020 Deep Learning Day.<\/p>\n

    Tian, Y., Ma, J., Gong, Q., Sengupta, S., Chen, Z., Pinkerton, J., and Zitnick, C. L. (2019). ELF OpenGo: An analysis and open reimplementation of AlphaZero. In ICML.<\/p>\n

    Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wunsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C., and Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575:350\u2013354.<\/p>\n

    Wiens, J., Saria, S., Sendak, M., Ghassemi, M., Liu, V. X., Doshi-Velez, F., Jung, K., Heller, K., Kale, D., Saeed, M., Ossorio, P. N., Thadaney-Israni, S., and Goldenberg, A. (2019). Do no harm: a roadmap for responsible machine learning for health care. Nature Medicine, 25:1337\u20131340.<\/p>\n

    Zhang, A., Lipton, Z. C., Li, M., and Smola, A. J. (2020). Dive into Deep Learning. https: \/\/d2l.ai.<\/p>\n

    \u674e\u822a. (2019). \u7edf\u8ba1\u5b66\u4e60\u65b9\u6cd5(\u7b2c\u4e8c\u7248). \u6e05\u534e\u5927\u5b66\u51fa\u7248\u793e.<\/p>\n

    \u5f20\u94b9, \u6731\u519b, \u82cf\u822a. \u8fc8\u5411\u7b2c\u4e09\u4ee3\u4eba\u5de5\u667a\u80fd. \u4e2d\u56fd\u79d1\u5b66: \u4fe1\u606f\u79d1\u5b66, 2020, 50: 1281\u20131302, doi: 10.1360\/SSI-2020-0204 Zhang B, Zhu J, Su H. Toward the third generation of artificial intelligence (in Chinese). Sci Sin Inform, 2020, 50: 1281\u20131302, doi: 10.1360\/SSI-2020-0204<\/p>\n

    \u5468\u5fd7\u534e. (2016). \u673a\u5668\u5b66\u4e60. \u6e05\u534e\u5927\u5b66\u51fa\u7248\u793e<\/p>\n

    \u6ce8\uff1a\u7531\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c\uff0c\u9996\u53d1\u4e8e\u77e5\u4e4e\u300a\u5f3a\u5316\u667a\u80fd(RLAI)\u300b\u4e13\u680f\uff0chttps:\/\/www.zhihu.com\/column\/c_<\/p>\n","protected":false},"excerpt":{"rendered":"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c\u5f3a\u5316\u5b66\u4e60(reinforcementlearning)\u7ecf\u8fc7\u4e86\u51e0\u5341\u5e74\u7684\u7814\u53d1\uff0c\u5728\u4e00\u76f4\u7a33\u5b9a...","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"_links":{"self":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/posts\/7612"}],"collection":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/comments?post=7612"}],"version-history":[{"count":0,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/posts\/7612\/revisions"}],"wp:attachment":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/media?parent=7612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/categories?post=7612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/tags?post=7612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}