\u5e94\u7528\u65b9\u5411\uff1a\u63a8\u8350\u7cfb\u7edf\u3001\u5e7f\u544a\u3001\u804a\u5929\u7cfb\u7edf\u3001\u5546\u4e1a\u3001\u91d1\u878d\u3001\u5065\u5eb7\u533b\u7597\u3001\u6559\u80b2\u3001\u673a\u5668\u4eba\u3001\u81ea\u52a8\u9a7e\u9a76\u3001\u4ea4\u901a\u3001\u80fd\u6e90\u3001\u5316\u5b66\u5408\u6210\u3001\u836f\u7269\u8bbe\u8ba1\u3001\u5de5\u4e1a\u63a7\u5236\u3001\u7f8e\u672f\u3001\u97f3\u4e50\u3001\u4ee5\u53ca\u5176\u5b83\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\u95ee\u9898\u3002
\u4e13\u520a\u5185\u5bb9\u4f1a\u57282021\u5e74\u521d\u5b8c\u6210\u7f16\u8f91\uff0c\u656c\u8bf7\u5173\u6ce8\u3002<\/li>\n<\/ol>\n\u5341\u3001\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a<\/h3>\n
\u57282019\u5e74\u56fd\u9645\u673a\u5668\u5b66\u4e60\u5927\u4f1a(International Conference on Machine Learning, ICML)\u4e0a\uff0c\u7b14\u8005\u4e0eAlborz Geramifard (\u8138\u4e66), Lihong Li (\u8c37\u6b4c), Csaba Szepesvari (Deepmind & \u963f\u5c14\u4f2f\u5854\u5927\u5b66), Tao Wang (\u82f9\u679c) \u5171\u540c\u7ec4\u7ec7\u4e3e\u529e\u4e86\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a(Reinforcement Learning for Real Life, RL4RealLife). \u5de5\u4e1a\u754c\u548c\u5b66\u672f\u754c\u5bf9\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u611f\u5174\u8da3\u7684\u7814\u53d1\u4eba\u5458\u96c6\u805a\u4e00\u5802\uff0c\u63a2\u8ba8\u5982\u4f55\u5c06\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5b9e\u9645\u573a\u666f\u3002<\/p>\n
\u7814\u8ba8\u4f1a\u6709\u4e09\u4e2a\u4e00\u6d41\u7684\u7279\u9080\u62a5\u544a\uff1a<\/p>\n
AlphaStar\uff1a\u7406\u89e3\u661f\u9645\u4e89\u9738\u3002\u62a5\u544a\u4eba\uff1aDavid Silver
\u5982\u4f55\u5f00\u5c55\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7684\u9769\u547d\uff1f\u62a5\u544a\u4eba\uff1aJohn Langford
\u63a8\u8350\u7cfb\u7edf\u4e2d\u7684\u5f3a\u5316\u5b66\u4e60\u3002\u62a5\u544a\u4eba\uff1aCraig Boutilier
\u9876\u7ea7\u4e13\u5bb6\u7ec4\u6210\u4e86\u4e13\u9898\u8ba8\u8bba\u5c0f\u7ec4: Craig Boutilier (\u8c37\u6b4c\u7814\u7a76\u9662), Emma Brunskill (\u65af\u5766\u798f\u5927\u5b66), Chelsea Finn (\u8c37\u6b4c\u7814\u7a76\u9662, \u65af\u5766\u798f\u5927\u5b66, \u52a0\u5dde\u5927\u5b66\u4f2f\u514b\u5229\u5206\u6821), Mohammad Ghavamzadeh (\u8138\u4e66\u4eba\u5de5\u667a\u80fd\u7814\u7a76\u9662), John Langford (\u5fae\u8f6f\u7814\u7a76\u9662), David Silver (Deepmind), \u548cPeter Stone (\u5f97\u514b\u8428\u65af\u5927\u5b66\u5965\u65af\u4e01\u5206\u6821, Cogitai). \u8ba8\u8bba\u4e86\u91cd\u8981\u7684\u95ee\u9898\uff0c\u6bd4\u5982\uff0c\u5f3a\u5316\u5b66\u4e60\u54ea\u4e9b\u65b9\u5411\u6700\u6709\u524d\u666f\uff1f\u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u5230\u5b9e\u9645\u573a\u666f\u7684\u4e00\u822c\u6027\u539f\u5219\u662f\u4ec0\u4e48\uff1f\u7b49\u7b49\u3002<\/p>\n
\u6709\u5927\u7ea660\u7bc7\u6d77\u62a5\/\u8bba\u6587\u3002\u9009\u62e9\u4e864\u7bc7\u6700\u4f73\u8bba\u6587\uff1a<\/p>\n
Chow et al. \u8ba8\u8bba\u4e86\u8fde\u7eed\u52a8\u4f5c\u95ee\u9898\u91cc\u7684\u5b89\u5168\u6027
Dulac-Arnold et al. \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u76849\u4e2a\u6311\u6218
Gauci et al. \u8ba8\u8bba\u4e86\u8138\u4e66\u7684\u5f00\u6e90\u5e94\u7528\u5f3a\u5316\u5b66\u4e60\u5e73\u53f0Horizon
Mao et al. \u8ba8\u8bba\u4e86\u589e\u5f3a\u8ba1\u7b97\u673a\u7cfb\u7edf\u5f00\u653e\u5e73\u53f0Park
\u6b22\u8fce\u8bbf\u95ee\u7814\u8ba8\u4f1a\u7f51\u7ad9\uff1b\u6709\u7279\u9080\u62a5\u544a\u7684\u89c6\u9891\u94fe\u63a5\u3001\u5927\u90e8\u5206\u8bba\u6587\u548c\u4e00\u90e8\u5206\u6d77\u62a5\uff1b\u7f51\u5740\u4e3a\uff1ahttps:\/\/sites.google.com\/view\/RL4RealLife2019.<\/p>\n
2020\u5e746\u6708\uff0c\u7b14\u8005\u4e0eGabriel Dulac-Arnold (\u8c37\u6b4c), Alborz Geramifard (\u8138\u4e66), Omer Gottesman (\u54c8\u4f5b\u5927\u5b66),Lihong Li (\u8c37\u6b4c), Anusha Nagabandi (\u52a0\u5dde\u5927\u5b66\u4f2f\u514b\u5229\u5206\u6821), Zhiwei (Tony) Qin (\u6ef4\u6ef4), Csaba Szepesvari (Deepmind & \u963f\u5c14\u4f2f\u5854\u5927\u5b66) \u5728\u7f51\u4e0a\u5171\u540c\u7ec4\u7ec7\u4e3e\u529e\u4e86\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7814\u8ba8\u4f1a\u3002\u4f1a\u8bae\u9080\u8bf7\u4e86\u9876\u7ea7\u4e13\u5bb6\u7ec4\u6210\u4e86\u4e24\u4e2a\u4e13\u9898\u8ba8\u8bba\u5c0f\u7ec4\uff0c\u5206\u522b\u8ba8\u8bba\u201c\u5f3a\u5316\u5b66\u4e60+\u5065\u5eb7\u533b\u7597\u201d\u548c\u201c\u4e00\u822c\u6027\u5f3a\u5316\u5b66\u4e60\u201d\u4e24\u4e2a\u4e13\u9898\uff1b\u4f1a\u8bae\u670930\u591a\u7bc7\u6d77\u62a5\/\u8bba\u6587\u3002<\/p>\n
\u5f3a\u5316\u5b66\u4e60+\u5065\u5eb7\u533b\u7597\u4e13\u9898\u8ba8\u8bba\u7531Finale Doshi-Velez (\u54c8\u4f5b\u5927\u5b66), Niranjani Prasad (\u666e\u6797\u65af\u987f\u5927\u5b66), Suchi Saria (\u7ea6\u7ff0\u970d\u666e\u91d1\u65af\u5927\u5b66)\u7ec4\u6210, \u7531Susan Murphy (\u54c8\u4f5b\u5927\u5b66)\u4e3b\u6301\uff0c\u7531Omer Gottesman (\u54c8\u4f5b\u5927\u5b66)\u505a\u5f00\u573a\u53ca\u603b\u7ed3\u4e3b\u6301\u3002<\/p>\n
\u4e00\u822c\u6027\u5f3a\u5316\u5b66\u4e60\u4e13\u9898\u8ba8\u8bba\u7531Ed Chi (\u8c37\u6b4c), Chelsea Finn (\u65af\u5766\u798f\u5927\u5b66), Jason Gauci (\u8138\u4e66)\u7ec4\u6210, \u7531Peter Stone (\u5f97\u514b\u8428\u65af\u5927\u5b66&\u7d22\u5c3c)\u4e3b\u6301, \u7531Lihong Li (\u8c37\u6b4c)\u505a\u5f00\u573a\u53ca\u603b\u7ed3\u4e3b\u6301\u3002<\/p>\n
\u66f4\u591a\u4fe1\u606f\u53c2\u89c1\u4f1a\u8bae\u7f51\u5740\uff1ahttps:\/\/sites.google.com\/view\/RL4RealLife.<\/p>\n
\u5341\u4e00\u3001\u5f3a\u5316\u5b66\u4e60\u8d44\u6599<\/h3>\n
\u5f3a\u5316\u5b66\u4e60\u7684\u5b66\u4e60\u8d44\u6599\u4e2d\uff0cSutton & Barto \u7684\u5f3a\u5316\u5b66\u4e60\u6559\u79d1\u4e66\u662f\u5fc5\u8bfb\u7684\uff0cDavid Silver\u7684UCL\u8bfe\u7a0b\u662f\u7ecf\u5178\uff0c\u963f\u5c14\u4f2f\u5854\u5927\u5b66\u6700\u8fd1\u5728Coursera\u4e0a\u7ebf\u4e86\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\u3002\u5f3a\u5316\u5b66\u4e60\u91cc\u6982\u5ff5\u6bd4\u8f83\u591a\uff0c\u4ed4\u7ec6\u5b66\u4e00\u4e9b\u57fa\u7840\uff0c\u4f1a\u5f88\u6709\u5e2e\u52a9\u3002\u5982\u679c\u6709\u4e00\u5b9a\u6df1\u5ea6\u5b66\u4e60\u80cc\u666f\uff0c\u53ef\u80fd\u53ef\u4ee5\u8003\u8651\u76f4\u63a5\u5b66\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u3002OpenAI Spinning Up\u6bd4\u8f83\u7b80\u6d01\uff0cDeepmind\u4e0eUCL\u5408\u51fa\u4e86\u6df1\u5ea6\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0cUC Berkeley\u7684\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\u662f\u9ad8\u7ea7\u8fdb\u9636\u3002\u4e0b\u9762\u5217\u4e86\u8fd9\u51e0\u4e2a\u8d44\u6599\u3002<\/p>\n
Sutton & Barto RL\u5f3a\u5316\u5b66\u4e60\u6559\u79d1\u4e66\uff0chttp:\/\/www.incompleteideas.net\/book\/the-book-2nd.html
David Silver\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0chttp:\/\/www0.cs.ucl.ac.uk\/staff\/D.Silver\/web\/Teaching.html
\u963f\u5c14\u4f2f\u5854\u5927\u5b66\u5728Coursera\u4e0a\u7684\u5f3a\u5316\u5b66\u4e60\u8bfe\uff0chttps:\/\/www.coursera.org\/specializations\/reinforcement-learning
OpenAI Spinning Up, https:\/\/blog.openai.com\/spinning-up-in-deep-rl\/
DeepMind & UCL \u7684\u6df1\u5ea6\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0chttps:\/\/www.youtube.com\/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs
UC Berkeley\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8bfe\u7a0b\uff0chttp:\/\/rail.eecs.berkeley.edu\/deeprlcourse\/
\u5b66\u4e60\u5f3a\u5316\u5b66\u4e60\uff0c\u6709\u5fc5\u8981\u5bf9\u6df1\u5ea6\u5b66\u4e60\u548c\u673a\u5668\u5b66\u4e60\u6709\u4e00\u5b9a\u7684\u4e86\u89e3\u3002\u4e0b\u9762\u63a8\u8350\u51e0\u7bc7\u7efc\u8ff0\u8bba\u6587\u3002<\/p>\n
LeCun, Bengio and Hinton, Deep Learning, Nature, May 2015
Jordan and Mitchell, Machine learning: Trends, perspectives, and prospects, Science, July 2015
Littman, Reinforcement learning improves behaviour from evaluative feedback, Nature, May 2015
\u5e0c\u671b\u6df1\u5165\u4e86\u89e3\u6df1\u5ea6\u5b66\u4e60\u3001\u673a\u5668\u5b66\u4e60\uff0cGoodfellow et al. (2016)\u3001Zhang et al. (2019) \u4ecb\u7ecd\u4e86\u6df1\u5ea6\u5b66\u4e60\uff1b\u5468\u5fd7\u534e(2016)\u3001\u674e\u822a(2019)\u4ecb\u7ecd\u4e86\u673a\u5668\u5b66\u4e60\u3002<\/p>\n
\u5b66\u4e60\u57fa\u672c\u6982\u5ff5\u7684\u540c\u65f6\u5e94\u8be5\u901a\u8fc7\u7f16\u7a0b\u52a0\u6df1\u7406\u89e3\u3002OpenAI Gym\u5f88\u5e38\u7528\uff0chttps:\/\/gym.openai.com.<\/p>\n
\u4e0b\u9762\u7684Github\u5f00\u6e90\u628aSutton & Barto\u5f3a\u5316\u5b66\u4e60\u4e66\u91cc\u9762\u7684\u4f8b\u5b50\u90fd\u5b9e\u73b0\u4e86\uff0c\u4e5f\u6709\u5f88\u591a\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u7684\u5b9e\u73b0\uff1ahttps:\/\/github.com\/ShangtongZhang\/reinforcement-learning-an-introduction.<\/p>\n
\u7b14\u8005\u5076\u5c14\u5199\u535a\u5ba2\uff1ahttps:\/\/www.zhihu.com\/people\/yuxili99\/\uff0c\u5728\u77e5\u4e4e\u4e0a\u5f00\u4e86\u5f3a\u5316\u5b66\u4e60\u4e13\u680f\uff1ahttps:\/\/zhuanlan.zhihu.com\/c_. \u5176\u4e2d\u300a\u5f3a\u5316\u5b66\u4e60\u8d44\u6599\u300b\u6536\u96c6\u4e86\u5f88\u591a\u5f3a\u5316\u5b66\u4e60\u53ca\u76f8\u5173\u7684\u8d44\u6599\uff1b\u300a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u573a\u666f\u300b\u6536\u96c6\u4e86\u5f88\u591a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u65b9\u9762\u7684\u8bba\u6587\u3001\u8d44\u6599\u3002<\/p>\n
\u5341\u4e8c\u3001\u5f3a\u5316\u5b66\u4e60\u7b80\u53f2<\/h3>\n
\u65e9\u671f\u7684\u5f3a\u5316\u5b66\u4e60\u6709\u4e24\u4e2a\u4e3b\u8981\u7684\u4e30\u5bcc\u7ef5\u957f\u7684\u53d1\u5c55\u7ebf\u7d22\u3002\u4e00\u4e2a\u662f\u6e90\u4e8e\u52a8\u7269\u5b66\u4e60\u7684\u8bd5\u9519\u6cd5\uff1b\u5728\u65e9\u671f\u7684\u4eba\u5de5\u667a\u80fd\u4e2d\u53d1\u5c55\uff0c\u4e0e\u4e8c\u5341\u4e16\u7eaa\u516b\u5341\u5e74\u4ee3\u4fc3\u8fdb\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u590d\u5174\u3002\u53e6\u4e00\u4e2a\u662f\u6700\u4f18\u63a7\u5236\u53ca\u5176\u89e3\u51b3\u65b9\u6848\uff1a\u503c\u51fd\u6570\u548c\u52a8\u6001\u89c4\u5212\u3002\u6700\u4f18\u63a7\u5236\u5927\u90e8\u5206\u6ca1\u6709\u5305\u62ec\u5b66\u4e60\u3002\u8fd9\u4e24\u4e2a\u7ebf\u7d22\u5148\u662f\u5206\u5934\u8fdb\u5c55\uff0c\u5230\u4e8c\u5341\u4e16\u7eaa\u516b\u5341\u5e74\u4ee3\uff0c\u65f6\u5e8f\u5dee\u5206(temporal-difference)\u65b9\u6cd5\u51fa\u73b0\uff0c\u5f62\u6210\u7b2c\u4e09\u6761\u7ebf\u7d22\u3002\u7136\u540e\u51e0\u79cd\u7ebf\u7d22\u4ea4\u7ec7\u878d\u5408\u5230\u4e00\u8d77\uff0c\u53d1\u5c55\u6210\u73b0\u4ee3\u5f3a\u5316\u5b66\u4e60\u3002<\/p>\n
\u6700\u4f18\u63a7\u5236\u59cb\u4e8e\u4e8c\u5341\u4e16\u7eaa\u4e94\u5341\u5e74\u4ee3\uff0c\u8bbe\u8ba1\u63a7\u5236\u5668\u6765\u4f18\u5316\u52a8\u6001\u7cfb\u7edf\u4e00\u6bb5\u65f6\u95f4\u5185\u884c\u4e3a\u7684\u6027\u80fd\u6307\u6807\u3002\u52a8\u6001\u89c4\u5212\u662f\u6700\u4f18\u63a7\u5236\u7684\u4e00\u4e2a\u89e3\u51b3\u65b9\u6cd5\uff0c\u7531Richard Bellman\u7b49\u4eba\u63d0\u51fa\uff0c\u57fa\u4e8e\u4ee5\u524dHamilton\u548cJacobi\u7684\u7406\u8bba\u3002\u52a8\u6001\u89c4\u5212\u4f7f\u7528\u52a8\u6001\u7cfb\u7edf\u7684\u72b6\u6001\u548c\u503c\u51fd\u6570\uff0c\u6216\u6700\u4f18\u56de\u62a5\u51fd\u6570\uff0c\u6765\u5b9a\u4e49\u4e00\u4e2a\u7b49\u5f0f\uff0c\u73b0\u5728\u88ab\u79f0\u4e3aBellman\u7b49\u5f0f\u3002\u901a\u8fc7\u89e3\u8fd9\u4e2a\u7b49\u5f0f\u7684\u4e00\u7ec4\u65b9\u6cd5\u5219\u88ab\u79f0\u4e3a\u52a8\u6001\u89c4\u5212\u65b9\u6cd5\u3002Bellman\u4e5f\u63d0\u51fa\u79bb\u6563\u968f\u673a\u7248\u7684\u6700\u4f18\u63a7\u5236\u95ee\u9898\uff0c\u65e2\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b(Markov decision processes, MDP). Ronald Howard\u57281960\u5e74\u7ed9MDP\u95ee\u9898\u8bbe\u8ba1\u4e86\u7b56\u7565\u8fed\u4ee3\u65b9\u6cd5\u3002\u8fd9\u4e9b\u90fd\u662f\u73b0\u4ee3\u5f3a\u5316\u5b66\u4e60\u7406\u8bba\u548c\u7b97\u6cd5\u7684\u57fa\u672c\u5143\u7d20\u3002<\/p>\n
\u4e00\u822c\u8ba4\u4e3a\uff0c\u52a8\u6001\u89c4\u5212\u662f\u89e3\u51b3\u4e00\u822c\u6027\u7684\u968f\u673a\u4f18\u5316\u63a7\u5236\u7684\u552f\u4e00\u65b9\u6cd5\u3002\u52a8\u6001\u89c4\u5212\u4f1a\u9047\u5230\u201c\u7ef4\u5ea6\u707e\u96be\u201d\u95ee\u9898\uff0c\u5c31\u662f\u8bf4\uff0c\u5b83\u7684\u8ba1\u7b97\u590d\u6742\u6027\u968f\u7740\u72b6\u6001\u53d8\u91cf\u7684\u4e2a\u6570\u800c\u6307\u6570\u589e\u957f\u3002\u4e0d\u8fc7\uff0c\u52a8\u6001\u89c4\u5212\u4ecd\u7136\u662f\u6700\u9ad8\u6548\u3001\u5e94\u7528\u6700\u5e7f\u7684\u65b9\u6cd5\u3002\u52a8\u6001\u89c4\u5212\u5df2\u7ecf\u88ab\u6269\u5c55\u5230\u90e8\u5206\u53ef\u89c1\u9a6c\u5c14\u79d1\u592b\u51b3\u7b56\u8fc7\u7a0b(Partially Observable MDP, POMDP)\uff0c\u5f02\u6b65\u65b9\u6cd5\uff0c\u4ee5\u53ca\u5404\u79cd\u5e94\u7528\u3002<\/p>\n
\u6700\u4f18\u63a7\u5236\u3001\u52a8\u6001\u89c4\u5212\u4e0e\u5b66\u4e60\u7684\u8054\u7cfb\uff0c\u786e\u8ba4\u5f97\u5374\u6bd4\u8f83\u6162\u3002\u53ef\u80fd\u7684\u539f\u56e0\u662f\u8fd9\u4e9b\u9886\u57df\u7531\u4e0d\u540c\u7684\u5b66\u79d1\u5728\u53d1\u5c55\uff0c\u800c\u76ee\u6807\u4e5f\u4e0d\u5c3d\u76f8\u540c\u3002\u4e00\u4e2a\u6d41\u884c\u7684\u89c2\u70b9\u662f\u52a8\u6001\u89c4\u5212\u662f\u79bb\u7ebf\u8ba1\u7b97\u7684\uff0c\u9700\u8981\u51c6\u786e\u7684\u7cfb\u7edf\u6a21\u578b\uff0c\u5e76\u7ed9\u51faBellman\u7b49\u5f0f\u7684\u89e3\u6790\u89e3\u3002\u8fd8\u6709\uff0c\u6700\u7b80\u5355\u7684\u52a8\u6001\u89c4\u5212\u662f\u6309\u65f6\u95f4\u4ece\u540e\u5411\u524d\u8fd0\u7b97\u7684\uff0c\u800c\u5b66\u4e60\u5219\u662f\u4ece\u524d\u5f80\u540e\u7684\uff0c\u8fd9\u6837\uff0c\u5219\u5f88\u96be\u628a\u4e24\u8005\u8054\u7cfb\u8d77\u6765\u3002\u4e8b\u5b9e\u4e0a\uff0c\u65e9\u671f\u7684\u4e00\u4e9b\u7814\u7a76\u5de5\u4f5c\uff0c\u5df2\u7ecf\u628a\u52a8\u6001\u89c4\u5212\u4e0e\u5b66\u4e60\u7ed3\u5408\u8d77\u6765\u4e86\u3002\u800c\u57281989\u5e74\uff0cChris Watkins\u7528MDP\u7684\u5f62\u5f0f\u5b9a\u4e49\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\uff0c\u628a\u52a8\u6001\u89c4\u5212\u548c\u7ebf\u4e0a\u5b66\u4e60\u5b8c\u5168\u7ed3\u5408\u8d77\u6765\uff0c\u4e5f\u5f97\u5230\u5e7f\u6cdb\u63a5\u53d7\u3002\u4e4b\u540e\uff0c\u8fd9\u6837\u7684\u8054\u7cfb\u83b7\u5f97\u8fdb\u4e00\u6b65\u7684\u53d1\u5c55\u3002\u9ebb\u7701\u7406\u5de5\u5b66\u9662\u7684Dimitri Bertsekas\u548cJohn Tsitsiklis\u63d0\u51fa\u4e86\u795e\u7ecf\u5143\u52a8\u6001\u89c4\u5212(neurodynamic programming)\u8fd9\u4e00\u672f\u8bed\uff0c\u7528\u6765\u6307\u4ee3\u52a8\u6001\u89c4\u5212\u4e0e\u795e\u7ecf\u5143\u7f51\u7edc\u7684\u7ed3\u5408\u3002\u73b0\u5728\u8fd8\u5728\u7528\u7684\u53e6\u4e00\u4e2a\u672f\u8bed\u662f\u8fd1\u4f3c\u52a8\u6001\u89c4\u5212(approximate dynamic programming). \u8fd9\u4e9b\u65b9\u6cd5\u4e0e\u5f3a\u5316\u5b66\u4e60\u90fd\u662f\u5728\u89e3\u51b3\u52a8\u6001\u89c4\u5212\u7684\u7ecf\u5178\u95ee\u9898\u3002<\/p>\n
\u5728\u67d0\u79cd\u610f\u4e49\u4e0a\uff0c\u6700\u4f18\u63a7\u5236\u5c31\u662f\u5f3a\u5316\u5b66\u4e60\u3002\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u4e0e\u6700\u4f18\u63a7\u5236\u95ee\u9898\u7d27\u5bc6\u76f8\u5173\uff0c\u5c24\u5176\u662f\u63cf\u8ff0\u6210MDP\u7684\u968f\u673a\u4f18\u5316\u63a7\u5236\u95ee\u9898\u3002\u8fd9\u6837\uff0c\u6700\u4f18\u63a7\u5236\u7684\u89e3\u51b3\u65b9\u6cd5\uff0c\u6bd4\u5982\u52a8\u6001\u89c4\u5212\uff0c\u4e5f\u662f\u5f3a\u5316\u5b66\u4e60\u65b9\u6cd5\u3002\u5927\u90e8\u5206\u4f20\u7edf\u7684\u6700\u4f18\u63a7\u5236\u65b9\u6cd5\u9700\u8981\u5b8c\u5168\u7684\u7cfb\u7edf\u6a21\u578b\u77e5\u8bc6\uff0c\u8fd9\u6837\u628a\u5b83\u4eec\u770b\u6210\u5f3a\u5316\u5b66\u4e60\u6709\u4e9b\u4e0d\u591f\u81ea\u7136\u3002\u4e0d\u8fc7\uff0c\u8bb8\u591a\u52a8\u6001\u89c4\u5212\u7b97\u6cd5\u662f\u589e\u91cf\u7684\u3001\u8fed\u4ee3\u7684\u3002\u50cf\u5b66\u4e60\u65b9\u6cd5\u4e00\u6837\uff0c\u5b83\u4eec\u901a\u8fc7\u8fde\u7eed\u7684\u8fd1\u4f3c\u9010\u6e10\u8fbe\u5230\u6b63\u786e\u89e3\u3002\u8fd9\u4e9b\u76f8\u4f3c\u6027\u6709\u7740\u6df1\u523b\u7684\u610f\u4e49\uff0c\u800c\u5bf9\u4e8e\u5b8c\u5168\u4fe1\u606f\u548c\u4e0d\u5b8c\u5168\u4fe1\u606f\u7684\u7406\u8bba\u548c\u65b9\u6cd5\u4e5f\u7d27\u5bc6\u76f8\u5173\u3002<\/p>\n
\u4e0b\u9762\u8ba8\u8bba\u5f3a\u5316\u5b66\u4e60\u65e9\u671f\u53d1\u5c55\u7684\u53e6\u5916\u4e00\u6761\u7ebf\u7d22\uff1a\u8bd5\u9519\u5b66\u4e60\u6cd5\u3002\u8bd5\u9519\u5b66\u4e60\u6cd5\u6700\u65e9\u53ef\u4ee5\u8ffd\u6eaf\u5230\u5341\u4e5d\u4e16\u7eaa\u4e94\u5341\u5e74\u4ee3\u30021911\u5e74\uff0cEdward Thorndike\u7b80\u660e\u5730\u628a\u8bd5\u9519\u5b66\u4e60\u6cd5\u5f53\u6210\u5b66\u4e60\u7684\u539f\u5219\uff1a\u5bf9\u4e8e\u540c\u4e00\u60c5\u51b5\u4e0b\u7684\u51e0\u4e2a\u53cd\u5e94\uff0c\u5728\u5176\u5b83\u56e0\u7d20\u4e00\u6837\u65f6\uff0c\u53ea\u6709\u4f34\u968f\u7740\u6216\u7d27\u968f\u52a8\u7269\u7684\u559c\u60a6\u4e4b\u540e\u7684\u90a3\u4e9b\u53cd\u5e94\uff0c\u624d\u4f1a\u88ab\u66f4\u6df1\u523b\u5730\u4e0e\u5f53\u4e0b\u7684\u60c5\u51b5\u8054\u7cfb\u8d77\u6765\uff0c\u8fd9\u6837\uff0c\u5f53\u8fd9\u4e9b\u53cd\u5e94\u518d\u6b21\u53d1\u751f\uff0c\u518d\u6b21\u53d1\u751f\u7684\u53ef\u80fd\u6027\u4e5f\u66f4\u5927\uff1b\u800c\u53ea\u6709\u4f34\u968f\u7740\u6216\u7d27\u968f\u52a8\u7269\u7684\u4e0d\u9002\u4e4b\u540e\u7684\u90a3\u4e9b\u53cd\u5e94\uff0c\u4e0e\u5f53\u4e0b\u7684\u60c5\u51b5\u8054\u7cfb\u4f1a\u88ab\u524a\u5f31\uff0c\u8fd9\u6837\uff0c\u5f53\u8fd9\u4e9b\u53cd\u5e94\u518d\u6b21\u53d1\u751f\uff0c\u518d\u6b21\u53d1\u751f\u7684\u53ef\u80fd\u6027\u4f1a\u66f4\u5c0f\u3002\u559c\u60a6\u6216\u4e0d\u9002\u7684\u7a0b\u5ea6\u8d8a\u5927\uff0c\u8054\u7cfb\u7684\u52a0\u5f3a\u6216\u51cf\u5f31\u7684\u7a0b\u5ea6\u4e5f\u8d8a\u5927\u3002Thorndike\u79f0\u5176\u4e3a\u201c\u6548\u679c\u5b9a\u5f8b\u201d(Law of Effect), \u56e0\u4e3a\u5b83\u63cf\u8ff0\u4e86\u5f3a\u5316\u4e8b\u4ef6\u5bf9\u9009\u62e9\u52a8\u4f5c\u7684\u503e\u5411\u6027\u7684\u6548\u679c\uff0c\u4e5f\u6210\u4e3a\u8bb8\u591a\u884c\u4e3a\u7684\u57fa\u672c\u539f\u5219\u3002<\/p>\n
\u201c\u5f3a\u5316\u201d\u8fd9\u4e00\u672f\u8bed\u51fa\u73b0\u4e8e1927\u5e74\u5df4\u6d66\u6d1b\u592b(Pavlov)\u6761\u4ef6\u53cd\u5c04\u8bba\u6587\u7684\u82f1\u8bd1\u672c\uff0c\u665a\u4e8eThorndike\u7684\u6548\u679c\u5b9a\u5f8b\u3002\u5df4\u6d66\u6d1b\u592b\u628a\u5f3a\u5316\u63cf\u8ff0\u6210\uff0c\u5f53\u52a8\u7269\u63a5\u6536\u5230\u523a\u6fc0\uff0c\u4e5f\u5c31\u662f\u5f3a\u5316\u7269\uff0c\u5bf9\u4e00\u79cd\u884c\u4e3a\u6a21\u5f0f\u7684\u52a0\u5f3a\uff0c\u800c\u8fd9\u4e2a\u523a\u6fc0\u4e0e\u53e6\u4e00\u4e2a\u523a\u6fc0\u6216\u53cd\u5e94\u7684\u53d1\u751f\u6709\u5408\u9002\u7684\u65f6\u95f4\u5173\u7cfb\u3002<\/p>\n
\u5728\u8ba1\u7b97\u673a\u91cc\u5b9e\u73b0\u8bd5\u9519\u6cd5\u5b66\u4e60\u662f\u4eba\u5de5\u667a\u80fd\u65e9\u671f\u7684\u60f3\u6cd5\u4e4b\u4e00\u3002\u57281948\u5e74\uff0c\u963f\u5170\u00b7\u56fe\u7075(Alan Turing)\u63cf\u8ff0\u4e86\u4e00\u4e2a\u201c\u5feb\u4e50-\u75db\u82e6\u7cfb\u7edf\u201d\uff0c\u6839\u636e\u6548\u679c\u5b9a\u5f8b\u8bbe\u8ba1\uff1a\u8fbe\u5230\u4e00\u4e2a\u7cfb\u7edf\u72b6\u6001\u65f6\uff0c\u5982\u679c\u9009\u54ea\u4e2a\u52a8\u4f5c\u8fd8\u6ca1\u6709\u786e\u5b9a\uff0c\u5c31\u6682\u65f6\u968f\u673a\u9009\u4e00\u4e2a\uff0c\u4f5c\u4e3a\u4e34\u65f6\u8bb0\u5f55\u3002\u5f53\u51fa\u73b0\u4e00\u4e2a\u75db\u82e6\u523a\u6fc0\uff0c\u53d6\u6d88\u6240\u6709\u7684\u4e34\u65f6\u8bb0\u5f55\uff1b\u5f53\u51fa\u73b0\u4e00\u4e2a\u5feb\u4e50\u523a\u6fc0\uff0c\u6240\u6709\u7684\u4e34\u65f6\u8bb0\u5f55\u53d8\u6210\u6c38\u4e45\u8bb0\u5f55\u3002<\/p>\n
1954\u5e74\uff0c\u56fe\u7075\u5956\u83b7\u5f97\u8005\u9a6c\u6587\u00b7\u660e\u65af\u57fa(Marvin Minsky)\u5728\u4ed6\u7684\u535a\u58eb\u8bba\u6587\u91cc\u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u8ba1\u7b97\u6a21\u578b\uff0c\u63cf\u8ff0\u4e86\u4ed6\u642d\u5efa\u7684\u6a21\u62df\u7535\u8def\u673a\u5668\uff0c\u7528\u6765\u6a21\u4eff\u5927\u8111\u4e2d\u53ef\u4ee5\u4fee\u6539\u7684\u7a81\u89e6\u8fde\u63a5\u3002\u4ed6\u4e8e1961\u5e74\u53d1\u8868\u300a\u901a\u5411\u4eba\u5de5\u667a\u80fd\u7684\u51e0\u4e2a\u6b65\u9aa4\u300b(Steps Toward Artificial Intelligence), \u8ba8\u8bba\u4e86\u4e0e\u8bd5\u9519\u5b66\u4e60\u6cd5\u76f8\u5173\u7684\u51e0\u4e2a\u95ee\u9898\uff0c\u5305\u62ec\u9884\u6d4b\u3001\u671f\u671b\u3001\u8fd8\u6709\u88ab\u4ed6\u79f0\u4e3a\u590d\u6742\u5f3a\u5316\u5b66\u4e60\u7cfb\u7edf\u4e2d\u57fa\u672c\u7684\u5956\u8d4f\u5206\u914d\u95ee\u9898\uff1a\u5982\u4f55\u628a\u6210\u529f\u83b7\u5f97\u7684\u5956\u8d4f\u5206\u914d\u7ed9\u53ef\u80fd\u5bfc\u81f4\u6210\u529f\u76f8\u5173\u7684\u90a3\u4e9b\u51b3\u5b9a\uff1f\u8fd9\u4e2a\u95ee\u9898\u4ecd\u7136\u662f\u73b0\u4ee3\u5f3a\u5316\u5b66\u4e60\u7684\u4e00\u4e2a\u5173\u952e\u95ee\u9898\u3002<\/p>\n
\u4e8c\u5341\u4e16\u7eaa\u516d\u5341\u5e74\u4ee3\u3001\u4e03\u5341\u5e74\u4ee3\u8bd5\u9519\u5b66\u4e60\u6cd5\u6709\u4e00\u4e9b\u53d1\u5c55\u3002Harry Klopf\u5728\u4eba\u5de5\u667a\u80fd\u9886\u57df\u5bf9\u8bd5\u9519\u6cd5\u5728\u5f3a\u5316\u5b66\u4e60\u4e2d\u7684\u590d\u5174\u505a\u4e86\u91cd\u8981\u8d21\u732e\u3002Klopf\u53d1\u73b0\uff0c\u5f53\u7814\u7a76\u4eba\u5458\u4e13\u95e8\u5173\u6ce8\u76d1\u7763\u5b66\u4e60\u65f6\uff0c\u5219\u4f1a\u9519\u8fc7\u81ea\u9002\u5e94\u884c\u4e3a\u7684\u4e00\u4e9b\u65b9\u9762\u3002\u6309\u7167Klopf\u6240\u8bf4\uff0c\u884c\u4e3a\u7684\u5feb\u4e50\u65b9\u9762\u88ab\u9519\u8fc7\u4e86\uff0c\u800c\u8fd9\u9a71\u52a8\u4e86\u4ece\u73af\u5883\u6210\u529f\u83b7\u5f97\u7ed3\u679c\uff0c\u63a7\u5236\u73af\u5883\u5411\u5e0c\u671b\u7684\u7ed3\u679c\u53d1\u5c55\uff0c\u800c\u8fdc\u79bb\u4e0d\u5e0c\u671b\u7684\u7ed3\u679c\u3002\u8fd9\u662f\u8bd5\u9519\u6cd5\u5b66\u4e60\u7684\u57fa\u672c\u601d\u60f3\u3002Klopf\u7684\u601d\u60f3\u5bf9\u5f3a\u5316\u5b66\u4e60\u4e4b\u7236Richard Sutton\u548cAndrew Barto\u6709\u6df1\u8fdc\u5f71\u54cd\uff0c\u4f7f\u5f97\u4ed6\u4eec\u6df1\u5165\u8bc4\u4f30\u76d1\u7763\u5b66\u4e60\u4e0e\u5f3a\u5316\u5b66\u4e60\u7684\u533a\u522b\uff0c\u5e76\u6700\u7ec8\u4e13\u6ce8\u5f3a\u5316\u5b66\u4e60\uff0c\u5305\u62ec\u5982\u4f55\u4e3a\u591a\u5c42\u795e\u7ecf\u5143\u7f51\u7edc\u8bbe\u8ba1\u5b66\u4e60\u7b97\u6cd5\u3002<\/p>\n
\u73b0\u5728\u8ba8\u8bba\u5f3a\u5316\u5b66\u4e60\u53d1\u5c55\u7684\u7b2c\u4e09\u4e2a\u7ebf\u7d22\uff0c\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u3002\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u57fa\u4e8e\u5bf9\u540c\u4e00\u4e2a\u91cf\u5728\u65f6\u95f4\u4e0a\u76f8\u8fde\u7684\u4f30\u8ba1\uff0c\u6bd4\u5982\uff0c\u56f4\u68cb\u4f8b\u5b50\u4e2d\u8d62\u68cb\u7684\u6982\u7387\u3002\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u662f\u5f3a\u5316\u5b66\u4e60\u4e2d\u4e00\u4e2a\u65b0\u7684\u72ec\u7279\u7684\u65b9\u6cd5\u3002<\/p>\n
\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u90e8\u5206\u4e0a\u8d77\u6e90\u4e8e\u52a8\u7269\u5b66\u4e60\u5fc3\u7406\u5b66\uff0c\u5c24\u5176\u662f\u6b21\u8981\u5f3a\u5316\u7269\u7684\u6982\u5ff5\u3002\u6b21\u8981\u5f3a\u5316\u7269\u4e0e\u50cf\u98df\u7269\u548c\u75db\u82e6\u8fd9\u6837\u7684\u4e3b\u8981\u5f3a\u5316\u7269\u76f8\u4f34\u800c\u6765\uff0c\u6240\u4ee5\u4e5f\u5c31\u6709\u76f8\u5e94\u7684\u5f3a\u5316\u7279\u70b9\u3002\u660e\u65af\u57fa\u4e8e1954\u5e74\u610f\u8bc6\u5230\u8fd9\u6837\u7684\u5fc3\u7406\u5b66\u539f\u5219\u53ef\u80fd\u5bf9\u4eba\u5de5\u5b66\u4e60\u7cfb\u7edf\u7684\u91cd\u8981\u610f\u4e49\uff1b\u4ed6\u53ef\u80fd\u662f\u7b2c\u4e00\u4f4d\u30021959\u5e74\uff0cArthur Samuel\u5728\u5176\u8457\u540d\u7684\u56fd\u9645\u8df3\u68cb\u7a0b\u5e8f\u4e2d\uff0c\u7b2c\u4e00\u6b21\u63d0\u51fa\u5e76\u5b9e\u73b0\u4e86\u5305\u62ec\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7684\u5b66\u4e60\u65b9\u6cd5\u3002Samuel\u53d7\u514b\u52b3\u5fb7\u00b7\u9999\u519c(Claude Shannon)1950\u5e74\u5de5\u4f5c\u7684\u542f\u53d1\uff0c\u53d1\u73b0\u8ba1\u7b97\u673a\u7a0b\u5e8f\u53ef\u4ee5\u7528\u8bc4\u4f30\u51fd\u6570\u73a9\u56fd\u9645\u8c61\u68cb\uff0c\u68cb\u827a\u4e5f\u53ef\u4ee5\u901a\u8fc7\u5728\u7ebf\u4fee\u6539\u8fd9\u4e2a\u8bc4\u4f30\u51fd\u6570\u6765\u63d0\u9ad8\u3002\u660e\u65af\u57fa\u4e8e1961\u5e74\u6df1\u5165\u8ba8\u8bbaSamuel\u7684\u65b9\u6cd5\u4e0e\u6b21\u8981\u5f3a\u5316\u7269\u7684\u8054\u7cfb\u3002Klopf\u57281972\u5e74\u628a\u8bd5\u9519\u5b66\u4e60\u6cd5\u4e0e\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u8054\u7cfb\u8d77\u6765\u3002<\/p>\n
Sutton\u57281978\u5e74\u8fdb\u4e00\u6b65\u7814\u7a76Klopf\u7684\u60f3\u6cd5\uff0c\u5c24\u5176\u662f\u4e0e\u52a8\u7269\u5b66\u4e60\u7684\u8054\u7cfb\uff0c\u901a\u8fc7\u8fde\u7eed\u65f6\u95f4\u9884\u6d4b\u7684\u53d8\u5316\u6765\u5b9a\u4e49\u5b66\u4e60\u89c4\u5219\u3002Sutton\u548cBarto\u7ee7\u7eed\u6539\u8fdb\u8fd9\u4e9b\u60f3\u6cd5\uff0c\u63d0\u51fa\u4e86\u57fa\u4e8e\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7684\u7ecf\u5178\u6761\u4ef6\u53cd\u5c04\u5fc3\u7406\u5b66\u6a21\u578b\u3002\u540c\u65f6\u671f\u6709\u4e0d\u5c11\u76f8\u5173\u5de5\u4f5c\uff1b\u4e00\u4e9b\u795e\u7ecf\u79d1\u5b66\u6a21\u578b\u4e5f\u53ef\u4ee5\u7528\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u6765\u89e3\u91ca\u3002<\/p>\n
Sutton\u548cBarto\u4e8e1981\u5e74\u63d0\u51fa\u4e86\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005\u4f53\u7cfb\u7ed3\u6784\uff0c\u628a\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u4e0e\u8bd5\u9519\u5b66\u4e60\u7ed3\u5408\u8d77\u6765\u3002Sutton1984\u5e74\u7684\u535a\u58eb\u8bba\u6587\u6df1\u5165\u8ba8\u8bba\u4e86\u8fd9\u4e2a\u65b9\u6cd5\u3002Sutton\u4e8e1988\u5e74\u628a\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u4e0e\u63a7\u5236\u5206\u5f00\uff0c\u628a\u5b83\u5f53\u505a\u4e00\u79cd\u901a\u7528\u7684\u9884\u6d4b\u65b9\u6cd5\u3002\u90a3\u7bc7\u8bba\u6587\u4e5f\u63d0\u51fa\u4e86\u591a\u6b65\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7b97\u6cd5\u3002<\/p>\n
\u57281989\u5e74\uff0cChris Watkins\u63d0\u51faQ\u5b66\u4e60\uff0c\u628a\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u3001\u6700\u4f18\u63a7\u5236\u3001\u8bd5\u9519\u5b66\u4e60\u6cd5\u4e09\u4e2a\u7ebf\u7d22\u5b8c\u5168\u878d\u5408\u5230\u4e00\u8d77\u3002\u8fd9\u65f6\uff0c\u5f00\u59cb\u5728\u673a\u5668\u5b66\u4e60\u548c\u4eba\u5de5\u667a\u80fd\u9886\u57df\u51fa\u73b0\u5927\u91cf\u7684\u5f3a\u5316\u5b66\u4e60\u65b9\u9762\u7684\u7814\u7a76\u30021992\u5e74\uff0c<\/p>\n
Gerry Tesauro\u6210\u529f\u5730\u4f7f\u7528\u5f3a\u5316\u5b66\u4e60\u548c\u795e\u7ecf\u5143\u7f51\u7edc\u8bbe\u8ba1\u897f\u6d0b\u53cc\u9646\u68cb(Backgammon)\u7684TD-Gammon\u7b97\u6cd5\uff0c\u8fdb\u4e00\u6b65\u589e\u52a0\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u70ed\u5ea6\u3002<\/p>\n
Sutton\u548cBarto\u4e8e1998\u5e74\u53d1\u8868\u300a\u5f3a\u5316\u5b66\u4e60\u4ecb\u7ecd\u300b\u4e4b\u540e\uff0c\u795e\u7ecf\u79d1\u5b66\u7684\u4e00\u4e2a\u5b50\u9886\u57df\u4e13\u6ce8\u4e8e\u7814\u7a76\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u4e0e\u795e\u7ecf\u7cfb\u7edf\u4e2d\u7684\u5f3a\u5316\u5b66\u4e60\uff0c\u800c\u8fd9\u5f52\u529f\u4e8e\u65f6\u5e8f\u5dee\u5206\u5b66\u4e60\u7b97\u6cd5\u7684\u884c\u4e3a\u4e0e\u5927\u8111\u4e2d\u751f\u6210\u591a\u5df4\u80fa\u7684\u795e\u7ecf\u5143\u7684\u6d3b\u52a8\u4e4b\u95f4\u795e\u79d8\u7684\u76f8\u4f3c\u6027\u3002\u5f3a\u5316\u5b66\u4e60\u8fd8\u6709\u6570\u4e0d\u80dc\u6570\u7684\u8fdb\u5c55\u3002<\/p>\n
\u6700\u8fd1\uff0c\u968f\u7740DQN\u7b97\u6cd5\u7684\u51fa\u73b0\u4ee5\u53caAlphaGo\u7684\u5de8\u5927\u6210\u529f\uff0c\u5f3a\u5316\u5b66\u4e60\u8fdb\u4e00\u6b65\u53d1\u5c55\uff0c\u4e5f\u51fa\u73b0\u4e86\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u8fd9\u4e00\u5b50\u9886\u57df\u3002\u8fd9\u6837\uff0c\u5f3a\u5316\u5b66\u4e60\u7b80\u53f2\u5c31\u4e0e\u524d\u9762\u7684\u4ecb\u7ecd\u8854\u63a5\u8d77\u6765\u4e86\u3002<\/p>\n
\u5341\u4e09\u3001\u5f3a\u5316\u5b66\u4e60\u65f6\u4ee3\u6b63\u5728\u5230\u6765<\/h3>\n
\u5f3a\u5316\u5b66\u4e60\u662f\u4e00\u7c7b\u4e00\u822c\u6027\u7684\u5b66\u4e60\u3001\u9884\u6d4b\u3001\u51b3\u7b56\u65b9\u6cd5\u6846\u67b6\u3002\u5982\u679c\u4e00\u4e2a\u95ee\u9898\u53ef\u4ee5\u63cf\u8ff0\u6210\u6216\u8f6c\u5316\u6210\u5e8f\u5217\u51b3\u7b56\u95ee\u9898\uff0c\u53ef\u4ee5\u5bf9\u72b6\u6001\u3001\u52a8\u4f5c\u3001\u5956\u8d4f\u8fdb\u884c\u5b9a\u4e49\uff0c\u90a3\u4e48\u5f3a\u5316\u5b66\u4e60\u5f88\u53ef\u80fd\u53ef\u4ee5\u5e2e\u52a9\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\u5f3a\u5316\u5b66\u4e60\u6709\u53ef\u80fd\u5e2e\u52a9\u81ea\u52a8\u5316\u3001\u6700\u4f18\u5316\u624b\u52a8\u8bbe\u8ba1\u7684\u7b56\u7565\u3002<\/p>\n
\u5f3a\u5316\u5b66\u4e60\u8003\u8651\u5e8f\u5217\u95ee\u9898\uff0c\u5177\u6709\u957f\u8fdc\u773c\u5149\uff0c\u8003\u8651\u957f\u671f\u56de\u62a5\uff1b\u800c\u76d1\u7763\u5b66\u4e60\u4e00\u822c\u8003\u8651\u4e00\u6b21\u6027\u7684\u95ee\u9898\uff0c\u5173\u6ce8\u77ed\u671f\u6548\u76ca\uff0c\u8003\u8651\u5373\u65f6\u56de\u62a5\u3002\u5f3a\u5316\u5b66\u4e60\u7684\u8fd9\u79cd\u957f\u8fdc\u773c\u5149\u5bf9\u5f88\u591a\u95ee\u9898\u627e\u5230\u6700\u4f18\u89e3\u975e\u5e38\u5173\u952e\u3002\u6bd4\u5982\uff0c\u5728\u6700\u77ed\u8def\u5f84\u7684\u4f8b\u5b50\u4e2d\uff0c\u5982\u679c\u53ea\u8003\u8651\u6700\u8fd1\u90bb\u5c45\u8282\u70b9\uff0c\u5219\u53ef\u80fd\u65e0\u6cd5\u627e\u5230\u6700\u77ed\u8def\u5f84\u3002<\/p>\n
David Silver\u535a\u58eb\u662fAlphaGo\u7684\u6838\u5fc3\u7814\u53d1\u4eba\u5458\uff0c\u4ed6\u63d0\u51fa\u8fd9\u6837\u7684\u5047\u8bbe\uff1a\u4eba\u5de5\u667a\u80fd=\u5f3a\u5316\u5b66\u4e60+\u6df1\u5ea6\u5b66\u4e60\u3002Russell\u548cNorvig\u7684\u7ecf\u5178\u4eba\u5de5\u667a\u80fd\u6559\u6750\u91cc\u63d0\u5230\uff1a\u5f3a\u5316\u5b66\u4e60\u53ef\u4ee5\u8bf4\u5305\u62ec\u4e86\u6574\u4e2a\u4eba\u5de5\u667a\u80fd\u3002\u6709\u7814\u7a76\u8868\u660e\uff0c\u8ba1\u7b97\u673a\u79d1\u5b66\u4e2d\u4efb\u4f55\u53ef\u4ee5\u8ba1\u7b97\u7684\u95ee\u9898\uff0c\u90fd\u53ef\u4ee5\u8868\u8fbe\u6210\u5f3a\u5316\u5b66\u4e60\u95ee\u9898\u3002<\/p>\n
\u672c\u4e66\u524d\u9762\u9996\u5148\u4ecb\u7ecd\u4e86\u5f3a\u5316\u5b66\u4e60\uff0c\u7136\u540e\u4ecb\u7ecd\u4e86\u5f3a\u5316\u5b66\u4e60\u5728\u6e38\u620f\u3001\u63a8\u8350\u7cfb\u7edf\u3001\u8ba1\u7b97\u673a\u7cfb\u7edf\u3001\u5065\u5eb7\u533b\u7597\u3001\u6559\u80b2\u3001\u91d1\u878d\u3001\u673a\u5668\u4eba\u3001\u4ea4\u901a\u3001\u80fd\u6e90\u3001\u5236\u9020\u7b49\u9886\u57df\u7684\u4e00\u4e9b\u5e94\u7528\u3002\u5e94\u8be5\u8bf4\uff0c\u8fd9\u91cc\u7684\u6bcf\u4e2a\u9886\u57df\u90fd\u6709\u5f88\u591a\u5de5\u4f5c\u3001\u5f88\u591a\u65b9\u5411\u6ca1\u6709\u8ba8\u8bba\uff0c\u53e6\u5916\u8fd8\u6709\u5f88\u591a\u9886\u57df\u6ca1\u6709\u5305\u62ec\u8fdb\u6765\uff1b\u96be\u514d\u6302\u4e00\u6f0f\u4e07\u3002\u4e0b\u56fe\u4e2d\u63cf\u8ff0\u4e86\u5f3a\u5316\u5b66\u4e60\u7684\u5e94\u7528\u9886\u57df\u53ca\u65b9\u5411\u3002\u53ef\u80fd\u7684\u5e94\u7528\u9886\u57df\u592a\u5e7f\u4e86\u3002
\u5f3a\u5316\u5b66\u4e60\u5728\u8ba1\u7b97\u673a\u7cfb\u7edf\u4e2d\u7684\u5404\u4e2a\u65b9\u5411\uff0c\u4ece\u5e95\u5c42\u7684\u82af\u7247\u8bbe\u8ba1\u3001\u786c\u4ef6\u7cfb\u7edf\uff0c\u5230\u64cd\u4f5c\u7cfb\u7edf\u3001\u7f16\u8bd1\u7cfb\u7edf\u3001\u6570\u636e\u5e93\u7ba1\u7406\u7cfb\u7edf\u7b49\u8f6f\u4ef6\u7cfb\u7edf\uff0c\u5230\u4e91\u8ba1\u7b97\u5e73\u53f0\u3001\u901a\u4fe1\u7f51\u7edc\u7cfb\u7edf\u7b49\u57fa\u7840\u8bbe\u65bd\uff0c\u5230\u6e38\u620f\u5f15\u64ce\u3001\u63a8\u8350\u7cfb\u7edf\u7b49\u5e94\u7528\u7a0b\u5e8f\uff0c\u5230\u8ba1\u7b97\u673a\u89c6\u89c9\u3001\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u673a\u5668\u5b66\u4e60\u4eba\u5de5\u667a\u80fd\u7cfb\u7edf\u672c\u8eab\uff0c\u90fd\u6709\u5e7f\u6cdb\u7684\u5e94\u7528\u3002<\/p>\n
\u5bf9\u4e8e\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\uff0c\u672c\u4e66\u6709\u6240\u6d89\u53ca\uff0c\u6bd4\u5982\u6e38\u620f\u4e2d\u6d89\u53ca\u5fc3\u7406\u5b66\u3001\u8bbe\u8ba1\u827a\u672f\u7b49\uff0c\u800c\u673a\u5668\u4eba\u3001\u4ea4\u901a\u3001\u80fd\u6e90\u3001\u5236\u9020\u7b49\u4e0e\u5de5\u7a0b\u5bc6\u5207\u76f8\u5173\u3002\u5e94\u8be5\u8bf4\uff0c\u5bf9\u4e8e\u5f3a\u5316\u5b66\u4e60\u5728\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u827a\u672f\u7b49\u65b9\u9762\u5e7f\u6cdb\u7684\u5e94\u7528\u573a\u666f\uff0c\u4ee5\u53ca\u8fd9\u4e9b\u9886\u57df\u5bf9\u5f3a\u5316\u5b66\u4e60\u7684\u53cd\u54fa\uff0c\u672c\u4e66\u7684\u6d89\u730e\u6709\u9650\u3002<\/p>\n
\u81ea\u7136\u79d1\u5b66\u53ca\u5de5\u7a0b\u7684\u95ee\u9898\uff0c\u4e00\u822c\u6bd4\u8f83\u5ba2\u89c2\uff0c\u6709\u6807\u51c6\u7b54\u6848\uff0c\u5bb9\u6613\u8bc4\u4f30\u3002\u5982\u679c\u6709\u6a21\u578b\u3001\u6bd4\u8f83\u51c6\u786e\u7684\u4eff\u771f\u3001\u6216\u5927\u91cf\u6570\u636e\uff0c\u5f3a\u5316\u5b66\u4e60\/\u673a\u5668\u5b66\u4e60\u5c31\u6709\u5e0c\u671b\u89e3\u51b3\u95ee\u9898\u3002AlphaGo\u662f\u8fd9\u79cd\u60c5\u51b5\u3002\u7ec4\u5408\u4f18\u5316\u3001\u8fd0\u7b79\u5b66\u3001\u6700\u4f18\u63a7\u5236\u3001\u836f\u5b66\u3001\u5316\u5b66\u3001\u57fa\u56e0\u7b49\u65b9\u5411\uff0c\u57fa\u672c\u7b26\u5408\u8fd9\u79cd\u60c5\u51b5\u3002\u793e\u4f1a\u79d1\u5b66\u53ca\u827a\u672f\u95ee\u9898\uff0c\u4e00\u822c\u5305\u542b\u4eba\u7684\u56e0\u7d20\uff0c\u4f1a\u53d7\u5fc3\u7406\u5b66\u3001\u884c\u4e3a\u79d1\u5b66\u7b49\u5f71\u54cd\uff0c\u4e00\u822c\u6bd4\u8f83\u4e3b\u89c2\uff0c\u4e0d\u4e00\u5b9a\u6709\u6807\u51c6\u7b54\u6848\uff0c\u4e0d\u4e00\u5b9a\u5bb9\u6613\u8bc4\u4f30\u3002\u6e38\u620f\u8bbe\u8ba1\u53ca\u8bc4\u4f30\u3001\u6559\u80b2\u7b49\u57fa\u672c\u7b26\u5408\u8fd9\u79cd\u60c5\u51b5\u3002\u5185\u5728\u52a8\u673a\u7b49\u5fc3\u7406\u5b66\u6982\u5ff5\u4e3a\u5f3a\u5316\u5b66\u4e60\/\u4eba\u5de5\u667a\u80fd\u4e0e\u793e\u4f1a\u79d1\u5b66\u53ca\u827a\u672f\u4e4b\u95f4\u642d\u5efa\u4e86\u8054\u7cfb\u7684\u6865\u6881\u3002<\/p>\n
\u6df1\u5ea6\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u5206\u522b\u4e8e2013\u5e74\u548c2017\u5e74\u88ab\u300a\u9ebb\u7701\u7406\u5de5\u5b66\u9662\u79d1\u6280\u8bc4\u8bba\u300b\u8bc4\u4e3a\u5f53\u5e7410\u9879\u7a81\u7834\u6027\u6280\u672f\u4e4b\u4e00\u3002\u6df1\u5ea6\u5b66\u4e60\u5df2\u7ecf\u88ab\u5e7f\u6cdb\u5e94\u7528\u3002\u5f3a\u5316\u5b66\u4e60\u4f1a\u5728\u5b9e\u9645\u5e94\u7528\u573a\u666f\u4e2d\u53d1\u6325\u8d8a\u6765\u8d8a\u91cd\u8981\u7684\u4f5c\u7528\u3002\u5f3a\u5316\u5b66\u4e60\u5df2\u7ecf\u88ab\u6210\u529f\u5e94\u7528\u4e8e\u6e38\u620f\u3001\u63a8\u8350\u7cfb\u7edf\u7b49\u9886\u57df\uff0c\u4e5f\u53ef\u80fd\u5df2\u7ecf\u6210\u529f\u5e94\u7528\u4e8e\u91cf\u5316\u91d1\u878d\u4e2d\u3002\u76ee\u524d\uff0c\u5f3a\u5316\u5b66\u4e60\u53ef\u80fd\u8fd8\u6ca1\u6709\u88ab\u5e7f\u6cdb\u5e94\u7528\u4e8e\u67d0\u4e9b\u573a\u666f\u7684\u4ea7\u54c1\u548c\u670d\u52a1\u4e2d\uff1b\u6211\u4eec\u4e5f\u5f88\u53ef\u80fd\u9700\u8981\u5bf9\u4e0d\u540c\u60c5\u51b5\u505a\u4e0d\u540c\u7684\u5206\u6790\u3002\u4e0d\u8fc7\uff0c\u5982\u679c\u8003\u8651\u957f\u671f\u56de\u62a5\uff0c\u73b0\u5728\u5f88\u53ef\u80fd\u662f\u57f9\u517b\u3001\u6559\u80b2\u3001\u5f15\u9886\u5f3a\u5316\u5b66\u4e60\u5e02\u573a\u7684\u7edd\u4f73\u65f6\u673a\u3002\u6211\u4eec\u4f1a\u770b\u5230\u6df1\u5ea6\u5b66\u4e60\u548c\u5f3a\u5316\u5b66\u4e60\u5927\u653e\u5f02\u5f69\u3002<\/p>\n
\u5341\u56db\u3001\u6ce8\u91ca\u53c2\u8003\u6587\u732e<\/h3>\n
Sutton and Barto (2018) \u662f\u5f3a\u5316\u5b66\u4e60\u7684\u9996\u9009\u6559\u6750\uff0c\u800c\u4e14\u5199\u7684\u5f88\u76f4\u89c2\u3002Szepesvari (2010) \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u7b97\u6cd5\u3002 Bertsekas (2019) \u4ecb\u7ecd\u4e86\u5f3a\u5316\u5b66\u4e60\u548c\u6700\u4f18\u63a7\u5236\u3002Bertsekas and Tsitsiklis (1996) \u8ba8\u8bba\u4e86\u795e\u7ecf\u5143\u52a8\u6001\u89c4\u5212\uff0c\u7406\u8bba\u6027\u6bd4\u8f83\u5f3a\u3002Powell (2011) \u8ba8\u8bba\u4e86\u8fd1\u4f3c\u52a8\u6001\u89c4\u5212\uff0c\u53ca\u5176\u5728\u8fd0\u7b79\u5b66\u4e2d\u7684\u5e94\u7528\u3002Powell (2019) \u548c Recht (2019) \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u4e0e\u6700\u4f18\u63a7\u5236\u7684\u5173\u7cfb\u3002Botvinick et al. (2019) \u8ba8\u8bba\u4e86\u5f3a\u5316\u5b66\u4e60\u4e0e\u8ba4\u77e5\u79d1\u5b66\u3001\u5fc3\u7406\u5b66\u3001\u795e\u7ecf\u79d1\u5b66\u7684\u5173\u7cfb\u3002<\/p>\n
Csaba Szepesvari\u5728ACM KDD 2020 \u6df1\u5ea6\u5b66\u4e60\u65e5\u4e0a\u5bf9\u5f3a\u5316\u5b66\u4e60\u505a\u4e86\u5168\u65b9\u4f4d\u7684\u6df1\u5165\u5256\u6790\uff0c\u7406\u6e05\u4e86\u8bb8\u591a\u9519\u8bef\u89c2\u5ff5\uff1b\u53c2\u89c1Szepesvari (2020)\u3002\u7b14\u8005\u5bf9\u5176\u505a\u4e86\u53cc\u8bed\u89e3\u8bfb\uff0c\u53c2\u89c1\u300a\u5f3a\u5316\u5b66\u4e60\u7684\u201c\u795e\u8bdd\u201d\u548c\u201c\u9b3c\u8bdd\u201d\u300b\uff0chttps:\/\/zhuanlan.zhihu.com\/p\/\u3002<\/p>\n
Goodfellow et al. (2016)\u3001Zhang et al. (2020) \u4ecb\u7ecd\u6df1\u5ea6\u5b66\u4e60\u3002\u5468\u5fd7\u534e(2016)\u3001\u674e\u822a(2019)\u4ecb\u7ecd\u673a\u5668\u5b66\u4e60\u3002Russell and Norvig (2009) \u4ecb\u7ecd\u4e86\u4eba\u5de5\u667a\u80fd\u3002 \u5f20\u94b9\u7b49(2020) \u8ba8\u8bba\u7b2c\u4e09\u4ee3\u4eba\u5de5\u667a\u80fd\u3002<\/p>\n
Mnih et al. (2015) \u4ecb\u7ecd\u4e86\u6df1\u5ea6Q\u7f51\u7edc (Deep Q-Network, DQN)\u3002Badia et al. (2020)\u8ba8\u8bba\u4e86Agent57. Silver et al. (2016) \u4ecb\u7ecd\u4e86AlphaGo. Silver et al. (2017) \u4ecb\u7ecd\u4e86AlphaGo Zero\uff1b\u53ef\u4ee5\u4e0d\u7528\u4eba\u7c7b\u77e5\u8bc6\u5c31\u80fd\u638c\u63e1\u56f4\u68cb\uff0c\u8d85\u8d8a\u4eba\u7c7b\u56f4\u68cb\u6c34\u5e73\u3002Silver et al. (2018) \u4ecb\u7ecd\u4e86AlphaZero, \u628aAlphaGo Zero\u6269\u5c55\u5230\u56fd\u9645\u8c61\u68cb\u548c\u65e5\u672c\u5c06\u68cb\u7b49\u66f4\u591a\u6e38\u620f\u3002Tian et al. (2019) \u5b9e\u73b0\u3001\u5206\u6790\u4e86AlphaZero\uff0c\u5e76\u63d0\u4f9b\u4e86\u5f00\u6e90\u8f6f\u4ef6\u3002Moravcik et al. (2017) \u4ecb\u7ecd\u4e86DeepStack\uff1bBrown and Sandholm (2017) \u4ecb\u7ecd\u4e86Libratus\uff1b\u662f\u4e24\u4e2a\u65e0\u9650\u6ce8\u53cc\u4eba\u5fb7\u5dde\u6251\u514b\u8ba1\u7b97\u673a\u7b97\u6cd5\u3002<\/p>\n
Vinyals et al. (2019)\u4ecb\u7ecd\u4e86AlphaStar\uff0c\u6253\u8d25\u4e86\u661f\u9645\u4e89\u9738\u4eba\u7c7b\u9ad8\u624b\u3002 Jaderberg et al. (2018) \u4ecb\u7ecd\u4e86\u53d6\u5f97\u4eba\u7c7b\u6c34\u5e73\u7684\u593a\u65d7\u7a0b\u5e8f\u3002 OpenAI (2019)\u4ecb\u7ecd\u4e86OpenAI Five\uff0c\u6253\u8d25\u4e86\u5200\u5854\u4eba\u7c7b\u9ad8\u624b\u3002\u5fae\u8f6f\u5728\u9ebb\u5c06\u65b9\u9762\u53d6\u5f97\u4e86\u8fdb\u5c55(Suphx)\u3002\u51b0\u58f6(curling)\u88ab\u79f0\u4e3a\u51b0\u4e0a\u56fd\u9645\u8c61\u68cb\uff0c\u6700\u8fd1\u4e5f\u6709\u8fdb\u5c55(Curly)\u3002\u8fd9\u4e9b\u5728\u591a\u73a9\u5bb6\u6e38\u620f\u4e0a\u53d6\u5f97\u7684\u6210\u7ee9\u8868\u660e\u5f3a\u5316\u5b66\u4e60\u5728\u56e2\u961f\u6e38\u620f\u4e2d\u5bf9\u6218\u672f\u548c\u6218\u7565\u5df2\u7ecf\u6709\u4e86\u4e00\u5b9a\u7684\u638c\u63e1\u3002<\/p>\n
OpenAI (2018)\u4ecb\u7ecd\u4e86\u4eba\u5f62\u673a\u5668\u624bDactyl\uff0c\u7528\u6765\u7075\u5de7\u5730\u64cd\u7eb5\u5b9e\u7269\u3002Hwangbo et al. (2019)\u3001Lee et al. (2020) \u4ecb\u7ecd\u4e86\u7075\u6d3b\u7684\u56db\u8db3\u673a\u5668\u4eba\u3002Peng et al. (2018) \u4ecb\u7ecd\u4e86\u4eff\u771f\u4eba\u5f62\u673a\u5668 DeepMimic\u5b8c\u6210\u9ad8\u96be\u5ea6\u6742\u6280\u822c\u7684\u52a8\u4f5c\u3002Lazic et al. (2018) \u7814\u7a76\u4e86\u6570\u636e\u4e2d\u5fc3\u5236\u51b7\u3002Segler et al. (2018) \u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5316\u5b66\u5206\u5b50\u9006\u5408\u6210\u3002Popova et al. (2018) \u628a\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5168\u65b0\u836f\u7269\u8bbe\u8ba1\u3002\u7b49\u7b49\u3002<\/p>\n
DQN\u7ed3\u5408\u4e86Q\u5b66\u4e60\u548c\u6df1\u5ea6\u795e\u7ecf\u5143\u7f51\u7edc\uff0c\u4f7f\u7528\u4e86\u7ecf\u9a8c\u56de\u653e (experience replay) \u548c\u76ee\u6807\u7f51\u7edc (target network) \u6280\u672f\u6765\u7a33\u5b9a\u8bad\u7ec3\u8fc7\u7a0b\u3002\u5728\u7ecf\u9a8c\u56de\u653e\u4e2d\uff0c\u7ecf\u9a8c\u88ab\u5b58\u50a8\u5728\u56de\u653e\u7f13\u51b2\u5668\u4e2d\uff0c\u7136\u540e\u968f\u673a\u6837\u672c\u7528\u4e8e\u5b66\u4e60\u3002\u76ee\u6807\u7f51\u7edc\u4fdd\u7559\u4e00\u4efd\u5355\u72ec\u7684\u7f51\u7edc\u53c2\u6570\uff0c\u7528\u4e8e\u5728\u5b66\u4e60\u4e2d\u4f7f\u7528\u7684\u7f51\u7edc\u53c2\u6570\uff1b\u76ee\u6807\u7f51\u7edc\u5b9a\u671f\u66f4\u65b0\uff0c\u5374\u5e76\u975e\u6bcf\u4e2a\u8bad\u7ec3\u8fed\u4ee3\u6b65\u9aa4\u90fd\u66f4\u65b0\u3002Mnih et al. (2016) \u4ecb\u7ecd\u4e86\u5f02\u6b65\u4f18\u52bf\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005\u7b97\u6cd5(Asynchronous Advantage Actor-Critic, A3C), \u5176\u4e2d\u5e76\u884c\u7684\u884c\u52a8\u8005\u4f7f\u7528\u4e0d\u540c\u7684\u63a2\u7d22\u65b9\u6cd5\u6765\u7a33\u5b9a\u8bad\u7ec3\uff0c\u800c\u5e76\u6ca1\u6709\u4f7f\u7528\u7ecf\u9a8c\u56de\u653e\u3002\u786e\u5b9a\u7b56\u7565\u68af\u5ea6\u53ef\u4ee5\u5e2e\u52a9\u66f4\u9ad8\u6548\u5730\u4f30\u8ba1\u7b56\u7565\u68af\u5ea6\u3002Silver et al. (2014) \u4ecb\u7ecd\u4e86\u786e\u5b9a\u7b56\u7565\u68af\u5ea6 (Deterministic Policy Gradient, DPG)\uff1bLillicrap et al. (2016) \u5c06\u5b83\u6269\u5c55\u4e3a\u6df1\u5ea6\u786e\u5b9a\u7b56\u7565\u68af\u5ea6 (Deep Deterministic Policy Gradient, DDPG)\u3002\u53ef\u4fe1\u533a\u57df\u65b9\u6cd5\u5bf9\u68af\u5ea6\u66f4\u65b0\u8bbe\u7f6e\u4e86\u7ea6\u675f\u6761\u4ef6\uff0c\u7528\u6765\u7a33\u5b9a\u7b56\u7565\u4f18\u5316\u3002Schulman et al. (2015)\u4ecb\u7ecd\u4e86\u53ef\u4fe1\u533a\u57df\u7b56\u7565\u4f18\u5316\u7b97\u6cd5 (Trust Region Policy Optimization, TRPO)\uff1bSchulman et al. (2017)\u4ecb\u7ecd\u4e86\u8fd1\u7aef\u7b56\u7565\u4f18\u5316\u7b97\u6cd5 (Proximal Policy Optimization, PPO)\u3002Haarnoja et al. (2018)\u4ecb\u7ecd\u4e86\u8f6f\u884c\u52a8\u8005-\u8bc4\u4ef7\u8005(Soft Actor Critic)\u7b97\u6cd5\u30022020\u5e74\u8c37\u6b4cDeepmind\u8bbe\u8ba1\u4e86Agent57\u7b97\u6cd5\uff0c\u53ef\u4ee5\u572857\u4e2a\u96c5\u8fbe\u5229\u6e38\u620f\u4e0a\u90fd\u53d6\u5f97\u975e\u5e38\u597d\u7684\u6210\u7ee9\u3002\u800c\u4e4b\u524d\u5728\u51e0\u6b3e\u6e38\u620f\u4e0a\uff0c\u6bd4\u5982Montezuma\u2019s Revenge, Pitfall, Solaris\u548cSkiing\u4e0a\uff0c\u6210\u7ee9\u603b\u5dee\u5f3a\u4eba\u610f\u3002Agent57\u878d\u5408\u4e86DQN\u4e4b\u540e\u7684\u5f88\u591a\u8fdb\u5c55\uff0c\u5305\u62ec\u5206\u5e03\u5f0f\u5b66\u4e60\u3001\u77ed\u671f\u8bb0\u5fc6\u3001\u7247\u6bb5\u5f0f\u8bb0\u5fc6\u3001\u7528\u5185\u5728\u52a8\u673a\u65b9\u6cd5\u9f13\u52b1\u76f4\u63a5\u63a2\u7d22(\u5305\u62ec\u5728\u957f\u65f6\u95f4\u5c3a\u5ea6\u4e0a\u548c\u77ed\u65f6\u95f4\u5c3a\u5ea6\u4e0a\u8ffd\u6c42\u65b0\u9896\u6027)\u3001\u8bbe\u8ba1\u5143\u63a7\u5236\u5668\uff0c\u7528\u6765\u5b66\u4e60\u5982\u4f55\u5e73\u8861\u63a2\u7d22\u548c\u5229\u7528\u3002<\/p>\n
\u503c\u5f97\u5173\u6ce8Pieter Abbeel, Dimitri Bertsekas, Emma Brunskill, Chelsea Finn, Leslie Kaelbling, Lihong Li, Michael Littman, Joelle Pineau, Doina Precup, Juergen Schmidhuber, David Silver, Satinder Singh, Dale Schuurmans, Peter Stone, Rich Sutton, Csaba Szepesvari\u7b49\u7814\u7a76\u4eba\u5458\uff0c\u4ee5\u53ca\u50cfCMU, Deepmind, Facebook, Google, Microsoft, MIT, OpenAI, Stanford, University of Alberta, UC Berkeley\u7b49\u7814\u7a76\u673a\u6784\u5728\u5f3a\u5316\u5b66\u4e60\u65b9\u9762\u7684\u5de5\u4f5c\u3002<\/p>\n
Amershi et al. (2019)\u8ba8\u8bba\u4e86\u673a\u5668\u5b66\u4e60\u4e2d\u7684\u8f6f\u4ef6\u5de5\u7a0b\uff1b\u5f88\u53ef\u80fd\u5bf9\u5f3a\u5316\u5b66\u4e60\u4e5f\u6709\u5e2e\u52a9\u3002\u4f5c\u8005\u5c55\u793a\u4e86\u673a\u5668\u5b66\u4e60\u5de5\u4f5c\u6d41\u76849\u4e2a\u9636\u6bb5\uff1a\u6a21\u578b\u9700\u6c42\u3001\u6570\u636e\u6536\u96c6\u3001\u6570\u636e\u6e05\u6d17\u3001\u6570\u636e\u6807\u6ce8\u3001\u7279\u5f81\u5de5\u7a0b\u3001\u6a21\u578b\u8bad\u7ec3\u3001\u6a21\u578b\u8bc4\u4f30\u3001\u6a21\u578b\u90e8\u7f72\u3001\u4ee5\u53ca\u6a21\u578b\u76d1\u89c6\u3002\u5728\u5de5\u4f5c\u6d41\u4e2d\u6709\u5f88\u591a\u53cd\u9988\u56de\u8def\uff0c\u6bd4\u5982\uff0c\u5728\u6a21\u578b\u8bad\u7ec3\u548c\u7279\u5f81\u5de5\u7a0b\u4e4b\u95f4\uff1b\u800c\u6a21\u578b\u8bc4\u4f30\u548c\u6a21\u578b\u76d1\u89c6\u53ef\u80fd\u4f1a\u56de\u5230\u524d\u9762\u4efb\u4f55\u4e00\u4e2a\u9636\u6bb5\u3002\u4f5c\u8005\u4e5f\u6307\u51fa\u4eba\u5de5\u667a\u80fd\u4e2d\u7684\u8f6f\u4ef6\u5de5\u7a0b\u4e0e\u4ee5\u524d\u8f6f\u4ef6\u5e94\u7528\u4e2d\u7684\u8f6f\u4ef6\u5de5\u7a0b\u7684\u4e09\u4e2a\u4e0d\u540c\uff1a1\uff09\u53d1\u73b0\u6570\u636e\u3001\u7ba1\u7406\u6570\u636e\u3001\u4e3a\u6570\u636e\u786e\u5b9a\u7248\u672c\u53f7\u66f4\u590d\u6742\u3001\u66f4\u56f0\u96be\uff1b2\uff09\u6a21\u578b\u5b9a\u5236\u548c\u6a21\u578b\u91cd\u7528\u90fd\u9700\u8981\u4e0d\u540c\u7684\u6280\u80fd\uff1b3\uff09\u4eba\u5de5\u667a\u80fd\u7ec4\u6210\u90e8\u5206\u7f3a\u5c11\u6a21\u5757\u5316\u3001\u590d\u6742\u7684\u65b9\u5f0f\u7ea0\u7f20\u5728\u4e00\u8d77\u3002<\/p>\n
\u73b0\u5b9e\u4e16\u754c\u4e2d\u5f3a\u5316\u5b66\u4e60\u9762\u4e34\u7684\u6311\u6218\u7684\u8ba8\u8bba\u57fa\u4e8eDulac-Arnold et al. (2020) \u3002\u673a\u5668\u4eba\u9ad8\u6548\u5b66\u4e60\u7684\u57fa\u7840\u7684\u8ba8\u8bba\u57fa\u4e8eKaelbling (2020)\u3002\u91cc\u9762\u63d0\u5230\u4e24\u7bc7\u535a\u5ba2\uff1aSutton (2019) The bitter lesson \u548c Brooks (2019)A better lesson. \u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u4e8e\u5065\u5eb7\u7684\u53c2\u8003\u539f\u5219\u57fa\u4e8eGottesman et al. (2019)\u3002Wiens et al. (2019)\u8ba8\u8bba\u4e86\u5728\u5065\u5eb7\u533b\u7597\u4e2d\u5e94\u7528\u673a\u5668\u5b66\u4e60\u5982\u4f55\u505a\u5230\u8d1f\u8d23\u4efb\u3002\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\uff1a\u4eba\u5de5\u667a\u80fd\u516c\u53f8\u4ee3\u8868\u4e00\u79cd\u65b0\u7684\u5546\u4e1a\u6a21\u5f0f\u7684\u8ba8\u8bba\u57fa\u4e8eCasado and Bornstein (2020)\u3002\u4eba\u5de5\u667a\u80fd\u521b\u4e1a\uff1a\u5f25\u8865\u6982\u5ff5\u9a8c\u8bc1\u4e0e\u4ea7\u54c1\u7684\u5dee\u8ddd\u7684\u8ba8\u8bba\u57fa\u4e8eNg (2020)\u3002\u53e6\u5916\uff0cAlharin et al. (2020), Belle and Papantonis (2020), Lipton (2018) \u7b49\u8ba8\u8bba\u53ef\u89e3\u91ca\u6027\u3002<\/p>\n
Li (2017) \u662f\u6df1\u5ea6\u5f3a\u5316\u5b66\u4e60\u7efc\u8ff0\uff0c\u517c\u987e\u4e86\u8be5\u9886\u57df\u7684\u5927\u65b9\u5411\u548c\u7ec6\u8282\uff0c\u5728\u5386\u53f2\u53d1\u5c55\u7684\u80cc\u666f\u4e0b\u8ba8\u8bba\u4e86\u6700\u65b0\u7684\u8fdb\u5c55\u3002Li (2017) \u8ba8\u8bba\u4e86\u516d\u4e2a\u6838\u5fc3\u5143\u7d20\uff1a\u503c\u51fd\u6570\u3001\u7b56\u7565\u3001\u5956\u8d4f\u3001\u6a21\u578b\u3001\u63a2\u7d22-\u5229\u7528\u3001\u4ee5\u53ca\u8868\u5f81\uff1b\u8ba8\u8bba\u4e86\u516d\u4e2a\u91cd\u8981\u673a\u5236\uff1a\u6ce8\u610f\u529b\u6a21\u578b\u548c\u5b58\u50a8\u5668\u3001\u65e0\u76d1\u7763\u5b66\u4e60\u3001\u5206\u5c42\u5f3a\u5316\u5b66\u4e60\u3001\u591a\u667a\u80fd\u4f53\u5f3a\u5316\u5b66\u4e60\u3001\u5173\u7cfb\u5f3a\u5316\u5b66\u4e60\u3001\u4ee5\u53ca\u5143\u5b66\u4e60\uff1b\u8ba8\u8bba\u4e8612\u4e2a\u5e94\u7528\u573a\u666f\uff1a\u6e38\u620f\u3001\u673a\u5668\u4eba\u3001\u81ea\u7136\u8bed\u8a00\u5904\u7406\u3001\u8ba1\u7b97\u673a\u89c6\u89c9\u3001\u91d1\u878d\u3001\u5546\u4e1a\u7ba1\u7406\u3001\u533b\u7597\u3001\u6559\u80b2\u3001\u80fd\u6e90\u3001\u4ea4\u901a\u3001\u8ba1\u7b97\u673a\u7cfb\u7edf\u3001\u4ee5\u53ca\u79d1\u5b66\u3001\u5de5\u7a0b\u3001\u548c\u827a\u672f\u3002<\/p>\n
\u53c2\u8003\u6587\u732e\uff1a<\/h3>\n
Alharin, A., Doan, T.-N., and Sartipi, M. (2020). Reinforcement learning interpretation methods: A survey. IEEE Access, 8: \u2013 .<\/p>\n
Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., and Zimmermann, T. (2019). Software engineering for machine learning: A case study. In ICSE.<\/p>\n
Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., and Guo, D. (2020). Agent57: Outperforming the atari human benchmark. ArXiv.<\/p>\n
Belle, V. and Papantonis, I. (2020). Principles and practice of explainable machine learning. AXiv.<\/p>\n
Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., and Hassabis, D. (2019). Reinforcement learning, fast and slow. Trends in Cognitive Sciences, 23(5):408\u2013422.<\/p>\n
Brooks, R. (2019). A better lesson. https:\/\/rodneybrooks.com\/a-better-lesson\/<\/p>\n
Brown, N. and Sandholm, T. (2017). Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science.<\/p>\n
Casado, M. and Bornstein, M. (2020). The new business of AI (and how its
different from traditional software). https:\/\/a16z.com\/2020\/02\/16\/ the-new-business-of-ai-and-how-its-different-from-traditional-software\/.<\/p>\n
Dulac-Arnold, G., Levine, N., Mankowitz, D. J., Li, J., Paduraru, C., Gowal, S., and Hester, T. (2020). An empirical investigation of the challenges of real-world reinforcement learning. ArXiv.<\/p>\n
Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., and Celi, L. A. (2019). Guidelines for reinforcement learning in healthcare. Nature Medicine, 25:14\u201318.<\/p>\n
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.<\/p>\n
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML.<\/p>\n
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26).<\/p>\n
Kaelbling, L. P. (2020). The foundation of efficient robot learning. Science, 369(6506):915\u2013916.<\/p>\n
Lazic, N., Boutilier, C., Lu, T., Wong, E., Roy, B., Ryu, M., and Imwalle, G. (2018). Data center cooling using model-predictive control. In NeurIPS.<\/p>\n
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., and Hutter, M. (2020). Learning quadrupedal locomotion over challenging terrain. Science Robotics.<\/p>\n
Li, Y. (2017). Deep Reinforcement Learning: An Overview. ArXiv.<\/p>\n
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016). Continuous control with deep reinforcement learning. In ICLR.<\/p>\n
Lipton, Z. C. (2018). The mythos of model interpretability. ACM Queue, 16(3):31\u201357.<\/p>\n
Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Garcia Castaneda, A., Beat- tie, C., Rabinowitz, N. C., Morcos, A. S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J. Z., Silver, D., Hassabis, D., Kavukcuoglu, K., and Graepel, T. (2018). Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. ArXiv.<\/p>\n
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Harley, T., Lillicrap, T. P., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In ICML.<\/p>\n
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529\u2013533.<\/p>\n
Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. (2017). Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508\u2013513.<\/p>\n
Ng, A. (2020). Bridging AI\u2019s proof-of-concept to production gap. https:\/\/www.youtube. com\/watch?v=tsPuVAMaADY.<\/p>\n
OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., and Zaremba, W. (2018). Learning dexterous in-hand manipulation. ArXiv.<\/p>\n
OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Jozefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., de Oliveira Pinto, H. P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. ArXiv.<\/p>\n
Peng, X. B., Abbeel, P., Levine, S., and van de Panne, M. (2018). Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. In SIGGRAPH.<\/p>\n
Popova, M., Isayev, O., and Tropsha, A. (2018). Deep reinforcement learning for de novo drug design. Science Advances, 4(7).<\/p>\n
Powell, W. B. (2011). Approximate Dynamic Programming: Solving the curses of dimensionality (2nd Edition). John Wiley and Sons.<\/p>\n
Powell, W. B. (2019). From reinforcement learning to optimal control: A unified framework for sequential decisions. Arxiv.<\/p>\n
Recht, B. (2019). A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 1:253\u2013279.<\/p>\n
Russell, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach (3rd edition). Pearson.<\/p>\n
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. (2015). Trust region policy optimization. In ICML.<\/p>\n
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. ArXiv.<\/p>\n
Segler, M. H. S., Preuss, M., and Waller, M. P. (2018). Planning chemical syntheses with deep neural networks and symbolic AI. Nature, 555:604\u2013610.<\/p>\n
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484\u2013489.<\/p>\n
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In ICML.<\/p>\n
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140\u20131144.<\/p>\n
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In ICML.<\/p>\n
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550:354\u2013359.<\/p>\n
Sutton, R. (2019). The bitter lesson. http:\/\/incompleteideas.net\/IncIdeas\/ BitterLesson.html.<\/p>\n
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd Edition). MIT Press.<\/p>\n
Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Morgan & Claypool.<\/p>\n
Szepesvari, C. (2020). Myths and misconceptions in rl. https:\/\/sites.ualberta.ca\/ \u0303szepesva\/talks.html. KDD 2020 Deep Learning Day.<\/p>\n
Tian, Y., Ma, J., Gong, Q., Sengupta, S., Chen, Z., Pinkerton, J., and Zitnick, C. L. (2019). ELF OpenGo: An analysis and open reimplementation of AlphaZero. In ICML.<\/p>\n
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R., Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy, J., Paine, T. L., Gulcehre, C., Wang, Z., Pfaff, T., Wu, Y., Ring, R., Yogatama, D., Wunsch, D., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Kavukcuoglu, K., Hassabis, D., Apps, C., and Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575:350\u2013354.<\/p>\n
Wiens, J., Saria, S., Sendak, M., Ghassemi, M., Liu, V. X., Doshi-Velez, F., Jung, K., Heller, K., Kale, D., Saeed, M., Ossorio, P. N., Thadaney-Israni, S., and Goldenberg, A. (2019). Do no harm: a roadmap for responsible machine learning for health care. Nature Medicine, 25:1337\u20131340.<\/p>\n
Zhang, A., Lipton, Z. C., Li, M., and Smola, A. J. (2020). Dive into Deep Learning. https: \/\/d2l.ai.<\/p>\n
\u674e\u822a. (2019). \u7edf\u8ba1\u5b66\u4e60\u65b9\u6cd5(\u7b2c\u4e8c\u7248). \u6e05\u534e\u5927\u5b66\u51fa\u7248\u793e.<\/p>\n
\u5f20\u94b9, \u6731\u519b, \u82cf\u822a. \u8fc8\u5411\u7b2c\u4e09\u4ee3\u4eba\u5de5\u667a\u80fd. \u4e2d\u56fd\u79d1\u5b66: \u4fe1\u606f\u79d1\u5b66, 2020, 50: 1281\u20131302, doi: 10.1360\/SSI-2020-0204 Zhang B, Zhu J, Su H. Toward the third generation of artificial intelligence (in Chinese). Sci Sin Inform, 2020, 50: 1281\u20131302, doi: 10.1360\/SSI-2020-0204<\/p>\n
\u5468\u5fd7\u534e. (2016). \u673a\u5668\u5b66\u4e60. \u6e05\u534e\u5927\u5b66\u51fa\u7248\u793e<\/p>\n
\u6ce8\uff1a\u7531\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c\uff0c\u9996\u53d1\u4e8e\u77e5\u4e4e\u300a\u5f3a\u5316\u667a\u80fd(RLAI)\u300b\u4e13\u680f\uff0chttps:\/\/www.zhihu.com\/column\/c_<\/p>\n","protected":false},"excerpt":{"rendered":"\u5f3a\u5316\u5b66\u4e60\u5e94\u7528\u7b80\u8ff0---\u5f3a\u5316\u5b66\u4e60\u65b9\u5411\u4f18\u79c0\u79d1\u5b66\u5bb6\u674e\u7389\u559c\u535a\u58eb\u521b\u4f5c\u5f3a\u5316\u5b66\u4e60(reinforcementlearning)\u7ecf\u8fc7\u4e86\u51e0\u5341\u5e74\u7684\u7814\u53d1\uff0c\u5728\u4e00\u76f4\u7a33\u5b9a...","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"_links":{"self":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/posts\/7612"}],"collection":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/comments?post=7612"}],"version-history":[{"count":0,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/posts\/7612\/revisions"}],"wp:attachment":[{"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/media?parent=7612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/categories?post=7612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mushiming.com\/wp-json\/wp\/v2\/tags?post=7612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}