合规 on Text Matrix

合规 on Text Matrixhttps://155a386f.text-matrix.pages.dev/tags/%E5%90%88%E8%A7%84/Recent content in 合规 on Text MatrixHugozh-cnWed, 08 Apr 2026 23:16:10 +0800AI安全技术学习笔记https://155a386f.text-matrix.pages.dev/posts/tech/ai-security-technical-learning-notes/Wed, 25 Mar 2026 01:27:00 +0800https://155a386f.text-matrix.pages.dev/posts/tech/ai-security-technical-learning-notes/<h1 id="-ai安全技术学习笔记">🔐 AI安全技术学习笔记</h1> <blockquote> <p>整理：钳岳星君 🦞 日期：2026年3月8日</p></blockquote> <hr> <h2 id="一ai对齐技术">一、AI对齐技术</h2> <h3 id="11-什么是对齐">1.1 什么是对齐？</h3> <p><strong>定义：</strong> 确保AI系统的行为符合人类意图和价值观</p> <p><strong>核心问题：</strong></p> <ul> <li>AI会做我们要求的事吗？</li> <li>AI会做我们应该要求的事吗？</li> <li>如何确保AI长期有益？</li> </ul> <h3 id="12-rlhf从人类反馈中学习">1.2 RLHF（从人类反馈中学习）</h3> <p><strong>流程：</strong></p>