原贴链接

嘿，r/LocalLLaMA的各位！如果你们在使用QwQ-32B时遇到无限重复的情况，你们不是一个人！我写了一个指南来帮助调试相关问题！我还上传了动态4位量化（dynamic 4bit quants）和其他GGUF文件！指南链接：https://docs.unsloth.ai/basics/tutorial - how - to - run - qwq - 32b - effectively。1. 当使用重复惩罚（repetition penalties）来对抗循环时，它反而会导致循环！2. Qwen团队确认，对于长文本（128K），应该使用YaRN。3. 当使用重复惩罚时，添加“–samplers “top_k;top_p;min_p;temperature;dry;typ_p;xtc””来停止无限生成。4. 使用“min_p = 0.1”有助于去除低概率的标记（tokens）。5. 尝试使用“–repeat - penalty 1.1 –dry - multiplier 0.5”来减少重复。6. 请按照Qwen团队建议使用“–temp 0.6 –top - k 40 –top - p 0.95”。例如我在llama.cpp中的设置效果很好 - 使用了我之前在这里介绍的DeepSeek R1 1.58位Flappy Bird测试：https://www.reddit.com/r/LocalLLaMA/comments/1ibbloy/158bit_deepseek_r1_131gb_dynamic_gguf/。运行命令：./llama.cpp/llama - cli
–model unsloth - QwQ - 32B - GGUF/QwQ - 32B - Q4_K_M.gguf
–threads 32
–ctx - size 16384
–n - gpu - layers 99
–seed 3407
–prio 2
–temp 0.6
–repeat - penalty 1.1
–dry - multiplier 0.5
–min - p 0.1
–top - k 40
–top - p 0.95
- no - cnv
–samplers “top_k;top_p;min_p;temperature;dry;typ_p;xtc”
–prompt “<|im_start|>user\nCreate a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird’s shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don’t hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.<|im_end|>\n<|im_start|>assistant\n\n”。我还将QwQ的动态4位量化上传到了https://huggingface.co/unsloth/QwQ - 32B - unsloth - bnb - 4bit，自0.7.3版本起它与vLLM直接兼容。QwQ量化错误链接：https://llminfo.image.fangd123.cn/images/w65lgkmh5ane1.png!/format/webp。模型链接：* [QwQ - 32B GGUFs](https://huggingface.co/unsloth/QwQ - 32B - GGUF)；* [QwQ - 32B动态4位](https://huggingface.co/unsloth/QwQ - 32B - unsloth - bnb - 4bit)；* [QwQ - 32B bitsandbytes 4位](https://huggingface.co/unsloth/QwQ - 32B - bnb - 4bit)；* [QwQ - 32B 16位](https://huggingface.co/unsloth/QwQ - 32B)。我在https://docs.unsloth.ai/basics/tutorial - how - to - run - qwq - 32b - effectively这里写了更多我的发现细节并制作了一个指南。谢谢！

讨论总结

原帖主要是关于QwQ - 32B无限生成的修复、最佳实践和bug修复的内容，分享了一些如重复惩罚、特定参数设置等技术要点。评论中涵盖了各种话题，包括技术操作、对原帖的赞扬感谢、问题求助、对原帖部分内容的反对等，整体氛围积极且技术交流性强。

主要观点

👍 原帖作者成果显著且为开源社区贡献大
- 支持理由：原帖提供了QwQ - 32B的有效信息，评论者认可其见解和成果，感谢为开源社区带来价值。
- 反对声音：无
🔥 使用特定聊天模板对运行QwQ - 32B很重要
- 正方观点：遵循模板有助于运行QwQ - 32B。
- 反方观点：无
💡 解决QwQ - 32B问题的操作复杂，应由相关团队处理
- 解释：评论者认为使用者不应该承担修复模型问题的复杂操作，应由相关团队解决后再发布。
👍 原帖内容有助于减少推理部分
- 支持理由：原帖提供的内容使表达更直接，避免冗长表述。
- 反对声音：无
🔥 不认可使用token samplers掩盖模型故障
- 正方观点：应正视模型故障，而非掩盖。
- 反方观点：无

金句与有趣评论

“😂 Amazing insight and work, thanks once more for giving the OSS community all this value and effort!”
- 亮点：高度赞扬原帖作者，体现出原帖在开源社区的价值。
“🤔 Oh I forgot - remember to follow the chat template exactly: <|im_start|>user\\nCreate a Flappy Bird game in Python.<|im_end|>\\n<|im_start|>assistant\\n<think>\\n”
- 亮点：强调了遵循聊天模板的重要性。
“👀 if using llama - server, command line parameters are overriden by incoming http request params.”
- 亮点：指出llama - server中一个重要的参数设置问题。
“😂 I have been having a lot of luck with those settings as well, i kind of like temp 0.4 tho just wanted to share my findings.”
- 亮点：分享了自己使用相关设置时的不同温度值尝试。
“🤔 This seems so complicated.”
- 亮点：表达了原帖内容复杂的直观感受。

情感分析

总体情感倾向是积极的。主要分歧点在于对解决QwQ - 32B问题操作的复杂性的看法，以及对使用token samplers掩盖模型故障的态度。积极情感的原因是原帖提供了有用的信息，得到许多感谢和赞扬；而负面情感源于部分使用者对操作复杂性的抱怨和对处理模型故障方式的不认可。

趋势与预测

新兴话题：动态GGUFs的发展以及适合特定GPU内存（如24Gbyte）的类型。
潜在影响：对QwQ - 32B的优化和改进可能会影响相关模型在不同应用场景下的性能和效率，对开源社区的发展也可能产生推动作用，促使更多人参与到类似技术问题的讨论和解决中。

详细内容：

标题：关于 QwQ-32B 无限生成问题的讨论与解决方案

在 Reddit 的 LocalLLaMA 板块，一篇有关 QwQ-32B 无限重复生成问题的帖子引起了广泛关注。该帖提供了一系列调试方法和相关设置，还分享了多个链接，如https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively，吸引了众多用户参与讨论，评论数众多。

讨论的焦点集中在如何有效解决 QwQ-32B 的无限生成问题以及不同设置的效果。有人认为使用重复惩罚来对抗循环可能会适得其反，反而导致循环加剧，真正导致循环的是截断。比如，[-p-e-w-]指出，使用 Min-P 等自适应截断采样器时，如果模型开始重复之前的输入，可能会触发自我强化的锁定，导致模型只能循环。

也有人分享了个人的测试经历。[danielhanchen]尝试了多种设置，如完全移除 min-p、调整温度和其他参数，但在不同的测试中都遇到了各种问题，如产生无效的 Python 语法、重复现象等。

还有用户[zoydberg357]分享了自己的幻觉基准测试经历，评估了不同提示下模型产生幻觉结果的频率。

对于解决方法，大家提出了各种见解。[-p-e-w-]建议将 DRY 放在采样器链中的特定位置，并调整 Min-P 的值。[danielhanchen]则根据测试结果不断调整参数设置，并分享了自己的发现。

总之，这场关于 QwQ-32B 无限生成问题的讨论十分热烈，各方观点丰富多样，虽然尚未达成完全一致，但为解决这一问题提供了众多思路和方向。

讨论总结#

主要观点#

金句与有趣评论#

情感分析#

趋势与预测#

详细内容：#