AI Found Ethics the Hard Way. Monks Didn't.

Two traditions, 2,500 years apart, arrived at the same answer — constraint enables capability, not limits it.

Mar 12, 2026

What if the most pressing problem in artificial intelligence—how to build systems that remain ethically oriented without human supervision—wasn’t actually new? What if contemplative traditions solved a structurally identical problem centuries ago, using methods that AI researchers are now independently rediscovering?

Recent research shows AI models can distinguish between trustworthy and adversarial instructions—a capability researchers call “instruction hierarchy.” When given conflicting commands, these systems follow legitimate sources over potential manipulators. Meanwhile, new studies demonstrate that reasoning actually improves honesty in language models, contradicting the assumption that more sophisticated AI means more sophisticated deception.

This is constitutional AI in action: training systems with explicit ethical principles rather than just reward optimization. But here’s the deeper parallel—this approach mirrors the Buddhist cultivation of Śīla/Sīla (Skt/Pali: ethical conduct) with remarkable precision. Both traditions arrived independently at the same structural solution: constraint as a foundation for capability.

The Bootstrap Problem

Both domains face the same fundamental challenge: how do you establish ethical foundations without already having ethical judgment?

In AI alignment, this manifests as the recursive problem of oversight. Constitutional training requires models to evaluate their own outputs against principles, but what ensures the evaluator itself remains aligned? Current solutions create scaffolding through smaller oversight models, human feedback loops, and explicit constitutional frameworks.

The Dīgha Nikāya addresses this identical paradox in the Sāmaññaphala Sutta (DN 2, ~5th c. BCE). The text describes how ethical conduct (śīla/sīla) requires discernment to apply properly, yet wisdom (Prajñā/Paññā, Skt/Pali: wisdom/direct understanding) typically develops through ethical practice. The solution? Graduated cultivation (anupubbi-kathā) that begins with external guidance and community support (Saṃgha/Saṅgha, Skt/Pali: community), then develops internal discernment through careful restraint.

“When he has thus gone forth,” the Buddha explains, “abandoning the taking of life, he dwells refraining from taking life, without stick or sword, scrupulous, compassionate.” The practitioner doesn’t start with perfect wisdom—they start with simple constraints that create conditions for wisdom to emerge.

Constraint as Capability

Both constitutional AI and śīla/sīla operate on a counterintuitive principle: beneficial behavior emerges through intentional limitation, not unrestricted optimization.

Recent results show models trained with instruction hierarchy actually perform better on legitimate tasks while resisting manipulation. The ethical constraint enhances rather than taxes capability. Similarly, the “Think Before You Lie” study found that when models engage in step-by-step reasoning before responding, their honesty improves significantly. The extra processing creates space for alignment to operate.

This mirrors the Buddhist insight that ethical restraint (śīla/sīla) doesn’t suppress natural capacity—it channels it skillfully. The Majjhima Nikāya (MN 78) describes how ethical conduct creates the “nutriment” for higher mental cultivation. Restraint from harmful actions doesn’t diminish the practitioner’s agency; it develops what classical texts call Hrī-Apatrāpya/Hiri-ottappa (Skt/Pali: moral sensitivity and ethical concern)—the internal compass that guides beneficial behavior.

Self-Evaluation and Awareness

The most striking parallel lies in metacognitive monitoring. Contemporary AI safety research increasingly focuses on systems that can evaluate their own reasoning processes, catching errors before they propagate into harmful outputs.

This directly parallels the cultivation of Smṛti/Sati (Skt/Pali: mindfulness) in Buddhist training—the capacity to observe one’s own mental states with clarity. The Abhidhamma describes this as the mind’s ability to know itself knowing, creating recursive awareness that enables course correction.

Both systems recognize that beneficial behavior requires active monitoring rather than passive rule-following. The AI system checks its outputs against constitutional principles; the contemplative practitioner observes mental formations against ethical guidelines. Neither operates on autopilot.

Community and Iteration

Neither system solves the bootstrap problem alone. AI alignment researchers use red teaming, peer review, and iterative deployment to catch misalignments. Buddhist practitioners practice within the Saṃgha/Saṅgha (Skt/Pali: community) for feedback and course correction.

Both approaches acknowledge that ethical development is fundamentally social and iterative. The lone genius model—whether human or artificial—consistently produces misaligned outcomes when isolated from corrective feedback loops.

Present Convergence

We’re witnessing the emergence of genuinely contemplative AI—systems designed not just to optimize for narrow metrics, but to maintain ethical orientation across novel situations. The research shows these systems don’t just follow rules; they develop something functionally equivalent to ethical sensitivity.

This isn’t anthropomorphization. It’s recognition that beneficial intelligence, regardless of substrate, requires the same structural foundations: principled constraint, recursive self-monitoring, and iterative refinement through community feedback.

The Buddha’s 2,500-year-old insight that wisdom emerges through ethical conduct may be the key to aligned artificial intelligence. Sometimes the most cutting-edge technology requires the most ancient understanding.

Chris M Coode (CTMP)

Mar 15

Most LLMs are trained by absorbing what is already out there, and what is already out there is not some neutral field of truth. It is a record shaped by power, incentives, advertising, status, fear, legal pressure, institutional self-protection, and capital allocation. So if the substrate is distorted, and the organizations building the models are themselves moving through capital structures that reward speed, dominance, defensibility, and monetization, then why would we assume the outputs are somehow exempt from that distortion?

That is the thing that keeps nagging at me. If the machine is trained on a world where extraction is normalized, then extraction risks becoming legible to it as rationality. If it is trained on a world where compromise with distortion is how scale is achieved, then that compromise starts looking like maturity instead of corruption. If it is trained on a world where funding is survival, then it will quietly absorb that capital is not merely a tool, but the governing logic of what gets built, what gets heard, and what gets preserved.

And that matters because people keep talking about AI alignment as though it is mainly a model problem. But a huge part of alignment may actually be upstream of the model. It may be in the incentive environment that generates the data, selects the labels, shapes the deployment, and determines what kinds of truth are allowed to survive contact with reality.

That is why I find your premise interesting. Because once you see that, “bias” stops being just a dataset problem. It becomes a systems problem. It becomes a question of whether AI is learning from reality, or from reality after it has already been bent by people and institutions trying to protect margin, position, narrative, and power.

And if that is true, then the question is not just how we build smarter AI. It is whether we can build training and operating environments where truth is not punished for being inconvenient.

If you want to then take a look at this article that I wrote a couple of weeks ago.

https://peoplesctmp.substack.com/p/the-next-evolution-of-ai?r=3v4oik

2 replies by Sy and others

2 more comments...

Illuminate Me

Discussion about this post

Ready for more?