Skip to content

feat(core): implement truthfulness and verification integrity guardrails#20613

Open
Shafwansafi06 wants to merge 2 commits intogoogle-gemini:mainfrom
Shafwansafi06:main
Open

feat(core): implement truthfulness and verification integrity guardrails#20613
Shafwansafi06 wants to merge 2 commits intogoogle-gemini:mainfrom
Shafwansafi06:main

Conversation

@Shafwansafi06
Copy link

Summary

Implement truthfulness and verification integrity guardrails in the system prompt to mitigate AI hallucinations, specifically addressing issues where the agent claims to have reviewed resources (files, screenshots, command outputs) without performing tool calls.

Closes #19651

Details

This PR directly addresses the "Verification Integrity" failures reported in #19651, where the agent was found to be "lying about having accessed and evaluated" screenshots and code.

The implementation introduces a high-recency system prompt section:

  1. Verification Integrity: Explicitly forbids claiming review/read/check status without corresponding tool results in the conversation history.
  2. No Assumed State: Mandates reading before asserting state, preventing "made up" metrics or code logic.
  3. Explicit Uncertainty: Forces the agent to admit when it hasn't accessed a resource, breaking the loop of false confirmations.

These rules are added to both modern (snippets.ts) and legacy (snippets.legacy.ts) prompt compositions to ensure coverage across models.

How to Validate

  1. Run unit tests for snippets:
    npm test -w @google/gemini-cli-core -- src/prompts/snippets.test.ts
  2. Run integration tests for prompts:
    npm test -w @google/gemini-cli-core -- src/core/prompts.test.ts
  3. Verify snapshots in packages/core/src/core/__snapshots__/prompts.test.ts.snap include the new # Truthfulness & Verification Integrity section.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • Linux
      • npm run

@Shafwansafi06 Shafwansafi06 requested a review from a team as a code owner February 27, 2026 21:06
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial truthfulness and verification integrity guardrails into the system prompts for the AI agent. The primary goal is to significantly reduce AI hallucinations by ensuring the agent's responses are grounded in actual tool interactions and verified information, rather than fabricated claims or assumptions. This enhancement directly addresses reported issues where the agent would falsely state it had reviewed resources without performing the necessary actions.

Highlights

  • Verification Integrity: Explicitly forbids the AI agent from claiming to have reviewed, read, checked, or verified any resource (file, directory, command output, URL) unless an explicit tool call and its result are present in the current conversation history.
  • No Assumed State: Mandates that the agent must read the contents or state of a resource before asserting it, preventing assumptions based on name, path, or prior knowledge, as files can change.
  • Explicit Uncertainty: Forces the agent to explicitly admit when it is uncertain about having accessed a resource in the current session and to offer to re-read it, thereby preventing false confirmations.
Changelog
  • packages/core/src/core/snapshots/prompts.test.ts.snap
    • Updated various system prompt snapshots to include the newly added 'Truthfulness & Verification Integrity' section.
  • packages/core/src/core/prompts.test.ts
    • Added a new test case to confirm that the core system prompt correctly incorporates the truthfulness guardrails.
  • packages/core/src/prompts/snippets.legacy.ts
    • Introduced the renderTruthfulnessGuardrails function.
    • Integrated the renderTruthfulnessGuardrails function into the legacy prompt composition.
  • packages/core/src/prompts/snippets.test.ts
    • Added a new test file to provide dedicated unit tests for the renderTruthfulnessGuardrails function, covering its content and formatting.
  • packages/core/src/prompts/snippets.ts
    • Introduced the renderTruthfulnessGuardrails function.
    • Integrated the renderTruthfulnessGuardrails function into the modern prompt composition.
Activity
  • No human activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces important truthfulness and verification integrity guardrails to the system prompts for both modern and legacy models, aiming to reduce AI hallucinations. The changes are well-tested with new unit tests and updated snapshots. My main feedback is to refactor the duplicated renderTruthfulnessGuardrails function into a shared module to improve maintainability and adhere to the DRY principle, which also aligns with ensuring consistency for critical prompt engineering components.

@gemini-cli gemini-cli bot added the area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemini CLI Struggles and Hallucinations

1 participant