feat(core): implement truthfulness and verification integrity guardrails#20613
feat(core): implement truthfulness and verification integrity guardrails#20613Shafwansafi06 wants to merge 2 commits intogoogle-gemini:mainfrom
Conversation
feat(core): implement truthfulness and verification integrity guardrails
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces crucial truthfulness and verification integrity guardrails into the system prompts for the AI agent. The primary goal is to significantly reduce AI hallucinations by ensuring the agent's responses are grounded in actual tool interactions and verified information, rather than fabricated claims or assumptions. This enhancement directly addresses reported issues where the agent would falsely state it had reviewed resources without performing the necessary actions. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces important truthfulness and verification integrity guardrails to the system prompts for both modern and legacy models, aiming to reduce AI hallucinations. The changes are well-tested with new unit tests and updated snapshots. My main feedback is to refactor the duplicated renderTruthfulnessGuardrails function into a shared module to improve maintainability and adhere to the DRY principle, which also aligns with ensuring consistency for critical prompt engineering components.
Summary
Implement truthfulness and verification integrity guardrails in the system prompt to mitigate AI hallucinations, specifically addressing issues where the agent claims to have reviewed resources (files, screenshots, command outputs) without performing tool calls.
Closes #19651
Details
This PR directly addresses the "Verification Integrity" failures reported in #19651, where the agent was found to be "lying about having accessed and evaluated" screenshots and code.
The implementation introduces a high-recency system prompt section:
These rules are added to both modern (
snippets.ts) and legacy (snippets.legacy.ts) prompt compositions to ensure coverage across models.How to Validate
npm test -w @google/gemini-cli-core -- src/prompts/snippets.test.tsnpm test -w @google/gemini-cli-core -- src/core/prompts.test.tspackages/core/src/core/__snapshots__/prompts.test.ts.snapinclude the new# Truthfulness & Verification Integritysection.Pre-Merge Checklist