← All posts

What happens to your code when a GitHub Action sends it to an AI?

May 2025

When a GitHub Action sends your code to an external service, what actually happens? Where does the code go? Who can see it? What happens if the service gets breached? These are reasonable questions for any team evaluating AI-powered CI/CD tools, and the answers matter more than most vendor marketing lets on.

What gets sent

The first question to ask about any GitHub Action that processes your code: what data actually leaves your runner?

For code review tools, the answer is typically the full diff and surrounding context — potentially large amounts of source code. For documentation tools, it varies significantly by implementation. Some send complete files; others send targeted diffs with relevant context.

DocDr sends: the git diff of the merged PR (filtered to remove lock files, generated code, and minified assets), and the current content of your documentation files. It does not send your full repository, build artifacts, or environment variables.

Secret scanning before transmission

The practical risk with any tool that reads your code is accidental credential exposure. A developer commits an API key in an example file. The key doesn't make it to your remote (git history is checked), but it's in the working diff when the CI job runs. Does the tool catch it before sending it upstream?

DocDr scans the diff for secrets before transmission using pattern matching against common credential formats: API keys, connection strings, private keys, tokens. If anything matches, the job fails with a clear error indicating what was found and where, rather than silently sending the credential to an external service.

This is defense-in-depth. Your pre-commit hooks and gitleaks scan should catch secrets before they're committed. But a CI-level check before external API calls is an additional safeguard.

Where the code goes

Most AI-powered CI tools route your code to an LLM provider API (OpenAI, Anthropic, Google). The data handling depends on which provider and which tier you're on.

For DocDr: your diff and doc content are sent to Google's Gemini API. Google does not use API inputs for model training by default — you can review their data processing terms for the specifics. The backend processes your data ephemerally; DocDr does not store your source code or diffs after generating a response.

Questions to ask any vendor:

  • Which LLM provider do you use, and under what data processing agreement?
  • Is my code used for model training?
  • How long is data retained after processing?
  • Do you store code diffs or source files?
  • What happens to data in the event of a breach?

The runner environment

GitHub Actions run in ephemeral VMs by default. The runner environment is fresh for each job and destroyed afterward. This means that code processed by a GitHub Action is only accessible during the job execution window — there's no persistent environment where an attacker could sit and wait for code to arrive.

Self-hosted runners change this equation. If you're using self-hosted runners, the security properties of the runner environment are your responsibility, not GitHub's. Be careful about which Actions you allow to run in privileged self-hosted environments.

Minimum permissions

GitHub Actions have a permissions model that limits what a job can do. Follow the principle of least privilege: only request the permissions you need.

For a documentation maintenance job, the minimum required permissions are:

permissions:
  contents: write      # create a branch with doc changes
  pull-requests: write # open a draft PR

The job doesn't need long-lived repository secrets, doesn't need to interact with deployments, and doesn't need admin access. If a vendor's documentation asks for more permissions than this, ask why.

Auditing what you've approved

Review your .github/workflows/ directory periodically. Check what each Action is authorized to do, what data it processes, and whether the third-party Actions you use are pinned to specific commit SHAs rather than mutable tags. A uses: some-action@v1 reference can change what code runs without you noticing; uses: some-action@abc123 cannot.

The GitHub marketplace has thousands of Actions, and not all of them have the same security practices. For Actions that process your source code, vet them the same way you'd vet any third-party dependency that handles sensitive data.

Keep your docs in sync automatically

DocDr reads your merged PRs, generates documentation updates with AI, and opens a draft PR for your review. No config files, no manual doc debt.

Start free trial →