> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cloudeval.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data and model boundaries

> Understand how CloudEval imports GitHub source, stores project snapshots, uses model context, and where team-controlled deployment options are headed.

CloudEval needs source and cloud evidence to build diagrams, reports, and review comments. This page explains what enters CloudEval today, how GitHub-backed projects are synced, and which data-control options are current versus planned.

## Current trust model

The current public product is a CloudEval-hosted service. For self-serve and Pro users, CloudEval imports source snapshots and generated evidence into CloudEval-managed project storage so the product can resolve templates, build graphs, run reports, and answer project-aware questions.

| Area                       | Current behavior                                                                                                                      |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| GitHub-backed IaC projects | CloudEval reads the selected repository, branch, and source root through the CloudEval GitHub App.                                    |
| Stored project state       | CloudEval stores a project snapshot with imported files, commit provenance, resolved topology, report outputs, and evidence metadata. |
| Source of truth            | GitHub remains the editable source of truth for GitHub-linked projects. CloudEval keeps a read-only review snapshot.                  |
| AI-assisted review         | CloudEval uses scoped project/report context to generate summaries and answers.                                                       |
| Team-controlled hosting    | Planned for Teams and Enterprise, not available in the self-serve Pro product today.                                                  |

<Note>
  Team and Enterprise data-control options are roadmap direction. Treat the public Pro plan as CloudEval-hosted unless a contract or deployment agreement says otherwise.
</Note>

## GitHub sync data flow

GitHub repository sync is designed to keep the repository as the source of truth while giving CloudEval a bounded review snapshot.

```mermaid theme={null}
sequenceDiagram
    participant User as CloudEval user
    participant CE as CloudEval app
    participant GH as GitHub App
    participant Store as Project storage
    participant Jobs as Analysis jobs

    User->>CE: Select repository, branch, source root
    CE->>GH: Mint short-lived installation token
    GH-->>CE: Repository tree and UTF-8 file contents
    CE->>Store: Store source snapshot with commit provenance
    CE->>Jobs: Resolve templates, build graph, refresh reports
    Jobs->>Store: Save topology, reports, evidence, and history
```

When CloudEval syncs a GitHub project, it:

1. Resolves the selected branch or ref to a commit SHA.
2. Reads repository files through the CloudEval GitHub App installation.
3. Preserves repository-relative paths under the selected source root.
4. Skips excluded paths such as `.git`, `.github`, `node_modules`, `.terraform`, `.env`, Terraform state files, and CloudEval-generated folders.
5. Imports a bounded snapshot for analysis.
6. Stores commit provenance on the project so reports and PR comments can point back to the reviewed source.

Current sync limits are designed to keep repository imports focused:

| Limit                        | Current value    |
| ---------------------------- | ---------------- |
| Maximum synced file size     | 1 MB per file    |
| Maximum total synced content | 20 MB per sync   |
| Maximum imported files       | 500 files        |
| File encoding                | UTF-8 text files |

## What CloudEval stores for GitHub projects

CloudEval stores project metadata and a source snapshot. That can include:

* repository owner/name, selected branch, source root, and commit SHA
* imported source files needed for IaC analysis
* `.cloudeval/config.yaml` when present
* resolved topology and dependency context
* report outputs, issue inventory, evidence metadata, and history
* generated artifacts used by diagrams, reports, exports, and review comments

CloudEval does not store long-lived GitHub access tokens. It stores the GitHub App installation id and mints short-lived installation tokens server-side when it needs to list repositories, fetch files, sync a commit, or post an app-authored PR comment.

## What stays outside CloudEval

For GitHub-linked projects:

* GitHub remains the editable source of truth.
* CloudEval does not push source changes back to your repository as part of sync.
* Source files are read-only in the CloudEval workspace.
* Pull-request comments and CI review are optional GitHub Actions workflows, not required for repository sync.

For Azure Cloud sync:

* CloudEval reads scoped Azure management-plane metadata and exported deployment/topology evidence according to the permissions you grant.
* It does not require broad contributor permissions for the documented least-privilege flow.
* See [Azure Cloud sync permissions](/reference/azure-live-sync-permissions) for the current permission model.

## AI and model context

CloudEval reports and review comments can include AI-generated summaries. The current public product uses CloudEval-managed model routing. The context sent for AI-assisted review is scoped to the active workflow, such as project metadata, selected source snippets, graph context, report evidence, and issue summaries.

Use deterministic outputs when you do not want model-generated prose:

* In GitHub Actions, set `ai_summary: "false"`.
* In the CLI, use `--no-ai-summary` for `cloudeval review`.
* Export JSON or Markdown when another tool should inspect deterministic report data directly.

## Future self-hosted and gateway pattern

For Teams and Enterprise, CloudEval is planning a customer-controlled deployment pattern where the CloudEval app, source snapshots, cloud evidence, and model routing can sit inside a customer-controlled boundary. The diagrams below split that future scope into two moments: the private data plane and model routing.

Color legend:

* Blue: CloudEval software running in the customer environment
* Green: customer-owned private infrastructure, storage, or connectors
* Amber: public source or cloud-provider endpoints outside the private boundary
* Purple: customer-approved model routing and model-provider boundary

### Future private data plane

```mermaid theme={null}
flowchart LR
    subgraph Public["Public or external systems"]
        GitHub["GitHub repository"]
        Azure["Azure management plane"]
        InternalRepo["Internal source control"]
    end

    subgraph Private["Customer private network"]
        User["CloudEval users"]
        App["Self-hosted CloudEval app"]
        Jobs["CloudEval analysis workers"]
        Store["Customer-managed project storage"]
        Connector["Private cloud connector"]
        Policy["Customer network and IAM policy"]
    end

    User --> App
    GitHub --> App
    InternalRepo --> App
    App --> Store
    App --> Jobs
    Jobs --> Store
    Jobs --> Connector
    Connector --> Azure
    Policy --> App
    Policy --> Connector

    classDef cloudeval fill:#dbeafe,stroke:#2563eb,color:#172554
    classDef private fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef public fill:#fef3c7,stroke:#d97706,color:#78350f

    class App,Jobs cloudeval
    class User,Store,Connector,Policy private
    class GitHub,Azure,InternalRepo public
    style Private fill:#f0fdf4,stroke:#16a34a,stroke-width:2px
    style Public fill:#fffbeb,stroke:#d97706,stroke-width:2px
```

In this planned data-plane model:

* source snapshots and generated report artifacts stay in customer-managed storage
* cloud sync runs through private-network or customer-approved connectors
* CloudEval software runs inside the customer-controlled network boundary
* public endpoints remain source systems or cloud-provider APIs, not the storage boundary for CloudEval project data

### Future model routing

```mermaid theme={null}
flowchart LR
    subgraph Private["Customer private network"]
        App["Self-hosted CloudEval app"]
        Jobs["CloudEval analysis workers"]
        Context["Scoped report and source context"]
        Gateway["Custom LLM gateway"]
        Audit["Customer logging and policy controls"]
        Key["Customer-owned model key or identity"]
    end

    subgraph Provider["Approved model boundary"]
        Model["Approved model provider"]
    end

    App --> Jobs
    Jobs --> Context
    Context --> Gateway
    Gateway --> Audit
    Gateway --> Key
    Gateway --> Model
    Model --> Gateway
    Gateway --> Jobs

    classDef cloudeval fill:#dbeafe,stroke:#2563eb,color:#172554
    classDef private fill:#dcfce7,stroke:#16a34a,color:#14532d
    classDef gateway fill:#f3e8ff,stroke:#9333ea,color:#581c87

    class App,Jobs cloudeval
    class Context,Audit,Key private
    class Gateway,Model gateway
    style Private fill:#f0fdf4,stroke:#16a34a,stroke-width:2px
    style Provider fill:#faf5ff,stroke:#9333ea,stroke-width:2px
```

In this planned model-routing path:

* AI summaries and chat route through an approved gateway instead of CloudEval-managed model routing
* the customer controls which model account, key, gateway policy, and logging boundary are used
* only scoped report, graph, issue, and source context needed for the active workflow is sent through the gateway

This is future scope. The current public Pro product remains CloudEval-hosted with CloudEval-managed storage and model routing.

## Planned team and enterprise controls

CloudEval is planning deeper data-control options for Teams and Enterprise. These are not self-serve Pro capabilities today.

| Capability                                | Status  | Intended use                                                                                          |
| ----------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------- |
| Self-hosted or private-network deployment | Planned | Keep source snapshots, cloud sync data, and generated evidence in customer-controlled infrastructure. |
| Customer-managed object storage           | Planned | Store imported source snapshots and report artifacts in a customer-controlled storage boundary.       |
| Custom LLM gateway                        | Planned | Route AI-assisted review through an approved enterprise model gateway.                                |
| Bring your own model key                  | Planned | Let teams use an approved model account or gateway for AI summaries and chat.                         |
| Private cloud/environment connectors      | Planned | Support stricter network and identity boundaries for cloud sync and evidence collection.              |

These roadmap items are most relevant when your organization needs stricter data residency, private networking, internal model routing, or audit controls before connecting private infrastructure code or live cloud environments.

## Related pages

* [GitHub repository sync](/reference/github-repository-sync)
* [GitHub Actions integration](/workflows/github-actions)
* [Azure Cloud sync permissions](/reference/azure-live-sync-permissions)
* [Feature matrix](/feature-availability)
* [Capabilities map](/capabilities-map)
