Published on: May 4, 2026

8 min read

Atlassian will train on your data: Opt out with GitLab

Learn why Atlassian's latest move is a threat to data governance and how GitLab's approach helps ensure your customers' data stays private and protected.

Starting August 17, 2026, Atlassian will begin collecting customer metadata and in-app content from Jira, Confluence, and other cloud products to train its AI offerings, including Rovo and Rovo Dev. This announcement comes after GitHub recently changed its Copilot data usage policy. Taken together, these changes suggest opt-out-by-default is becoming the industry norm. GitLab takes the opposite position: no data collection, no AI training on customer data, no matter what tier you're on.

Atlassian's change is enabled by default for all cloud customers and affects roughly 300,000 organizations. For customers on the Free, Standard, and Premium tiers, metadata collection is mandatory and cannot be turned off. Only Enterprise-tier customers have the option to opt out. This policy change deserves a close read if your engineering, IT, and program management teams run on Atlassian because they are most exposed by this change — and least likely to have been consulted before it happened.

Although the underlying governance questions are the same for both Atlassian and GitHub's changes, the data at risk is different. Where GitHub's change concerned source code and developer interactions, Atlassian's reaches into project plans, internal documentation, workflow configurations, and operational metadata across Jira, Confluence, and the broader Atlassian stack. For organizations that rely on these tools as their system of record for how work gets planned and delivered, the implications run deep.

What changed and what it means for your data

Atlassian will collect two categories of information:

  • Metadata: de-identified operational signals like story points, sprint dates, and SLA values, including data from its Teamwork Graph and connected third-party apps
  • In-app content: user-generated material such as Confluence page content, Jira issue titles, descriptions, and comments

Atlassian says it will apply de-identification and aggregation before training. Collected data may be retained for up to seven years, with in-app data removed within 30 days of opt-out and models retrained within 90 days.

There are some exclusions: Customers using customer-managed encryption keys, Atlassian Government Cloud, Isolated Cloud, or those with HIPAA requirements are carved out from collection. But for the vast majority of Atlassian's cloud customer base, data collection will start unless you pay for the Enterprise tier and actively flip the switch.

This reverses Atlassian's prior stated position that customer data would not be used to train or improve AI services. Organizations that adopted Jira and Confluence to manage their most sensitive planning workflows, sprint boards, security tickets, incident postmortems, and internal documentation will soon be contributing that content to Atlassian's AI training pipeline, without ever being asked.

The governance gap in "opt-out by default"

Opt-out-by-default data collection for AI training is an emerging pattern across the software industry. It raises the same set of questions every time: How does this interact with existing data processing agreements? Does the vendor's definition of "metadata" match what your legal and security teams would consider non-sensitive data?

For many organizations, the answer to these questions is "we don't know."

When a vendor changes its data practices through a terms-of-service update, the burden falls on the customer to notice, evaluate the implications, and act within the window the vendor provides.

The mandatory nature of metadata collection on Free, Standard, and Premium tiers makes this more acute. The only exit is upgrading to Enterprise, which requires a minimum of 801 users and custom pricing that would represent a significant cost jump for teams that aren't there yet. Data protection, in other words, is now a purchasing decision.

The tiered structure also introduces a subtler problem. Metadata like story points, sprint velocity, SLA metrics, and task classifications may seem innocuous in isolation, but in aggregate they reveal project structure, team performance patterns, and delivery cadence. For organizations in competitive industries, that operational intelligence has real value, and "de-identified" does not necessarily mean "non-sensitive" once patterns are reconstructable at scale.

Why this matters more for Atlassian-stack organizations

In Atlassian-based organizations, Jira has been the center how teams plan, track, and deliver work. It’s the source of truth for sprint planning, bug tracking, release management, portfolio coordination, and cross-functional project execution.

In regulated industries like financial services, public sector and manufacturing, Jira and Confluence together hold sensitive operational data that may be subject to compliance requirements. The risk compounds for organizations that have expanded beyond Jira into the broader Atlassian ecosystem.

When you run Jira, Confluence, Bitbucket, and Bamboo together, the surface area of data now feeding into AI training spans your project plans, internal documentation, source code metadata, and CI/CD configurations — each of which security and compliance teams would want to review before sharing with a vendor's training pipeline.

Atlassian’s Teamwork Graph connectors add another dimension for customers who have integrated third-party tools, such as Slack, Figma, Google Drive, Salesforce, and ServiceNow, into their environment. Teamwork Graph connectors index relationship and activity signals from these connected apps, which means the metadata Atlassian collects will not be limited to what lives inside Atlassian products. For security and compliance teams accustomed to evaluating data flows on a per-vendor basis, this cross-platform reach complicates the assessment considerably.

Organizations that are already navigating Atlassian's push from Data Center and Server editions to the cloud face a compounding challenge. Adding default AI data collection to that migration path raises the stakes further: The question is no longer just "do we move to Atlassian Cloud?" but "do we move to Atlassian Cloud knowing our data will feed AI training unless we're on the most expensive tier?"

What regulated industries should be evaluating now

The compliance implications vary by sector, but the obligation to reassess is consistent.

In financial services, frameworks like SR 11-7 and DORA require documented, auditable oversight of third-party technology providers, including how those providers handle data. In the public sector, NIST 800-53 and FISMA make controlling where sensitive data flows a foundational requirement. In healthcare, HIPAA governs how patient-adjacent data is handled by third parties.

Across the board, a material change in a vendor's data practices, such as Atlassian moving from "we don't train on your data" to "we do, by default," triggers a documentation and risk reassessment obligation.

Institutions operating under the EU AI Act face an additional dimension: opt-out framing aligns with U.S. norms, while European regulators generally expect opt-in consent for data processing of this nature.

If your model risk or vendor management team documented Atlassian's data handling controls before this announcement, the question isn't whether this change triggers a reassessment obligation. It does. The question is whether your team can take action before August 17.

What to look for in your platform vendors

CTOs and CISOs across regulated industries need to adopt AI in a way they can explain to regulators, boards, and customers. Because of this, GitLab operates within the following set of principles:

Unconditional data commitments, not tier-dependent protections. Regulated organizations need to know, with specificity, what happens to their data. A commitment that varies by plan tier, or that requires action before a deadline, introduces exactly the kind of uncontrolled variable that keeps CISOs up at night.

Transparency and auditability. Model risk management frameworks require organizations to understand the AI systems they deploy, including the training data and third parties involved. Vendors who cannot answer these questions clearly create documentation risk.

Separation between customer data and vendor AI training. When a platform vendor trains models on customer usage data, workflows and operational patterns become inputs to a system that also serves competitors. For organizations where project structure or delivery cadence represents competitive advantage, that exposure matters.

How GitLab's approach differs

GitLab doesn't train on customer data — at any tier, full stop. AI vendors powering GitLab Duo features are contractually prohibited from using customer inputs or outputs for their own purposes, a commitment GitLab CEO Bill Staples has consistently reiterated.

GitLab's AI Transparency Center documents exactly which models power which features, how data is handled, and what vendor commitments are in place. GitLab's AI Continuity Plan documents how vendor changes are managed, including any material changes to how AI vendors treat customer data. For institutions managing third-party AI risk under DORA or similar frameworks, vendor continuity and concentration are active governance concerns, and having a documented plan for both is part of what responsible AI tooling looks like.

For organizations that require AI processing to stay within their own infrastructure, GitLab Duo Agent Platform is available with GitLab Self-Managed deployments, including support for integration with self-hosted AI models. This means prompts and code never leave the customer's environment. GitLab also provides IP indemnification for Duo-generated output, with no filters required and no activation steps needed. Where your data lives remains your choice, no matter your deployment model or subscription tier.

Whether your organization stays on Atlassian or begins evaluating alternatives, the conversation about who controls your data and how it gets used should be happening now. The August 17 deadline is approaching, but you still have time to try GitLab Ultimate with Duo Agent Platform for free today.

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum.

Share your feedback

Start building faster today

See what your team can do with the intelligent orchestration platform for DevSecOps.