Remediation Getting Started

This content has moved and will no longer be updated. Please go to https://docs.vmware.com/en/CloudHealth-Secure-State/ for the latest version. Please see the latest What's new for the more details about the move.

Last updated on December 15, 2021

Introduction

CloudHealth Secure State provides a unique approach to improving cloud security by automating remediation across your AWS and Azure cloud environments. Customers can not only monitor their cloud for any misconfigurations in real-time, but also programmatically remediate findings from the Secure State console. Secure State's remediation service is designed around its cloud permissions control policy, which enables you to manage and remediate misconfigurations while still providing Secure State with read-only access (least privileges) to your cloud accounts. This document outlines how to configure deploy and configure the remediation service on your cloud accounts.

You can view the latest remediation job release versions on the Github repository.

Note: At this time, remediation is supported only for AWS and Azure environments.

Key Concepts

There are a few key concepts that you may want to be familiar with before getting started. A brief architectural overview is provided below for your reference.

Remediation Architecture

The diagram above presents the various components in the remediation workflow that interact with your cloud resources to improve the security posture.

There are two parts to the remediation framework:

  1. The Secure State platform that acts as the control plane for any actions.

  2. The worker group that is deployed in the customer’s cloud environment and managed by the customer.

Secure State requires read-only permissions to the customer’s account, while the worker group requires limited write permissions to a scoped set of accounts that customers enable.

Remediation

A remediation is an action configured for the desired criteria on findings. It defines the remediation job to run for a set of findings. Remediation criteria, which includes a provider, rule, cloud accounts, tags, and regions, works as a filter on the findings to act upon. Any findings that match a set of criteria can be either manually or automatically remediated.

Remediation Worker

A remediation worker, a container hosting remediation scripts, needs to be deployed in customer’s environment in order to apply remediating configurations. This container is completely owned by the customer. The worker automatically registers with Secure State on activation and sends back health notifications and logs. There are several out-of-the-box actions included in the container image for common misconfigurations. Any of these actions can be modified and new actions can be authored by customers.

Remediation Worker Group

A worker group is a set of remediation workers that all act upon the same logical group of resources. There can be multiple worker groups for the same cloud provider, and the same worker group can act on resources in multiple clouds. A typical example is to have a worker group per provider (AWS, Azure) and software environment (development, staging, production).

Remediation Job

A remediation job is a script that contains the code to fix a misconfiguration. This script is hosted on the worker and automatically loads to the Secure State when the worker activates.

Remediation Action

A remediation action is the change (or changes) a remediation worker is configured to make to the selected criteria on findings.

Remediation Action Criteria

Metadata (tags, regions, and so on) used to define the scope of a remediation action.

Remediation (Action) Run

In this context, a run is a single instance of a remediation action performed on a finding. Remediation run status can be Success, In Progress, or Pending.

Remediation Runs

A metric that tracks the status of individual remediation action runs.


Configure remediation framework

Before you can run remediations on findings in Secure State, you must setup the framework Secure State needs to run remediations in your cloud environment. This requires several steps you must take both in your Secure State console and your cloud environment.

  1. Configure the correct permissions in your cloud accounts to authenticate the remediation worker to Secure State when it's deployed, and allow it to perform remediation actions.
  2. In Secure State, create a remediation worker group and associate it with the cloud accounts you want to remediate findings in.
  3. Deploy the remediation worker in your cloud environment.

The specific instructions for each of these steps differ based on your cloud provider. This section describes the end-to-end process for setting up a remediation worker in AWS and Azure.

This process only needs to be performed once for the environment you want to run remediations in. Some customers may only need to set up a single configuration for their entire organization, while others may need several to provide logical separation between different cloud providers or software environments (development, staging, production, and so on).

AWS

To correctly configure a remediation worker in AWS, you'll need to create IAM roles for the worker and each of the cloud accounts you plan to remediate in before taking any additional steps.

For a list of supported AWS remediation jobs and their minimum permissions when setting up IAM policies, refer to the Secure State Remediation Github repository.

Configure your account permissions

The AWS remediation framework is designed to work in an environment where a customer hosts the remediation work in a central cloud account that can access and run remediation scripts on other cloud accounts. The directions are designed with this model in mind, but you may be working with single cloud account. That said, you must have a single cloud account to remediate. In that case, you can still follow these directions successfully as long as you keep in mind that you're deploying the worker and running the remediations on the same account.

To get started, you must create at least two IAM roles, each with a unique policy.

  • An IAM role associated with the cloud account you plan to run remediations on.
  • Attached policy: Grants the minimum permissions needed to perform the selected remediation actions on the cloud account.
  • An IAM role to serve as the instance profile for the EC2 instance the remediation worker runs on.
  • Attached policy: Allows the EC2 instance to assume the permissions set on the cloud account while performing a remediation action.

Create a cloud account IAM role

You can refer to the AWS documentation for specific instructions on creating an IAM role for your cloud account. When creating the role, you should configure an external ID to connect your cloud account to a remediation worker group later on.

To create a policy to attach to the IAM role, refer to the minimum_policy.json files for the remediation jobs you want to run on your cloud account. This example shows a policy that grants permissions to remediate open PostgreSQL Server ports on security groups in your cloud account. This sample policy allows the grants permission to remove inbound security group rules and was created with content from the associated job page.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:RevokeSecurityGroupIngress"
            ],
            "Resource": "*"
        }
    ]
}

You can add support for more remediation jobs by including their required permissions in Actions. Observe least privilege principles by only including the permissions for the types of findings you plan to remediate.

Once you've created the IAM role, copy the Role ARN and External ID for use in the next section.

AWS IAM Credentials

Repeat this process for as many cloud accounts as you plan to remediate.

Create an IAM instance profile for EC2

You can refer to the AWS documentation for specific instructions on creating an IAM instance profile for EC2.

You must create the IAM instance profile in the same cloud account you plan to host the remediation worker in (see the Deploy remediation worker section). This might be the same account you set up an IAM role for if you're remediating a single cloud account, or a separate one if you're remediating multiple cloud accounts.

The attached policy for this role uses sts:AssumeRole to allow the remediation worker to assume the cloud account role you created in the previous step. You can create the policy with this JSON:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "*"
        }
    ]
}

Once you've created the role and attached a policy, you can proceed to the next section.

Create the remediation worker group

Secure State uses remediation worker groups to issue commands to remediation workers after they've been deployed and configured with remediation jobs to perform. This section shows you how to create a worker group in Secure State and associate it with any cloud accounts you set up permissions for in the previous section.

  1. From your Secure State dashboard, navigate to Settings > Remediation worker groups.

  2. Enter a name for the worker group and an optional description.

Add worker group

  1. Click on Generate Deployment Info to get credentials for deploying the remediation worker. Make sure you copy and store the client secret in a safe place-for security purposes, it is displayed only once. If you lose the secret, you must delete the existing worker group and create a new one. You can also copy the code snippet here for use in the next section.

Worker credentials

  1. On the next screen, associate your cloud accounts with the worker group. Enter the Role ARN and External ID you copied in the previous section.

Associate AWS cloud account

Note: You are responsible for ensuring that the appropriate permissions are created in the cloud account for the desired remediation actions, whether native or custom. If this is not setup correctly, remediation actions will fail later and log the missing permissions. You can add or remove more cloud accounts after creating the worker group (click the Options button from the worker group details page), but at least one cloud account must be associated during setup to create the worker group.

  1. Click on Save to create the worker group.

You should see the worker group displayed in your list of remediation worker groups now. Many users may find they only need a single worker group for their entire organization, but if you need logical separation between different groupings of resources (like cloud providers or software environments) then it may be useful to create multiple worker groups. A typical use case would be to create an individual worker group for development, staging, production, and so on.

Deploy the remediation worker

Now that you've configured permissions access to your cloud accounts and associated them with a remediation worker group, you can create the remediation worker in your AWS environment. This section shows you how to provision an EC2 instance and deploy the docker container that runs the worker. You must connect to the instance by SSH or other means to perform some of these steps.

  1. Provision an EC2 instance in the same cloud account you created an IAM instance profile in previously. The minimum specifications to host the remediation worker are 128 MB memory and 1/2 core CPU.

Note: Make sure you assign the instance profile you created in the first section when launching the instance, or it won't be able to access any cloud accounts for remediation.

  1. Install Docker on your EC2 instance. You can refer to the documentation at AWS or Docker for specific steps based on the image your instance is running.

  2. Connect to your EC2 instance and run the code snippet from the deployment info in the previous section to start the worker image.

docker run --name vss-remediation-{WORKER_GROUP_NAME} \
-e VSS_CLIENT_ID={ENTER CLIENT ID} \
-e VSS_CLIENT_SECRET={ENTER CLIENT SECRET} \
vmware/vss-remediation-worker:latest

Note: To ensure the worker image remains running in the event of a host reboot, consider configuring the docker container with a restart policy

The worker should connect to the remediation worker group you created automatically. You can confirm by going to the details page for your worker group and selecting the Workers tab.

Now that your worker is deployed and connected to your remediation worker group, you're ready to start creating configuring remediation jobs on your associated cloud accounts. Refer to the Setting up remediations section for the next steps.

Troubleshooting

If you see the worker container error out, ensure that there are no outbound networking rules or firewalls configured to block calls to Secure State. You may have to allow certain calls to Secure State so that the worker can communicate back to the product. You may also need to see if the worker container has the appropriate networking configuration. Verify this by running the command docker network ls to list all the containers with their drivers. Please make sure an appropriate network driver, such as bridge, is selected for the worker container. This can also be passed into the worker execute command as an argument --network=bridge.

Update the remediation worker

New remediation jobs are frequently added for additional AWS rules by the Secure State team. If you want to take advantage of new jobs, re-deploy the remediation worker to your EC2 instance (make sure you remove the old one), and then add the desired remediation jobs as described further in this guide.

Azure

To correctly configure a remediation worker in Azure, you'll need to create an Azure Active Directory application for authentication and custom IAM roles for each subscription you plan to remediate in before you take any additional steps.

For a list of supported Azure remediation jobs and their minimum permissions when setting up custom IAM roles, refer to the Secure State Remediation Github repository.

Create a remediation worker group

Secure State uses remediation worker groups to issue commands to remediation workers after they've been deployed and configured with remediation jobs to perform. This section shows you how to create a worker group in Secure State and associate it with any cloud accounts you set up permissions for in the previous section.

  1. From your Secure State dashboard, navigate to Settings > Remediation worker groups.

  2. Enter a name for the worker group and an optional description.

Add worker group

  1. Click on Generate Deployment Info to get credentials for deploying the remediation worker (client ID and client secret are the values you need). Make sure you copy and store the client secret in a safe place-for security purposes, it is displayed only once. If you lose the secret, you must delete the existing worker group and create a new one.

Get worker credentials

  1. On the next screen, choose the cloud accounts you want to associate with the worker group and click Next.

  2. Click on Save to create the worker group.

You should see the worker group displayed in your list of remediation worker groups now. Many users may find they only need a single worker group for their entire organization, but if you need logical separation between different groupings of resources (like cloud providers or software environments) then it may be useful to create multiple worker groups. A typical use case would be to create an individual worker group for development, staging, production, and so on.

Note: You are responsible for ensuring that the appropriate permissions are created in the cloud account for the desired remediation actions, whether native or custom. If this is not setup correctly, remediation actions will fail later and log the missing permissions. You can add or remove more cloud accounts after creating the worker group (click the Options button from the worker group details page), but at least one cloud account must be associated during setup to create the worker group.

Configure account permissions

You must create a minimum of two entities:

  • An Azure Active Directory app registration to authenticate the worker.
  • An Access control (IAM) custom role to assign the permissions the worker needs to perform remediations.

Register an app in Azure Active Directory

You can refer to the Azure Documentation for instructions to create an application in Azure AD. This is similar to the process you'd follow in the Getting Started guide for onboarding Azure Cloud accounts in Secure State.

After you've created the app, note down the following credentials for later use:

  • Application (client) ID
  • Directory (tenant) ID
  • Client secret

You must create the client secret after registering the app; check the Azure documentation for specific steps.

Create a custom IAM role

To create a custom IAM role, open the subscription you plan to deploy the remediation worker in. You can refer to the Azure Documentation for specific instructions to create the role. Depending on your familiarity with Azure, the simplest way to add the permissions you want is to copy the contents of a minimum.json file from a remediation job page and add it to the JSON editor for the custom role.

{
  "properties": {
    "roleName": "remediation_min_perms",
    "description": "This role has the required permissions to run CHSS native remediation jobs.",
    "assignableScopes": [
    ],
    "permissions": [
      {
        "actions": [
          "Microsoft.Network/networkSecurityGroups/read",
          "Microsoft.Network/networkSecurityGroups/write"
        ],
        "notActions": [],
        "dataActions": [],
        "notDataActions": []
      }
    ]
  }
}

You must define the subscriptions you plan to remediate in assignableScopes. If you're managing a large number of subscriptions, you can enter the management group they belong to instead (if you have access). Azure supports only one management group per role, so you may need to create multiple roles to manage your subscriptions if they are spread across different management groups.

You can add support for more remediation jobs by including their required permissions in Actions. Observe least privilege principles by only including the permissions for the types of findings you plan to remediate.

Assign permissions at the subscription scope

Once the application and custom role are defined, you must assign the role to the application at the subscription level.

For example, using an application named "Secure State Remediation", the role assignment would look like this:

Azure role assignment

You can review the Azure Documentation for specific steps to create a role assignment. You must perform this action in every subscription you want to perform remediation on.

Note: If your custom role isn't appearing in the role assignment drop-down, verify you've added the subscription to assignableScopes in the previous step.

Deploy the remediation worker

Now that you've configured permissions access to your cloud accounts and associated them with a remediation worker group, you can create the remediation worker in your Azure environment. This section shows you how to provision a virtual machine and deploy the docker container that runs the worker. You must connect to the instance by SSH or other means to perform some of these steps.

  1. Provision a virtual machine on the subscription you created the IAM custom role in. The minimum specifications to host the remediation worker are 128 MB memory and 1/2 core CPU.

  2. Install Docker on your virtual machine You can refer to the documentation Docker for specific steps based on the image your VM is running.

  3. Connect to your VM and run the this docker command with credentials from the remediation worker group and Azure AD application from the previous steps.

docker run --name vss-{WORKER_GROUP_NAME} \
-e VSS_CLIENT_ID={ENTER CLIENT ID} \
-e VSS_CLIENT_SECRET={ENTER CLIENT SECRET} \
-e AZURE_CLIENT_ID={ENTER AZURE APP CLIENT ID} \
-e AZURE_CLIENT_SECRET={ENTER AZURE CLIENT SECRET} \
-e AZURE_TENANT_ID={ENTER AZURE TENANT ID} \
vmware/vss-remediation-worker:latest

Note: To ensure the worker image remains running in the event of a host reboot, consider configuring the docker container with a restart policy

The worker should connect to the remediation worker group you created automatically. You can confirm by going to the details page for your worker group and selecting the Workers tab.

Now that your worker is deployed and connected to your remediation worker group, you're ready to start creating configuring remediation jobs on your associated cloud accounts. Refer to the Setting up remediations section for the next steps.

Troubleshooting

If you see the worker container error out, ensure that there are no outbound networking rules or firewalls configured to block calls to Secure State. You may have to allow certain calls to Secure State so that the worker can communicate back to the product. You may also need to see if the worker container has the appropriate networking configuration. Verify this by running the command docker network ls to list all the containers with their drivers. Please make sure an appropriate network driver, such as bridge, is selected for the worker container. This can also be passed into the worker execute command as an argument --network=bridge.

If a remediation job is failing without providing any logs, then it's likely a problem with worker configuration. To troubleshoot this, navigate to Settings > Remediation Worker Group, select your worker group, and see the output under the Logs tab. For example, you might see the following error appearing in the worker group logs, but not remediation job logs:

NoCredentialProviders: no valid providers in chain

This suggests that the IAM instance profile may not be configured correctly. Verify you created the instance profile according to the directions and review the and verify the worker is listed as a trusted entity for the account you're trying to remediate.

For example, if the remediation worker is on running on account 342687210996, this account must be listed as a trusted entity for any other IAM role you remediate with. Double-check your policies and ensure the ARNs are entered correctly and there are no spaces present.

Update the remediation worker

New remediation jobs are frequently added for additional Azure rules by the Secure State team. If you want to take advantage of new jobs, re-deploy the remediation worker to your virtual machine (make sure you remove the old one), and then add the desired remediation jobs as described further in this guide.


Create remediation

A remediation must be configured to link remediating actions to findings. As part of a remediation, a group of findings based on a selection of criteria such as rule, cloud accounts, tags, region, and so on are wired up with a remediation job, the script that will apply the secure configuration.

Follow these steps to create a remediation:

  1. From your Secure State Dashboard, navigate to the Actions > Remediations and select Add New.

  2. Enter a name and description for the remediation. Select the provider and worker group to target, then click Next.

  3. Select the security rule and the cloud accounts for which you’d like to remediate findings. The criteria can be further scoped to findings with certain cloud tags or in specific regions. Once you're done, click Next.

Create Remediation Wizard

  1. Select the job to run on the selected findings criteria. You can review the code for each available job. A few jobs for common misconfigurations such as open SSH ports or unencrypted buckets are provided out-of-the-box in the worker image. All jobs are defined on the worker image and can be modified as per use. Click Next when you're ready to continue.

Create Remediation – Select Job

  1. Review the summary of the cloud accounts and findings that match the selection criteria, and choose whether you’d like to publish the remediation at this time. Once you publish, the remediation will become available in the list of findings as an action. The full list of matching findings will also be available in the details page once the creation flow completes. Click Save when done.

Create Remediation – Review Step

You should now see the remediation listed in Actions > Remediations. You can view and modify any of the configurations for a remediation after creation.


Run remediation on a finding

Once the worker group has been setup and the remediation has been created, you are ready to begin running remediations to remove misconfigurations from your cloud environment. We support a manual “Click to Fix” option and an automated “Auto-remediation” option for remediation runs.

Click to fix remediations

This section describes how you can utilize the “Click to Fix” workflow. Begin by clicking into the remediation you created. Review the properties of the remediation and switch over to the “Findings” tab. You will see all the matching findings that can be remediated using the configured job. They are displayed in the “Available” tab.

Remediation – Click to Fix

Select the Findings you’d like to remediate and select the “REMEDIATE” option right above the list to confirm. This will signal the deployed worker to execute the selected job to remove misconfigurations.

Note: Please carefully review the remediation job configured before running remediation. This action will change the specified configurations of the targeted resources.

You can now see the selected Findings in the “Submitted” tab. This displays all the Findings that you run remediations for. The remediation status column describes the state of each remediation run for a finding, whether In Progress, Success, or Failure. Next, wait for a few minutes for the remediation status to change to “Success.” You will also notice the finding status column that marks whether a finding is still detected in your environment. The finding status will change to “Resolved” once the applied configuration is detected by Secure State.

Remediation - Submitted Status

You can review the logs to track the progress of your remediation run. All worker logs are sent back to Secure State and are available in the “Logs” tab for the selected remediation.

Remediation Logs

If the remediation run for a finding fails, you can view the logs to debug the issue. Once you’ve fixed the error causing issue, you can retry remediation on the finding by selecting “Remediate Again”.

Remediation Retry

Note: It may take several minutes for the finding to change to the “Resolved” status after the remediation has been successfully completed and the misconfiguration has been removed. Finding re-evaluations are based on change events (like AWS CloudWatch) that may be sent by the cloud provider several minutes after a change has been made. Wait for a few minutes and review the logs before re-running a remediation.

Once the remediation is published, this click to remediate capability is also available as an option on the list of findings and the finding details page.

Auto-remediations

To proactively address any new findings that match certain criteria, you can enable the auto-remediation capability for a remediation. This deploys the remediation as a guardrail so that any future misconfigurations are remediated in real-time as they’re detected by Secure State.

A remediation must be published before auto-remediation can be enabled. As new findings are detected, the configured remediation job is run on the worker group. Findings that have been remediated using this automatic process are also displayed in the submitted tab with the remediation statuses and execution logs.

To enable automatic remediation for any new findings, go to the “Properties” page of the remediation and “Turn On” auto-remediation.

Remediation - Auto-remediation

Note: Auto-remediation acts only on new findings that match the criteria. Existing findings must be manually selected and run for remediation.

Monitoring Remediation Actions and Worker Groups

You can see a snapshot of your remediation activity and run status on your dashboard.

Dashboard Remediation Monitoring

The Remediation widget in the main dashboard provides a quick summary of the worker groups you have categorized by status, and lists the total number of remediation actions created with unpublished or auto-remediation settings. The Remediation runs widget displays a simple graph to visualize the total success and failure rates of remediation runs alongside their percentage change over the past seven days.

Click into the Remediation runs chart to view the list of Remediation actions, then apply the Run status filter to review the Remediation action runs with Success, Failure, or In Progress statuses. The Remediation runs column in the Remediations page identifies the counts of the runs filtered by the selected status, allowing you to quickly narrow down to successful or failed actions.

Note: The Remediation runs graph may take a few minutes to update after a worker group or action is deleted.


Conclusion

Remediation enables customers to improve cloud security by automating actions across their AWS and Azure cloud environments, while maintaining complete control over the permissions boundary. Misconfigurations can be remediated manually after review or automatically by deploying remediation actions as security guardrails. Secure State not only detects cloud misconfigurations in customer’s cloud environments in real-time, but also removes them automatically before they can pose a risk for their organization.

If you would like to learn more about creating custom remediation jobs for your unique cloud environment, check out the Remediation Custom Jobs guide.