Skip to main content

Nomad on Azure: Revisiting Consul

·2272 words·11 mins
Azure Nomad Microsoft Hashicorp Mvp Consul
Nomad On Azure - This article is part of a series.
Part 7: This Article

It has been some time since we talked about Consul. We introduced Consul to the Nomad setup back in part 2 to simplify forming the Nomad cluster.

Nomad on Azure: The one where we introduce Consul
·2102 words·10 mins
Azure Nomad Microsoft Hashicorp Mvp

Consul is an important part of a successful Nomad experience, beyond forming the initial Nomad cluster. In this part we will revisit Consul with the primary goal of strengthening the security of the Consul cluster.

Nomad has built-in support for service discovery. However, Consul has additional features (e.g. service mesh) which makes it a great match with Nomad.

Apart from working on security concerns we will also add a DNS name for the Consul cluster similar to what we did for Nomad. Finally, we will make sure that Consul responds to DNS queries for the .consul domain in our virtual network. This is to make sure we can use the Consul service discovery feature even outside of the Nomad cluster.

As always, the source code for this part can be found in the accompanying GitHub repository:

mattias-fjellstrom/nomad-on-azure

Accompanying git repository for my blog series on “Nomad on Azure”

HCL
0
0

Strengthening the Consul cluster
#

There are two primary security features we would like to configure for our Consul cluster:

  • Enable mTLS (and add the DNS consul.hashicorp.mattiasfjellstrom.com while we are at it)
  • Enable gossip encryption

These are similar to what we did for the Nomad cluster.

Note that Consul also has the concept of access-control lists (ACLs) exactly like Nomad. This feature will be enabled in a later blog post - there will be enough complexity with mTLS and gossip encryption!

Enable mTLS and add a DNS name
#

We need to generate the server certificates for our Consul cluster. This can be done using your established PKI solution (e.g. HashiCorp Vault), or you can use the Consul CLI.

Since we have not introduced any specific PKI solution (yet) in this series we will stick to using the Consul CLI.

There is a feature (called auto-encrypt) in Consul that allows the Consul client certificates to be generated on the fly and kept in-memory. We will use this feature to simplify the setup a bit. We must still create the CA certificate and the server certificate (and a CLI certificate that we can use to connect to the cluster from our own laptop):

$ domain=consul.hashicorp.mattiasfjellstrom.com # use your own
$ consul tls ca create \
    -name-constraint="true" \
    -additional-name-constraint "$domain"
$ consul tls cert create \
    -server \
    -dc dc1 \
    -additional-dnsname "$domain"
$ consul tls cert create \
    -cli \
    -dc dc1 \
    -additional-dnsname "$domain"

The Consul documentation (and general best practices) recommends that you create individual server certificates for each server in the cluster. You could do that by simply repeating the consul tls cert create -server ... command above the desired number of times.

However … This would complicate the setup a bit, and also turn the Consul servers into pets instead of cattle. If we had a PKI solution setup we could have automated the process, but for now we will share the same server certificate across all servers.

Add the server certificates in the cloudinit configuration:

locals {
  cloudinit_files = {
    write_files = [
      {
        path    = "/etc/consul.d/tls/consul-agent-ca.pem"
        content = file("../tls/consul-agent-ca.pem")
      },
      {
        path    = "/etc/consul.d/tls/dc1-server-consul-0.pem"
        content = file("../tls/dc1-server-consul-0.pem")
      },
      {
        path    = "/etc/consul.d/tls/dc1-server-consul-0-key.pem"
        content = file("../tls/dc1-server-consul-0-key.pem")
      },
      # ...
    ]
  }
}

All the certificates are placed in the /etc/consul.d/tls directory.

Update the server agent configuration file to use these certificates:

tls {
  defaults {
    verify_incoming = true
    verify_outgoing = true
    ca_file         = "/etc/consul.d/tls/consul-agent-ca.pem"
    cert_file       = "/etc/consul.d/tls/dc1-server-consul-0.pem"
    key_file        = "/etc/consul.d/tls/dc1-server-consul-0-key.pem"
  }
}

To enable the auto-encrypt feature to generate Consul client certificates on the fly, add the following stanza to the server configuration file:

auto_encrypt {
  allow_tls = true
}

The Consul client agents need to be configured with the same CA certificate, so add the certificate to the clients as well and update the configuration file for each client:

tls {
  defaults {
    ca_file = "/etc/consul.d/tls/consul-agent-ca.pem"
  }
}

auto_encrypt = {
  tls = true
}

Consul uses different ports for encrypted and non-encrypted traffic. The encrypted traffic ports are not enabled by default. You can configure which ports should be in use in the ports stanza of the configuration. For the server agents the ports are configured like this:

ports {
  grpc     = -1
  grpc_tls = 8503
  http     = -1
  https    = 8501
  dns      = 8600
}

Setting a port to -1 disables it. The important takeaway from this is that HTTPS traffic is services on port 8501. This is the port that the Consul CLI and the Consul UI is available on.

Currently, Nomad still expects to communicate with Consul on port 8500. We must update the Nomad agent configuration files to accommodate the changes we have introduced in Consul.

Update the consul stanza for all the Nomad servers and clients:

consul {
  ssl     = true
  address = "127.0.0.1:8501"
  ca_file = "/etc/consul.d/tls/consul-agent-ca.pem"
}

In the previous blog post we added a load balancer for the Consul cluster. We must update the health probe of this load balancer to use port 8501 instead of 8500. Then we add a DNS A-record consul.hashicorp.mattiasfjellstrom.com that points to the load balancer public IP. The code is similar to what we saw for Nomad in a previous post so we will not cover the details here.


There are more (but minor) configuration changes for both Consul clients and servers that have been left out of this walkthrough. See the accompanying source code repository for the full details. Note that your desired configuration might not match with this demo code, and a some of this configuration might be revised in a future blog post.

Enable gossip encryption
#

Consul servers and clients all participate in the gossip pool. This is different from Nomad where only the servers participate.

This means that all Consul servers and clients must have the same gossip encryption key.

We can create the encryption key either using Terraform like we did for Nomad, or using the Consul CLI. Since the Consul architecture is distributed across different Terraform configurations using the Consul CLI would be the simplest solution for now. If we were using HashiCorp Vault we could have placed (or generated) the gossip encryption key in Vault and fetched it from there during provisioning.

Generate the gossip encryption key and store it in a local file:

$ consul keygen > gossip.key

This file will be read in all the Consul agent deployments (servers and clients).

Update all the Consul agent configuration files by adding the encrypt argument with the contents of the gossip.key file (note the file is in a directory named gossip):

encrypt = "${trimspace(file("../gossip/gossip.key"))}"

# the following configuration is not strictly needed but can be
# useful when introducing gossip encryption in an existing cluster
encrypt_verify_incoming = true
encrypt_verify_outgoing = true

That finalizes the gossip encryption part.

Configure Azure Private DNS Resolver
#

We want to use Consul for DNS queries. When something running in our Azure environment tries to reach my.service.consul we want the Consul servers to respond with an IP and port where this application can be reached. To achieve this we need to conditionally forward *.consul queries to the Consul servers.

We can do this by provisioning an Azure private DNS resolver in our virtual network. This is a shared resources, so it makes sense to place it in the shared platform part of our infrastructure.

The private DNS resolver requires dedicated (or delegated) subnets in the virtual network. It needs a subnet for inbound endpoints where DNS queries from outside your virtual network should be forwarded to:

resource "azurerm_subnet" "inbound" {
  name                 = "snet-inbound"
  virtual_network_name = azurerm_virtual_network.default.name
  resource_group_name  = azurerm_resource_group.default.name
  address_prefixes = [
    cidrsubnet(var.vnet_cidr_range, 12, 0),
  ]

  delegation {
    name = "Microsoft.Network.dnsResolvers"
    service_delegation {
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
      name    = "Microsoft.Network/dnsResolvers"
    }
  }
}

And similarly it needs a subnet for outbound endpoints where we will configure that Consul should receive queries for the .consul domain:

resource "azurerm_subnet" "outbound" {
  name                 = "snet-outbound"
  virtual_network_name = azurerm_virtual_network.default.name
  resource_group_name  = azurerm_resource_group.default.name
  address_prefixes = [
    cidrsubnet(var.vnet_cidr_range, 12, 1),
  ]

  delegation {
    name = "Microsoft.Network.dnsResolvers"
    service_delegation {
      actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
      name    = "Microsoft.Network/dnsResolvers"
    }
  }
}

Both of these subnets are /28 subnets.

The private DNS resolver resource is connected to the main shared platform virtual network:

resource "azurerm_private_dns_resolver" "default" {
  name                = "default"
  resource_group_name = azurerm_resource_group.default.name
  location            = azurerm_resource_group.default.location
  virtual_network_id  = azurerm_virtual_network.default.id
}

The inbound and outbound endpoints are established in the corresponding subnets we configured above:

resource "azurerm_private_dns_resolver_inbound_endpoint" "default" {
  name                    = "default"
  location                = azurerm_resource_group.default.location
  private_dns_resolver_id = azurerm_private_dns_resolver.default.id

  ip_configurations {
    private_ip_allocation_method = "Dynamic"
    subnet_id                    = azurerm_subnet.inbound.id
  }
}

resource "azurerm_private_dns_resolver_outbound_endpoint" "default" {
  name                    = "default"
  location                = azurerm_resource_group.default.location
  subnet_id               = azurerm_subnet.outbound.id
  private_dns_resolver_id = azurerm_private_dns_resolver.default.id
}

For the inbound endpoint we specify that a dynamic private IP address should be allocated.

The final piece to add in the shared platform infrastructure is a DNS forwarding ruleset (we will add rules to it later) and connect it to the virtual network:

resource "azurerm_private_dns_resolver_dns_forwarding_ruleset" "default" {
  name                = "default"
  resource_group_name = azurerm_resource_group.default.name
  location            = azurerm_resource_group.default.location
  private_dns_resolver_outbound_endpoint_ids = [
    azurerm_private_dns_resolver_outbound_endpoint.default.id
  ]
}

resource "azurerm_private_dns_resolver_virtual_network_link" "default" {
  name                      = "default"
  virtual_network_id        = azurerm_virtual_network.default.id
  dns_forwarding_ruleset_id = azurerm_private_dns_resolver_dns_forwarding_ruleset.default.id
}

On the Consul side we need to add a rule to the DNS forwarding ruleset that sends DNS queries destined for *.consul to the Consul servers. Unfortunately a rule in a DNS forwarding ruleset do not support using custom ports. The Consul DNS service listens on port 8600, not on port 53. We could change the port to 53, but this requires root privileges for the Consul service which is not ideal (it currently runs as a dedicated consul user).

Instead we can add an internal load balancer with the Consul servers as a backend pool. Then we add a load balancer rule that listens on port 53 (UDP) and forwards the traffic to port 8600 on the Consul servers.

Previously we created a public load balancer for the Consul servers. The main difference with a private load balancer is that you do not assign it a public IP, instead you add it to one of your subnets:

resource "azurerm_lb" "private" {
  name                = "lb-private-consul-servers"
  resource_group_name = azurerm_resource_group.default.name
  location            = azurerm_resource_group.default.location

  frontend_ip_configuration {
    name      = "private"
    subnet_id = azurerm_subnet.default.id
  }
}

The relevant load balancer rule forwarding traffic (UDP traffic to be specific) from the load balancer on port 53 to the Consul servers on port 8600 is configured like this:

resource "azurerm_lb_rule" "private_dns" {
  name                           = "private-dns"
  loadbalancer_id                = azurerm_lb.private.id
  protocol                       = "Udp"
  frontend_port                  = 53
  backend_port                   = 8600
  frontend_ip_configuration_name = azurerm_lb.private.frontend_ip_configuration[0].name
  probe_id                       = azurerm_lb_probe.private.id

  backend_address_pool_ids = [
    azurerm_lb_backend_address_pool.private.id,
  ]
}

With the load balancer in place we can add a rule to the DNS forwarding ruleset:

resource "azurerm_private_dns_resolver_forwarding_rule" "consul" {
  name                      = "consul-dns"
  dns_forwarding_ruleset_id = "<id>"
  domain_name               = "consul."

  target_dns_servers {
    ip_address = azurerm_lb.private.frontend_ip_configuration[0].private_ip_address
    port       = 53
  }
}

Verification
#

Go through terraform init, terraform plan, and terraform apply to provision the Nomad cluster, Consul cluster, and the shared platform virtual network and private DNS resolver.

Connect to one of the Consul servers (I’m using the accompanying justfile, see the previous blog post for details):

$ just connect-to-consul-servers

Verify that mTLS and gossip encryption is enabled by viewing the startup logs:

$ journalctl -u consul
Jul 13 05:39:55 consul-servers01VVZR systemd[1]: Started consul.service - "HashiCorp Consul - A service mesh solution".
Jul 13 05:39:55 consul-servers01VVZR consul[2136]: ==> Starting Consul agent...
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:                Version: '1.21.2'
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:             Build Date: '2025-06-18 08:16:39 +0000 UTC'
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:                Node ID: '81a1d3f3-91bd-6a1e-09cc-2d0f9eac47cc'
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:              Node name: 'consul-servers01VVZR'
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:             Datacenter: 'dc1' (Segment: '<all>')
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:                 Server: true (Bootstrap: false)
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:            Client Addr: [10.0.10.6] (HTTP: -1, HTTPS: 8501, gRPC: -1, gRPC-TL>
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:           Cluster Addr: 10.0.10.6 (LAN: 8301, WAN: 8302)
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:      Gossip Encryption: true
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:       Auto-Encrypt-TLS: true
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:            ACL Enabled: false
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:      Reporting Enabled: false
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:     ACL Default Policy: allow
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:              HTTPS TLS: Verify Incoming: true, Verify Outgoing: true, Min Ver>
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:               gRPC TLS: Verify Incoming: true, Min Version: TLSv1_2
Jul 13 05:39:55 consul-servers01VVZR consul[2136]:       Internal RPC TLS: Verify Incoming: true, Verify Outgoing: true (Verify >
...

We can see that the correct configuration has been read.

Next, connect to one of the Nomad servers:

$ just connect-to-nomad-servers

Verify that Consul responds to DNS queries for the .consul domain:

$ dig nomad-client.service.consul

# truncated ...

;; ANSWER SECTION:
nomad-client.service.consul. 0	IN	A	10.0.30.4
nomad-client.service.consul. 0	IN	A	10.0.30.5
nomad-client.service.consul. 0	IN	A	10.0.30.6

It seems like it is working!

Summary of Part 7
#

This part was devoted to Consul. We have a more secure Consul cluster running with mTLS and gossip encryption enabled. There is still the issue of enabling ACLs, but we’ll postpone that for a while.

We have added an Azure private DNS resolver to our architecture. When someone in the virtual network asks for the address to my.service.consul (or any *.consul address) our Consul cluster will respond. This is service discovery via DNS.

In the next part we will revise the infrastructure yet again, this time introducing HashiCorp Boundary to set up secure access to our Nomad cluster and the Consul cluster. Both clusters are currently exposed to the internet, which is less than ideal. It’s time to fix that.

Stay tuned!

Mattias Fjellström
Author
Mattias Fjellström
Cloud architect · Author · HashiCorp Ambassador · Microsoft MVP
Nomad On Azure - This article is part of a series.
Part 7: This Article