Validator Configuration as Code with Ansible

Distributed Validator Network

Managing validators and nodes in blockchain networks can be challenging, especially when maintaining multiple environments. As Operator we want to deploy to multiple distinct vendors and environments to increase the redundancy and availability of our validators. But this also means we need to manage multiple environments and vendors and software versions. To avoid manual errors and ensure a consistent setup across all nodes, we decided to use Ansible to manage our validator infrastructure.

Key Benefits: Automation, Consistency, Versioning, Disaster Recovery, and Reduced Operational Risk

Index

The Challenge

Managing configuration drift and ensuring consistency across multiple validator nodes manually is time-consuming and error-prone, especially in geographically distributed setups with frequently changing versions and nodes.

Before implementing our solution, each blockchain software update required 2-6 hours of manual work across all nodes, with high risk of human error.
After implementing our solution, the time to update the validator software was reduced to 15 minutes with minimal risk of human error.

The Solution

Leveraging Ansible, we developed reusable playbooks and roles to automate the configuration of:

  • System configuration (OS hardening, package installation)
  • Security configuration (SSH keys, firewall rules, certificates)
  • Validator configuration (binary installation)
  • Monitoring configuration (Prometheus, Grafana, Alertmanager, Node Exporter, Filebeat)

Architecture Overview

For this example we have a distributed validator network with 5 nodes in 2 different networks.

Loading diagram...

We can split the required functionality into seperate playbooks and apply them to the corresponding groups.

ServiceSoftware
Blockchain ServiceCosmos SDK based blockchain deamon.
Signing ServiceHorcrux remote signing service.
Monitoring ServicesNode Exporter
Monitoring StackGrafana, Prometheus, Alertmanager
Firewalliptables
VPN-NetworkWireGuard
Base packagesFail2Ban, Unattended Upgrades, Chrony, SSH hardening

Defining Inventory, Groups and Variables

Defining Inventory

The inventory file defines all hosts and their groups:

inventory.ymlyaml
all:
  hosts:
    node-A:
    node-B:
    node-C:
    node-D:
    node-E:
  children:
    juno-1:
      hosts:
        node-A:
        node-B:
        node-C:
    monitoring:
      hosts:
        node-D:
        node-E:

Groups and Host Variables

Group specific variables are defined in the group_vars directory:

group_vars/juno-1.ymlyaml
config_chain_id: "juno-1"
chain_network: mainnet
config_subnet: ##.##.##.0/24
config_persistent_peers: [####:26656,####:26656,####:26656]
config_unconditional_peer_ids: [####,####,####]
config_private_peer_ids: [####,####,####]
config_priv_validator_laddr: tcp://0.0.0.0:1234

Host specific variables are defined in the host_vars directory:

host_vars/node-A.ymlyaml
# External Public IP
#ansible_host: ####::#
# Preferred Internal IP
ansible_host: ##.##.#.#
ansible_user: controller
ansible_become_pass: !vault ####
config_external_address: !vault ####
config_moniker: validator-alpha
config_pruning_interval: 13
config_max_num_outbound_peers: 100

Handling Secrets

Never store private keys or sensitive information in your Ansible repository. Use Ansible Vault for encrypting sensitive information.

To ensure the deployment does not expose any secrets, we use Ansible Vault to encrypt the sensitive information like passwords and sensitive environment variables.
Highly sensitive data like signing keys are provisioned separately and are out of scope of the ansible deployment.

To encrypt a arbitrary string with a strong password securely, we can use the following command:

Encrypting a string

$ ansible-vault encrypt_string -J -p New Vault password: Confirm New Vault password: Variable name (enter for no name): ansible_become_pass String to encrypt (hidden): Encryption successful ansible_become_pass: !vault | $ANSIBLE_VAULT;1.1;AES256 62623836616462376234386535343435306166366537663335376238656630376632313061643839 6561336564303663613265343139306533333833643263320a373934313165353638666135626139 30366633643634386163646162646237633734643534393437613564376631376236653539616232 6637323337396631630a373535666535363633386139643532363165313364353635613434336564 35346566633736323139323236363762373238383532363430333561306362653236

We can copy the encrypted string into the hosts variables file:

host_vars/node-A.ymlyaml
# External Public IP
#ansible_host: ####::#
# Preferred Internal IP
ansible_host: ##.##.#.#
ansible_user: controller
ansible_become_pass: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          62623836616462376234386535343435306166366537663335376238656630376632313061643839
          6561336564303663613265343139306533333833643263320a373934313165353638666135626139
          30366633643634386163646162646237633734643534393437613564376631376236653539616232
          6637323337396631630a373535666535363633386139643532363165313364353635613434336564
          35346566633736323139323236363762373238383532363430333561306362653236
config_external_address: !vault ####
config_moniker: validator-alpha
config_pruning_interval: 13
config_max_num_outbound_peers: 100

This encryption approach should be applied to all sensitive variables in your configuration. For enhanced security, we strongly recommend excluding the host_vars directory from version control systems and managing it as protected confidential information within your organization's secure secrets management workflow.

Putting it all together with templates

Templates are used to generate the configuration files for with the correct values for the corresponding host.

templates/juno/config.toml.j2j2
##
# Static config keys omitted.
# Default config.toml with variable overrides.
##
priv_validator_laddr = "{{ config_priv_validator_laddr | default('') }}"
seeds = "{{ config_seeds | default('######') }}"
external_address = "{{ config_external_address | default('') }}"
persistent_peers = "{{ config_persistent_peers | default('') }}"
addr_book_strict = {{ config_addr_book_strict | default('true') }}
max_num_inbound_peers = {{ config_max_num_inbound_peers | default('40') }}
max_num_outbound_peers = {{ config_max_num_outbound_peers | default('40') }}
unconditional_peer_ids = "{{ config_unconditional_peer_ids | default('') }}"

Running the Playbook

This section demonstrates a typical playbook structure for deploying blockchain nodes. Key features include:

  • User and directory setup
  • System optimization (swap management)
  • Secure binary deployment
  • Configuration templating
  • Service management

This playbook handles the deployment of the blockchain deamon:

deploy-juno.ymlyaml
- name: Deploy Juno blockchain deamon
  hosts: juno-1
  become: true
  vars:
    description: "Juno chain service"
    home_path: /home/chainuser/.juno
    chain_user: chainuser
    chain_version: v21.0.1
  tasks:
    - name: ensure chainuser user exists
      user:
        name: chainuser
        shell: /bin/bash
        password: '!'
        state: present
    - name: Create chains directory
      file:
        path: /home/chainuser/chains
        owner: chainuser
        group: chainuser
        state: directory
    - name: ensure filehandle limits
      pam_limits:
        domain: chainuser
        limit_type: soft
        limit_item: nofile
        value: unlimited
    - name: Disable swap for current session
      command: swapoff -a
    - name: Disable swap permanently, persist reboots
      replace:
        path: /etc/fstab
        regexp: '^(\s*)([^#\n]+\s+)(\w+\s+)swap(\s+.*)$'
        replace: '#\1\2\3swap\4'
        backup: yes
    - name: make sure junod is offline
      systemd:
        name: junod
        state: stopped
      ignore_errors: yes
    - name: install junod
      copy:
        src: ./binaries/junod/{{ chain_version }}
        dest: /home/chainuser/chains/junod/
        owner: chainuser
        group: chainuser
        mode: '0740'
    - name: create junod service
      vars:
        exec_path: /home/chainuser/chains/junod/{{ chain_version }}/junod
      template:
        src: chain.service.j2
        dest: /etc/systemd/system/junod.service
        owner: root
        group: root
        mode: '0644'
      notify: reload systemd
    - name: ensure .juno exists
      file:
        path: /home/chainuser/.juno
        owner: chainuser
        group: chainuser
        mode: '0700'
        state: directory
    - name: ensure .juno/config exists
      file:
        path: /home/chainuser/.juno/config
        owner: chainuser
        group: chainuser
        mode: '0700'
        state: directory        
    - name: copy juno config.toml
      template:
        src: juno/{{ chain_network }}/config.toml.j2
        dest: /home/chainuser/.juno/config/config.toml
        owner: chainuser
        group: chainuser
        mode: '0640'
    - name: copy juno app.toml
      template:
        src: juno/{{ chain_network }}/app.toml.j2
        dest: /home/chainuser/.juno/config/app.toml
        owner: chainuser
        group: chainuser
        mode: '0640'
    - name: copy juno client.toml
      template:
        src: juno/{{ chain_network }}/client.toml.j2
        dest: /home/chainuser/.juno/config/client.toml
        owner: chainuser
        group: chainuser
        mode: '0640'
    - name: copy juno genesis
      copy:
        src: ./configurations/juno/genesis/{{ chain_network }}.json
        dest: /home/chainuser/.juno/config/genesis.json
        owner: chainuser
        group: chainuser
        mode: '0640'
  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes
Running the playbook

$ ansible-playbook deploy-juno.yml --ask-vault-pass Vault password:

TASK deploy-juno : ensure chainuser user exists ********************************************************************************** changed: node-A

TASK deploy-juno : Create chains directory ****************************************************************************************** changed: node-A

TASK deploy-juno : ensure filehandle limits ****************************************************************************************** changed: node-A

TASK deploy-juno : Disable swap for current session ********************************************************************************** changed: node-A

TASK deploy-juno : make sure junod is offline **************************************************************************************** ok: node-A

TASK deploy-juno : install junod *************************************************************************************************** changed: node-A

TASK deploy-juno : create junod service ********************************************************************************************** changed: node-A

TASK deploy-juno : ensure .juno exists ********************************************************************************************** changed: node-A

TASK deploy-juno : copy juno config.toml ********************************************************************************************* changed: node-A

TASK deploy-juno : copy juno app.toml ********************************************************************************************** changed: node-A

TASK deploy-juno : copy juno client.toml ********************************************************************************************* changed: node-A

TASK deploy-juno : copy juno genesis ************************************************************************************************ changed: node-A

TASK deploy-juno : reload systemd *************************************************************************************************** changed: node-A

TASK deploy-juno : start junod ****************************************************************************************************** changed: node-A

TASK deploy-juno : wait for junod to be ready **************************************************************************************** changed: node-A

Results and Metrics

After implementing Ansible across our validator network, we observed significant improvements in several key operational metrics:

After going through the process of creating playbooks for all required services we can drastically reduce the time and effort required to maintain all services while being much more flexible and error resilient.
This becomes especially important when operating on multiple different networks which all may have addtional configurations for testnets.
Comparing the before and after metrics we can see the following improvements:

MetricBeforeAfter
Deployment Time4-6 hours20 minutes
Configuration Errors2-3 per monthNone in 6 months
Recovery Time4+ hours30 minutes
Manual Steps40+4-5

Lessons Learned

  1. Start Simple: Begin with basic playbooks and iterate
  2. Repeatability: Make sure playbooks can be run many times safely
  3. Testing: Use staging environments before production
  4. Documentation: Maintain detailed documentation alongside code

Conclusion

Our Ansible implementation transformed validator management from an error-prone, time-consuming process to a streamlined operation. The combination of:

  • Version-controlled infrastructure
  • Automated consistency checks
  • Encrypted secret management
  • Reusable network templates

has created a foundation for scalable validator operations.

The automated approach has allowed us to drastically reduce the time and effort required to maintain all services while being much more flexible and error resilient.

Need help automating your validator infrastructure? Contact us for a consultation.