Validator Configuration as Code with Ansible
Managing validators and nodes in blockchain networks can be challenging, especially when maintaining multiple environments. As Operator we want to deploy to multiple distinct vendors and environments to increase the redundancy and availability of our validators. But this also means we need to manage multiple environments and vendors and software versions. To avoid manual errors and ensure a consistent setup across all nodes, we decided to use Ansible to manage our validator infrastructure.
Key Benefits: Automation, Consistency, Versioning, Disaster Recovery, and Reduced Operational Risk
Index
- The Challenge
- The Solution
- Inventory, Groups and Variables
- Handling Secrets
- Running the Playbook
- Results and Metrics
- Lessons Learned
- Conclusion
The Challenge
Managing configuration drift and ensuring consistency across multiple validator nodes manually is time-consuming and error-prone, especially in geographically distributed setups with frequently changing versions and nodes.
Before implementing our solution, each blockchain software update required 2-6 hours of manual work across all nodes, with high risk of human error.
After implementing our solution, the time to update the validator software was reduced to 15 minutes with minimal risk of human error.
The Solution
Leveraging Ansible, we developed reusable playbooks and roles to automate the configuration of:
- System configuration (OS hardening, package installation)
- Security configuration (SSH keys, firewall rules, certificates)
- Validator configuration (binary installation)
- Monitoring configuration (Prometheus, Grafana, Alertmanager, Node Exporter, Filebeat)
Architecture Overview
For this example we have a distributed validator network with 5 nodes in 2 different networks.
We can split the required functionality into seperate playbooks and apply them to the corresponding groups.
Service | Software |
---|---|
Blockchain Service | Cosmos SDK based blockchain deamon. |
Signing Service | Horcrux remote signing service. |
Monitoring Services | Node Exporter |
Monitoring Stack | Grafana, Prometheus, Alertmanager |
Firewall | iptables |
VPN-Network | WireGuard |
Base packages | Fail2Ban, Unattended Upgrades, Chrony, SSH hardening |
Defining Inventory, Groups and Variables
Defining Inventory
The inventory file defines all hosts and their groups:
all:
hosts:
node-A:
node-B:
node-C:
node-D:
node-E:
children:
juno-1:
hosts:
node-A:
node-B:
node-C:
monitoring:
hosts:
node-D:
node-E:
Groups and Host Variables
Group specific variables are defined in the group_vars
directory:
config_chain_id: "juno-1"
chain_network: mainnet
config_subnet: ##.##.##.0/24
config_persistent_peers: [####:26656,####:26656,####:26656]
config_unconditional_peer_ids: [####,####,####]
config_private_peer_ids: [####,####,####]
config_priv_validator_laddr: tcp://0.0.0.0:1234
Host specific variables are defined in the host_vars
directory:
# External Public IP
#ansible_host: ####::#
# Preferred Internal IP
ansible_host: ##.##.#.#
ansible_user: controller
ansible_become_pass: !vault ####
config_external_address: !vault ####
config_moniker: validator-alpha
config_pruning_interval: 13
config_max_num_outbound_peers: 100
Handling Secrets
Never store private keys or sensitive information in your Ansible repository. Use Ansible Vault for encrypting sensitive information.
To ensure the deployment does not expose any secrets, we use Ansible Vault to encrypt the sensitive information like passwords and sensitive environment variables.
Highly sensitive data like signing keys are provisioned separately and are out of scope of the ansible deployment.
To encrypt a arbitrary string with a strong password securely, we can use the following command:
$ ansible-vault encrypt_string -J -p
New Vault password:
Confirm New Vault password:
Variable name (enter for no name): ansible_become_pass
String to encrypt (hidden):
Encryption successful
ansible_become_pass: !vault |
$ANSIBLE_VAULT;1.1;AES256
62623836616462376234386535343435306166366537663335376238656630376632313061643839
6561336564303663613265343139306533333833643263320a373934313165353638666135626139
30366633643634386163646162646237633734643534393437613564376631376236653539616232
6637323337396631630a373535666535363633386139643532363165313364353635613434336564
35346566633736323139323236363762373238383532363430333561306362653236
We can copy the encrypted string into the hosts variables file:
# External Public IP
#ansible_host: ####::#
# Preferred Internal IP
ansible_host: ##.##.#.#
ansible_user: controller
ansible_become_pass: !vault |
$ANSIBLE_VAULT;1.1;AES256
62623836616462376234386535343435306166366537663335376238656630376632313061643839
6561336564303663613265343139306533333833643263320a373934313165353638666135626139
30366633643634386163646162646237633734643534393437613564376631376236653539616232
6637323337396631630a373535666535363633386139643532363165313364353635613434336564
35346566633736323139323236363762373238383532363430333561306362653236
config_external_address: !vault ####
config_moniker: validator-alpha
config_pruning_interval: 13
config_max_num_outbound_peers: 100
This encryption approach should be applied to all sensitive variables in your configuration. For enhanced security, we strongly recommend excluding the host_vars
directory from version control systems and managing it as protected confidential information within your organization's secure secrets management workflow.
Putting it all together with templates
Templates are used to generate the configuration files for with the correct values for the corresponding host.
##
# Static config keys omitted.
# Default config.toml with variable overrides.
##
priv_validator_laddr = "{{ config_priv_validator_laddr | default('') }}"
seeds = "{{ config_seeds | default('######') }}"
external_address = "{{ config_external_address | default('') }}"
persistent_peers = "{{ config_persistent_peers | default('') }}"
addr_book_strict = {{ config_addr_book_strict | default('true') }}
max_num_inbound_peers = {{ config_max_num_inbound_peers | default('40') }}
max_num_outbound_peers = {{ config_max_num_outbound_peers | default('40') }}
unconditional_peer_ids = "{{ config_unconditional_peer_ids | default('') }}"
Running the Playbook
This section demonstrates a typical playbook structure for deploying blockchain nodes. Key features include:
- User and directory setup
- System optimization (swap management)
- Secure binary deployment
- Configuration templating
- Service management
This playbook handles the deployment of the blockchain deamon:
- name: Deploy Juno blockchain deamon
hosts: juno-1
become: true
vars:
description: "Juno chain service"
home_path: /home/chainuser/.juno
chain_user: chainuser
chain_version: v21.0.1
tasks:
- name: ensure chainuser user exists
user:
name: chainuser
shell: /bin/bash
password: '!'
state: present
- name: Create chains directory
file:
path: /home/chainuser/chains
owner: chainuser
group: chainuser
state: directory
- name: ensure filehandle limits
pam_limits:
domain: chainuser
limit_type: soft
limit_item: nofile
value: unlimited
- name: Disable swap for current session
command: swapoff -a
- name: Disable swap permanently, persist reboots
replace:
path: /etc/fstab
regexp: '^(\s*)([^#\n]+\s+)(\w+\s+)swap(\s+.*)$'
replace: '#\1\2\3swap\4'
backup: yes
- name: make sure junod is offline
systemd:
name: junod
state: stopped
ignore_errors: yes
- name: install junod
copy:
src: ./binaries/junod/{{ chain_version }}
dest: /home/chainuser/chains/junod/
owner: chainuser
group: chainuser
mode: '0740'
- name: create junod service
vars:
exec_path: /home/chainuser/chains/junod/{{ chain_version }}/junod
template:
src: chain.service.j2
dest: /etc/systemd/system/junod.service
owner: root
group: root
mode: '0644'
notify: reload systemd
- name: ensure .juno exists
file:
path: /home/chainuser/.juno
owner: chainuser
group: chainuser
mode: '0700'
state: directory
- name: ensure .juno/config exists
file:
path: /home/chainuser/.juno/config
owner: chainuser
group: chainuser
mode: '0700'
state: directory
- name: copy juno config.toml
template:
src: juno/{{ chain_network }}/config.toml.j2
dest: /home/chainuser/.juno/config/config.toml
owner: chainuser
group: chainuser
mode: '0640'
- name: copy juno app.toml
template:
src: juno/{{ chain_network }}/app.toml.j2
dest: /home/chainuser/.juno/config/app.toml
owner: chainuser
group: chainuser
mode: '0640'
- name: copy juno client.toml
template:
src: juno/{{ chain_network }}/client.toml.j2
dest: /home/chainuser/.juno/config/client.toml
owner: chainuser
group: chainuser
mode: '0640'
- name: copy juno genesis
copy:
src: ./configurations/juno/genesis/{{ chain_network }}.json
dest: /home/chainuser/.juno/config/genesis.json
owner: chainuser
group: chainuser
mode: '0640'
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
$ ansible-playbook deploy-juno.yml --ask-vault-pass
Vault password:
TASK deploy-juno : ensure chainuser user exists **********************************************************************************
changed: node-A
TASK deploy-juno : Create chains directory ******************************************************************************************
changed: node-A
TASK deploy-juno : ensure filehandle limits ******************************************************************************************
changed: node-A
TASK deploy-juno : Disable swap for current session **********************************************************************************
changed: node-A
TASK deploy-juno : make sure junod is offline ****************************************************************************************
ok: node-A
TASK deploy-juno : install junod ***************************************************************************************************
changed: node-A
TASK deploy-juno : create junod service **********************************************************************************************
changed: node-A
TASK deploy-juno : ensure .juno exists **********************************************************************************************
changed: node-A
TASK deploy-juno : copy juno config.toml *********************************************************************************************
changed: node-A
TASK deploy-juno : copy juno app.toml **********************************************************************************************
changed: node-A
TASK deploy-juno : copy juno client.toml *********************************************************************************************
changed: node-A
TASK deploy-juno : copy juno genesis ************************************************************************************************
changed: node-A
TASK deploy-juno : reload systemd ***************************************************************************************************
changed: node-A
TASK deploy-juno : start junod ******************************************************************************************************
changed: node-A
TASK deploy-juno : wait for junod to be ready ****************************************************************************************
changed: node-A
Results and Metrics
After implementing Ansible across our validator network, we observed significant improvements in several key operational metrics:
After going through the process of creating playbooks for all required services we can drastically reduce the time and effort required to maintain all services while being much more flexible and error resilient.
This becomes especially important when operating on multiple different networks which all may have addtional configurations for testnets.
Comparing the before and after metrics we can see the following improvements:
Metric | Before | After |
---|---|---|
Deployment Time | 4-6 hours | 20 minutes |
Configuration Errors | 2-3 per month | None in 6 months |
Recovery Time | 4+ hours | 30 minutes |
Manual Steps | 40+ | 4-5 |
Lessons Learned
- Start Simple: Begin with basic playbooks and iterate
- Repeatability: Make sure playbooks can be run many times safely
- Testing: Use staging environments before production
- Documentation: Maintain detailed documentation alongside code
Conclusion
Our Ansible implementation transformed validator management from an error-prone, time-consuming process to a streamlined operation. The combination of:
- Version-controlled infrastructure
- Automated consistency checks
- Encrypted secret management
- Reusable network templates
has created a foundation for scalable validator operations.
The automated approach has allowed us to drastically reduce the time and effort required to maintain all services while being much more flexible and error resilient.
Need help automating your validator infrastructure? Contact us for a consultation.