Skip to content
Learning Ansible for Network Automation

Learning Ansible for Network Automation


This project is mostly me learning, a lot of parts in this project are more for me to be able to reference if needed. I also learn best by teaching, and repetition. Documenting my projects kills 2 birds with 1 stone.

This project treats the network as code. Device configurations, VLAN assignments, BGP policies, monitoring targets, and topology definitions are all version controlled, validated through a PR pipeline, and deployed automatically on merge.

This project is documented across 46 parts, starting from a bare control node and working through the full automation platform.


Network topology

All devices run as Containerlab nodes using vrnetlab on a dedicated Proxmox VM. The topology file is version controller, validated by a 3 stage pipeline, and deployed via a diff-aware Gitea Actions job that adds or removes only what changed.

topology


Infrastructure Stack

Each service runs as a Docker container or its own VM. Everything is self-hosted on Proxmox.

ServicePurposeIntegration
NetBoxSource of truthDynamic Ansible inventory, Zabbix and Prometheus auto-registration, Graylog IP enrichment
GiteaSelf hosted Git with branch protectionGPG-signed commits enforced on main, Oxidized cofig backend, Gitea Actions runner
AWXAutomation controller (RBAC, job history, approval gates)Webhook triggered by Graylog alerts, Netbox change events, and Gitea merges
ZabbixNetwork monitoring via SNMP pollingHosts auto-registered from Netbox; interface, BGP, and CPU/memory triggers
Prometheus + GrafanaMetrics scraping and dashboardsSNMP Exporter for devices, Node Exporter for VMs
Graylog + OpenSearchStructured syslog (parsing, routing, alerting)Pipeline rules parse IOS/NX-OS/PAN-OS formates; NetBox lookup table enriches source IPs
OxidizedConfig backup with Git diff historyCommits to Gitea on schedule and immediately on Graylog config change detection
BatfishPre-change network model analysisIngests Oxidized configs; interactive reachability queries before opening a PR
Ansible EDAEvent driven automationReal-time event processing; wired into Graylog alart stream
FreeRADIUS802.1X wired authenticationRADIUS server for campus access switch authentication
tac_plusTACACS+ for AAA nd command authorizationPrivilege levels and command sets per device role
NetdiscoLayer 2/3 discoveryReality check against NetBox intent; drift dtection

Key Capabilities

GitOps Pipeline

Pull requests trigger yamllint and ansible-lint gates, merged is blocked if either fails. Marging to main auto-deploys via a self-hosted Gitea Actions runner. Scoped pipelines mean a campus VLAN change only triggers the campus workflow, not a full fabric run.

Shelf-healing workflows

Graylog detects config change syslog events, fires a webhook to AWX. An automated three-stage workflow backs up the device, collects BGP summary and interface states, and posts a full diagnosis report to Slack.

Pre-change analysis

Before opening a PR, an engineer runs a Python script that loads the current Oxidized configs into Batfish and traces a traffic flow end to end. The output shows the full hop-by-hop forwarding path and which ACL line would block the traffic.

NetBox as source of truth

No static inventory files. NetBox drives dynamic Ansible inventory, Zabbix host registration, Prometheus scrape targets, and Graylog IP enrichment. Adding a device to NetBox propagates to all monitoring tools automatically via webhook-triggered AWX jobs.

Topology as code

The Containerlab topology file is version-controlled alongside playbooks. PRs trigger a three-stage validation: yamllint, a structural Python validator checking every node and link, then containerlab graph. Merges deploy with a diff-aware job that adds or removes only what changed

Pipeline hardening

GPG-signed commits enforced on the main branch. detect-secrets pre-commit hooks block credentials before they reach Gitea. Ansible collections pinned with exact versions and SHA256 checksums, served from a private Gitea package registry mirror.

GitOps Change Pipeline

Feature branch ➦ Pull request ➦ yamllint ➦ ansible-lint ➦ Batfish query ➦ Merge to main ➦ Gitea Actions deploy ➦ Oxidized backup

Self-healing loop

Config change on device ➦ Graylog syslog event ➦ Alert fires ➦ AWX webhook ➦ Stage 1: Backup ➦ Stage 2: Diagnose ➦ Stage 3: Slack report

Topology Change Pipeline

Edit topology-lab.yml ➦ Pull request ➦ yamllint ➦ Python validator ➦ containerlab graph ➦ Merge ➦ Diff-aware deploy ➦ Node reachability verify

Security Practices
GPG-signed commits enforced on protected main branch
detect-secrets pre-commit hook; credentials blocked before reaching Git
Ansible collections pinned with SHA256 checksums and private mirror
Ansible Vault; separate vault IDs per trust boundary
Branch protection; merge requires lint pipeline to pass
TACACS+ with privilege levels and command authorisation per role
SSH hardening, management ACLs, control plane policing on all devices
Secrets remediation process documented; git-filter-repo purge procedure

Screenshots
Grafana: Lab Operations Center

Grafana: Lab Operations Center

BGP session states, interface traffic, VM health, and Loki log stream in a single dashboard.

Graylog: Structured Syslog

Graylog: Structured Syslog

IOS-XE config change event parsed into fields; device_name, event_type, cisco_mnemonic, enriched from NetBox.

Gitea: PR Pipeline Passing

Gitea: PR Pipeline Passing

Yamllint and ansible-lint checks passing on a campus VLAN change PR before merge to main.

AWX: Self-headling Workflow

AWX: Self-headling Workflow

Three-stage remediation workflow: backup → diagnose → notify, triggered by a Graylog config-change alert.


Build Sequence
Phase 1

Foundation

Control node, Ansible fundamentals, Vault secrets management, Containerlab topology, all 7 virtual devices, base IOS-XE / NX-OS / PAN-OS automation, BGP fabric brought up end to end.

Parts 01–16
Phase 2

Infrastructure Platform

Proxmox VM provisioning, Gitea self-hosted Git, NetBox IPAM and dynamic inventory, AWX automation controller, infrastructure hardening.

Parts 17–20
Phase 3

Enterprise Network Services

Campus switching: VLANs, STP, LAG, port security, 802.1X, QoS. Routing; OSPF, BGP policy, HSRP, WAN. Firewall policies, NAT, IPsec VPN, AAA, NTP, SNMP.

Parts 21–34
Phase 4

Observability Stack

Zabbix SNMP monitoring, Prometheus + Grafana + Loki, Graylog + OpenSearch structured logging, stack integration: NetBox → Zabbix sync, unified dashboard, self-healing workflows.

Parts 35–41
Phase 5

GitOps, Hardening, and Automation

Oxidized config backup, GitOps pipeline, Batfish pre-change analysis, CI/CD hardening, topology as code, Netdisco discovery, NetBox reconciliation, AWX workflows, Ansible EDA.

Parts 42-52

Technologies

NETWORKING

Cisco IOS-XE
Cisco NX-OS
Palo Alto PAN-OS
FortiGate
BGP
OSPF
VLANs
STP / RSTP
LACP
HSRP
802.1X
ACLs
NAT
IPsec VPN
QoS
SNMP

AUTOMATION

Ansible
AWX
Ansible EDA
Jinja2
Ansible Vault
Python
pybatfish
Containerlab
vrnetlab

OBSERVABILITY

Prometheus
Grafana
Loki
Zabbix
Graylog
OpenSearch
Oxidized
Batfish
Netdisco
AlertManager

INFRASTRUCTURE

Proxmox
Gitea
Gitea Actions
Docker
NetBox
GPG
detect-secrets
FreeRADIUS
TACACS+

Last updated on • Ernesto Diaz