Weathering the storm - Managed IT Services

How to mitigate the impact of mass reboots

Planned or unplanned power events affecting your miners can wreak havoc on your network. Even locally isolated power events can have a negative impact on your seeming healthy miners. When thousands of miners reboot, or power up, at the same time, the impact on L2 switches mac address tables, ARP tables, and Firewall NAT tables is significant. This is a very real operational problem in large-scale Bitcoin mining farms.

During a simultaneous reboot of thousands of devices, there’s a cascade of effects at every layer of the network stack — particularly Layer 2 (MAC tables), Layer 3 (ARP tables), and firewall NAT tables. The result is long delays in contacting mining pools after a planned curtailment or unexpected power event. Sometimes, the effects can be longer, or seemingly permanent as network devices can go into a death spiral of exchanging invalid mac entries or arp tables– affecting hashrate as packets are dropped as a result of the surge in broadcast traffic. Here’s how your network can be affected.

Layer 2 Switch MAC Address Tables

Effect

Each miner’s network interface sends gratuitous ARP / DHCP / broadcast traffic shortly after reboot. The switch must relearn thousands of MAC addresses almost at once. If your switch has a limited MAC address table size (e.g., 16 K or 32 K entries, typical for many enterprise L2 switches), it can overflow.

Impact

MAC flapping or aging-out churn: entries rapidly time out and relearn, consuming CPU.

Temporary flooding: until the MAC-to-port mapping is relearned, frames get flooded within the VLAN.

Increased control-plane load on stackable or fabric switches (especially when STP, LLDP, or IGMP snooping are active).

Mitigation

Use access switches with sufficient CAM (MAC) table size

Segregate miners into multiple VLANs, each with a manageable number of MACs

Adjust mac-address-table aging-time so entries persist longer across short reboots.

Disable unnecessary L2 protocols (LLDP, STP) on miner-facing ports.

Layer 3 ARP Tables on Routers / Gateways

Effect

Each miner sends a DHCP request and ARP probe for its assigned IP after boot. The gateway must resolve and cache thousands of ARP entries simultaneously.

Impact

ARP table exhaustion: many routers default to a few thousand entries; mining farms can exceed that easily.
Control-plane CPU spikes due to ARP resolution storms.

Packet loss during convergence if ARP cache limits are reached or aged entries are replaced too quickly.

Mitigation

▪ Use routers or L3 switches with large ARP table capacity (hundreds of thousands of entries).

▪ Stagger miner reboots (power-sequencing, PDUs, or firmware scripts).

▪ Increase ARP timeout values (e.g., from 4 h → 24 h) to reduce churn

▪ Optionally use static ARP entries or DHCP bindings for critical infrastructure devices.

Firewall / NAT Tables

Effect

Thousands of outbound connections (to mining pools, telemetry, update servers) are re-established almost simultaneously. Each creates a NAT translation entry (and often TCP states).

If all miners use the same public IP (PAT), the firewall tracks tens of thousands of concurrent connections.

Impact

NAT table overflow → drops or slow connections. Session tracking CPU spikes (especially with stateful inspection). Uneven hash utilization in multi-core firewalls (one core overloaded). Timeout churn when many miners reconnect repeatedly (e.g., after DHCP renewals).

Mitigation

▪ Use high-capacity firewalls (millions of NAT sessions supported).

▪ Increase NAT table size or connection limits per interface or rule.

▪ Aggregate miners behind multiple source IPs (e.g., NAT pools per VLAN).

▪ Consider stateless NAT or bypass firewall inspection entirely for mining pool traffic if the security model allows.
▪ Adjust NAT timeout values to avoid churn on long-lived TCP sessions.

Conclusion

Consider Power Sequencing: reboot miners in waves (e.g., 100–200 devices per minute) using PDUs or management scripts. Use DHCP Reservations to avoid full rediscovery each time there is a large event.

Monitor CAM/ARP/NAT utilization proactively — thresholds can predict overload.

Consider bypassing the firewall for non-critical miner traffic (use ACLs or upstream filtering or local services such as DNS and NTP instead).

Let Wekos help

Wekos can make your network industrial strength – strong enough to weather the storms that may be affecting you. We have the experience and expertise to tune your network to prevent hash stealing network misconfigurations negatively impacting your bottom line.

Contact

How to mitigate the impact of mass reboots

Layer 2 Switch MAC Address Tables

Effect

Impact

Mitigation

Layer 3 ARP Tables on Routers / Gateways

Effect

Impact

Mitigation

Firewall / NAT Tables

Effect

Impact

Mitigation

Conclusion

Let Wekos help

Related Posts