The official documentation of Proxmox describes the procedure to remove a cluster node. This article includes some additional steps.
Prerequisites #
The node needs to:
- Be empty. That is, all containers and virtual machines need to be migrated to other nodes.
- Have no replication jobs. Go to the
Replicationmenu entry of the node and make sure it is empty. - Optionally, be offline 1.
Removing the node #
Let’s assume we have a cluster with five nodes, named proxmox1 to proxmox5, and we want to remove the proxmox5 node. We will need to have SSH sessions open with both the node to be deleted (i.e., proxmox5) and any other node (e.g., proxmox1).
In the proxmox1 node, use the pvecm nodes command to list the nodes and identify the one to be removed.
# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 proxmox1 (local)
2 1 proxmox2
3 1 proxmox3
4 1 proxmox4
5 1 proxmox5
Remove the node from the cluster by issuing the pvecm delnode proxmox5 command.
# pvecm delnode proxmox5
Killing node 5
If the node was down while you issued the command, you will receive the message Could not kill node (error = CS_ERR_NOT_EXIST), which can be safely ignored as it does not report an actual failure in the deletion of the node, but rather a failure in corosync trying to kill an offline node.
Check the node list again using the pvecm nodes command.
# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 proxmox1 (local)
2 1 proxmox2
3 1 proxmox3
4 1 proxmox4
Using the SSH session in the now-removed node (e.g., proxmox5), shut it down:
shutdown -h now
The official documentation emphasises that the node, once removed and shut down, should never come back online again (in the same network). To quickly achieve this, you can:
- Remove the server from the virtual switches.
- Use Hetzner’s rescue system to wipe the disks.
Should you need to reinstall it and reuse the same hostname or IP address, make sure you have cleaned up first, as described in the next section.
Cleaning up #
Back to the proxmox1 SSH session, remove the configuration files of the node from the cluster. Because the /etc/pve directory is a cluster-wide filesystem, removal of the configuration directory and fingerprint of the node will automatically be replicated to the rest of nodes.
rm --recursive --force /etc/pve/nodes/proxmox5
At this point, a number of SSH-related files still have the fingerprint of the just-deleted node (e.g., proxmox5 or 10.0.0.5):
| Path to file | Linked by |
|---|---|
/etc/pve/priv/known_hosts |
/etc/ssh/ssh_known_hosts |
/etc/pve/priv/authorized_keys |
/root/authorized_keys |
/root/.ssh/known_hosts |
To clean it up, execute the following commands on the SSH session of the proxmox1 node:
sed -i.bak '/proxmox5\|\b10.0.0.5\b/d' /etc/pve/priv/known_hosts
sed -i.bak '/proxmox5\|\b10.0.0.5\b/d' /etc/pve/priv/authorized_keys
ssh-keygen -f "/root/.ssh/known_hosts" -R "proxmox5"
ssh-keygen -f "/root/.ssh/known_hosts" -R "10.0.0.5"
Again, changes in the files inside /etc/pve/priv/ will be replicated across the cluster. However, changes in the /root/.ssh/known_hosts file will not.
We cannot use ssh-keygen -R on the files inside /etc/pve/priv/ because there are hardlinks pointing at them and the command would fail with the following error:
# ssh-keygen -f "/etc/pve/priv/known_hosts" -R "10.0.0.5"
# Host 10.0.0.5 found: line 25
link /etc/pve/priv/known_hosts to /etc/pve/priv/known_hosts.old: Function not implemented
Final notes #
In Proxmox 8, the link /etc/ssh/ssh_known_hosts no longer exists and, if it does, it can be safely removed using pvecm updatecerts --unmerge-known-hosts.
If you receive an SSH error after rejoining a node with the same IP or hostname, run pvecm updatecerts once on each node to update its fingerprint cluster wide.
If the node that was deleted had a ZFS pool, you need to edit the /etc/pve/storage.cfg file and remove the node name from the nodes key of the zfspool entry.
-
This is how the official documentation recommends to do it. ↩︎