Taming systemd Restart Policies to Prevent Service Chaos -

Introduction to systemd Restart Policies

I’ve seen systemd restart policies go wrong when not properly configured, leading to a never-ending cycle of restarts without resolving the underlying issue. To avoid this chaos, it’s essential to understand how systemd manages system services, including starting, stopping, and restarting them as needed. The key to taming these restart policies lies in understanding how systemd service files work and how to configure them effectively.

Understanding systemd Service Files

Systemd service files, typically located in /etc/systemd/system/ or /usr/lib/systemd/system/, define the behavior of a service, including how it should be started, stopped, and restarted. The [Service] section is where you specify the restart policy using the Restart directive. For example:

[Service]
Restart=always

Don’t bother with always unless you have a good reason, as it can lead to the aforementioned restart chaos. Instead, consider more nuanced options, which we’ll explore later.

Configuring Restart Policies

Systemd provides several restart policies, each with its use case:

always: Restart the service regardless of the exit code.
on-success: Restart the service only if it exits successfully (exit code 0).
on-failure: Restart the service only if it exits with a non-zero exit code.
on-abnormal: Restart the service only if it exits abnormally (e.g., due to a signal).
on-abort: Restart the service only if it exits due to an abort signal.
on-watchdog: Restart the service only if it exits due to a watchdog timeout.
no: Never restart the service.

When configuring a restart policy, edit the service file and update the Restart directive. For instance, to set the restart policy to on-failure, use:

[Service]
Restart=on-failure

In practice, on-failure is a good default, as it balances reliability with the need to prevent unnecessary restarts.

Troubleshooting Restart Issues

If a service is constantly being restarted, check the following:

System logs: Look for error messages related to the service in /var/log/syslog.
Service status: Use systemctl status to check the service status and error messages.
Restart counters: Use systemctl show to check the restart counters, like this:

systemctl show --property=RestartUSec myservice

This will show you the time of the last restart, helping you identify patterns or issues.

Security Considerations

When configuring restart policies, consider the security implications. An attacker could exploit a service configured to always restart to launch a denial-of-service (DoS) attack. To mitigate this risk, use the StartLimitBurst and StartLimitInterval directives to limit restarts within a given time period. For example:

[Service]
Restart=always
StartLimitBurst=5
StartLimitInterval=30s

This limits the service to 5 restarts within a 30-second interval, preventing abuse.

Best Practices

To get the most out of systemd restart policies:

Monitor service logs: Regularly check service logs to detect potential issues before they become critical.
Test restart policies: Verify your restart policies are working as expected.
Use sane defaults: Use reasonable defaults, like on-failure, to prevent unnecessary restarts.
Limit restarts: Limit the number of restarts within a given time period to prevent DoS attacks.

For more information on systemd, visit the systemd.io website or check out the freedesktop.org wiki.