Cluster Maintenance Mode
Maintenance mode is a safety feature in GreptimeDB that temporarily disables automatic cluster management operations.
This mode is particularly useful during:
- Cluster deployment
- Cluster upgrades
- Planned downtime
- Any operation that might temporarily affect cluster stability
When to Use Maintenance Mode
With GreptimeDB Operator
If you are upgrading a cluster using GreptimeDB Operator, you don't need to enable the maintenance mode manually. The operator handles this automatically.
Without GreptimeDB Operator
When upgrading a cluster without using GreptimeDB Operator, you must manually enable Metasrv's maintenance mode before:
- Deploying a new cluster (maintenance mode should be enabled after metasrv nodes are ready)
- Rolling upgrades of Datanode nodes
- Metasrv nodes upgrades
- Frontend nodes upgrades
- Any operation that might cause temporary node unavailability
After the cluster is deployed/upgraded, you can disable the maintenance mode.
Impact of Maintenance Mode
When maintenance mode is enabled:
- Auto Balancing (if enabled) will be paused
- Region Failover (if enabled) will be paused
- Manual region operations are still possible
- Read and write operations continue to work normally
- Monitoring and metrics collection continue to function
Managing Maintenance Mode
The maintenance mode can be enabled and disabled through Metasrv's HTTP interface at: http://{METASRV}:{RPC_PORT}/admin/maintenance/enable
and http://{METASRV}:{RPC_PORT}/admin/maintenance/disable
. Note that this interface listens on Metasrv's RPC_PORT
, which defaults to 3002
.
Enable Maintenance Mode
After calling the maintenance mode interface, ensure you check that the HTTP status code returned is 200 and confirm that the response content meets expectations. If there are any exceptions or the interface behavior does not meet expectations, proceed with caution and avoid continuing with high-risk operations such as cluster upgrades.
Enable maintenance mode by sending a POST request to the /admin/maintenance/enable
endpoint.
curl -X POST 'http://localhost:3002/admin/maintenance/enable'
The expected output is:
{"enabled":true}
If you encounter any issues or unexpected behavior, do not proceed with maintenance operations.
Disable Maintenance Mode
Before disabling maintenance mode, you must confirm that all components have returned to normal status.
Disable maintenance mode by sending a POST request to the /admin/maintenance/disable
endpoint.
Before disabling maintenance mode:
- Ensure all components are healthy and operational
- Verify that all nodes are properly joined to the cluster
curl -X POST 'http://localhost:3002/admin/maintenance/disable'
The expected output is:
{"enabled":false}
Check Maintenance Mode Status
Check maintenance mode status by sending a GET request to the /admin/maintenance
endpoint.
curl -X GET http://localhost:3002/admin/maintenance/status
The expected output is:
{"enabled":false}
Troubleshooting
Common Issues
- Maintenance mode cannot be enabled
- Verify Metasrv is running and accessible
- Check if you have the correct permissions
- Ensure the RPC port is correct
Best Practices
- Always verify the maintenance mode status before and after operations
- Have a rollback plan ready
- Monitor cluster health during maintenance
- Document all changes made during maintenance