Imran Hussain

Imran Hussain

Random Musings...

© 2020

Elasticsearch: Shard Allocation Failed

Elasticsearch is great, can handle (data) node failures really well depending on your index settings, replication etc… mostly automagically!

However sometimes when you have a network blip, or hardware issues you can run into problems and it sort of recovers.

Here’s some helpful-ish commands you can run:

curl -s -XGET 'localhost:9200/_cluster/health?pretty'

Tells you the state of your cluster, it should be green and you should have no unassigned shards.

curl -s -XGET 'localhost:9200/_cluster/allocation/explain?pretty'

Tells you why a shard allocation has failed. You need to sort the underlying issues that this highlights.

curl -H 'Content-Type: application/json' -XPUT localhost:9200/_cluster/settings -d '{ "transient" :{ "cluster.routing.allocation.node_concurrent_recoveries" : 16 } }'

This makes recoveries go faster by letting the cluster do more recoveries simultaneously.

curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/_cluster/reroute?retry_failed&pretty'

If you’ve sorted all your issues but there’s still unassigned shards for some reason it could be that you’ve hit the max_retry limit (default: 5) and you need to manually kick it into action.