Random Musings...

Elasticsearch: Shard Allocation Failed

Elasticsearch is great, can handle (data) node failures really well depending on your index settings, replication etc… mostly automagically!

However sometimes when you have a network blip, or hardware issues you can run into problems and it sort of recovers.

Here’s some helpful-ish commands you can run:

  • curl -s -XGET 'localhost:9200/_cluster/health?pretty'
    

    Tells you the state of your cluster, it should be green and you should have no unassigned shards.

  • curl -s -XGET 'localhost:9200/_cluster/allocation/explain?pretty'
    

    Tells you why a shard allocation has failed. You need to sort the underlying issues that this highlights.

  • curl -H 'Content-Type: application/json' -XPUT localhost:9200/_cluster/settings -d '{ "transient" :{ "cluster.routing.allocation.node_concurrent_recoveries" : 16 } }'
    

    This makes recoveries go faster by letting the cluster do more recoveries simultaneously.

  • curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/_cluster/reroute?retry_failed&pretty'
    

    If you’ve sorted all your issues but there’s still unassigned shards for some reason it could be that you’ve hit the max_retry limit (default: 5) and you need to manually kick it into action.