GSoC/GCI Archive
Google Summer of Code 2010 Apache Software Foundation

ZooKeeper Failure Detector Model

by Abmar Barros for Apache Software Foundation

ZooKeeper servers detect the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method and it works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations. This project's goals are to abstract the failure detector to a separate module, to implement several failure detectors and to compare their appropriateness for ZooKeeper.