Tuesday, September 13, 2016

Zookeeper Practice



https://community.hortonworks.com/questions/12942/how-to-clean-up-files-in-zookeeper-directory.html

Ongoing Data Directory Cleanup

The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble. These are the snapshot and transactional log files. As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supercedes all previous logs.
A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
In the following example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of <count> should typically be greater than 3 (although not required, this provides 3 backups in the unlikely event a recent log has become corrupted). This can be run as a cron job on the ZooKeeper server machines to clean up the logs daily.

 java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>

java -cp "*" org.apache.zookeeper.server.PurgeTxnLog  ~/src/apache/solr/solr-5.3.0/example/cloud/node1/solr/zoo_data/ ~/src/apache/solr/solr-5.3.0/example/cloud/node1/solr/zoo_data/ -n 3

https://community.hortonworks.com/questions/12942/how-to-clean-up-files-in-zookeeper-directory.html


The ZooKeeper server continually saves znode snapshot files and, optionally, transactional logs in a Data Directory to enable you to recover data. It's a good idea to back up the ZooKeeper Data Directory periodically. Although ZooKeeper is highly reliable because a persistent copy is replicated on each server, recovering from backups may be necessary if a catastrophic failure or user error occurs.
When you use the default configuration, the ZooKeeper server does not remove the snapshots and log files, so they will accumulate over time. You will need to clean up this directory occasionally, taking into account on your backup schedules and processes. To automate the cleanup, a zkCleanup.sh script is provided in the bin directory of thezookeeper base package. Modify this script as necessary for your situation. In general, you want to run this as a cron task based on your backup schedule.
The data directory is specified by the dataDir parameter in the ZooKeeper configuration file, and the data log directory is specified by the dataLogDir parameter.

The ZooKeeper API is built around a ZooKeeper handle that is passed to every API call. This handle represents a session with ZooKeeper.
An object we need to create that will receive session events.

In this algorithm, all potential masters try to create the /master znode, but only one will succeed. The process that succeeds becomes the master.
We need two things to create /master. First, we need the initial data for the znode. Usually we put some information about the process that becomes the master in this initial data
void checkMaster() {
    zk.getData("/master", false, masterCheckCallback, null);
}

public void runForMaster() {
    LOG.info("Running for master");
    zk.create("/master",
            serverId.getBytes(),
            Ids.OPEN_ACL_UNSAFE,
            CreateMode.EPHEMERAL,
            masterCreateCallback,
            null);
}

zk.create("/master", serverId.getBytes(),
OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);

DataCallback masterCheckCallback = new DataCallback() {
  void processResult(int rc, String path, Object ctx, byte[] data,
                     Stat stat) {
      switch(Code.get(rc)) {
      case CONNECTIONLOSS:
          checkMaster();
          return;
      case NONODE:
          runForMaster();
          return;
      }
  }
}

void checkMaster() {
  zk.getData("/master", false, masterCheckCallback, null);
}


void submitTask(String task, TaskObject taskCtx){
    taskCtx.setTask(task);
    zk.create("/tasks/task-",
            task.getBytes(),
            Ids.OPEN_ACL_UNSAFE,
            CreateMode.PERSISTENT_SEQUENTIAL,
            createTaskCallback,
            taskCtx);
}
First, the sequence number will indicate the order in which the tasks were queued. Second, the sequence number will create unique paths for tasks with minimal work.

ZooKeeper helps organize distributed state and provides a framework for handling failures; it doesn’t make failures go away

The primary mechanism ZooKeeper provides to deal with changes is watches. With watches, a client registers its request to receive a one-time notification of a change to a given znode. For example, we can have the primary master create an ephemeral znode representing the master lock, and the backup masters register a watch for the existence of the master lock. Once the backup masters receive their notifications, they can start a new master election by trying to create a new ephemeral znode to represent the master lock

Watches and notifications form a general mechanism that enables clients to observe changes by other clients without having to continually poll ZooKeeper.

The notification comes in the form of a callback to the application client.
Missing events is typically not a problem because any changes that have occurred during the period between receiving a notification and registering a new watch can be captured by reading the state of ZooKeeper directly.

having one notification amortized across multiple events is a positive aspect. It makes the notification mechanism much more lightweight than sending a notification for every event for applications that have a high rate of updates.

There are two types of watches: data watches and child watches. Creating, deleting, or setting the data of a znode successfully triggers a data watch. Both exists and getData set data watches. Only getChildren sets child watches, which are triggered when a child znode is either created or deleted.

The Stat structure contains information about the znode, such as the timestamp of the last change (zxid) that changed this znode and the number of children in the znode.

it is possible for the /master znode to be deleted between the execution of the create callback and the execution of the exists operation. Consequently, we need to check whether stat is null whenever the response from exists is positive (return code is OK). stat is null when the node does not exist.


Curator implements recipes for primitives such as locks, barriers, and caches. For ZooKeeper operations like create, delete, getData, etc., it streamlines programming by allowing us to chain calls, a programming style often called fluent. It also provides namespaces, automatic reconnection, and other facilities that make applications more robust.

zkc.create().withMode(CreateMode.PERSISTENT).forPath("/mypath", new byte[0]);
zkc.create().inBackground().withMode(CreateMode.PERSISTENT).forPath("/mypath",
    new byte[0]);
zkc.getData().inBackground().watched().forPath("/mypath");

client = CuratorFrameworkFactory.newClient(hostPort, retryPolicy);
client.getCuratorListenable().addListener(masterListener);

client.getUnhandledErrorListenable().addListener(errorsListener);

Curator exposes a different set of states than ZooKeeper. It has, for example, a SUSPENDED state, and it uses LOST to represent session expiration.

When dealing with state changes, our recommendation is in general to halt all operations of the master because we do not know if the ZooKeeper client will be able to reconnect before the session expires, and even if it does, the client might not be the primary master any more. It is safer to play conservatively in the case of a disconnection.

Sequential znodes
CreateBuilder provides a withProtection call that tells the Curator client to prefix the sequential znode with a unique identifier. If the create fails, the client retries the operation, and as part of retrying it verifies whether there is already a znode with the unique identifier.

the DeleteBuilder interface defines a guaranteed call.
http://www.voidcn.com/blog/kiss_the_sun/article/p-4913080.html
Curator中选举分为两种: Leader Latch和Leader Election
Leader Latch
很简单的选举算法。随机从候选者中选择一台作为leader, 选中后除非leader自己 调用close()释放leadership,否则其他的候选者不能成为leader

释放leadership
只能通过close()释放leadership, 只有leader将leadership释放时,其他的候选者才有机会被选为leader

leaderLatch.close();
错误处理
LeaderLatch通过增加了一个ConnectionStateListener监听连接问题。如果出现SUSPENDED或者LOST,leader会报告自己不再是leader(直到重新建立连接,否则不会有leader)。如果LOST的连接被重新建立即RECONNECTED,leaderLatch会删除先前的zNode并重新建立zNode。
强烈建议LeaderLatch使用时注册一个ConnectionStateListener。

leaderLatch.start();
一旦启动,LeaderLatch会与其他有相同latchpath的候选者协商,从中随机选中一个作为leader. 可以调用实例的hasLeadership()判断该实例是否为leader。


通过LeaderSelectorListener可以对领导权进行控制, 在适当的时候释放领导权,这样每个节点都有可能获得领导权。 而LeaderLatch一根筋到死, 除非调用close方法,否则它不会释放领导权。
leaderSelector.start();
一旦启动,如果获取了leadership的话,takeLeadership()会被调用,只有当leader释放了leadership的时候,takeLeadership()才会返回。

释放
调用close()释放 leadership

leaderSelector.close();
错误处理
LeaderSelectorListener类继承了ConnectionStateListener。一旦LeaderSelector启动,它会向curator客户端添加监听器。 使用LeaderSelector必须时刻注意连接的变化。一旦出现连接问题如SUSPENDED,curator实例必须确保它可能不再是leader,直至它重新收到RECONNECTED。如果LOST出现,curator实例不再是leader并且其takeLeadership()应该直接退出。
建议重要:推荐的做法是,如果发生SUSPENDED或者LOST连接问题,最好直接抛CancelLeadershipException,此时,leaderSelector实例会尝试中断并且取消正在执行takeLeadership()方法的线程。 建议扩展LeaderSelectorListenerAdapter, LeaderSelectorListenerAdapter中已经提供了推荐的处理方式 。

http://curator.apache.org/curator-recipes/distributed-id-queue.html
https://github.com/yiming187/curator-example/blob/master/src/main/java/com/ctrip/zk/curator/example/DistributedDelayQueueExample.java
https://github.com/yiming187/curator-example/blob/master/src/main/java/com/ctrip/zk/curator/example/LeaderSelectorExample.java

http://www.cnblogs.com/francisYoung/p/5458615.html
原声的ZooKeeper 的CRUD API有同步和异步之分,对于异步API,需要传递AsyncCallback回调。对于getData,getChildren,exists这三个API,还可以设置Watcher。这些功能在Curator中是如何实现的?
在Curator中,可以通过如下三种方式来异步获取结果:
 1.inBackground()+CuratorListener
 2.inBackground(new BackgroundCallback(){ public void processResult(CuratorFramework client,CuratorEvent event){}})
 3.inBackground(newBackgroundCallback(){},Executor)

以inBackground()+CuratorListener这种方式来使用异步API,如下:
复制代码
             client.getCuratorListenable().addListener(new CuratorListener(){

                @Override
                public void eventReceived(CuratorFramework client, CuratorEvent event) throws Exception {
                     // TODO Auto-generated method stub
                    if(event.getType()==CuratorEventType.CREATE){
                        System.out.println("create path="+event.getPath()+",code="+event.getResultCode());
                    }else if(event.getType()==CuratorEventType.GET_DATA){
                        System.out.println("get path="+event.getPath()+",data="+new String(event.getData()));
                    }else if(event.getType()==CuratorEventType.SET_DATA){
                        
                        System.out.println("set path="+event.getPath()+",data="+new String(client.getData().forPath(event.getPath()))/*+",data="+new String(event.getData())*/);
                    }else if(event.getType()==CuratorEventType.DELETE){
                        System.out.println("delete path="+event.getPath());
                    }
                    
    
                }});
复制代码

接下来client的所有以inBackground()方式使用的API,其异步处理结果都是通过这个CuratorListener来处理。

在第二中方式inBackground(BackgroundCallback)中,如下:
复制代码
client.create()
            .creatingParentsIfNeeded()
            .withProtection()
            .withMode(CreateMode.EPHEMERAL)
            .inBackground(new BackgroundCallback(){

                @Override
                public void processResult(CuratorFramework client, CuratorEvent event) throws Exception {
                    // TODO Auto-generated method stub
                  if(event.getType()==CuratorEventType.CREATE){
                      System.out.println("code:"+event.getResultCode()+"path:"+event.getPath()); 
                      
                      //client.getData().inBackground().forPath(event.getPath());
                  }
                }  
              } )
            .forPath("/francis/tmp/a","wbs".getBytes());
          



https://github.com/Netflix/curator/issues/297
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.zookeeper.server.quorum.flexible.QuorumMaj.<init>(Ljava/util/Map;)V
    at org.apache.curator.framework.imps.EnsembleTracker.<init>(EnsembleTracker.java:54)
    at org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:156)
    at org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:136)
    at org.miaohong.jbfs.store.server.controller.Test.main(Test.java:47)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
For this particular issue, you're probably using Curator 3.0 which requires ZK 3.5.x. Use Curator 2.9.x instead.


https://github.com/apache/curator/tree/master/curator-examples/src/main/java/framework
http://www.cnblogs.com/zhwbqd/p/3969161.html
正确设置java max heap size , 对于防止swaping很重要, 频繁的进行磁盘交换将会大幅影响性能, 由于Zk是有序的, 如果一个request hit到磁盘, 那么其他后续的一定也是到磁盘
防止swapping, 尝试设置heap zise到物理内存大小, 留给OS和cache一点空间, 最好的做法是进行性能测试, 如果不行的话, 建议是4G的机器, 3G的heap size, 大约3/4左右
http://www.xuehuile.com/blog/aec50c9a085b42f5a97144fd90551170.html

http://www.xuehuile.com/blog/aec50c9a085b42f5a97144fd90551170.html
http://blog.5ibc.net/p/15486.html
257                List<String> children = curator.getChildren().usingWatcher(new ZKWatcher(parentPath,path)).forPath(path);
258                if(children != null && children.size() > 0) {
259                    for(String child : children) {
260                        String childPath = parentPath + "/" + child;
261                        byte[] b = curator.getData().usingWatcher(new ZKWatcher(parentPath,childPath))
262                                .forPath(childPath);
263                        String value = new String(b,"utf-8");
264                        if(StringUtils.isNotBlank(value)) {
265                            cacheMap.put(childPath, value);
266                        }
267                    }
268                }
269                
curator.delete().inBackground().forPath(path);


http://qindongliang.iteye.com/blog/2122764
  1.         zkclient.setData().forPath(path,content.getBytes());      
  1.         zkclient.delete().guaranteed().deletingChildrenIfNeeded().forPath(path);  
  1.         if(zkclient.checkExists().forPath(path)==null){  
https://cyberroadie.wordpress.com/2011/11/24/implementing-leader-election-with-zookeeper/

http://www.cnblogs.com/good-temper/p/5656866.html
目前分布式系统已经很流行了,一些开源框架也被广泛应用,如dubbo、Motan等。对于一个分布式服务,最基本的一项功能就是服务的注册和发现,而利用zk的EPHEMERAL节点则可以很方便的实现该功能。EPHEMERAL节点正如其名,是临时性的,其生命周期是和客户端会话绑定的,当会话连接断开时,节点也会被删除。下边我们就来实现一个简单的分布式server:
server:
  1. 服务启动时,创建zk连接,并在go_servers节点下创建一个新节点,节点名为"ip:port",完成服务注册
  2. 服务结束时,由于连接断开,创建的节点会被删除,这样client就不会连到该节点
client:
  1. 先从zk获取go_servers节点下所有子节点,这样就拿到了所有注册的server
  2. 从server列表中选中一个节点(这里只是随机选取,实际服务一般会提供多种策略),创建连接进行通信

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts