Thursday, December 20, 2018

Ivy “Impossible to Acquire Lock”
Easy way to get IVY to release all locks.
find ~/.ivy -name '*.lck' -exec rm -f {} \;
Java thread dump: WAITING (on object monitor) - what is it waiting on?

It's a peculiarity of HotSpot JVM. When dumping a stack, JVM recovers the wait object from the method local variables. This info is available for interpreted methods, but not for compiled native wrappers.
When Object.wait is executed frequently enough, it gets JIT-compiled.
After that there will be no "waiting on" line in a thread dump.
  1. Since wait() must be called on a synchronized object, most often the wait object is the last locked object in the stack trace. In your case it is
    - locked <0x00007f98cbebe5d8> (a com.tibco.tibjms.TibjmsxResponse)
  2. To prevent Object.wait from being JIT-compiled (and thus making wait info always available) use the following JVM option
       :::: ERRORS
               impossible to acquire lock for org.grails#grails-docs;1.3.6
find ~/.gradle/caches/artifacts/ -name "*.lck" 
Removing it while a milestone-4 process was hanging, it continued soon with no problems.
You can set ANT_OPTS parameters. For example in Windows
SET ANT_OPTS=-Xmx1024m -XX:MaxPermSize=256m
or just use maxmemory attribute of junit ant task (
<junit maxmemory="1024m" ...
<junit printsummary="true" fork="yes" haltonfailure="false" failureproperty="junitsFailed" errorProperty="junitsFailed">
    <jvmarg value="-Xmx1024m"/>
    <jvmarg value="-Duser.timezone=GMT0"/>

Tuesday, November 13, 2018

Service Discovery
  • 一个 server 起来后,它立刻就能够接收所属 service 的请求
  • 能够为整个 service 做 load balancing
  • server 出问题时,能自动停止分配 request 给这台 server
  • 移除一台 server 不会影响整个 service 的正常工作。这样还能提高 debugibility
  • 能提供 observibility,保证工程师能够看到 service 的工作情况, server 的情况
  • 改变 service 的配置时,需要的人工干涉能够尽量少,比如添加一个 service ,添加一台 server
  • 不要有 single point of failure




把服务按照 subdomain 来划分。新添加一台 server 时,直接增加一个 DNS record。这样 request 到来时,DNS 可以随机分配 request 到其下的 server。
  1. DNS 的解析是会被 client 缓存住的。如果 server 发生改变, client 需要刷新缓存,否则还是可能把 request 发送到已经 down 了的 server 上。这增加了 service 的不确定性。而其实,在整个架构的各个环节,只要有 component 缓存了这个 DNS 的解析,都会出问题,而且很难 debug
  2. 另外,DNS 这种随机分配的方式还不同于 round robin。它的 load balancing 方面不保证把 request 发给相对空闲的 server。

Load Balancer

如果使用全局的 load balancer,也就是说, services 之间的请求,都要通过 load balancer 来进行路由。每个 service 只需要保证能联络 load balancer 就行了。如何做到这一点呢?你可能会想到通过 DNS。但这样并没解决问题。如果 load balancer down了,你还是需要通过配置 DNS 来路由到另一台 load balancer。而 DNS 的问题(比如随机分配)在上面已经讲过了。

在 app 内做 Service 注册和发现

一个流行的解决方案,就是把这个 service 注册和发现的逻辑写进 app 的代码里面。在 Airbnb ,从前在一些 java app 里写了 service 注册逻辑。使用 Zookeeper 来管理所有 service。Zookeeper 拥有一份后端所有 services 的列表。service 间歇性的向 zookeeper 请求这份列表,来获得其他 services 的信息。
这个方案也有些问题。Zookeeper 对非 JVM 语言的支持并不好。对那些语言,需要自己写 zookeeper client。有时候甚至需要支持其他团队的项目,也就是说不是你本人写的,你只是负责配置。这也会带来麻烦。
Airbnb 开发的这个 SmartStack,把这个 service discovery 的问题和 app 本身完全独立出来。它由两个部分组成: Nerve 用来做 service registration,Synapse 用来做 service discovery。这两个部分变成两个 app 和主 app 部署在一起。


Nerve 负责把自己 app 的信息发给 Zookeeper。这与 java zookeeper client 很相似。
而在注册自己之前, Nerve 有一个 health checklist 需要检测。只有当所有 health check 都通过,才能注册自己。这也就要求,所有 service 都实现了被 health check 的功能。


Synapse 实现 service discovery 的办法是,通过从 Zookeeper 获取其他 services 的信息,在本地配置并运行了一个 HAProxy。由这个HAProxy 来负责 routing request 这件事情。
这其实就是把一个 centralised load balancer 变成了 decentralised。app 的 load balancer 之间在 localhost 运行了,所以也不存在找不到的情况。Synapse 其实只是一个中间进程,和 HAProxy独立。如果 Synapse down 了, app 不能获得 services 信息的更新,但从前的信息还在 HAProxy。通过这种工作方式,还实现了强大的 logging 功能,复杂的 routing 算法,队列算法, retry 机制, timeout 机制。
当其他 service 信息发生改变的时候,Synapse 会重新生成 HAProxy 配置文件,并重启 HAProxy。其他 service down 的时候, Synapse 会把对应的 HAProxy 的 stats socket 变成维护模式。

使用 SmartStack 的好处

  • Zookeeper 会间歇性对 service 做 health check。一旦 service 可用,Zookeeper 会通知其他 server 的 Synapse
  • 如果 server 出现问题,在下一次 health check 时会被 Zookeeper 发现
  • Synapse 接收到 Zookeeper 的更新后会重新生成 HAProxy 配置文件,然后重新读入配置文件,而不需要重启 HAProxy。因此这个过程非常快
  • 关闭 Nerve 即可从集群移除一台 server
  • 通过 HAProxy 的管理页面可以看到各台 server 的状态,observibility
  • 架构是基于 Zookeeper 搭建的,是 robust 分布式系统

Postgresql Advanced Usage
since we really only care about updating the cache when the database is updated, we can let the database itself update the caches by broadcasting when a change has been made. Postgresql provides functionality for a publish-subscribe pattern called LISTEN/NOTIFY. Like any pub-sub implementation, LISTEN/NOTIFY allows you to set channels on which the database can broadcast some text. Others can then listen on those channels and receive information asynchronously. Postgresql stores all the NOTIFY’s in a queue and drops them only when all registered listeners have received them. It is something to keep in mind because that queue can fill up if a listener fails which will cause an error in Postgresql on the next notify. Lastly, we can build a simple trigger in Postgresql that will NOTIFY on inserts to a table.

For example, let’s say we have an application that keeps track of employees and the departments they belong to. Each department has an employee designated as the manager of that department. For processing purposes, it’d be helpful if we kept a directory in memory of all the employees and who their department manager is.

CREATE OR REPLACE FUNCTION new_hire_notify() RETURNS trigger AS $$
    payload varchar;
    mid uuid;
    SELECT manager_id INTO mid FROM departments
    WHERE id=NEW.department;
    payload = CAST( AS text) ||
    ‘, ‘ || CAST(mid AS text);
    PERFORM pg_notify(‘new_hire’, payload);
$$ LANGUAGE plpgsql;
CREATE TRIGGER value_insert
ON employees
  EXECUTE PROCEDURE new_hire_notify();

Then we create a new listener connection, which is a separate TCP connection to Postgresql. On that connection, we can then specify channels to listen to. We can subscribe to multiple channels on the same listener by calling listener.Listen on as many channels as we need. Finally, we pass the listener to the Cache.Listen method, and spin it off into a Go routine.
MySQL has a binlog replication protocol which is used for primary/secondary replication. This is essentially a replicated queue that has all the transactions recorded in-order as shown in Figure 4.
This isn’t a popular solution but I say, why not? It works very well. You can write an application that can speak the MySQL binlog replication protocol that consumes the binlog entries and execute SET operations against the cache service(s). There are two ways you could consume the binlog data.
  • Interpret the raw SQL syntax and issue SET operations.
  • The web application embeds cache keys as a comment in the SQL.

Both of these options are good because you can even get the transaction scope of each transaction in the binlog statements if you need to and if the target system supports atomic multi-set operations. I prefer the 2nd option because it’s easier to parse and the application already has this information in most cases.
For an actively connected client interested in cache invalidation techniques:
  • SQL Server support Query Notifications
  • Oracle support Continuous Query Notifications.
  • MySQL supports replication streams, but they're not the same as update notification: they cannot be set up and tear down dynamically for a client and they do not work well for monitoring individual rows a particular application is interested in.
For a disconnected client interested in data sync, all vendors support some sort of replication.

Taobao Single's Days







2、限流,单个服务单个容器能够处理的量是有限的,单机压出上限后,增加限流google guava包有相应方法,直接用即可。







Monday, November 12, 2018

Spring Cloud
  • 首先,如果你对某个接口定义了@FeignClient注解,Feign就会针对这个接口创建一个动态代理
  • 接着你要是调用那个接口,本质就是会调用 Feign创建的动态代理,这是核心中的核心
  • Feign的动态代理会根据你在接口上的@RequestMapping等注解,来动态构造出你要请求的服务的地址
  • 最后针对这个地址,发起请求、解析响应
  • 192.168.169:9000
  • 192.168.170:9000
  • 192.168.171:9000
  • 192.168.172:9000
  • 192.168.173:9000

  • 这时Spring Cloud Ribbon就派上用场了。Ribbon就是专门解决这个问题的。它的作用是负载均衡,会帮你在每次请求时选择一台机器,均匀的把请求分发到各个机器上
  • Ribbon的负载均衡默认使用的最经典的Round Robin轮询算法。这是啥?简单来说,就是如果订单服务对库存服务发起10次请求,那就先让你请求第1台机器、然后是第2台机器、第3台机器、第4台机器、第5台机器,接着再来—个循环,第1台机器、第2台机器。。。以此类推。

  • 首先Ribbon会从 Eureka Client里获取到对应的服务注册表,也就知道了所有的服务都部署在了哪些机器上,在监听哪些端口号。
  • 然后Ribbon就可以使用默认的Round Robin算法,从中选择一台机器
  • Feign就会针对这台机器,构造并发起请求。
五、Spring Cloud核心组件:Hystrix

  1. 如果系统处于高并发的场景下,大量请求涌过来的时候,订单服务的100个线程都会卡在请求积分服务这块。导致订单服务没有一个线程可以处理请求
  2. 然后就会导致别人请求订单服务的时候,发现订单服务也挂了,不响应任何请求了

  • 我们结合业务来看:支付订单的时候,只要把库存扣减了,然后通知仓库发货就OK了
  • 如果积分服务挂了,大不了等他恢复之后,慢慢人肉手工恢复数据!为啥一定要因为一个积分服务挂了,就直接导致订单服务也挂了呢?不可以接受!





六、Spring Cloud核心组件:Zuul




  • Eureka:各个服务启动时,Eureka Client都会将服务注册到Eureka Server,并且Eureka Client还可以反过来从Eureka Server拉取注册表,从而知道其他服务在哪里
  • Ribbon:服务间发起请求的时候,基于Ribbon做负载均衡,从一个服务的多台机器中选择一台
  • Feign:基于Feign的动态代理机制,根据注解和选择的机器,拼接请求URL地址,发起请求
  • Hystrix:发起请求是通过Hystrix的线程池来走的,不同的服务走不同的线程池,实现了不同服务调用的隔离,避免了服务雪崩的问题
  • Zuul:如果前端、移动端要调用后端系统,统一从Zuul网关进入,由Zuul网关转发请求给对应的服务


Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts