Tuesday, September 4, 2018

Java GC Troubleshooting



https://blog.takipi.com/5-tips-for-reducing-your-java-garbage-collection-overhead/
Tip #1: Predict Collection Capacities
All standard Java collections, as well as most custom and extended implementations (such as Trove and Google’s Guava), use underlying arrays (either primitive- or object-based). Since arrays are immutable in size once allocated, adding items to a collection may in many cases cause an old underlying array to be dropped in favor of a larger newly-allocated array.

Most collection implementations try to optimize this re-allocation process and keep it to an amortized minimum, even if the expected size of the collection is not provided. However, the best results can be achieved by providing the collection with its expected size upon construction.
Tip #2: Process Streams Directly
Tip #5: Use Specialized Primitive Collections


https://mp.weixin.qq.com/s/STY6dQlW0XnBe_CHVgotOQ
如下图所示,在上一次YGC之后,from space的使用率是12%,但是在下一次YGC准备发生的时候,发现from space的使用率变成了99%。

大家知道Java Heap主要由新生代和老生代组成,而新生代又分别由eden+s0(from space)+s1(to space)构成,通常情况下s0或者s1有一块是空的,主要用来做GC copy。
当我们创建一个对象的时候,会申请分配一块内存,这块内存主要在新生代里分配,并且是在eden里分配,当然某些特殊情况可以直接到老生代去分配,按照这种规则,正常情况下怎么也轮不到到from space去分配内存,因此在上次GC完之后到下次GC之前不可能去from space分配内存。

我拿到GC日志后,第一件事就是找到对应的GC日志上下文,这种诡异的现象到底是偶尔发生的还是一直存在,于是我整个日志搜索from space 409600K,  99%,找到第一次情况发生的位置,发现并不是一开始就有这种情况的,而是到某个时候才开始有,并且全部集中在中间某一段时间里,那我立马看了下第一次发生的时候的上下文,发现之前有过一次Full GC和一次CMS GC
首先从日志上我们排除了GC Locker的问题,如果是GC Locker,那在JDK8下默认会打印出相关的cause,但是实际上gc发生的原因是因为分配失败所致,于是重点落在了should_allocate_from_space
bool should_allocate_from_space() const {
    return _should_allocate_from_space;
}
这是gc发生之后的一些处理逻辑,并且是full为true的情况,那意味着肯定是Full GC发生之后才有可能设置这个属性set_should_allocate_from_space(),并且也只有在Full GC之后才可能清理这个属性clear_should_allocate_from_space(),那基本就和我们的现象吻合了。
那是不是所有的Full GC发生之后都会这样呢,从上面的代码来看显然不是,只有当!collection_attempt_is_safe() && !_eden_space->is_empty()为true的时候才会有这种情况,这里我简单说下可能的场景,当我们因为分配内存不得已发生了一次Full GC的时候,发现GC效果不怎么样,甚至eden里还有对象,老生代也基本是满的,老生代里的内存也不足以容纳eden里的对象,此时就会发生上面的情况。
不过随着时间的推移,有可能接下来有好转,比如做一次CMS GC或许就能把老生代的一些内存释放掉,那其实整个内存就又恢复了正常,但是这带来的一个问题就是发现后面经常会发生从from space里分配内存的情况,也就是我们这次碰到的问题,直到下次Full GC发生之后才会解封,所以我们哪怕执行一次jmap -histo:live也足以解封。

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts