Thursday, September 24, 2015

The Art of Readable Code



1. Code Should Be Easy to Understand

I. Surface-Level Improvements
2. Packing Information into Names
3. Names That Can’t Be Misconstrued
5. Knowing What to Comment
6. Making Comments Precise and Compact

II. Simplifying Loops and Logic
7. Making Control Flow Easy to Read
8. Breaking Down Giant Expressions
9. Variables and Readability
III. Reorganizing Your Code
10. Extracting Unrelated Subproblems
11. One Task at a Time
12. Turning Thoughts into Code
13. Writing Less Code
http://beiyuu.com/readable-code/
http://perthcharles.github.io/
Code should be written to minimize the time it would take for someone else to understand it.

使用含义明确的词,比如用download而不是get

避免通用的词

tmpretval这样词,除了说明是临时变量和返回值之外,没有任何意义。但是给他加一些有意义的词,就会很明确:
  1. tmp_file = tempfile.NamedTemporaryFile()
  2. ...
  3. SaveData(tmp_file, ...)
不使用retval而使用变量真正代表的意义:
  1. sum_squares += v[i]; // Where's the "square" that we're summing? Bug!
嵌套的for循环中,ij也有同样让人困惑的时候:
  1. for (int i = 0; i < clubs.size(); i++)
  2. for (int j = 0; j < clubs[i].members.size(); j++)
  3. for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
  4. cout << "user[" << j << "] is in club[" << i << "]" << endl;
换一种写法就会清晰很多:
  1. if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.
所以,当使用一些通用的词,要有充分的理由才可以。
使用具体的名字

CanListenOnPort就比ServerCanStart好,can start比较含糊,而listen on port确切的说明了这个方法将要做什么。

--run_locally就不如--extra_logging来的明确。

增加重要的细节,比如变量的单位_ms,对原始字符串加_raw

Boolean型变量命名
bool read_password = true;
这是一个很危险的命名,到底是需要读取密码呢,还是密码已经被读取呢,不知道,所以这个变量可以使用user_is_authenticated代替。通常,给Boolean型变量添加is、has、can、should可以让含义更清晰,比如:

             SpaceLeft()  -->  hasSpaceLeft()
bool disable_ssl = false  -->  bool use_ssl = true
1.选择专业的词
举例:
int size() => int Height()/NumNodes()/MemoryBytes()
简单的使用size不能直接反应出返回值使什么的size。
如果可以的话选用更具体、更符合情景的专业词汇对于可读性有较大的帮助。

2.避免使用tmp和retval这样泛泛的变量/函数名
除非有十分好的理由,比如常见的swap函数中的tmp。

3.用具体的名字代替抽象的名字
书中有一个插图很好的阐释了这点:动物和危险动物

4.给变量带上重要的细节
比如添加单位等,可以参考微软的“匈牙利表示法”

符合预期
在这个例子中,getMean方法遍历了所有的样本,返回总额,所以并不是普通意义上轻量的get方法,所以应该取名computeMean比较合适。

1.推荐用min和max来表示(包含)极限
2.推荐用first和last来表示包含的范围
3.推荐用begin和end来表示包含/排除范围
4.要小心用户对特定词的期望。例如,用户会期望get()
或者size()是轻量的方法。

注释
不要给烂取名注释
注释的大部分都在解释clean是什么意思,那不如换个正确的名字:
void EnforceLimitsFromRequest(Request request, Reply reply);

记录你的想法

可用的词有:

TODO  : Stuff I haven't gotten around to yet
FIXME : Known-broken code here
HACK  : Adimittedly inelegant solution to a problem
XXX   : Danger! Major problem here

站在读者的角度去思考
当别人读你的代码时,让他们产生疑问的部分,就是你应该注释的地方。

说明可能陷阱

总结性的注释
精简注释
用实例说明边界情况
明你的代码的真正目的

函数调用时的注释
Connect(timeout = 10, use_encryption = False)

简化循环和逻辑
流程控制要简单
运算符左边:通常是需要被检查的变量,也就是会经常变化的
运算符右边:通常是被比对的样本,一定程度上的常量
这就解释了为什么bytes_received < bytes_expected比反过来更好理解。

if/else的顺序
正向的逻辑在前,比如if(debug)就比if(!debug)好
简单逻辑的在前,这样if和else就可以在一个屏幕显示
有趣、清晰的逻辑在前

使用三目运算符可以减少代码行数,上例就是一个很好的例证,但是我们的真正目的是减少读代码的时间, 所以只在简单表达式的地方用。

避免使用do/while表达式

尽早return
减少嵌套

拆分复杂表达式
不要滥用逻辑表达式
用变量

变量与可读性
消除变量
没用的临时变量

消除条件控制变量
减少变量的作用域
变量最好只写一次
同一个变量,不停的被赋值也会让读者头晕,如果变量变化的次数少一些,代码可读性就更强。


重新组织你的代码

分离不相关的子问题
看看这个方法或代码,问问你自己“这段代码的最终目标是什么?”
对于每一行代码,要问“它与目标直接相关,或者是不相关的子问题?”
如果有足够多行的代码是处理与目标不直接相关的问题,那么抽离成子函数

业务相关的函数
那些与目标不相关函数,抽离出来可以复用,与业务相关的也可以抽出来,保持代码的易读性,

简化现有接口

写更少的代码
最易懂的代码就是没有代码!

去掉那些没意义的feature,也不要过度设计
重新考虑需求,解决最简单的问题,也能完成整体的目标
熟悉你常用的库,周期性研究他的API
1.在项目开始时写的代码只有很少的代码留到最终版本中

2.写代码前,需要反复考虑和质疑的你需求
很多时候你认为需要的功能,到最后都没有实现的必要

3.用最合适的方式实现你的需求
书中有一个关于缓存的例子。如果数据的重复访问只处在一行,那么没有必要
实现一个复杂的LRU策略。

4.使用UNIX工具而非编写代码

Designing and Implementing a “Minute/Hour Counter”
// Track the cumulative counts over the past minute and over the past hour.
// Useful, for example, to track recent bandwidth usage.
class MinuteHourCounter {
    // Add a new data point (count >= 0).
    // For the next minute, MinuteCount() will be larger by +count.
    // For the next hour, HourCount() will be larger by +count.
    void Add(int count);

    // Return the accumulated count over the past 60 seconds.
    int MinuteCount();

    // Return the accumulated count over the past 3600 seconds.
    int HourCount();
};

An Easier-to-Read Version
class MinuteHourCounter {
    list<Event> events;

    int CountSince(time_t cutoff) {
        int count = 0;
        for (list<Event>::reverse_iterator rit = events.rbegin();
             rit != events.rend(); ++rit) {
            if (rit->time <= cutoff) {
                break;
            }
            count += rit->count;
        }
        return count;
    }

  public:
    void Add(int count) {
        events.push_back(Event(count, time()));
    }

    int MinuteCount() {
        return CountSince(time() - 60);
    }

    int HourCount() {
        return CountSince(time() - 3600);
    }
};

Better to have all the difficult code confined to one place.

Because “traditional” for loops of the form for(begin; end; advance) are easiest to read. The reader can immediately understand it as “go through all the elements” and doesn’t have to think about it further.

Attempt 2: Conveyor Belt Design
class MinuteHourCounter {
    list<Event> minute_events;
    list<Event> hour_events;  // only contains elements NOT in minute_events

    int minute_count;
    int hour_count;  // counts ALL events over past hour, including past minute
};

void Add(int count) {
    const time_t now_secs = time();
    ShiftOldEvents(now_secs);

    // Feed into the minute list (not into the hour list--that will happen later)
    minute_events.push_back(Event(count, now_secs));

    minute_count += count;
    hour_count += count;
}

int MinuteCount() {
    ShiftOldEvents(time());
    return minute_count;
}

int HourCount() {
    ShiftOldEvents(time());
    return hour_count;
}
 // Find and delete old events, and decrease hour_count and minute_count accordingly.
void ShiftOldEvents(time_t now_secs) {
    const int minute_ago = now_secs - 60;
    const int hour_ago = now_secs - 3600;

    // Move events more than one minute old from 'minute_events' into 'hour_events'
    // (Events older than one hour will be removed in the second loop.)
    while (!minute_events.empty() && minute_events.front().time <= minute_ago) {
        hour_events.push_back(minute_events.front());

        minute_count -= minute_events.front().count;
        minute_events.pop_front();
    }

    // Remove events more than one hour old from 'hour_events'
    while (!hour_events.empty() && hour_events.front().time <= hour_ago) {
        hour_count -= hour_events.front().count;
        hour_events.pop_front();
    }
}

Attempt 3: A Time-Bucketed Design
The key idea is to bucket all the events within a small time window together, and summarize those events with a single total. For instance, the events over the past minute could be inserted into 60 discrete buckets, each 1 second wide. The events over the past hour could also be inserted into 60 discrete buckets, each 1 minute wide.
this design has a fixed, predictable memory usage.

class MinuteHourCounter {
    TrailingBucketCounter minute_counts;
    TrailingBucketCounter hour_counts;

  public:
    MinuteHourCounter() :
        minute_counts(/* num_buckets = */ 60, /* secs_per_bucket = */ 1),
        hour_counts(  /* num_buckets = */ 60, /* secs_per_bucket = */ 60) {
    }

    void Add(int count) {
        time_t now = time();
        minute_counts.Add(count, now);
        hour_counts.Add(count, now);
    }

    int MinuteCount() {
        time_t now = time();
        return minute_counts.TrailingCount(now);
    }

    int HourCount() {
        time_t now = time();
        return hour_counts.TrailingCount(now);
    }
};

class TrailingBucketCounter {
    ConveyorQueue buckets;
    const int secs_per_bucket;
    time_t last_update_time;  // the last time Update() was called

    // Calculate how many buckets of time have passed and Shift() accordingly.
    void Update(time_t now) {
        int current_bucket = now / secs_per_bucket;
        int last_update_bucket = last_update_time / secs_per_bucket;
 
        buckets.Shift(current_bucket - last_update_bucket);
        last_update_time = now;
    }

  public:
    TrailingBucketCounter(int num_buckets, int secs_per_bucket) :
        buckets(num_buckets),
        secs_per_bucket(secs_per_bucket) {
    }

    void Add(int count, time_t now) {
        Update(now);
        buckets.AddToBack(count);
    }

    int TrailingCount(time_t now) {
        Update(now);
        return buckets.TotalSum();
    }
};


// A queue with a maximum number of slots, where old data gets shifted off the end.
class ConveyorQueue {
    queue<int> q;
    int max_items;
    int total_sum;  // sum of all items in q

  public:
    ConveyorQueue(int max_items) : max_items(max_items), total_sum(0) {
    }

    int TotalSum() {
        return total_sum;
    }

    void Shift(int num_shifted) {
        // In case too many items shifted, just clear the queue.
        if (num_shifted >= max_items) {
            q = queue<int>();  // clear the queue
            total_sum = 0;
            return;
        }

        // Push all the needed zeros.
        while (num_shifted > 0) {
            q.push(0);
            num_shifted--;
        }

        // Let all the excess items fall off.
        while (q.size() > max_items) {
            total_sum -= q.front();
            q.pop();
        }
    }

    void AddToBack(int count) {
        if (q.empty()) Shift(1);  // Make sure q has at least 1 item.
        q.back() += count;
        total_sum += count;
    }
};

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts