Wednesday, September 30, 2015

Hashing Misc

Comparing Hashing strategies


String.hashCode() is a very poor hash, especially for the lower bits. It is standard and cannot be changed or break backward compatibility.  However, this doesn't have to be a problem as HashMap uses an agitate function which brings down some of the higher bits to randomise the lower ones.

int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
as is actually quite typical with strings, characters are biased towards certain bits being set. 

The trick of the string hash function is to try and iron out this bias by mixing bits with a biased probability with other bits, in particular those where the probability is already fairly unbiased.

This looks quite confusing to me because it has so many conflicts; although it's not required to be unique (we can still rely on the equals()), but less conflicts means better performance without visiting the entries in a linked list.
Suppose we've two characters, then so long as we can find two strings matching below equation, then we will have the same hashcode()
a * 31 +b = c * 31 +d
It will be easy to conclude that (a-c) * 31 = d-b take an easy example is make a-c = 1 and d-b = 31; so i wrote below codes for simple test
public void testHash() {
    System.out.println("A:" + (int)'A');
    System.out.println("B:" + (int)'B');
    System.out.println("a:" + (int)'a');

    System.out.println("Aa".hashCode() + "," + "BB".hashCode());
    System.out.println("Ba".hashCode() + "," + "CB".hashCode());
    System.out.println("Ca".hashCode() + "," + "DB".hashCode());
    System.out.println("Da".hashCode() + "," + "EB".hashCode());        
it will print below results which means all the strings have the same hashcode(), and it's easy to do it in a loop.
even worse, suppose we've 4 characters in the string, according to the algorithm, suppose the first 2 characters produce a2, the 2nd 2 characters produce b2; the hashcode will still be a2 * 31^2 + b2 thus, with a2 and b2 equal between 2 strings, we will get more strings with hashcode() conflict. such examples are "AaAa", "BBBB" and so on; then we will have 6 characters, 8 characters......
suppose most of the time we use characters in ascii table in a string which will be used in a hashmap or hashtable, then the choosen prime number 31 here is definitely too small;
  1. Nobody in their right mind would expect String.hashCode to be highly collision free. It is simply not designed with that in mind. (If you want highly collision free hashing, then use a crypto hash algorithm ... and pay the cost.) String.hashCode() is designed to be reasonably good across the domain of all Strings ... and fast.
Any server in the world running the same version of Ruby will get the same hash values. This means that I can send a specially crafted web request to your server, in which the request parameters contain lots of those strings with the same hash code. Your web framework will probably parse the parameters into a hash table, and they will all end up in the same hash bucket, no matter how big you make the hash table. Whenever you want to access the parameters, you now have to iterate over a long list of hash collisions, and your swift O(1) hash table lookup is suddenly a crawling slow O(n).
I just need to make a small number of these evil requests to your server and I’ve brought it to its knees. 
The solution is for the hash code to be consistent within one process, but to be different for different processes.

I understand the use of hashCode() — it’s very tempting. On strings, numbers and collection classes, hashCode() always returns a consistent value, apparently even across different JVM vendors. It’s like that despite the documentation for hashCode()explicitly not guaranteeing consistency across different processes:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
To reduce the impact of a poor hashing strategy, the HashMap uses an agitating function. In Java 8 it is fairly simple.

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
This mixes the high bits of the hash code with the low bits to improve the randomness of the lower bits.

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            hash = h;
        return h;


I Just Need Unique Hash Codes, Don't I?
A common misconception is that to avoid collisions all you need is to have a unique hash code.  While having unique hash codes is highly desirable, that is not enough.

The number of buckets needs to be smaller and the hash codes need to be modulo-ed to the number of buckets. If the number of buckets is a power of 2, you can use a mask of the lowest bits.

The problem is that the size of a HashMap is not unlimited (or at least 2^32 in size)  This means the hashCode() number has to be reduced to a smaller number of bits.

A simple solution is to have a bucket turn into a tree instead of a linked list.  In Java 8, it will do this for String keys, but this could be done for all Comparable types AFAIK.

Another approach is to allow custom hashing strategies to allow the developer to avoid such problems, or to randomize the mutation on a per collection basis, amortizing the cost to the application.
mvn -version will output which java it's using. If JAVA_HOME is set to a valid JDK directory and Maven is using something else, then most likely someone has tampered with the way that Maven starts up.
有些人看到murmur就想到了陌陌就想到了别的,其实是 multiply and rotate的意思,因为算法的核心就是不断的"x *= m; x = rotate_left(x,r);"


那Java自己的String的hashCode()呢? 用的是Horner法则, s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

for (int i = 0; i < str.length(); i++) { hash = 31*hash + str.charAt[i]; }
注意,int会溢出成负数,再溢成正数,比如"abcde"是 92599395, abcdef"是 -1424385949, "abcdefgh" 是 1259673732
31有个很好的特性,就是用移位和减法来代替乘法,可以得到更好的性能:31*i==(i<<5)-i。现在的VM可以自动完成这种优化。 Java自己的HashMap,会在调用Key的hashCode()得到一个数值后,用以下算法再hash一次,免得Key自己实现的hashCode()质量太差。
static int hash(int h) {
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
再猜原因之二它的变化不够激烈,比如"abc"是96354, "abd"就比它多1。而用 murmur"abc"是1118836419,"abd"是413429783。像在Jedis里,它的几个Shard节点的名字就叫做shard-1,shard-2,shard-3而已,hash这几个字符串,murmur的结果还能均匀的落在一致性哈希环上,用Honer法则就不行了。

97 Things Every Software Architect Should Know: Collective Wisdom from the Experts

97 Things Every Software Architect Should Know
1. Don’t Put Your Resume Ahead of the Requirements
2. Simplify Essential Complexity; Diminish Accidental Complexity
3. Chances Are, Your Biggest Problem Isn’t Technical
4. Communication Is King; Clarity and Leadership, Its Humble Servants
5. Application Architecture Determines Application Performance
6. Seek the Value in Requested Capabilities
7. Stand Up!
8. Everything Will Ultimately Fail
9. You’re Negotiating More Often Than You Think
10. Quantify
11. One Line of Working Code Is Worth 500 of Specification
12. There Is No One-Size-Fits-All Solution
13. It’s Never Too Early to Think About Performance
14. Architecting Is About Balancing
15. Commit-and-Run Is a Crime
16. There Can Be More Than One
17. Business Drives
18. Simplicity Before Generality, Use Before Reuse
19. Architects Must Be Hands On
20. Continuously Integrate
21. Avoid Scheduling Failures
22. Architectural Tradeoffs
23. Database As a Fortress
24. Use Uncertainty As a Driver
25. Warning: Problems in Mirror May Be Larger Than They Appear
26. Reuse Is About People and Education, Not Just Architecture
27. There Is No ‘I’ in Architecture
28. Get the 1,000-Foot View
29. Try Before Choosing
30. Understand the Business Domain
31. Programming Is an Act of Design
32. Give Developers Autonomy
33. Time Changes Everything
34. “Software Architect” Has Only Lowercase a’s; Deal with It
35. Scope Is the Enemy of Success
36. Value Stewardship Over Showmanship
37. Software Architecture Has Ethical Consequences
38. Skyscrapers Aren’t Scalable
39. Heterogeneity Wins
40. It’s All About Performance
41. Engineer in the White Spaces
42. Talk the Talk
43. Context Is King
44. Dwarves, Elves, Wizards, and Kings
45. Learn from Architects of Buildings
46. Fight Repetition
47. Welcome to the Real World
48. Don’t Control, but Observe
49. Janus the Architect
50. Architects’ Focus Is on the Boundaries and Interfaces
51. Empower Developers
52. Record Your Rationale
53. Challenge Assumptions—Especially Your Own
54. Share Your Knowledge and Experiences
55. Pattern Pathology
56. Don’t Stretch the Architecture Metaphors
57. Focus on Application Support and Maintenance
58. Prepare to Pick Two
59. Prefer Principles, Axioms, and Analogies to Opinion and Taste
60. Start with a Walking Skeleton
61. It Is All About The Data
62. Make Sure the Simple Stuff Is Simple
63. Before Anything, an Architect Is a Developer
64. The ROI Variable
65. Your System Is Legacy; Design for It
66. If There Is Only One Solution, Get a Second Opinion
67. Understand the Impact of Change
68. You Have to Understand Hardware, Too
69. Shortcuts Now Are Paid Back with Interest Later
70. “Perfect” Is the Enemy of “Good Enough”
71. Avoid “Good Ideas”
72. Great Content Creates Great Systems
73. The Business Versus the Angry Architect
74. Stretch Key Dimensions to See What Breaks
75. If You Design It, You Should Be Able to Code It
76. A Rose by Any Other Name Will End Up As a Cabbage
77. Stable Problems Get High-Quality Solutions
78. It Takes Diligence
79. Take Responsibility for Your Decisions
80. Don’t Be Clever
81. Choose Your Weapons Carefully, Relinquish Them Reluctantly
82. Your Customer Is Not Your Customer
83. It Will Never Look Like That
84. Choose Frameworks That Play Well with Others
85. Make a Strong Business Case
86. Control the Data, Not Just the Code
87. Pay Down Your Technical Debt
88. Don’t Be a Problem Solver
Instead of immediately working to solve the problem as presented, see if you can change the problem.
interrogate the problem
89. Build Systems to Be Zuhanden
90. Find and Retain Passionate Problem Solvers
91. Software Doesn’t Really Exist
92. Learn a New Language
93. You Can’t Future-Proof Solutions
94. The User Acceptance Problem
95. The Importance of Consommé
96. For the End User, the Interface Is the System
97. Great Software Is Not Built, It Is Grown

97 Things Every Programmer Should Know
1. Act with Prudence
2. Apply Functional Programming Principles
3. Ask, “What Would the User Do?” (You Are Not the User)
4. Automate Your Coding Standard
5. Beauty Is in Simplicity
6. Before You Refactor
7. Beware the Share
8. The Boy Scout Rule
9. Check Your Code First Before Looking to Blame Others
10. Choose Your Tools with Care
11. Code in the Language of the Domain
12. Code Is Design
13. Code Layout Matters
14. Code Reviews
15. Coding with Reason
If you need a nested section, make it a function.
Functions should have few parameters (four is a good upper bound). This does not restrict the data communicated to functions: grouping related parameters into a single object localizes object invariants, which simplifies reasoning with respect to their coherence and consistency.
More generally, each unit of code, from a block to a library, should have a narrow interface. Less communication reduces the reasoning required. This means that getters that return internal state are a liability—don’t ask an object for information to work with. Instead, ask the object to do the work with the information it already has. In other words, encapsulation is all—and only—about narrow interfaces.
In order to preserve class invariants, usage of setters should be discouraged. Setters tend to allow invariants that govern an object’s state to be broken.
As well as reasoning about its correctness, arguing about your code helps you better understand it. Communicate the insights you gain for everyone’s benefit.
16. A Comment on Comments
17. Comment Only What the Code Cannot Say
18. Continuous Learning
19. Convenience Is Not an -ility
20. Deploy Early and Often
21. Distinguish Business Exceptions from Technical
22. Do Lots of Deliberate Practice
23. Domain-Specific Languages
24. Don’t Be Afraid to Break Things
25. Don’t Be Cute with Your Test Data
26. Don’t Ignore That Error!
27. Don’t Just Learn the Language, Understand Its Culture
28. Don’t Nail Your Program into the Upright Position
29. Don’t Rely on “Magic Happens Here”
30. Don’t Repeat Yourself
31. Don’t Touch That Code!
32. Encapsulate Behavior, Not Just State
33. Floating-Point Numbers Aren’t Real
34. Fulfill Your Ambitions with Open Source
35. The Golden Rule of API Design
It’s not enough to write tests for an API you develop; you have to write unit tests for code that uses your API. When you follow this rule, you learn firsthand the hurdles that your users will have to overcome when they try to test their code independently.
36. The Guru Myth
37. Hard Work Does Not Pay Off
38. How to Use a Bug Tracker
39. Improve Code by Removing It
40. Install Me
41. Interprocess Communication Affects Application Response Time
42. Keep the Build Clean
43. Know How to Use Command-Line Tools
44. Know Well More Than Two Programming Languages
45. Know Your IDE
46. Know Your Limits
47. Know Your Next Commit
48. Large, Interconnected Data Belongs to a Database
49. Learn Foreign Languages
50. Learn to Estimate
51. Learn to Say, “Hello, World”
52. Let Your Project Speak for Itself
53. The Linker Is Not a Magical Program
54. The Longevity of Interim Solutions
55. Make Interfaces Easy to Use Correctly and Hard to Use Incorrectly
56. Make the Invisible More Visible
57. Message Passing Leads to Better Scalability in Parallel Systems
58. A Message to the Future
59. Missing Opportunities for Polymorphism
60. News of the Weird: Testers Are Your Friends
61. One Binary
62. Only the Code Tells the Truth
63. Own (and Refactor) the Build
64. Pair Program and Feel the Flow
65. Prefer Domain-Specific Types to Primitive Types
66. Prevent Errors
67. The Professional Programmer
68. Put Everything Under Version Control
69. Put the Mouse Down and Step Away from the Keyboard
70. Read Code
71. Read the Humanities
72. Reinvent the Wheel Often
73. Resist the Temptation of the Singleton Pattern
74. The Road to Performance Is Littered with Dirty Code Bombs
75. Simplicity Comes from Reduction
76. The Single Responsibility Principle
77. Start from Yes
78. Step Back and Automate, Automate, Automate
79. Take Advantage of Code Analysis Tools
80. Test for Required Behavior, Not Incidental Behavior
81. Test Precisely and Concretely
82. Test While You Sleep (and over Weekends)
83. Testing Is the Engineering Rigor of Software Development
84. Thinking in States
85. Two Heads Are Often Better Than One
86. Two Wrongs Can Make a Right (and Are Difficult to Fix)
87. Ubuntu Coding for Your Friends
88. The Unix Tools Are Your Friends
89. Use the Right Algorithm and Data Structure
90. Verbose Logging Will Disturb Your Sleep
91. WET Dilutes Performance Bottlenecks
92. When Programmers and Testers Collaborate
93. Write Code As If You Had to Support It for the Rest of Your Life
94. Write Small Functions Using Examples
95. Write Tests for People
96. You Gotta Care About the Code
97. Your Customers Do Not Mean What They Say
1. Get Users Involved As Early As Possible
2. Avoid Whack-a-Mole Development
3. A Word Can Make You Miss Your Deadline
4. Make Project Sponsors Write Their Own Requirements
5. Favor the Simple Over the Complex
6. Pay Your Debts
7. Add Talents, Not Skills, to Your Team
8. Keep It Simple, Simon
9. You Aren’t Special
10. Scrolling Through Time
11. Save Money on Your Issues
12. How to Spot a Good IT Developer
13. Developer Productivity: Skilled Versus Average
14. Size Matters
15. Document Your Process, Then Make Sure It Is Followed
16. Go Ahead, Throw That Practice Out
17. Requirement Specifications: An Oxymoron
18. Success Is Always Measured in Business Value
19. Don’t Skip Vacations for the Project
20. Provide Regular Time to Focus
21. Project Management Is Problem Management
22. Empowering Developers: A Man Named Tim
23. Clever Code Is Hard to Maintain
24. Managing Human Factors in IT Project Management
25. Use a Wiki
26. The Missing Link
27. Estimate, Estimate, Estimate
28. Developers Unite—PMOs Are Advancing
29. Value Results, Not Just Effort
30. Software Failure Is Organizational Failure
31. A Voice from the Other Side
32. Keep Your Perspective
33. How Do You Define “Finished”?
34. The 60/60 Rule
35. We Have Met the Enemy...and He Is Us
36. Work in Cycles
37. To Thine Own Self Be True
38. Meetings Don’t Write Code
Not getting in, getting done, and getting out.
Diving too deep.

Going off-topic.
Going over time.
Meeting too often.
Your point is well taken, but in the interest of time, we are going to need to move on” (or “hear from others,” or “come back to this point later if there is time”)
39. Chart a Course for Change
40. IT Program Management: Shared Vision
41. Planning for Reality
42. The Fallacy of Perfect Execution
43. Introduce a More Agile Communication System
44. Don’t Worship a Methodology
45. Don’t Throw Spreadsheets at People Issues
46. One Deliverable, One Person
47. The Fallacy of Perfect Knowledge
48. Build Teams to Run Marathons, Not Sprints
49. The Holy Trinity of Project Management
50. Roadmaps: What Have We Done for You Lately?
51. The Importance of the Project Scope Statement
52. Align Vision and Expected Outcome
53. Alice Doesn’t Live Here Anymore
54. Avoiding Contract Disputes
55. You Get What You Measure
56. Don’t Fall into the “Not Invented Here” Syndrome
57. Favor the Now Over the Soon
58. Speed Is Life; More Is Better
59. Building the Morale on Your Team
60. A Project Depends on Teamwork
61. Serve Your Team
62. The Fallacy of the Big Round Ball
63. Responding to a Crisis
64. Know Your Integration Points
65. Aggressively Promote Communication in Distributed Projects
66. Start with the End in Mind
67. Clear Terms, Long Friendship!
68. The Best Estimators: Those Who Do the Work
69. Communicating Is Key
70. A Project Is the Pursuit of a Solution
71. It’s the People, Stupid
72. Documents Are a Means, Not an End
73. Can Earned Value and Velocity Coexist on Reports?
74. Scope Change Happens; Get Used to It
75. Buying Ready-Made Software
76. Project Sponsors—Good, Bad, and Ugly
77. Should You Under-Promise, or Over-Deliver?
78. Every Project Manager Is a Contract Administrator
79. Important, but Not Urgent
80. Teach the Process
81. The Fallacy of Status
82. What Do They Want to Hear, Anyway?
83. Recognize the Value of Team Morale
84. Engage Stakeholders All Through Project Life
85. The Value of Planning
86. Don’t Always Be “The Messenger”
87. Effectively Manage the Deliverables
88. We Are Project Managers, Not Superheroes
89. Increase Communication: Hold Frequent, Instant Meetings
90. Flexibility Simplifies Project Management
91. The Web Points the Way, for Now
92. Developers Hate Status Reports, Managers Love Them
93. You Are Not in Control
94. Share the Vision
95. True Success Comes with a Supporting Organization
96. Establish Project Management Governance
97. 9.7 Reasons I Hate Your Website

Product Architecture Misc
To build an application, this would be:
  • Modular
  • Loosely Coupled
  • Highly Configurable
  • Highly Testable
  • Easily extensible

Practical API Design: Confessions of a Java Framework Architect


一.只公开你需要公开的内容。必须要限制对外公开的方法和属性的数量,包括public 和protected的,这些API一旦公布,就代表了对外的一种承诺,你永远不能去改变他们。否则就会带来框架版本的兼容问题。只有框架内部使用的公开方法,也不应该对外公开,给上一个“此方法仅框架使用,外部不可以使用”这样的文档注释是很糟糕的。但是原生提供的internal修饰符又只支持包内访问。针对这个问题,书里还特别提供了一种在Java里能巧妙的跨包访问又能对外屏蔽的方法


public final class Template extends Object {
  private final Class type;
  public Template(Class type) { this.type = type; }
  public Class getType() { return type; }
  public Template() { this(Object.class); }
public final class Template<T> extends Object {
  private final Class<T> type;
  public Template(Class<T> type) { this.type = type; }
  public Class<T> getType() { return type; }
  // now what!?
  public Template() { this(Object.class); }
public final class Template<T> extends Object {
  private final Class<T> type;
  public Template(Class<T> type) { this.type = type; }
  public Class<T> getType() { return type; }
  public Template() { this((Class<T>)Object.class); }
  public static Template<Object> create() {
    return new Template<Object>(Object.class);


public interface Scrambler {
  public String scramble(String word);
public interface WordLibrary {
  public String[] getWords();

public interface UI {
  public void display();

Word to API providers: Make it simple for developers
Annoyance the first – trsnss
Yes, name-ification of things is hard. Still, common sense could go a long way. You gotta hand it to the Unix folks for their heroic efforts to save a keystroke – why say dupe when we can save that valuable E and say dup?
Number two is related:
Annoyance the second – makingNamesMuchTooLongToBePractical
Apple, in contrast, gives us gems like

Vexation number three – Using synonyms
Oh, Android, I’m looking at you.
Apparently there’s a difference between a “BlankActivity” and an “EmptyActivity.” What’s the difference? I once knew, but now I’m drawing an empty – I mean a blank.
Android gives us tool bars, which at some point got renamed app bars, also known as action bars, giving us lovely code like:
   // Set the toolbar as my app bar
To abbreviate or not to abbreviate. That’s apparently not the question at the Googleplex. Por que no los dos? Cross referencegetExternalDirectory and getExternalFilesDir. These return two different locations, by the way, because sometimes in a directory you want to store files, and sometimes you want to store… I dunno, something else.
What’s the difference between a tap and a touch? I have no idea (at least not on a phone), and neither does the Android design team, who also sometimes calls it a click.
Hey, what version are you running? Oh, it’s KitKat. Or is it 4.1? No, maybe it’s SDK version 19… Oh wait, those all mean the same thing. Good luck keeping them straight.
And a final Android example. Want to make a project asset available by URL? Simply put it in the /assets/ directory in your source tree and construct a URL starting with “file:///android_asset/”. Yep, one is plural, one is singular. One has android_ in front, one doesn’t. Why? Because they’re Google and they can do whatever the hell they want, that’s why. What are you gonna do, use FirefoxOS?
Annoyance the fourth – Poor (or absence of) documentation
Speaking of asset URLs, you won’t find out about them by reading the documentation, aside from an offhand mention of it on theWebSettings.setAllowFileAccess method – a method which has zero affect on whether you can access files this way. Yes indeed, the old semi-documented feature.
Annoyance the 5.1beta3 – Version incompatibility

Annoyance the 6 – Overcomplimification (easy things made hard!)

7. XML


Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts