Massive Technical Interviews Tips: Bugs Misc

http://www.multicians.org/thvv/threeq.html
Three Questions About Each Bug You Find
Is this mistake somewhere else also?
What next bug is hidden behind this one?
What should I do to prevent bugs like this?

Before you ask the three questions, you need to overcome your natural resistance to looking carefully at the bug. Look at the code and explain what went wrong. Start with the observable facts and work backwards, asking why repeatedly, until you can describe the pattern that underlies the bug. Often, you should do this with a colleague, because explaining what you think happened will force you to confront your assumptions about what the program is up to.

Make a habit of asking the three questions every time you find a bug. You don't even have to wait for a bug to use the three questions.

During design and implementation review, each comment you get can be treated with the three questions. Review comments are the result of an underlying communication process which you can improve. If you feel that a reader's comment on a specification is wrong, for example, you might ask what kept your document from being understood, and how you might communicate better with the reviewer.

Design and code inspections[2] are a powerful means of finding many bugs. You can ask the three questions about each defect discovered in the inspection process. The first two questions won't turn up many new bugs if the inspection process is thorough, but the third question can help find ways to prevent future bugs.

https://henrikwarne.com/2012/10/21/4-reasons-why-bugs-are-good-for-you/

EACH BUG CAN TEACH YOU SOMETHING

Feedback is one key to developing better products. It’s one of the primary tenets of agile development. Unit testing and iterative development are both techniques to provide feedback faster. With unit testing you get feedback on whether the code works, and with each delivered release you can hear what the customer thinks of the new features. A bug report is another form of feedback on your code.

There can be many different causes of a bug. Some possibilities are: a simple coding mistake (like a nested if-statement where you end up in the wrong branch), or a faulty assumption on your part (maybe the incoming messages don’t always have certain fields present, so you got a null pointer exception), or there is a missing requirement (you should have answered the message in a different way if a given parameter is present), or the customer is using the software in an unanticipated (but correct) way, leading to bugs.

In each of these cases, you can learn something about how to code, about your product, or about the domain it operates in.

YOUR OWN CODE BECOMES EASIER TO DEBUG

When you spend time trouble-shooting problems and fixing bugs, it doesn’t take long until you want to make your own code as easy as possible to debug. It is frustrating not having all available information presented.

If for instance 21 was received, it should say something like “Illegal value: 21, not in range 0 – 20”.

Every time you find and fix a bug, you need to ask yourself: is there anything in my code I should do differently in order to eliminate bugs like this in the future? Is there anything I should be doing to make trouble-shooting this kind of bug easier in the future? This is very fertile ground for improvements.

BOTH YOU AND THE CUSTOMER WILL BE HAPPY

As I mentioned in Why I Love Coding, one of the joys of programming is making something that is useful to other people.

You get the same kind of kick out of fixing a bug, but on a different time scale. Delivering new features usually takes a while, but a bug-fix can be done in an hour. Each fixed bug makes you feel you are accomplishing something, and that’s a great feeling.

Everybody knows that there will always be bugs. What matters is that somebody is ready to fix them quickly when they are discovered.

SOLVING PROBLEMS IS FUN

Debugging and fixing bugs is the same. Each bug is a new mystery to figure out. Often your first reaction when seeing a new bug report is: that’s impossible? How could that happen? That’s when you start looking for clues. What do the logs say? Any error reports from the system? What else happened in the system at this time? Was anything changed recently – new software, configuration changes, traffic disturbances? Let the figuring-out begin!

https://henrikwarne.com/2014/01/27/a-bug-a-trace-a-test-a-twist/

The exception is including the information that the index was -3. This is better than a lot of other Java exceptions, but not as good as it could be. Had the index been too high (i.e. 4 for an array of length 4), you would not know from the exception how big the array was. A good exception or error message should include as much dynamic information as possible. For an ArrayIndexOutOfBoundsException that means both including the offending index, and including the valid array range. Not including all dynamic information is unfortunately a very common failing for many Java exceptions.

ou also need to know what data the algorithm was operating on. Fortunately, in our system we have the option of using Trace on Error. This is a form of session-based logging (called tracing at work, to distinguish it from traditional logging).

The trace included with the error report showed the complete message received, as well as all key steps taken until the error occurred. Among other things, it included the destination MSISDN (phone number).

the hashCode-method can (of course) return negative values as well as positive values. Even though I had tested with several different phone numbers, I hadn’t picked one big enough to cause the hashCode-method to return a negative value

Three Questions About Each Bug You Find by Tom Van Vleck. When you find and fix a bug, you should ask yourself 3 questions:

Is this mistake somewhere else also?
What next bug is hidden behind this one?
What should I do to prevent bugs like this?

int getBucket(Digits destMsisdn) {

 return Math.abs(destMsisdn.hashCode()) % numberOfBuckets;

}

We had recently discussed Integer.MIN_VALUE and Integer.MAX_VALUE at work. One of the properties of the binary encoding is that there is one more negative int value than there are positive int values. Therefore I was wondering what would happen if you take the absolute value of Integer.MIN_VALUE. A quick check showed that it returns the same value. So there is one (very unusual case) where the absolute value of a Java int is negative. One solution would have been to explicitly check for this case, and let getBucket() return zero instead in this case. However, I decided on this solution instead (and this is a case that warrants a comment):

int getBucket(Digits destMsisdn) {

 // hashCode() can give negative values. Math.abs() gives

 // negative value for Integer.MIN_VALUE, so important

 // to do Math.abs() *after* taking the remainder.

 return Math.abs(destMsisdn.hashCode() % numberOfBuckets);

}

The hashCode method can return negative values. Most of the time this doesn’t matter. But since I was using the hash code to pick an array index, I was only thinking of the value as unsigned. Even though I know that integers can be both positive and negative, in this context I was blind to it.

A trace makes trouble shooting so much easier. There is a big difference between only having a stack trace, and having all the session data available when trying to figure out what happened.

Unit tests help, but don’t catch everything. Well-factored code and unit tests make testing a hypothesis and testing a fix quick and easy. But bugs still get through. Therefore, you also need to have a system that is easy to trouble shoot.

Always ask the 3 questions. There may be more to the bug than you think, so make sure to use the 3 questions to see more cases.

https://henrikwarne.com/2016/06/16/18-lessons-from-13-years-of-tricky-bugs/

1. Event order. When handling events, it is fruitful to ask the following questions: Can the events arrive in a different order? What if we never receive this event? What if this event happens twice in a row? Even if it would normally never happen, bugs in other parts of the system (or interacting systems) could cause it to happen.

2. Too early. This is a special case of “Event order” above, but it has caused some tricky bugs, so it gets its own category. For example, if signaling messages are received too early, before configuration and start-up procedures are finished, a lot of strange behavior can happen. Another example: when a connection was marked as down even before it was put into the idle list. When debugging that problem, we always assumed it got set to down while it was in the idle list (but then why wasn’t it taken out of the list?). It was a failure of imagination on our part not to consider that things sometimes happen too early.

3. Silent failures. Some of the hardest bugs to track down have (in part) been caused by code that silently fails and continues instead of throwing an error. For example, system calls (like bind) that return error codes that aren’t checked. Another example: parsing-code that just returned instead of throwing an error when it encountered a faulty element. The call continued for a while in a faulty state, making the debugging much harder. It is better to return an error as soon as a failure case is detected.

4. If. If-statements with several conditions , if (a or b), especially when chained, if (x) else if (y), have caused many bugs for me. Even though if-statements are conceptually simple, they are easy to get wrong when there are multiple conditions to keep track of. These days I try to rewrite the code to be simpler to avoid having to deal with complicated if-statements.

5. Else. Several bugs have been caused by not properly considering what should happen if a condition is false. In almost every case, there should be an else-part for each if-statement. Furthermore, if you set a variable in one branch of an if-statement, you should probably set it in the other as well. Related to this is the case when a flag is set. It is easy to only add the condition for setting the flag, but forgetting to add the condition for when the flag should be reset again. Leaving a flag set forever will likely lead to bugs down the road.

6. Changing assumptions. Many of the bugs that were the hardest to prevent in the first place were caused by changing assumptions. For example, in the beginning there could only be one customer event per day. Then a lot of code is written under this assumption. At some later point, the design is changed to allow multiple customer events per day. When this happens, it can be hard to change all cases that are affected by the new design. It is easy to find all the explicit dependencies on the change, but the hard part is to find all the cases that implicitly depend on the old design. For example, there may be code that fetches all customer events for a given day. An implicit assumption may be that the result set is never greater than the number of customers. I don’t have a good strategy on how to prevent these problems, so suggestions are welcome.

7. Logging. Visibility into what the program does is crucial, especially when the logic is complicated. Make sure to add enough (but not too much) logging, so you can tell why the program does what it does. When everything works fine, it doesn’t matter, but as soon as (the inevitable) problem happens, you will be happy that you added proper logging.

TESTING

As a developer, I am not done with a feature until I have tested it. At a minimum this means that every new or changed line of code has been executed at least once. Furthermore, unit testing or functional testing is good, but not enough. The new feature must also be tested and explored in a production-like environment. Only then can I say that I am done with a feature.

8. Zero and null. Make sure to always test with zero and null (when applicable). For a string it means both a string of length zero, and a string that is null. Another example: test the disconnection of a TCP connection before any data (zero bytes) was sent on it. Not testing with these combinations is the number one reason for bugs slipping through that I should have caught when testing.

9. Add and remove. Often new features involves being able to add new configurations to the system, for example a new profile for phone number translation. It is very natural to test that it works to add a new profile. However, I have found that it is easy to forget to test the removal of the profile as well.

10. Error handling. The code that handles errors is often hard to test. It’s best to have automatic tests that check the error handling code, but sometimes that is not possible. One trick I sometimes use then is to modify the code temporarily to cause the error handling code to run. The easiest way to do this is to reverse an if-statement, for example flipping it from if error_count > 0 to if error_count == 0. Another example is misspelling a database column name to cause the desired error handling code to run.

11. Radom input. One way of testing that can often reveal bugs is to use random input. For example, the ASN.1 decoding of the H.323 protocol operates on binary data. By sending in random bytes to be decoded, we found several bugs in the decoder. Another example is to generate scripts with test calls, where the call duration, answer delay, first party to hang up and so on were all randomly generated. These test scripts exposed numerous bugs, particularly where there were interference from events happening close together.

12. Check what shouldn’t happen. Often testing involves checking that a desired action happened. But it is easy to overlook the opposite case – to check that an action that shouldn’t happen actually didn’t happen.

13. Own tools. Usually I have created my own small tools to make testing easier. For example, when I worked with the SIP protocol for VoIP, I wrote a small script that could reply with exactly the headers and values I wanted. That tool made testing a lot of corner cases easy. Another example is a command line tool that can make API calls. By starting small, and gradually adding features as needed, I have ended up with very useful tools. The advantage of writing my own tools is that I get exactly what I want.

DEBUGGING

14. Discuss. The debugging technique that has helped me the most in the past is to discuss the problem with a colleague. Often it is enough to simply describe the problem to a co-worker for me to realize what the problem is. Furthermore, even if they are not very familiar with the code in question, they can often come up with good ideas of what could be wrong anyway. Discussing with a co-worker has been especially effective with my most difficult bugs.

15. Pay close attention. Often when debugging a problem took a long time, it was because I made false assumptions. For example, I thought the problem happened in a certain method when in fact it never even got to that method in the first place. Or the exception that was thrown wasn’t the one I assumed it was. Or I thought the latest version of the software was running, but it was an older version. Therefore, be sure to verify that details instead of assuming. It’s easy to see what you expect to see, instead of what is actually there.

16. Most recent change. When things that used to work stop working, it is often caused by the last thing that was changed. In one case, the most recent thing changed was just the logging, but an error in the logging caused a bigger problem. To make regressions like this easier to find, it helps to commit different changes in different commits, and to use clear descriptions of the changes.

17. Believe the user. Sometimes when a user reports a problem, my instinctive reaction is: “That’s impossible. They must have done something wrong.” But I have learnt not to react that way. More times than I would like, it turns out that what they report is what actually happens. So these days, I take what they report at face value. Of course I still double check that everything has been set correctly etc. But I have seen so many cases where weird things happened because of unusual configuration or unanticipated usage, that my default assumption is that they are correct and the program is wrong.

18. Test the fix. When a fix for a bug is ready, it must be tested. First run the code without the fix, and observe the bug. Then apply the fix and repeat the test case. Now the buggy behavior should be gone. Following these steps makes sure it actually is a bug, and that the fix actually fixes the problem. Simple but necessary.

Other problems, like loop errors and corner cases, I see far fewer of because I have been unit-test more logic. But that doesn’t mean there aren’t bugs – there still are. The lessons in this post help me to limit the damage at the three stages of coding, testing and debugging.
http://juristr.com/blog/2015/05/jersey-webresource-ignores-headers/

WebResource resource = Client.create(new DefaultClientConfig()).resource("http://myapp.org/api/v1/data");
resource.accept(MediaType.APPLICATION_JSON);
resource.type(MediaType.APPLICATION_JSON);
resource.header(HttpHeaders.AUTHORIZATION, "Negotiate " + token);

return resource.get(String.class);

However, the Negotiate token didn’t get appended, at least that was what I noticed explicitly as I got a “401 Authorization denied” response. Logging the HTTP requests on my Apache further underlined my assumptions.

What seems insane initially, gets much clearer when you take a look at how - for instance - accept(...) is implemented on the com.sun.jersey.api.client.WebResource class:

...
@Override
public Builder accept(String... types) {
    return getRequestBuilder().accept(types);
}
...

You get a new Builder object each time! That’s why it doesn’t work. So instead of the wrong version above, you rather have to write it like this:

WebResource resource = Client.create(new DefaultClientConfig()).resource("http://myapp.org/api/v1/data");
            
WebResource.Builder builder = resource.accept(MediaType.APPLICATION_JSON);
builder.type(MediaType.APPLICATION_JSON);
builder.header(HttpHeaders.AUTHORIZATION, "Negotiate " + token);

return builder.get(String.class);

Note, the first call resource.accept() returns the Builder object, and any subsequent calls to type() and header() work directly on that builder instance.

You can even invert the sequence of calls, like, calling first resource.type(..) and then accept and so on. Why? Because both, WebResource.Builder as well as WebResource itself implement the same interface RequestBuilder, just that the WebResource’s implementation creates a new Builder object, while the Builder’s implementation really adds the passed information onto a metadata collection.

The Builder pattern is a common approach to simplify the creation of object instances by hiding the implementation details, especially in Java. Normally you invoke a series of methods that add information to your object, to finally call the build() method which returns the desired instance. The WebResource class totally hides this

Thursday, February 18, 2016

Bugs Misc