Massive Technical Interviews Tips: Misc

Tuesday, October 6, 2015

Misc - Http TCP Network

http://stackoverflow.com/questions/7360520/connectiontimeout-versus-sockettimeout

A connection timeout occurs only upon starting the TCP connection. This usually happens if the remote machine does not answer. This means that the server has been shut down, you used the wrong IP/DNS name or the network connection to the server is down.

A socket timeout is dedicated to monitor the continuous incoming data flow. If the data flow is interrupted for the specified timeout the connection is regarded as stalled/broken. Of course this only works with connections where data is received all the time.

http://stackoverflow.com/questions/26022037/is-there-any-way-to-distinguish-a-connection-timeout-from-a-socket-timeout

java.net.ConnectException : Packet loss due to wrong network, network overload, too many request to server, firewall.
java.net.SocketTimeoutException : The socket timeout is the amount of time to keep the server socket open while data is being transferred back to the caller. This could even be the server is still processing and writing back data but it's taking rather long and the client has just timed out waiting for it.

1) What is the difference between connection and read timeout for sockets?

The connection timeout is the timeout in making the initial connection; i.e. completing the TCP connection handshake. The read timeout is the timeout on waiting to read data. Specifically, if the server fails to send a byte <timeout> seconds after the last byte, a read timeout error will be raised.

http://www.infoq.com/cn/news/2015/11/TCP-network

有了一些基础知识，可以帮助我们更快的排查网络问题，例如，在《性能探索——我们如何将每个POST请求削减200ms》这篇博客中，作者介绍了他们对POST请求延迟问题的排查，为什么每个POST请求会多消耗200ms，这里摘录一些最终排查到的核心原因：

Ruby的Net::HTTP库，会将HTTP的POST请求拆分成两个TCP数据包：POST请求头一个数据包，请求体一个数据包。而curl命令却相反，它会尽可能的将请求头和请求体塞入一个数据包中。更糟糕的是，Net::HTTP在打开TCP套接字的时候，没有设置TCP_NODELAY选项，因此该套接字会等待第一个数据包的确认包（ack）之后，才会发送第二个数据包。该行为是Nagle算法的结果。

到连接的另一端，HAProxy需要选择如何应答这两个包。在版本1.4.15（我们曾经使用的版本）中，它选择使用TCP延迟应答。延迟应答和Nagle算法相互影响，引起了请求中断，直到服务端触发了延迟应答超时。

这时连接双方（Ruby Net::HTTP和HAProxy）的数据交互是这样的：

双方都在等待对方发送数据包，应用端等待HAProxy发送应答包（Nagle算法），HAProxy在等待应用端后续的数据包（延迟应答）。这就导致了中间的200ms延迟。

找到问题之后，解决就非常方便，在应用端设置TCP_NODELAY参数或者服务端取消延迟应答（TCP_QUICKACK参数）。另一个问题又来了，设置了这两个参数之后，对于应用和服务端有什么影响呢？

应用端套接字设置了TCP_NODELAY参数之后，TCP包将不会使用缓冲区而直接发送。如果应用端会发送大量小数据，可能会遇到缓冲区刷新的瓶颈，同时可能会有大量带宽浪费在了TCP头上。

服务端使用了TCP_QUICKACK，将不会合并发送应答包，同样会增加数据包数量。但是相对来说，应答包的损耗相对于延迟应答来说可能更小。

Tuesday, October 6, 2015

Misc - Http TCP Network

Labels

Popular Posts