Monday, October 31, 2016

Google Borg





http://dongxicheng.org/mapreduce-nextgen/yarn-mesos-borg/
细数国内外互联网巨头,他们都有自己的资源管理系统,比如Google的Borg,Twitter的Mesos,阿里巴巴的Fuxi,微软的Apollo等。本文涉及到的Google Borg相关内容,主要参考了论文“Large-scale cluster management at Google with Borg”

Google Borg采用了集中式master/slave架构 (Borgmaster和Borglet),其中Borgmaster负责集群资源管理和调度,Borglet负责执行实际的任务,Borgmaster通常有5个实例,通过Paxos协议组成一个高可用分布式集群;Borgmaster会周期性地主动poll各个Borglet以获取其状态和资源使用情况,一旦发现其出现问题,会触发容错,之所以采用poll,而不是像YARN/meso采用心跳机制,主要是考虑到这种方式能够更容易地控制网络通信开销,避免active Borgmaster选举时出现网络风暴等。前面提到Borg中存在5个Borgmaste,他们每个会分管一部分Borglet的状态获取和探测,并将收到的信息压缩和求差后,交给active Borgmaster,以分摊active Borgmaster压力。从这种细粒度的设计可以初探端倪:Borg尽管采用了集中式架构,但扩展性仍很好,对应Borg论文中这句话“We are not sure where the ultimate scalability limit to Borg’s centralized architecture will come from; so far, every time we have approached a limit, we’ve managed to eliminate it.”
开源系统YARN和Mesos均采用了双层调度架构,需要注意的是,google在论文omega和borg中均认为YARN是集中式架构,而把Mesos划归为双层调度架构,个人认为这是不准确的,YARN跟Mesos类似,ResourceManager负责集群调度,ApplicationMaster负责framework和应用程序内部调度,相比Mesos,由于YARN的ResourceManager要维护各个分配出去的Container的状态和位置等信息,因此,要比Mesos Master重一些。YARN和Mesos的Master均采用Zookeeper解决高可用问题,其中active master进行资源管理和调度,而backup master仅仅是backup作用,不会协助active master做任何事情;Slave会主动通过周期性心跳向Master汇报状态信息。
1.1扩展性
Borg对所管理的所有borglet进行了水平分片,让每个 Borgmaster分管一部分,这些borgmaster共享集群元信息,每个Borgmaster均可以为应用程序分配资源,但backup Borgmaster需要将分配信息发送给active Borgmaster进行审批,这个地方与Google Omega中的share state是一致的。在这方面,YARN和Mesos均是做不到,他们只有一个master进行资源管理和调度,在超大集群中,master可能会成为瓶颈,这是YARN和Mesos需要改进的方向。
1.2高可用
应用程序方面:Borg内置了各种错误重试机制,确保在机器故障,网络故障等情况下,应用程序不会失败。
服务方面:Borgmaster和Borglet挂掉后,其上正在运行的应用程序和任务不会受影响,这一点,目前Mesos和YARN已经做到。
2.对批处理作业和长服务的支持
为了简化应用程序分类,Borg把应用程序分成两类,批处理作业和长服务,批处理作业是类似于MapReduce和Spark的作业,在一定时间内会运行结束,长服务则类似于web service,HDFS service等,可能会永久运行下去,永不停止。批处理作业和长服务是绝配,混合部署它们对提高集群利用率是非常有帮助的,长服务占用大块资源,而批处理作业穿插到长服务未使用的小块资源中,当高优先级的应用需要资源时,可直接杀死抢占批处理作业。根据borg和omega论文的描述,在google集群中,长服务占用了集群中70%~80%的资源。
Mesos和YARN均对批处理作业有良好的支持,这类应用的支持也是最简单的,而难点在于长服务,一旦引入长服务,会带来以下问题:
(1)资源竞争与饿死问题。 当一个集群中只存在批处理作业时,资源调度是很容易做的,因为资源释放和申请是不断在进行中的,任何资源的申请都可以得到满足。但存在长服务后则不同,因为目前主流的调度器均采用资源预留的调度机器,比如一个作业需要10G内存,目前没有任何节点存在10GB内存呢,为了避免永远拿不到资源,调度器会找一个存在部分资源的节点,比如node1存在6GB内存,则会为该作业预留着,一直等到再次释放出4GB ,则将node1上的10GB内存分配给该作业,整个过程在批处理场景中能自然的发生,一旦加入长服务后,则可能产生饿死现象,也就是说node1上的4GB内存可能永远等不到,因为可能被长服务占着。在这方面,Borg做得很好,但mesos和YARN均存在饿死问题,目前无法解决。
(2)服务高可用问题。 资源管理系统一旦支持长服务后,应保证系统服务出现故障时,上层的长服务不会受到应用,比如在YARN中,ResourceManager或者NodeManager出现故障后,其上运行的长服务,比如MySQL,不应受到影响,到恢复后,重新接管这些服务。这一点,Borg/Mesos/YARN(本文指的是Hadoop 2.6.0之后的版本)均已经支持。
(3)日志收集和滚动。 长服务永不停止,为了方面排错和监控,长服务的日志收集也是需要解决的,Borg/Mesos/YARN在这一块均有很好地支持,其中mesos/YARN可通过上层框架解决,比如Mesos中的 aurora和Marathon(apache顶级项目),YARN中的Twill和Slider(Apache二级项目)。
(4)服务发现。长服务部署到资源管系统中后,可能被调度运行到任意一个节点上,为了让外界的访问者(客户端)发现自己的位置,需要有一个服务注册组件登记这些长服务,Borg/Mesos/YARN均对服务注册有支持,Mesos可通过上层框架,比如Aurora,YARN内核内置了对服务注册的支持。
(5)资源伸缩。长服务运行一段时间后,由于访问量的增加或减少,可能需要动态调整所使用的资源量。资源伸缩有两个维度:一个是横向的,即增加实例数目;另一方面是纵向的,即原地增加正在运行实例的资源量。这方面Borg/Mesos/YARN均已经支持,其中横向支持是通过上层框架实现的,而纵向支持是通过资源管理系统内核支持的(https://issues.apache.org/jira/browse/YARN-1197 )。
(6)服务配置更新和在线升级。 这一块均与资源管理系统无直接关系,一般通过上层的框架实现。
3.其他实现机制
(1)资源预申请(在borg论文中,称之为“alloc”)。应用程序可以预申请一些资源,可用于快速启动一些低延迟task,动态扩容等方面使用。 这一块mesos和yarn没有直接支持。在mesos和yarn中,应用框架申请到资源后,必须在一定时间内使用,如果未使用,mesos和yarn会进行回收。 实际上,mesos和yarn可以在上层框架层面解决这一问题,比如hive on Tez则实现了资源预申请,具体可参考我的这篇文章:“Tez:运行在YARN上的DAG计算框架”:http://dongxicheng.org/mapreduce-nextgen/tez-yarn-dag/
(2)作业优先级与资源抢占。在一个混合应用部署的集群中,抢占是必须要支持的,为了支持抢占,必须提前合理规划好应用程序的优先级,以便于在一些情况下,为高优先级的作业抢占低优先级作业的资源。 资源抢占在YARN中以及得到了支持,但没能够用在批处理和长服务混合资源调度中。实际上,在YARN中,资源抢占仅仅用于队列回收资源的场景中。
(3)份额限制(Quota)和进入控制(admission control)
为了防止应用程序无限制申请资源,满足长服务的SLA,需要由一套完善的admission control机制,在开源实现中,YARN在2.6.0开始加入了这一套机制,具体参考:https://issues.apache.org/jira/browse/YARN-1051 。
4.总结
目前看来,Mesos/YARN的架构和设计上,与Google Borg仍有一定的差距,但需要注意的是,很多细节之处,都是tradeoff的结果,很难说哪种机制更适合我们的场景,对于搭建中小型的集群和数据中心,Mesos/YARN已经绰绰有余了。
5.参考资料
(5)Apache slider:http://slider.incubator.apache.org/
(6)Apache Twill:http://twill.incubator.apache.org/
(7)Apache Aurora:http://aurora.apache.org/
(8)Apache marathon:https://mesosphere.github.io/marathon/


http://allenlsy.com/google-production-environment
Cluster 需要高效地管理里面的服务器资源。 Google 内部,有 job 的概念。每个 job 有很多子 task(关于 job 和 task 的概念,可以参考之前 Netflix Titus 架构 部分的内容)。job 还定义了期望使用多少资源。
工程师们使用内部软件运行 job。job 被发送给 Borg (Borg 是 Google 内部的容器管理工具,暂时未开源)。Borg master 询问 scheduler 应该由哪几台机器来运行 job。得到回应后,将请求发给指定的机器 Borglet,开始运行 job。如果job 失败,应该会被自动重新运行。
Google 使用 Borg name service (BNS)来定位服务器上的 task。格式是/bns/<cluster>/<user>/<job name>/<task number>
BNS 地址需要映射到 IP:port 地址,并且保证同步。

Lock service

Lock service
这个 BNS 映射到 IP 的信息,存储在另一个内部服务 Chubby 中。Chubby 是一个分布式系统中的锁服务,它下面还提供一个可以用类似 API 方式操作的文件系统,并使用 Paxos 算法来实现各个服务器之间的异步一致 (asynchronous consensus)。这个映射信息就是在 Chubby 里面。
  • HDD + SSD:在存储系统的最底层,是机械硬盘和固态硬盘。
  • D:意思是 disk,用来管理 HDD + SSD。D 提供的服务是存储临时数据,主要是给运行中的 job 使用。
  • Colossus 是基于 google file system 开发的分布式文件系统,运行在 cluster 之内,支持永久保存数据,支持复制、加密等等。
  • Bigtable 是一个NoSQL 数据库。Bigtable 可以保存有序的数据。在 cluster 之间进行复制时,bigtable 保证 eventually consistent。
  • Spanner 是一个 NewSQL 数据库。旨在实现具有 NoSQL 一般可扩展性的关系型数据库。
通过 Borg, Google 将服务器集群实现的和单独一台电脑一样,可以运行 job 也可以存储数据。但是分布式系统会面临机器损坏的情况。举个例子,如果一个job 运行到一半,机器坏了,怎么办?

Monitoring

monitoring
上面那个问题是通过监控的办法解决的。 Borgmon 是 Borg 的监控工具。图中, Borgmon 存在于很多层级。Borgmon 获取各个 task 的运行状态信息,然后最后向上汇总到 global borgmon。而 cluster borgmon 除了把信息汇总给 global borgmon,还发送给 google 的 time series database (时序数据库)以及 alert manager 警报系统。例如当某个 cluster 的错误率异常高的时候,会触发警报。
还有一个辅助工具 Prober ,负责给 task 所在服务器发送请求,并监控响应时间。这是从另一个角度观察服务器的健康状况。Prober 也将信息汇总给 borgmon。

Inter-task communication 任务之间的通讯

task 和其他 task 之间进行通讯使用 Stubby。Stubby 是一个 RPC(远程进程调用) 服务,基于 http,使用 protobuf 协议。或者可以简单的说,task 之间使用 protobuf 进行 RPC。
代码提交成功后,Blaze (Google 的 build tool,它的开源版是 Bazel)将代码编译成二进制文件。Blaze 的使用,需要工程师指定 build 的 output 输出,dependencies 依赖等等。
另一个工具 Continuous testing,从 repo 获得最新代码后开始运行自动化测试。通过后,一个叫 Rapid 的工具会调用 Blaze 来生成 MPM (Midas Package Manager)package。MPM 给软件加上名字、版本号,以及签名以确保软件的 authenticity。最后 MPM package 被 Sisyphus 部署到 production。Sisyphus 是 google 的 deploy 工具,可以做 deploy 过程中很多复杂而又繁琐的工作。并且可以指定 deploy 的方式,比如是立即通过 canary 的方式 deploy,还是指定未来某个时刻 deploy。
http://allenlsy.com/john-wikes-google-borg
在 Borg 的内部,每一台机器上都运行了一个 Borglet 的客户端,负责收集机器的状态信息,并反馈给 BorgMaster。以上面图为例,当收到这个 job 之后,BorgMaster 与 scheduler 合作,计划将在什么时候,在哪些机器上运行这10000个 replica。
对于 Replica,我们并不关心它是在 VM 还是在 container 里运行。 Borg 会根据当前 cell 里面机器剩余资源的情况决定在哪台机器上运行多少个 replica。之后,就能在 Borg 的 UI 界面上看到,多少个 replica 正在运行等等状态了。
Google 内部 replica 运行在 container 里面,而 GCP 上的 replica 运行在 VM 里面,因为需要更高的安全策略。

根据 job 的优先度,可以被分为 prod 和 non-prod 两类。prod 会被优先运行。而 non-prod 通常是一些后台任务或者不太紧急的任务。当机器资源不足时,non-prod 要为 prod 让位(preemption)。让位的方式就是, non-prod job (很可能以容器的形式在运行)会被 kill,然后在另一台机器上重新启动运行。上面这幅图里展示了,prod 和 non-prod job 在一整周的运行时间里,所遇到的问题的平均次数。灰色部分表示 preemption 的情况。蓝色部分表示例行服务器系统升级。可以看到,prod 和 non-prod 同样时间里为系统升级被 kill 的次数差不多,但 non-prod 的 preemption 次数就多很多了。
另外,在 Google 的生产环境中,每天都有很多机器故障停机。此时 Borg 会在其他机器上重新运行被停掉的程序。
但是,很多时候人们在描述 job 的时候会要求比 job 所需更多的资源。在 Borg 里,如果出现这种情况,那么 Borg 会把超出合理范围的预留资源分配给其他 job,连 prod 的高优先级 job 也不能例外。
那么,下一个问题就是,多少预留资源是合理的?
经过观察发现,大部分用都只使用了声明资源的 1/3 左右。当然这是为突发状况准备的。上图是一段时间内一台服务器的资源使用情况。最顶上灰色是 job 声明需要的资源,红色线是实际使用率。Google 做了一个算法,计算出了一个合理的预留资源数量,在图中用蓝色线表示。红蓝线之间黄色区域是给 job 的实际的预留资源。而绿色区域是可以用作其他 job 的资源。这条蓝色的线会根据 job 使用资源的变化而调整。我们可以看出,当流量猛增的时候,黄色区域变小,Borg 会自动 scale。
绿色区域经常被用作 non-prod 的 job 使用。要记得,当 prod 需要更多资源时, 这些 non-prod job 会被清掉,在另一个 cell 上重启。所以这个操作当然是越少越好。
当工程师在开发一个产品的时候,在服务器端,他们如果只关心他们 app 的服务器,只关注他们的业务逻辑,那么对他们是最有利的。这个 app 服务器在整个 app 所需的服务器里面只是很小的一部分。还需要有其他服务器来做辅助工作,比如存储、保存配置信息、监测、计算服务器使用账单、系统升级等等。而很多 app 可以共用这些辅助服务器。
在提供了这些配套服务以后,这样就有了面向 Google 以外的 Google Cloud Platform。而一个轻量版的 Borg ,结合 Docker 作为容器打包工具,就成为了 Kubernetes。

Thursday, October 27, 2016

Apache HttpClient



http://stackoverflow.com/questions/5769717/how-can-i-get-an-http-response-body-as-a-string-in-java
Every library I can think of returns a stream. You could use IOUtils.toString() from Apache Commons IO to read an InputStream into a String in one method call. E.g.:
URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
Update: I changed the example above to use the content encoding from the response if available. Otherwise it'll default to UTF-8 as a best guess, instead of using the local system default.
  1. Using EntityUtils and HttpEntity
    HttpResponse response = httpClient.execute(new HttpGet(URL));
    HttpEntity entity = response.getEntity();
    String responseString = EntityUtils.toString(entity, "UTF-8");
    System.out.println(responseString);
  2. HttpResponse response = httpClient.execute(new HttpGet(URL));
    String responseString = new BasicResponseHandler().handleResponse(response);
    System.out.println(responseString);
http://hc.apache.org/httpclient-3.x/performance.html

Reuse the HttpClient instance

Generally it is recommended to have a single instance of HttpClient per communication component or even per application. However, if the application makes use of HttpClient only very infrequently, and keeping an idle instance of HttpClient in memory is not warranted, it is highly recommended to explicitly shut down the multithreaded connection manager prior to disposing the HttpClient instance. This will ensure proper closure of all HTTP connections in the connection pool.

Concurrent execution of HTTP methods

If the application logic allows for execution of multiple HTTP requests concurrently (e.g. multiple requests against various sites, or multiple requests representing different user identities), the use of a dedicated thread per HTTP session can result in a significant performance gain. HttpClient is fully thread-safe when used with a thread-safe connection manager such as MultiThreadedHttpConnectionManager. Please note that each respective thread of execution must have a local instance of HttpMethod and can have a local instance of HttpState or/and HostConfiguration to represent a specific host configuration and conversational state. At the same time the HttpClient instance and connection manager should be shared among all threads for maximum efficiency.
For details on using multiple threads with HttpClient please refer to the HttpClient Threading Guide.

http://stackoverflow.com/questions/30528912/java-httpclient-4-x-performance-guide-to-resolve-issue
Please DO re-use the HttpClient instance! HttpClient instances are very expensive. By throwing it away not only you throw away the instance itself, you also throw away the SSL context, the connection manager, and all persistent connections that might be kept-alive by the connection manager.

http://stackoverflow.com/questions/24904275/thread-is-blocked-when-to-reuse-appache-httpclient

http://www.cnblogs.com/wasp520/archive/2012/07/06/2580101.html
我服务器端APACHE的配置 
  1. Timeout 30  
  2. KeepAlive On   #表示服务器端不会主动关闭链接   
  3. MaxKeepAliveRequests 100  
  4. KeepAliveTimeout 180   

因此这样的配置就会导致每个链接至少要过180S才会被释放,这样在大量请求访问时就必然会造成链接被占满,请求等待的情况。 
在通过DEBUH后发现HttpClient在method.releaseConnection()后并没有把链接关闭,这个方法只是将链接返回给connection manager。如果使用HttpClient client = new HttpClient()实例化一个HttpClient connection manager默认实现是使用SimpleHttpConnectionManager

代码实现很简单,所有代码就和最上面的事例代码一样。只需要在HttpMethod method = new GetMethod("http://www.apache.org");加上一行HTTP头的设置即可 
  1. method.setRequestHeader("Connection""close");  

看一下HTTP协议中关于这个属性的定义:
HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,
       Connection: close
现在再说一下客户端关闭链接和服务器端关闭链接的区别。如果采用客户端关闭链接的方法,在客户端的机器上使用netstat –an命令会看到很多TIME_WAIT的TCP链接。如果服务器端主动关闭链接这中情况就出现在服务器端。
参考WIKI上的说明http://wiki.apache.org/HttpComponents/FrequentlyAskedConnectionManagementQuestions
The TIME_WAIT state is a protection mechanism in TCP. The side that closes a socket connection orderly will keep the connection in state TIME_WAIT for some time, typically between 1 and 4 minutes.
TIME_WAIT的状态会出现在主动关闭链接的这一端。TCP协议中TIME_WAIT状态主要是为了保证数据的完整传输。具体可以参考此文档:
http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html#ss2.7
另外强调一下使用上面这些方法关闭链接是在我们的应用中明确知道不需要重用链接时可以主动关闭链接来释放资源。如果你的应用是需要重用链接的话就没必要这么做,使用原有的链接还可以提供性能。

http://jinnianshilongnian.iteye.com/blog/2089792
因为使用了连接池,但连接不够用,造成大量的等待;而且这种等待都有滚雪球的效应(和交易组最近使用的apache common dbcp存在的风险是类似的)。

httpclient 3.1 使用synchronized+wait+notifyAll,存在两个问题,量大synchronized慢和notifyAll可能造成线程饥饿;httpclient 4.2.3 使用 ReentrantLock(默认非公平) + Condition(每个线程一个)。





Tuesday, October 25, 2016

Java Internal



http://blog.csdn.net/yangzl2008/article/details/43202969
https://my.oschina.net/ambitor/blog/663957
我们在学习使用Java的过程中,一般认为new出来的对象都是被分配在堆上,但是这个结论不是那么的绝对,通过对Java对象分配的过程分析,可以知道有两个地方会导致Java中new出来的对象并一定分别在所认为的堆上。这两个点分别是Java中的逃逸分析TLAB(Thread Local Allocation Buffer

逃逸分析,是一种可以有效减少Java 程序中同步负载和内存堆分配压力的跨函数全局数据流分析算法。通过逃逸分析,Java Hotspot编译器能够分析出一个新的对象的引用的使用范围从而决定是否要将这个对象分配到堆上。
在计算机语言编译器优化原理中,逃逸分析是指分析指针动态范围的方法,它同编译器优化原理的指针分析和外形分析相关联。当变量(或者对象)在方法中分配后,其指针有可能被返回或者被全局引用,这样就会被其他过程或者线程所引用,这种现象称作指针(或者引用)的逃逸(Escape)。

Java在Java SE 6u23以及以后的版本中支持并默认开启了逃逸分析的选项。Java的 HotSpot JIT编译器,能够在方法重载或者动态加载代码的时候对代码进行逃逸分析,同时Java对象在堆上分配和内置线程的特点使得逃逸分析成Java的重要功能。

经过逃逸分析之后,可以得到三种对象的逃逸状态。
  1. GlobalEscape(全局逃逸), 即一个对象的引用逃出了方法或者线程。例如,一个对象的引用是复制给了一个类变量,或者存储在在一个已经逃逸的对象当中,或者这个对象的引用作为方法的返回值返回给了调用方法。
  2. ArgEscape(参数级逃逸),即在方法调用过程当中传递对象的应用给一个方法。这种状态可以通过分析被调方法的二进制代码确定。
  3. NoEscape(没有逃逸),一个可以进行标量替换的对象。可以不将这种对象分配在传统的堆上。
编译器可以使用逃逸分析的结果,对程序进行一下优化。
  1. 堆分配对象变成栈分配对象。一个方法当中的对象,对象的引用没有发生逃逸,那么这个方法可能会被分配在栈内存上而非常见的堆内存上。
  2. 消除同步。线程同步的代价是相当高的,同步的后果是降低并发性和性能。逃逸分析可以判断出某个对象是否始终只被一个线程访问,如果只被一个线程访问,那么对该对象的同步操作就可以转化成没有同步保护的操作,这样就能大大提高并发程度和性能。
  3. 矢量替代。逃逸分析方法如果发现对象的内存存储结构不需要连续进行的话,就可以将对象的部分甚至全部都保存在CPU寄存器内,这样能大大提高访问速度。

未开启逃逸分析设置为:
        -server -verbose:gc 

开启逃逸分析设置为:
  1. -server -verbose:gc -XX:+DoEscapeAnalysis  
JVM在内存新生代Eden Space中开辟了一小块线程私有的区域,称作TLAB(Thread-local allocation buffer)。默认设定为占用Eden Space的1%。在Java程序中很多对象都是小对象且用过即丢,它们不存在线程共享也适合被快速GC,所以对于小对象通常JVM会优先分配在TLAB上,并且TLAB上的分配由于是线程私有所以没有锁开销。因此在实践中分配多个小对象的效率通常比分配一个大对象的效率要高。
也就是说,Java中每个线程都会有自己的缓冲区称作TLAB(Thread-local allocation buffer),每个TLAB都只有一个线程可以操作,TLAB结合bump-the-pointer技术可以实现快速的对象分配,而不需要任何的锁进行同步,也就是说,在对象分配的时候不用锁住整个堆,而只需要在自己的缓冲区分配即可。

  1. 编译器通过逃逸分析,确定对象是在栈上分配还是在堆上分配。如果是在堆上分配,则进入选项2.
  2. 如果tlab_top + size <= tlab_end,则在在TLAB上直接分配对象并增加tlab_top 的值,如果现有的TLAB不足以存放当前对象则3.
  3. 重新申请一个TLAB,并再次尝试存放当前对象。如果放不下,则4.
  4. 在Eden区加锁(这个区是多线程共享的),如果eden_top + size <= eden_end则将对象存放在Eden区,增加eden_top 的值,如果Eden区不足以存放,则5.
  5. 执行一次Young GC(minor collection)。
  6. 经过Young GC之后,如果Eden区任然不足以存放当前对象,则直接分配到老年代。
对象不在堆上分配主要的原因还是堆是共享的,在堆上分配有锁的开销。无论是TLAB还是栈都是线程私有的,私有即避免了竞争(当然也可能产生额外的问题例如可见性问题),这是典型的用空间换效率的做法。


Monday, October 24, 2016

What is the toughest question ever asked in any interview?



https://www.quora.com/What-is-the-toughest-question-ever-asked-in-any-interview
This question was asked to one of my friends in his campus placement interview with Microsoft.
Interviewer: Ok, so one last question. A right triangle has a hypotenuse equal to 10 and an altitude to the hypotenuse equal to 6. Find the area of the triangle

My friend started thinking 'Why would a software company ask a geometry question and that too such a trivial one! Maybe it is a trick question!? Maybe it isn't a trick question and he just wants me to think otherwise so that I would screw up even this paltry question!?'
He contemplated for a while and answered:
Friend: Sir, as area of any triangle is 0.5*base*height, the answer to this question would be 0.5 *10*6 which evaluates to 30!
Interviewer: Are you sure? Think about it again!
*My friend thought for a while and replied with full confidence*
Friend: Yes sir, I am sure the area of triangle is 30. You are just messing up with my brain to make me think otherwise so that I would commit error even in this trivial question.
(This was the exact response which my friend gave to the interviewer!)
Interviewer: Well, your answer is wrong XYZ, and that was the last question of this interview. You can wait outside until we declare the results.
Friend: Sir, could you please tell me what's the correct answer?
Interviewer: The correct answer is, such a triangle cannot exist. If you think about it, you will come to know why!
And my friend left the room dumbfounded, still thinking about why that triangle cannot exist.
Verdict: He wasn't selected!
----------------------------------------------------------------------------------------------
Here's the solution: It turns out that the maximum length of the altitude to hypotenuse in the above triangle can only be 5 and not 6, so its maximal area would be 25.
The angle opposite the hypotenuse must be a right angle of 90 degrees. This means the two sides of the triangle must subtend a 180 degree angle in a circle. The hypotenuse must be the diameter of a circle, and the third point can be any point on the circle (except the endpoints of the hypotenuse).


The vertical distance from the third point to the hypotenuse is the altitude to the hypotenuse. This is largest when the third point is at the top or bottom of the circle, and the vertical distance is equal to the radius of the circle (half the length of the hypotenuse, which is the diameter of the circle).


Therefore, a right triangle with a hypotenuse of 10 can have an altitude on its hypotenuse of at most 5.
I would say it was one of the most difficult questions to answer because the hard question was very well disguised as a trivial question. And although it seemed quite strange that a software giant like Microsoft asked a geometry question, it wasn't strange at all. The real motive could have been to check whether the candidate has good analytical skills and levelheadedness, both of which are quite essential in the programming world.

After all the confusion, I must make it clear that altitude to the hypotenuse is given as 6
EDIT1: With a little thinking outside of the box, Job Bouwman showed that such a triangle may actually exist, although (according to his own words) you take a risk of coming across as a smart-ass...
EDIT2: I just stumbled upon a question which too was probably asked in a Microsoft interview(not sure) and I thought it would be nice to share it with you people. Here’s the link:

This incident took place with one of my friends during an interview for a technical position.
I- Interviewer, C- Candidate
I: There is a circular race-track of diameter 1 km. Two cars A and B are standing on the track diametrically opposite to each other. They are both facing in the clockwise direction. At t=0, both cars start moving at a constant acceleration of 0.1 m/s/s (initial velocity zero). Since both of them are moving at same speed and acceleration and clockwise direction, they will always remain diametrically opposite to each other throughout their motion.
At the center of the race-track there is a bug. At t=0, the bug starts to fly towards car A. When it reaches car A, it turn around and starts moving towards car B. When it reaches B, it again turns back and starts moving towards car A. It keeps repeating the entire cycle. The speed of the bug is 1 m/s throughout.
After 1 hour, all 3 bodies stop moving. What is the total distance traveled by the bug?
How would you, the reader, approach this problem?
First of all here is a graphic to help you visualize initial condition.
Now, let’s try to visualize the path of the bug. The question states that it will always be moving towards one of the cars. But the cars themselves are moving. So, bug’s path would not be a straight line. It would be a complicated spiral like path. Plus, the cars are not moving at constant velocity. They are accelerating, this will further complicate the spiral path.
So, the approach is clear. We need to find mathematical equation corresponding to bug’s path for one cycle. Then we can simply calculate the distance from this equation and a little integral calculus. Then multiply the answer with the number of cycles.
But how to calculate the equation of the complicated spiral path?
At this point my friend simply gave up.
The interviewer encouraged him to at least tell his approach. My friend explained above approach.
At this interviewer replied - “Are you ready to hear my solution?”
My friend was more than eager.
The interviewer said - “Bug is traveling at a constant speed of 1 m/s throughout it’s motion. At this constant speed, he travels for 1 hour. So distance = speed x time = 1 m/s x 3600s = 3.6 km.”

My thoughts -
I think this question (and its solution) provides a profound description of life itself. Most of the times, we face situations that seem hopelessly complicated. There seems to be no way out.
But once you hack away the outer layers - parts that are unimportant, parts that are irrelevant to the final solution, parts that are only present there to complicate the situation, you reach the core. And the core is beautiful, with an elegance that is signified by its sheer simplicity.

Some people are asking for further explanation.
What happened here is that the question provided a lot of details about the complicated interconnected motion of the three bodies. But all of that information is irrelevant to calculate the final answer. The only important statement in the question that you have to identify is, “The speed of the bug is 1m/s throughout”. This is the core. Once you identify that it’s speed was constant throughout, the actual path the bug took becomes irrelevant. No matter how complicated that path was, the total distance would be still given by the simple equation “distance = speed x time”
That was the real challenge. To be able to see through the outer layers and get to the core. That ability is what the interviewer was looking for.

http://www.mytechinterviews.com/globe-walker
Question: How many points are there on the globe where, by walking one mile south, then one mile east and then one mile north, you would reach the place where you started?
Answer: The trivial answer to this question is one point, namely, the North Pole. But if you think that answer should suffice, you might want to think again! :)
Let’s think this through methodically. If we consider the southern hemisphere, there is a ring near the South Pole that has a circumference of one mile. So what if we were standing at any point one mile north of this ring? If we walked one mile south, we would be on the ring. Then one mile east would bring us back to same point on the ring (since it’s circumference is one mile). One mile north from that point would bring us back to the point were we started from. If we count, there would be an infinite number of points north of this one mile ring.
So what’s our running total of possible points? We have 1 + infinite points. But we’re not done yet!
Consider a ring that is half a mile in circumference near the South Pole. Walking a mile along this ring would cause us to circle twice, but still bring us to back to the point we started from. As a result, starting from a point that is one mile north of a half mile ring would also be valid. Similarly, for any positive integer n, there is a circle with radius
r = 1 / (2 * pi * n)
centered at the South Pole. Walking one mile along these rings would cause us to circle n times and return to the same point as we started. There are infinite possible values for n. Furthermore, there are infinite ways of determining a starting point that is one mile north of these n rings, thus giving us (infinity * infinity) possible points that satisfy the required condition.
So the real answer to this question is 1 + infinity * infinity = infinite possible points!

Labels

Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

Popular Posts