YARN Essentials
YARN is a generic resource platform to manage resources in a typical cluster.
YARN enables multiple applications to run simultaneously on the same shared cluster and allows applications to negotiate resources based on need. resource allocation/management is central to YARN.
In Hadoop 1.x, there was a single JobTracker service that was overloaded with many things such as cluster resource management, scheduling jobs, managing computational resources, restarting failed tasks, monitoring TaskTrackers, and so on.
Locality awareness
Multitenancy
support more data processing models
separate two major functionalities, resource management and job scheduling or monitoring of JobTracker, into separate daemons, that is, a cluster level ResourceManager (RM) and an application-specific ApplicationMaster (AM).
master-slave: ResourceManager is the master and node-specific slave NodeManager (NM)
The ResourceManager is the supervisor component that manages the resources among the applications in the whole system. The per-application ApplicationMaster is the application-specific daemon that negotiates resources from ResourceManager and works in hand with NodeManagers to execute and monitor the application's tasks.
The application-level ApplicationMaster is responsible for negotiating resources from the ResourceManager on application submission, such as memory, CPU, disk, and so on. It is also responsible for tracking an application's status and monitoring application processes in coordination with the NodeManager.
ResourceManager
ApplicationMaster
NodeManager
NodeManager acts as a per-machine agent and is responsible for managing the life cycle of the container and for monitoring their resource usage.
scheduler policies
The FIFO scheduler
The Fair scheduler
The Capacity scheduler
YARN is a generic resource platform to manage resources in a typical cluster.
YARN enables multiple applications to run simultaneously on the same shared cluster and allows applications to negotiate resources based on need. resource allocation/management is central to YARN.
In Hadoop 1.x, there was a single JobTracker service that was overloaded with many things such as cluster resource management, scheduling jobs, managing computational resources, restarting failed tasks, monitoring TaskTrackers, and so on.
Locality awareness
Multitenancy
support more data processing models
separate two major functionalities, resource management and job scheduling or monitoring of JobTracker, into separate daemons, that is, a cluster level ResourceManager (RM) and an application-specific ApplicationMaster (AM).
master-slave: ResourceManager is the master and node-specific slave NodeManager (NM)
The ResourceManager is the supervisor component that manages the resources among the applications in the whole system. The per-application ApplicationMaster is the application-specific daemon that negotiates resources from ResourceManager and works in hand with NodeManagers to execute and monitor the application's tasks.
The application-level ApplicationMaster is responsible for negotiating resources from the ResourceManager on application submission, such as memory, CPU, disk, and so on. It is also responsible for tracking an application's status and monitoring application processes in coordination with the NodeManager.
ResourceManager
ApplicationMaster
NodeManager
NodeManager acts as a per-machine agent and is responsible for managing the life cycle of the container and for monitoring their resource usage.
scheduler policies
The FIFO scheduler
The Fair scheduler
The Capacity scheduler