Sunday, March 4, 2018

Mesos



/var/log/mesos/
/var/data/mesos

http://mesos.apache.org/documentation/latest/sandbox/#where-is-it
Mesos refers to the “sandbox” as a temporary directory that holds files specific to a single executor. Each time an executor is run, the executor is given its own sandbox and the executor’s working directory is set to the sandbox.

The sandbox is located on the agent, inside a directory tree like the following:
root ('--work_dir')
|-- slaves
|   |-- latest (symlink)
|   |-- <agent ID>
|       |-- frameworks
|           |-- <framework ID>
|               |-- executors
|                   |-- <executor ID>
|                       |-- runs
|                           |-- latest (symlink)
|                           |-- <container ID> (Sandbox!)

https://mesosphere.github.io/dcos-commons/
https://github.com/mesosphere/dcos-commons
https://github.com/dcos/examples.git

Install dcos using vagrant
https://yurisubach.com/2016/07/06/dcos-local-deployment/
https://github.com/dcos/dcos-vagrant
  1. Access the GUI http://m1.dcos/
vagrant up|destroy
https://github.com/dcos/dcos-cli/issues/679
If you are using Vagrant, the --user=vagrant switch works. Note it still asks for a password, I entered the password as vagrant and I got in.
https://github.com/dcos/examples/tree/master/redis/1.10
dcos package install mr-redis
dcos node ssh --master-proxy --leader --user=vagrant
you will need to use the API available on mrredis.mesos:5656, from within the DC/OS cluster, will create a Redis instance with the name test, a memory capacity of 100 MB and 3 Redis slaves
curl -X POST mrredis.mesos:5656/v1/CREATE/test3/100/1/3
curl -s mrredis.mesos:5656/v1/STATUS/test | jq

docker run -it --rm redis:4-alpine redis-cli -h 10.0.1.5 -p 6380

curl -X DELETE mrredis.mesos:5656/v1/DELETE/test
dcos package uninstall mr-redis

https://github.com/dcos/examples/tree/master/elasticsearch/1.10
dcos package install elasticsearch
In the following we will use the DC/OS Admin Router to provide access to the Elasticsearch UI: use the URL http://$DCOS_DASHBOARD/service/elasticsearch/ and replace $DCOS_DASHBOARD with the URL of your DC/OS UI:

curl -XPUT 192.168.65.111:1025/customer?pretty

curl 192.168.65.111:1025/_cat/indices?v

curl -XPUT 192.168.65.111:1025/customer/external/1?pretty -d '
{
   "name": "Dave C Os"
}'
dcos package uninstall elasticsearch
https://gist.github.com/benclarkwood/77dea418bcac21a91d4a28e59b489af1


http://blog.dataart.com/getting-started-with-packages-in-dcos/

https://docs.mesosphere.com/1.11/tutorials/dcos-101/
https://mesosphere.github.io/marathon/docs/persistent-volumes.html
You can create a stateful application by specifying a local persistent volume. Local volumes enable stateful tasks because tasks can be restarted without data loss.
When you specify a local volume or volumes, tasks and their associated data are “pinned” to the node they are first launched on and will be relaunched on that node if they terminate. The resources the application requires are also reserved. Marathon will implicitly reserve an appropriate amount of disk space (as declared in the volume via persistent.size) in addition to the sandbox disk size you specify as part of your application definition.



    {
      "containerPath": "data",
      "mode": "RW",
      "persistent": {
        "type": "root",
        "size": 10,
        "constraints": []
      }
    }
    
    https://docs.mesosphere.com/1.7/usage/tutorials/wordpress-mysql/
    https://docs.mesosphere.com/services/elastic/2.2.0-5.6.5/install/

    http://mesos.apache.org/documentation/latest/deploy-scripts/
    One particularly useful setting is LIBPROCESS_IP, which tells the master and agent binaries which IP address to bind to; in some installations, the default interface that the hostname resolves to is not the machine’s external IP address, so you can set the right IP through this variable

    https://github.com/uzyexe/mesos-marathon-demo/blob/master/docker-compose.yml - works
    http://mesos.apache.org/documentation/latest/endpoints/

       /slaves
       /master/slaves
       /health
       /master/health
    https://hub.docker.com/r/mesosphere/mesos-slave-dind/
    • --privileged=true - Provides access to cgroups

    Recommended Environment Variables

    • MESOS_CONTAINERIZERS - Include docker to enable running tasks as docker containers. Ex: docker,mesos
    • MESOS_RESOURCES - Specify resources to avoid oversubscribing via auto-detecting host resources. Ex: cpus:4;mem:1280;disk:25600;ports:[21000-21099]
    • DOCKER_NETWORK_OFFSET - Specify an IP offset to give each mesos-slave-dind container (default: 0.0.1.0). Ex: 0.0.1.0 (slave A), 0.0.2.0 (slave B)
    • DOCKER_NETWORK_SIZE - Specify a CIDR range to apply to the above offset (default=24).
    • VAR_LIB_DOCKER_SIZE - Specify the max size (in GB) of the loop device to be mounted at /var/lib/docker (default=5). This is only used if OverlayFS is not supported by the kernel or the parent docker is configured to use AUFS.

    https://github.com/dcos/dcos-docker#network-routing-docker-for-mac
    HyperKit (the hypervisor used by Docker for Mac) does not currently support IP routing on Mac.
    Use one of the following alternative solutions instead:



    https://docs.mesosphere.com/1.11/deploying-services/expose-service/
    Create a Marathon app definition with the required "acceptedResourceRoles":["slave_public"] parameter specified.

    All other users: You can use Marathon-LB, a rapid proxy and load balancer that is based on HAProxy.
    https://mesosphere.github.io/marathon/docs/persistent-volumes.html
    https://docs.mesosphere.com/1.7/usage/tutorials/marathon/stateful-services/
    You’ll notice that we’re creating a volume for postgres to use for its data. Even if the task dies and restarts, it will get that volume back. Next, add this service to your cluster:
    dcos marathon app add /1.7/usage/tutorials/marathon/stateful-services/postgres.marathon.json
    
    
    One the service has been scheduled and the docker container has downloaded, postgres will become healthy and be ready to use. You can see this by checking out what tasks are running on your cluster:
    dcos marathon task list
    

    dcos marathon app stop postgres
    

    dcos marathon app start postgres
    
    To restore the state of your cluster as it was before installing the stateful service, you delete the service:
    dcos marathon app remove postgres
    
    https://www.youtube.com/watch?v=kDVBRh9J2Ys

    https://www.reddit.com/r/devops/comments/6vft4j/main_differences_between_apache_mesos_vs/
    DCOS is highly opinionated in how you need to run your services.
    Mesos doesn't give a shit, it's pretty modular and can be plugged into any scheduler you want, along with whatever service discovery architecture.
    DCOS you pretty much have to do it their way else you're fighting against it, similar to Kubernetes.

    http://mesosframeworks.com/
    https://github.com/mesos/elasticsearch
    https://github.com/adamtulinius/mesos-solrcloud

    https://www.digitalocean.com/community/tutorials/an-introduction-to-mesosphere
    • Master daemon: runs on a master node and manages slave daemons
    • Slave daemon: runs on a master node and runs tasks that belong to frameworks
    • Framework: also known as a Mesos application, is composed of a scheduler, which registers with the master to receive resource offers, and one or more executors, which launches tasks on slaves. Examples of Mesos frameworks include Marathon, Chronos, and Hadoop
    • Offer: a list of a slave node's available CPU and memory resources. All slave nodes send offers to the master, and the master provides offers to registered frameworks
    • Task: a unit of work that is scheduled by a framework, and is executed on a slave node. A task can be anything from a bash command or script, to an SQL query, to a Hadoop job
    • Apache ZooKeeper: software that is used to coordinate the master nodes
    https://docs.ovh.com/fr/docker/quick-start-with-marathon/
    https://mesosphere.github.io/marathon/docs/application-basics.html

    http://blog.csdn.net/pelick/article/details/45652117
    Marathon依赖zk和mesos,如果没用mesos集群的话可以跑local模式。
    首先download zookeeper包,修改conf/zoo.cfg,然后
    bin/zkServer.sh start
    • 1
    启动一个在local:2181的单个zk服务。zk对Marathon来说,用于做同一个app的多个副本的选举,以做到app fail后marathon可以在新的mesos slave上重新启动。另一方面,zk也用于mesos集群的HA模式。即zk同时负责了对mesos和marathon的HA,但节点路径是分开的,可以见下面启动参数。

    我也是这样在本地启动了一个master和一个slave
    /usr/local/sbin/mesos-master --registry=in_memory --ip=127.0.0.1
    /usr/local/sbin/mesos-slave --master=127.0.0.1:5050
    • 1
    • 2
    然后可以在localhost:5050 看到mesos master的UI。
    下载Marathon包,我使用的是0.8.0版本,支持0.20.0+版本的mesos。下载完解压就可以使用 https://mesosphere.github.io/marathon/
    ./bin/start --master 127.0.0.1:5050 --zk zk://localhost:2181/marathon
    • 1
    这里连接的就是local启动的mesos master和zk。在local:8080查看marathon的UI。
    MESOS_NATIVE_JAVA_LIBRARY=/usr/local/Cellar/mesos/1.4.1/lib/libmesos.dylib
    ./marathon --master 127.0.0.1:5050 --zk zk://localhost:2181/marathon


    Marathon相比Mesos上另一个service调度框架Apache Aurora更加易上手,本身是Scala开发的,整体和mesos一样让人感觉轻量,主要提供的是google-scale能力和方便的app管理服务。

    http://xialingsc.github.io/home//mesos/How-to-install-Mesos-On-Mac/
    在实践过程中,还需要将/var/lib/mesos的权限赋予当前用户,否则会出现“/var/lib/mesos/replicated_log/LOCK: Permission denied Failed to recover the log”,修改其权限的方式为:
    sudo chown `whoami` /var/lib/mesos
    https://medium.com/@GetLevvel/how-to-get-started-with-apache-mesos-marathon-and-zookeeper-24fb72d76cf9



    https://medium.com/@gargar454/deploy-a-mesos-cluster-with-7-commands-using-docker-57951e020586

    https://github.com/sekka1/mesosphere-docker
    1.   HOST_IP=$(docker-machine env default)
    https://github.com/sekka1/mesosphere-docker/issues/11


    https://mesosphere.com/blog/installing-mesos-on-your-mac-with-homebrew/
    brew update
    brew install mesos
    
    /usr/local/Cellar/mesos/0.19.0: 83 files, 24M, built in 17.4 minutes
    
    /usr/local/sbin/mesos-master --registry=in_memory --ip=127.0.0.1
    
    A Mesos cluster needs at least one Mesos Master to coordinate and dispatch tasks onto Mesos Slaves. When experimenting on your laptop, a single master is all you need. Full production clusters, such as those you might run in a public cloud or in a private datacenter, will usually run Mesos in High Availability Mode. A highly-available Mesos cluster (designed for fault-tolerance with no single point of failure) will often have three or more masters running.
    Once your Mesos Master has started, you can visit its management console: http://localhost:5050
    Since a Mesos Master needs slaves onto which it will dispatch jobs, you might also want to run some of those. Mesos Slaves can be started by running the following command for each slave you wish to launch:
    /usr/local/sbin/mesos-slave --master=127.0.0.1:5050 --work_dir=~/mesos/slave1
    

    https://github.com/mesosphere/dcos-commons


    https://www.usenix.org/legacy/events/nsdi11/tech/full_papers/Ghodsi.pdf

    https://medium.com/@GetLevvel/how-to-get-started-with-apache-mesos-marathon-and-zookeeper-24fb72d76cf9

    http://mesos.readthedocs.io/en/stable/getting-started/
    https://platform9.com/blog/compare-kubernetes-vs-mesos/
    https://stackoverflow.com/questions/26705201/whats-the-difference-between-apaches-mesos-and-googles-kubernetes


    Apache Mesos, abstracts CPU, memory, and disk resources in a way that allows datacenters to function as if they were one large machine

    It has built-in support for isolating processes using containers, such as Linux control groups (cgroups) and Docker, allowing multiple applications to run alongside each other on a single machine.

    resource offers, two-tier scheduling, and resource isolation
    uses resource offers to advertise resources to frameworks

    resource scheduling is the responsibility of the Mesos master’s allocation module and the framework’s scheduler, a concept known as two-tier scheduling

    Dominant Resource Fairness (DRF)
    DRF seeks to maximize the minimum dominant share across all users. For example, if user A runs CPU-heavy tasks and user B runs memory-heavy tasks, DRF attempts to equalize user A’s share of CPUs with user B’s share of memory

    Resource isolation
    Using Linux cgroups or Docker containers to isolate processes, Mesos allows for multitenancy, or for multiple processes to be executed on a single Mesos slave.

    When using cgroups, any packages or libraries that the tasks might depend on must be already present on the host operating system.

    The leading master is responsible for deciding which resources to offer to a particular framework using a pluggable allocation module, or scheduling algorithm, to distribute resource offers to the various schedulers. The scheduler can then either accept or reject the offer based on whether it has any work to be performed at that time.

    --attributes='datacenter:pdx1;rack:1-1;os:rhel7'
    --resources='cpu:24;mem:24576;disk:409600'

    a framework is the term given to any Mesos application that’s responsible for scheduling and executing tasks on a cluster. A framework is made up of two components: a scheduler and an executor.

    A scheduler is typically a long-running service responsible for connecting to a Mesos master and accepting or rejecting resource offers. Mesos delegates the responsibility of scheduling to the framework

    An executor is a process launched on a Mesos slave that runs a framework’s tasks on a slave

    a Spark driver program connects to a cluster manager—the Spark master—that in turn distributes tasks to various worker nodes.

    the Spark Driver refers to the machine running the Spark job, and the SparkContext is the main entry point to Spark. The SparkContext is responsible for connecting to a cluster manager and running tasks on the cluster.


    Labels

    Review (572) System Design (334) System Design - Review (198) Java (189) Coding (75) Interview-System Design (65) Interview (63) Book Notes (59) Coding - Review (59) to-do (45) Linux (43) Knowledge (39) Interview-Java (35) Knowledge - Review (32) Database (31) Design Patterns (31) Big Data (29) Product Architecture (28) MultiThread (27) Soft Skills (27) Concurrency (26) Cracking Code Interview (26) Miscs (25) Distributed (24) OOD Design (24) Google (23) Career (22) Interview - Review (21) Java - Code (21) Operating System (21) Interview Q&A (20) System Design - Practice (20) Tips (19) Algorithm (17) Company - Facebook (17) Security (17) How to Ace Interview (16) Brain Teaser (14) Linux - Shell (14) Redis (14) Testing (14) Tools (14) Code Quality (13) Search (13) Spark (13) Spring (13) Company - LinkedIn (12) How to (12) Interview-Database (12) Interview-Operating System (12) Solr (12) Architecture Principles (11) Resource (10) Amazon (9) Cache (9) Git (9) Interview - MultiThread (9) Scalability (9) Trouble Shooting (9) Web Dev (9) Architecture Model (8) Better Programmer (8) Cassandra (8) Company - Uber (8) Java67 (8) Math (8) OO Design principles (8) SOLID (8) Design (7) Interview Corner (7) JVM (7) Java Basics (7) Kafka (7) Mac (7) Machine Learning (7) NoSQL (7) C++ (6) Chrome (6) File System (6) Highscalability (6) How to Better (6) Network (6) Restful (6) CareerCup (5) Code Review (5) Hash (5) How to Interview (5) JDK Source Code (5) JavaScript (5) Leetcode (5) Must Known (5) Python (5)

    Popular Posts