Nomad 導入

Nomadはオーケストレーションツールでオンプレやクラウド環境を超えてコンテナやコンテナではないアプリケーションのデプロイ管理ができます。

インストール

macOS Catalina Version 10.15.7で実行しています。

最初にHomeBrewでnomadをインストールします。

$ brew install nomad

インストールされたことを確認

$ nomad -v
Nomad v1.1.3

devモード

devモードで起動します。他のモードとして通常モードがあり、通常モードではサーバとクライアントを分けて起動することができたり、柔軟な設定が可能です。

起動

エージェントを起動。devモードでサーバーを起動します。

$ nomad agent -dev
==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
            Bind Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
                Client: true
             Log Level: DEBUG
                Region: global (DC: dc1)
                Server: true
               Version: 1.1.3

==> Nomad agent started! Log data will stream in below:

    2021-08-08T00:29:53.496+0900 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2021-08-08T00:29:53.497+0900 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2021-08-08T00:29:53.497+0900 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-08-08T00:29:53.497+0900 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-08-08T00:29:53.497+0900 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-08-08T00:29:53.497+0900 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-08-08T00:29:53.497+0900 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-08-08T00:29:53.497+0900 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-08-08T00:29:53.502+0900 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:4647 Address:127.0.0.1:4647}]"
    2021-08-08T00:29:53.502+0900 [INFO]  nomad.raft: entering follower state: follower="Node at 127.0.0.1:4647 [Follower]" leader=
    2021-08-08T00:29:53.503+0900 [INFO]  nomad: serf: EventMemberJoin: 147ddac625fa.ant.amazon.com.global 127.0.0.1
    2021-08-08T00:29:53.504+0900 [INFO]  nomad: starting scheduling worker(s): num_workers=8 schedulers=[service, batch, system, _core]
    2021-08-08T00:29:53.504+0900 [INFO]  client: using state directory: state_dir=/private/var/folders/3x/2yhk0tsd2r7djkrt82cnhsv8gmbt28/T/NomadClient353248954
    2021-08-08T00:29:53.505+0900 [INFO]  nomad: adding server: server="147ddac625fa.ant.amazon.com.global (Addr: 127.0.0.1:4647) (DC: dc1)"
    2021-08-08T00:29:53.507+0900 [INFO]  client: using alloc directory: alloc_dir=/private/var/folders/3x/2yhk0tsd2r7djkrt82cnhsv8gmbt28/T/NomadClient116304081
    2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=[arch, cni, consul, cpu, host, memory, network, nomad, signal, storage, vault, env_aws, env_gce, env_azure]
    2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=/opt/cni/config
    2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul period=15s
    2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr.cpu: detected cpu frequency: MHz=1400
    2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr.cpu: detected core count: cores=8
    2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr.cpu: detected reservable cores: cpuset=[]
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo0 IP=127.0.0.1
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo0 IP=::1
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
    2021-08-08T00:29:53.615+0900 [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault period=15s
    2021-08-08T00:29:54.873+0900 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
    2021-08-08T00:29:54.874+0900 [INFO]  nomad.raft: entering candidate state: node="Node at 127.0.0.1:4647 [Candidate]" term=2
    2021-08-08T00:29:54.874+0900 [DEBUG] nomad.raft: votes: needed=1
    2021-08-08T00:29:54.874+0900 [DEBUG] nomad.raft: vote granted: from=127.0.0.1:4647 term=2 tally=1
    2021-08-08T00:29:54.874+0900 [INFO]  nomad.raft: election won: tally=1
    2021-08-08T00:29:54.874+0900 [INFO]  nomad.raft: entering leader state: leader="Node at 127.0.0.1:4647 [Leader]"
    2021-08-08T00:29:54.874+0900 [INFO]  nomad: cluster leadership acquired
    2021-08-08T00:29:54.879+0900 [INFO]  nomad.core: established cluster id: cluster_id=695b378c-f623-61ae-a0d8-d3281d4e8367 create_time=1628350194878909000
    2021-08-08T00:29:55.619+0900 [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type error="Get "http://169.254.169.254/computeMetadata/v1/instance/machine-type": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
    2021-08-08T00:29:55.619+0900 [DEBUG] client.fingerprint_mgr.env_gce: error querying GCE Metadata URL, skipping
    2021-08-08T00:29:57.624+0900 [DEBUG] client.fingerprint_mgr.env_azure: could not read value for attribute: attribute=compute/azEnvironment error="Get "http://169.254.169.254/metadata/instance/compute/azEnvironment?api-version=2019-06-04&format=text": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
    2021-08-08T00:29:59.629+0900 [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs=[arch, cpu, host, network, nomad, signal, storage]
    2021-08-08T00:29:59.629+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2021-08-08T00:29:59.629+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2021-08-08T00:29:59.629+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2021-08-08T00:29:59.629+0900 [DEBUG] client.device_mgr: exiting since there are no device plugins
    2021-08-08T00:29:59.630+0900 [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
    2021-08-08T00:29:59.630+0900 [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
    2021-08-08T00:29:59.630+0900 [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver
    2021-08-08T00:29:59.630+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=healthy description=Healthy
    2021-08-08T00:29:59.630+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=mock_driver health=healthy description=Healthy
    2021-08-08T00:29:59.630+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=undetected description="exec driver unsupported on client OS"
    2021-08-08T00:29:59.632+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=
    2021-08-08T00:29:59.633+0900 [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:4647] old_servers=[]
    2021-08-08T00:29:59.687+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=healthy description=Healthy
    2021-08-08T00:29:59.825+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=healthy description=Healthy
    2021-08-08T00:29:59.825+0900 [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[raw_exec mock_driver docker java] undetected:[exec qemu]]"
    2021-08-08T00:29:59.825+0900 [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
    2021-08-08T00:29:59.825+0900 [INFO]  client: started client: node_id=7cd596ff-0354-861c-9fd0-2467e2f2237a
    2021-08-08T00:29:59.827+0900 [DEBUG] client: updated allocations: index=1 total=0 pulled=0 filtered=0
    2021-08-08T00:29:59.829+0900 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0
    2021-08-08T00:29:59.829+0900 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
    2021-08-08T00:29:59.830+0900 [INFO]  client: node registration complete
    2021-08-08T00:29:59.831+0900 [DEBUG] client: state updated: node_status=ready
    2021-08-08T00:30:00.830+0900 [DEBUG] client: state changed, updating node and re-registering
    2021-08-08T00:30:00.831+0900 [INFO]  client: node registration complete

別のターミナルで作業

$ mkdir nomadtest && cd $_
$ nomad job init
$ nomad job init
Example job file written to example.nomad

example.nomadという名前でコメントを省略したものについて以下のようなファイルが作成されます。nomad job init実行時に-shortオプションをつけるとコメント無しのJobファイルが作成されます。

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    network {
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
} 

固定ポートを使用する場合は、network以下でポート番号の指定をしている部分をtoからstaticに変更することで可能です。ただし、基本的にジョブの種類がsystemかロードバランサーのような特殊なジョブのときのみに使用することが推奨されています。

port Parameters

static (int: nil) - Specifies the static TCP/UDP port to allocate. If omitted, a dynamic port is chosen. We do not recommend using static ports, except for system or specialized jobs like load balancers.
to (string:nil) - Applicable when using "bridge" mode to configure port to map to inside the task's network namespace. Omitting this field or setting it to -1 sets the mapped port equal to the dynamic port allocated by the scheduler. The NOMAD_PORT_<label> environment variable will contain the to value.

Deploymentがin progressと表示されて、しばらくするとsuccessfulになります。デプロイが正常に完了しない場合は、failedになります。

$ nomad job run example.nomad
==> 2021-08-08T00:30:07+09:00: Monitoring evaluation "2f88c6c1"
    2021-08-08T00:30:07+09:00: Evaluation triggered by job "example"
==> 2021-08-08T00:30:08+09:00: Monitoring evaluation "2f88c6c1"
    2021-08-08T00:30:08+09:00: Evaluation within deployment: "0aa67e40"
    2021-08-08T00:30:08+09:00: Allocation "38353085" created: node "7cd596ff", group "cache"
    2021-08-08T00:30:08+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T00:30:08+09:00: Evaluation "2f88c6c1" finished with status "complete"
==> 2021-08-08T00:30:08+09:00: Monitoring deployment "0aa67e40"
  ✓ Deployment "0aa67e40" successful
    
    2021-08-08T00:30:36+09:00
    ID          = 0aa67e40
    Job ID      = example
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    cache       1        1       1        0          2021-08-08T00:40:35+09:00

Play

Jobの一覧を確認。ここでは、example.nomadファイル中で定義したjobのexampleが確認できます。

$ nomad job status
ID       Type     Priority  Status   Submit Date
example  service  50        running  2021-08-08T00:30:07+09:00

job名であるexampleを引数に指定するとより詳細を確認できます。AllocationsのIDを確認からAllocations IDを確認できます。これはnomad job run example.nomad実行時に、ログ出力でAllocationの行に記載されるIDと同じものになります。

$ nomad job status example
ID            = example
Name          = example
Submit Date   = 2021-08-08T00:30:07+09:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
cache       0       0         1        0       0         0

Latest Deployment
ID          = 0aa67e40
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
cache       1        1       1        0          2021-08-08T00:40:35+09:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
38353085  7cd596ff  cache       0        run      running  8m13s ago  7m45s ago

確認したAllocation IDを引数に以下のコマンドを実行すると、リソース割当ての詳細を確認できます。Task ResourcesAddressにIPアドレスおよびポート番号が確認できないIssueがあるようです。コンテナにアクセスするための代替案については後ほど言及します。

[feature] Expose contents of DriverNetwork via API #3285

$ nomad alloc status 38353085
ID                  = 38353085-0a43-cdcd-6c07-2697b5dfbf83
Eval ID             = 2f88c6c1
Name                = example.cache[0]
Node ID             = 7cd596ff
Node Name           = 147ddac625fa.ant.amazon.com
Job ID              = example
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 8m33s ago
Modified            = 8m5s ago
Deployment ID       = 0aa67e40
Deployment Health   = healthy

Allocation Addresses
Label  Dynamic  Address
*db    yes      127.0.0.1:30624 -> 6379

Task "redis" is "running"
Task Resources
CPU         Memory           Disk     Addresses
16/500 MHz  748 KiB/256 MiB  300 MiB  

Task Events:
Started At     = 2021-08-07T15:30:25Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2021-08-08T00:30:25+09:00  Started     Task started by client
2021-08-08T00:30:07+09:00  Driver      Downloading image
2021-08-08T00:30:07+09:00  Task Setup  Building Task Directory
2021-08-08T00:30:07+09:00  Received    Task received by client

同じくAllocation IDを指定してログを確認できます。追加の引数でnomad alloc logs 38353085 redisのようにnomadファイルで指定したtask名を指定することもできます。

$ nomad alloc logs 38353085
1:C 07 Aug 15:30:25.450 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.2.12 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:M 07 Aug 15:30:25.451 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 07 Aug 15:30:25.451 # Server started, Redis version 3.2.12
1:M 07 Aug 15:30:25.451 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 07 Aug 15:30:25.451 * The server is now ready to accept connections on port 6379

http://localhost:4646/のURLにアクセスすることでブラウザ上でGUIを通して操作することもできます。

コンテナに接続

先程、nomad alloc status <ALLOCATION_ID>を実行した際に、Task ResourcesAddressにIPアドレスおよびポート番号が確認できませんでしたので、確認するための代替案について言及します。

まずは以下のコマンドでコンテナIDを確認します。

$ docker ps
CONTAINER ID   IMAGE       COMMAND                  CREATED         STATUS         PORTS                                                  NAMES
c08ced4cdb8d   redis:3.2   "docker-entrypoint.s…"   7 minutes ago   Up 7 minutes   127.0.0.1:25343->6379/tcp, 127.0.0.1:25343->6379/udp   redis-433ee0d0-b2a8-a617-6d8d-05a575c7523c

確認したコンテナID(c08ced4cdb8d)もしくはコンテナ名(redis-433ee0d0-b2a8-a617-6d8d-05a575c7523c)を指定して以下のように実行します。すると動的にマッピングされたポート番号が確認できます。

$ docker port c08ced4cdb8d      
6379/tcp -> 127.0.0.1:25343
6379/udp -> 127.0.0.1:25343

以下の方法でも取得可能です。

$ docker inspect c08ced4cdb8d --format='{{ (index (index .NetworkSettings.Ports "6379/tcp") 0) }}'
map[HostIp:127.0.0.1 HostPort:25343]

無事に接続できることが確認できます。

$ redis-cli -p 25343
127.0.0.1:25343> INFO
# Server
redis_version:3.2.12
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b0df607ad3315254
redis_mode:standalone
os:Linux 4.19.121-linuxkit x86_64
arch_bits:64
:
# Cluster
cluster_enabled:0

# Keyspace

設定ファイルの変更

ためしに.nomadファイルのgroup以下のcountを1から3に変更してみます。すると、現状との差分について以下のように確認できます。

$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 create, 1 in-place update)
  +/- Count: "1" => "3" (forces create)
  +/- Task: "redis" (forces in-place update)

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "cache" (failed to place 2 allocations):
    * Resources exhausted on 1 nodes
    * Dimension "network: reserved port collision db=6379" exhausted on 1 nodes

Job Modify Index: 61
To submit the job with version verification run:

nomad job run -check-index 61 example.nomad

When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

先程と同様にnomad job run example.nomadを実行すると、コンテナが3つ起動していることが確認できます。

$ docker ps
CONTAINER ID   IMAGE       COMMAND                  CREATED          STATUS          PORTS                                                  NAMES
4a8c9a46dac1   redis:3.2   "docker-entrypoint.s…"   18 seconds ago   Up 17 seconds   127.0.0.1:29006->6379/tcp, 127.0.0.1:29006->6379/udp   redis-3965372e-177d-d03a-258b-a4c8c3a3cefc
e10c1dd8554d   redis:3.2   "docker-entrypoint.s…"   30 seconds ago   Up 29 seconds   127.0.0.1:23744->6379/tcp, 127.0.0.1:23744->6379/udp   redis-30dbcc89-d4d4-9125-12e1-97d9ee81a578
5e7abae7621a   redis:3.2   "docker-entrypoint.s…"   30 seconds ago   Up 29 seconds   127.0.0.1:25631->6379/tcp, 127.0.0.1:25631->6379/udp   redis-8d5be185-c241-c5f3-ed3f-cb4b0ccd68ca

それぞれのコンテナにはホスト側のネットワークインターフェイスにポート番号が割り振られていることも確認できます。逆に言うと、.nomadファイルでnetwork以下のportでポート番号をtoではなく、staticで指定すると、ホスト側でポート番号がバッティングするのでセットアップできないことが理由で、nomad job run <.nomadファイル>を実行してもin progressから進まなくなります。

$ docker port 4a8c9a46dac1
6379/tcp -> 127.0.0.1:29006
6379/udp -> 127.0.0.1:29006
$ docker port e10c1dd8554d
6379/tcp -> 127.0.0.1:23744
6379/udp -> 127.0.0.1:23744
$ docker port 5e7abae7621a
6379/tcp -> 127.0.0.1:25631
6379/udp -> 127.0.0.1:25631

Stop

以下のようにジョブを終了できます。

$ nomad job stop example
==> 2021-08-08T12:51:53+09:00: Monitoring evaluation "8642a80f"
    2021-08-08T12:51:53+09:00: Evaluation triggered by job "example"
==> 2021-08-08T12:51:54+09:00: Monitoring evaluation "8642a80f"
    2021-08-08T12:51:54+09:00: Evaluation within deployment: "18860b6e"
    2021-08-08T12:51:54+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T12:51:54+09:00: Evaluation "8642a80f" finished with status "complete"
==> 2021-08-08T12:51:54+09:00: Monitoring deployment "18860b6e"
  ✓ Deployment "18860b6e" successful
    
    2021-08-08T12:51:54+09:00
    ID          = 18860b6e
    Job ID      = example
    Job Version = 7
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    cache       3        3       3        0          2021-08-08T12:49:00+09:00

ステータスは以下のようになります。

$ nomad job status
ID       Type     Priority  Status          Submit Date
example  service  50        dead (stopped)  2021-08-08T12:38:37+09:00

通常モード

先程はdevモードで設定を行ないましたが、ここでは通常モードで設定を行ないます。通常モードではサーバとクライアントを分けて起動することができたり、様々な柔軟な設定を行うことができます。ここでは、サーバ1台、クライアント3台構成です。

HashiCorp Nomad Workshopを参考にさせていただきました。

$ MY_PATH=$(pwd)

Nomadサーバ用の.hclファイルを作成します。

$ cat << EOF > nomad-local-config-server.hcl
data_dir  = "${MY_PATH}/local-nomad-data"

bind_addr = "127.0.0.1"

server {
  enabled          = true
  bootstrap_expect = 1
}

advertise {
  http = "127.0.0.1"
  rpc  = "127.0.0.1"
  serf = "127.0.0.1"
}
EOF

Nomadクライアント用の.hclファイルを作成します。

$ cat << EOF > nomad-local-config-client-1.hcl

data_dir  = "${MY_PATH}/local-cluster-data-1"

bind_addr = "127.0.0.1"

client {
  enabled = true
  servers = ["127.0.0.1:4647"]
}

advertise {
  http = "127.0.0.1"
  rpc  = "127.0.0.1"
  serf = "127.0.0.1"
}

ports {
  http = 5641
  rpc  = 5642
  serf = 5643
}
EOF

残り2つのクライアントも同様に設定します。

$ cat << EOF > nomad-local-config-client-2.hcl

data_dir  = "${MY_PATH}/local-cluster-data-2"

bind_addr = "127.0.0.1"

client {
  enabled = true
  servers = ["127.0.0.1:4647"]
}

advertise {
  http = "127.0.0.1"
  rpc  = "127.0.0.1"
  serf = "127.0.0.1"
}

ports {
  http = 5644
  rpc  = 5645
  serf = 5646
}
EOF
$ cat << EOF > nomad-local-config-client-3.hcl

data_dir  = "${MY_PATH}/local-cluster-data-3"

bind_addr = "127.0.0.1"

client {
  enabled = true
  servers = ["127.0.0.1:4647"]
}

advertise {
  http = "127.0.0.1"
  rpc  = "127.0.0.1"
  serf = "127.0.0.1"
}

ports {
  http = 5647
  rpc  = 5648
  serf = 5649
}
EOF

起動用にスクリプトを作成します。

$ cat << EOF > run.sh
#!/bin/sh
pkill nomad
pkill java

sleep 10

nomad agent -config=${MY_PATH}/nomad-local-config-server.hcl &

nomad agent -config=${MY_PATH}/nomad-local-config-client-1.hcl &
nomad agent -config=${MY_PATH}/nomad-local-config-client-2.hcl &
nomad agent -config=${MY_PATH}/nomad-local-config-client-3.hcl &
EOF

作成した起動用スクリプトを実行します。

$ chmod +x run.sh
$ ./run.sh
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.                                                                                        
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-server.hcl
==> Starting Nomad agent...
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-client-1.hcl
==> Starting Nomad agent...
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-client-3.hcl
==> Starting Nomad agent...
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-client-2.hcl
==> Starting Nomad agent...
:~/workspace/nomad-test/normal/ ==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
            Bind Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
                Client: false
             Log Level: INFO
                Region: global (DC: dc1)
                Server: true
               Version: 1.1.3

==> Nomad agent started! Log data will stream in below:

    2021-08-08T12:58:41.139+0900 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-nomad-data/plugins
    2021-08-08T12:58:41.141+0900 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.141+0900 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.141+0900 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.141+0900 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.141+0900 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.141+0900 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.628+0900 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:4647 Address:127.0.0.1:4647}]"
    2021-08-08T12:58:41.628+0900 [INFO]  nomad.raft: entering follower state: follower="Node at 127.0.0.1:4647 [Follower]" leader=
    2021-08-08T12:58:41.631+0900 [INFO]  nomad: serf: EventMemberJoin: 147ddac625fa.ant.amazon.com.global 127.0.0.1
    2021-08-08T12:58:41.631+0900 [INFO]  nomad: starting scheduling worker(s): num_workers=8 schedulers=[batch, system, service, _core]
    2021-08-08T12:58:41.631+0900 [INFO]  nomad: adding server: server="147ddac625fa.ant.amazon.com.global (Addr: 127.0.0.1:4647) (DC: dc1)"
    2021-08-08T12:58:43.417+0900 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
    2021-08-08T12:58:43.417+0900 [INFO]  nomad.raft: entering candidate state: node="Node at 127.0.0.1:4647 [Candidate]" term=2
    2021-08-08T12:58:43.576+0900 [INFO]  nomad.raft: election won: tally=1
    2021-08-08T12:58:43.576+0900 [INFO]  nomad.raft: entering leader state: leader="Node at 127.0.0.1:4647 [Leader]"
    2021-08-08T12:58:43.576+0900 [INFO]  nomad: cluster leadership acquired
    2021-08-08T12:58:43.831+0900 [INFO]  nomad.core: established cluster id: cluster_id=4d6f6e28-5286-cdc8-f4ca-37f9bc3627fd create_time=1628395123785336000
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:5641
            Bind Addrs: HTTP: 127.0.0.1:5641
                Client: true
             Log Level: INFO
                Region: global (DC: dc1)
                Server: false
               Version: 1.1.3

==> Nomad agent started! Log data will stream in below:

    2021-08-08T12:58:41.145+0900 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-1/plugins
    2021-08-08T12:58:41.147+0900 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.147+0900 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.147+0900 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.147+0900 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.147+0900 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.147+0900 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.147+0900 [INFO]  client: using state directory: state_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-1/client
    2021-08-08T12:58:41.415+0900 [INFO]  client: using alloc directory: alloc_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-1/alloc
    2021-08-08T12:58:47.652+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2021-08-08T12:58:47.652+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2021-08-08T12:58:47.652+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2021-08-08T12:58:47.800+0900 [INFO]  client: started client: node_id=5751917d-f7c6-2911-23fd-7f472e22a6cf
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:5647
            Bind Addrs: HTTP: 127.0.0.1:5647
                Client: true
             Log Level: INFO
                Region: global (DC: dc1)
                Server: false
               Version: 1.1.3

==> Nomad agent started! Log data will stream in below:

    2021-08-08T12:58:41.148+0900 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-3/plugins
    2021-08-08T12:58:41.151+0900 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.151+0900 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.151+0900 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.151+0900 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.151+0900 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.151+0900 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.192+0900 [INFO]  client: using state directory: state_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-3/client
    2021-08-08T12:58:41.460+0900 [INFO]  client: using alloc directory: alloc_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-3/alloc
    2021-08-08T12:58:47.710+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2021-08-08T12:58:47.710+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2021-08-08T12:58:47.710+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2021-08-08T12:58:47.847+0900 [INFO]  client: started client: node_id=a8035563-e361-e676-5f87-5234fcc111b8
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 127.0.0.1:5644
            Bind Addrs: HTTP: 127.0.0.1:5644
                Client: true
             Log Level: INFO
                Region: global (DC: dc1)
                Server: false
               Version: 1.1.3

==> Nomad agent started! Log data will stream in below:

    2021-08-08T12:58:41.173+0900 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-2/plugins
    2021-08-08T12:58:41.177+0900 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.177+0900 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.177+0900 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.177+0900 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.177+0900 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.177+0900 [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2021-08-08T12:58:41.212+0900 [INFO]  client: using state directory: state_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-2/client
    2021-08-08T12:58:41.460+0900 [INFO]  client: using alloc directory: alloc_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-2/alloc
    2021-08-08T12:58:47.709+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2021-08-08T12:58:47.709+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2021-08-08T12:58:47.709+0900 [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2021-08-08T12:58:47.846+0900 [INFO]  client: started client: node_id=a93c7901-6556-95eb-2bab-7c84eb92a7ea
    2021-08-08T12:58:47.951+0900 [INFO]  client: node registration complete
    2021-08-08T12:58:47.951+0900 [INFO]  client: node registration complete
    2021-08-08T12:58:47.951+0900 [INFO]  client: node registration complete
    2021-08-08T12:58:56.499+0900 [INFO]  client: node registration complete
    2021-08-08T12:58:56.579+0900 [INFO]  client: node registration complete
    2021-08-08T12:58:57.336+0900 [INFO]  client: node registration complete

一方でjobを実行します。

$ nomad job run example.nomad
==> 2021-08-08T13:04:58+09:00: Monitoring evaluation "46e0ebb0"
    2021-08-08T13:04:58+09:00: Evaluation triggered by job "example"
==> 2021-08-08T13:04:59+09:00: Monitoring evaluation "46e0ebb0"
    2021-08-08T13:04:59+09:00: Evaluation within deployment: "3485031c"
    2021-08-08T13:04:59+09:00: Allocation "84e0861d" created: node "a93c7901", group "cache"
    2021-08-08T13:04:59+09:00: Allocation "c2c930dd" created: node "5751917d", group "cache"
    2021-08-08T13:04:59+09:00: Allocation "caf0cdbd" created: node "a8035563", group "cache"
    2021-08-08T13:04:59+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T13:04:59+09:00: Evaluation "46e0ebb0" finished with status "complete"
==> 2021-08-08T13:04:59+09:00: Monitoring deployment "3485031c"
  ✓ Deployment "3485031c" successful
    
    2021-08-08T13:05:12+09:00
    ID          = 3485031c
    Job ID      = example
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    cache       3        3       3        0          2021-08-08T13:15:11+09:00

コンテナが起動していることが確認できます。

$ docker ps
CONTAINER ID   IMAGE       COMMAND                  CREATED         STATUS         PORTS                                                  NAMES
da4aa44e3c50   redis:3.2   "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes   127.0.0.1:55002->6379/tcp, 127.0.0.1:55002->6379/udp   redis-caf0cdbd-ab28-2913-c13c-ed808641fe75
0c011a5f5868   redis:3.2   "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes   127.0.0.1:55000->6379/tcp, 127.0.0.1:55000->6379/udp   redis-84e0861d-759c-9597-c640-0426f921d7aa
095b95cbad84   redis:3.2   "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes   127.0.0.1:55001->6379/tcp, 127.0.0.1:55001->6379/udp   redis-c2c930dd-a52a-93c2-1dfe-7f2795a3d57b

ポートマッピングも確認できます。

$ docker port da4aa44e3c50
6379/tcp -> 127.0.0.1:55002
6379/udp -> 127.0.0.1:55002
$ docker port 0c011a5f5868
6379/tcp -> 127.0.0.1:55000
6379/udp -> 127.0.0.1:55000
$ docker port 095b95cbad84
6379/tcp -> 127.0.0.1:55001
6379/udp -> 127.0.0.1:55001

終了するときは、devモードと同様にnomad job stop exampleを実行します。

nomad job stop example
==> 2021-08-08T13:46:55+09:00: Monitoring evaluation "db4fe79c"
    2021-08-08T13:46:55+09:00: Evaluation triggered by job "example"
    2021-08-08T13:46:56.126+0900 [INFO]  client.driver_mgr.docker: stopped container: container_id=1df557a9edd3f2f8da7cda236ce6b5a7a5412a8ce82a62f29b00a0b1e1b17b2f driver=docker
    2021-08-08T13:46:56.170+0900 [INFO]  client.driver_mgr.docker: stopped container: container_id=e6a36f303a6bef946e7964a9f83604c34c3e2f9bf8f80bce67e0233dd8151634 driver=docker
    2021-08-08T13:46:56.273+0900 [INFO]  client.driver_mgr.docker: stopped container: container_id=22b0dbab2875d94c555b168b47aa94e635556b29f8bf2c010188ece92d011335 driver=docker
    2021-08-08T13:46:56.408+0900 [INFO]  client.gc: marking allocation for GC: alloc_id=84e0861d-759c-9597-c640-0426f921d7aa
==> 2021-08-08T13:46:56+09:00: Monitoring evaluation "db4fe79c"
    2021-08-08T13:46:56+09:00: Evaluation within deployment: "3485031c"
    2021-08-08T13:46:56+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T13:46:56+09:00: Evaluation "db4fe79c" finished with status "complete"
==> 2021-08-08T13:46:56+09:00: Monitoring deployment "3485031c"
  ✓ Deployment "3485031c" successful
    
    2021-08-08T13:46:56+09:00
    ID          = 3485031c
    Job ID      = example
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    cache       3        3       3        0          2021-08-08T13:15:11+09:00
    2021-08-08T13:46:56.616+0900 [INFO]  client.gc: marking allocation for GC: alloc_id=c2c930dd-a52a-93c2-1dfe-7f2795a3d57b                              
:~/workspace/nomad-test/normal/ ==> 2021-08-08T13:46:56.887+0900 [INFO]  client.gc: marking allocation for GC: alloc_id=caf0cdbd-ab28-2913-c13c-ed808641fe75

デーモン状態で起動していますので、ps aux | grep nomadで対象プロセスを確認して、プロセスを終了するために、プロセスIDをスペース区切りで列挙して指定してkillします。

Errors

API error (500): toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

最終的に以下のようにfailedになった

$ nomad job run example.nomad
==> 2021-08-08T00:02:32+09:00: Monitoring evaluation "47ccd42c"
    2021-08-08T00:02:32+09:00: Evaluation triggered by job "example"
    2021-08-08T00:02:32+09:00: Allocation "1a366690" created: node "a5d3a114", group "cache"
==> 2021-08-08T00:02:33+09:00: Monitoring evaluation "47ccd42c"
    2021-08-08T00:02:33+09:00: Evaluation within deployment: "56393041"
    2021-08-08T00:02:33+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T00:02:33+09:00: Evaluation "47ccd42c" finished with status "complete"
==> 2021-08-08T00:02:33+09:00: Monitoring deployment "56393041"
  ! Deployment "56393041" failed
    
    2021-08-08T00:12:32+09:00
    ID          = 56393041
    Job ID      = example
    Job Version = 0
    Status      = failed
    Description = Failed due to progress deadline
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    cache       1        4       0        4          2021-08-08T00:12:32+09:00

エージェントを起動した方のログで以下の出力がされていた

2021-08-08T00:23:01.070+0900 [ERROR] client.driver_mgr.docker: failed pulling container: driver=docker image_ref=redis:3.2 error="API error (500): toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"
2021-08-08T00:23:01.070+0900 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=7ce8b091-7cdb-b6c8-3918-213b02150700 task=redis error="Failed to pull `redis:3.2`: API error (500): toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"

2020/11/20ごろにDockerHubが行った制約によるものの様子。 6時間で匿名ユーザーは100まで、FreeのDocker Hubユーザーは、200までpullできるようです。

docker login -u *** -p ***はしていなかったので、匿名ユーザー扱いだが、特に直近6時間でpullしていないため、理由が不明でした。

Understanding Docker Hub Rate Limiting

On November 20, 2020, rate limits anonymous and free authenticated use of Docker Hub went into effect. Anonymous and Free Docker Hub users are limited to 100 and 200 container image pull requests per six hours. You can read here for more detailed information.

が、より詳細にと書かれたリンク先にIPアドレスに基づいて制限されることが書かれていました。

Download rate limit

Unauthenticated (anonymous) users will have the limits enforced via IP.

Scaling Docker to Serve Millions More Developers: Network Egress

For anonymous (unauthenticated) users, pull rates are limited based on the individual IP address. 

これに気づいて、社内VPNで接続していたものを切断してみました。無事にデプロイすることができました。

nomad job run <.nomadファイル>を実行しても起動しない

Pattern 1: mac OSでmode = "bridge"を指定

NomodではDockerのドライバーを使用した際にbridgeモードが適用されますが、明示的に.nomadファイルにmodeをbridgeと指定すると、nomad job run <.nomadファイル>を実行してもin progressから進まなくなります。

Networking

Nomad uses bridged networking by default, like Docker.

理由は、NomadのネットワーキングモードのbridgeがLinux向けだからのようです。そのため、mode = "bridge"はmacOS上では記載してはいけないようです。

"missing network" constraint w/ bridge network #8684

Pattern 2: .nomadファイルでnetwork以下のportでポート番号をtoではなく、staticで指定して、複数のコンテナを起動

ホスト側でポート番号がバッティングするのでセットアップできないことが理由で、nomad job run <.nomadファイル>を実行してもin progressから進まなくなります。

References

My Twitter & RSS

Leave a Reply

Your email address will not be published. Required fields are marked *