Nomad 導入
Nomadはオーケストレーションツールでオンプレやクラウド環境を超えてコンテナやコンテナではないアプリケーションのデプロイ管理ができます。
インストール
macOS Catalina Version 10.15.7で実行しています。
最初にHomeBrewでnomadをインストールします。
$ brew install nomad
インストールされたことを確認
$ nomad -v
Nomad v1.1.3
devモード
devモードで起動します。他のモードとして通常モードがあり、通常モードではサーバとクライアントを分けて起動することができたり、柔軟な設定が可能です。
起動
エージェントを起動。devモードでサーバーを起動します。
$ nomad agent -dev
==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:
Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
Bind Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
Client: true
Log Level: DEBUG
Region: global (DC: dc1)
Server: true
Version: 1.1.3
==> Nomad agent started! Log data will stream in below:
2021-08-08T00:29:53.496+0900 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
2021-08-08T00:29:53.497+0900 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
2021-08-08T00:29:53.497+0900 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2021-08-08T00:29:53.497+0900 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
2021-08-08T00:29:53.497+0900 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2021-08-08T00:29:53.497+0900 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2021-08-08T00:29:53.497+0900 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2021-08-08T00:29:53.497+0900 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2021-08-08T00:29:53.502+0900 [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:4647 Address:127.0.0.1:4647}]"
2021-08-08T00:29:53.502+0900 [INFO] nomad.raft: entering follower state: follower="Node at 127.0.0.1:4647 [Follower]" leader=
2021-08-08T00:29:53.503+0900 [INFO] nomad: serf: EventMemberJoin: 147ddac625fa.ant.amazon.com.global 127.0.0.1
2021-08-08T00:29:53.504+0900 [INFO] nomad: starting scheduling worker(s): num_workers=8 schedulers=[service, batch, system, _core]
2021-08-08T00:29:53.504+0900 [INFO] client: using state directory: state_dir=/private/var/folders/3x/2yhk0tsd2r7djkrt82cnhsv8gmbt28/T/NomadClient353248954
2021-08-08T00:29:53.505+0900 [INFO] nomad: adding server: server="147ddac625fa.ant.amazon.com.global (Addr: 127.0.0.1:4647) (DC: dc1)"
2021-08-08T00:29:53.507+0900 [INFO] client: using alloc directory: alloc_dir=/private/var/folders/3x/2yhk0tsd2r7djkrt82cnhsv8gmbt28/T/NomadClient116304081
2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=[arch, cni, consul, cpu, host, memory, network, nomad, signal, storage, vault, env_aws, env_gce, env_azure]
2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=/opt/cni/config
2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul period=15s
2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr.cpu: detected cpu frequency: MHz=1400
2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr.cpu: detected core count: cores=8
2021-08-08T00:29:53.559+0900 [DEBUG] client.fingerprint_mgr.cpu: detected reservable cores: cpuset=[]
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo0 IP=127.0.0.1
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo0 IP=::1
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.610+0900 [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: mbits=1000
2021-08-08T00:29:53.615+0900 [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault period=15s
2021-08-08T00:29:54.873+0900 [WARN] nomad.raft: heartbeat timeout reached, starting election: last-leader=
2021-08-08T00:29:54.874+0900 [INFO] nomad.raft: entering candidate state: node="Node at 127.0.0.1:4647 [Candidate]" term=2
2021-08-08T00:29:54.874+0900 [DEBUG] nomad.raft: votes: needed=1
2021-08-08T00:29:54.874+0900 [DEBUG] nomad.raft: vote granted: from=127.0.0.1:4647 term=2 tally=1
2021-08-08T00:29:54.874+0900 [INFO] nomad.raft: election won: tally=1
2021-08-08T00:29:54.874+0900 [INFO] nomad.raft: entering leader state: leader="Node at 127.0.0.1:4647 [Leader]"
2021-08-08T00:29:54.874+0900 [INFO] nomad: cluster leadership acquired
2021-08-08T00:29:54.879+0900 [INFO] nomad.core: established cluster id: cluster_id=695b378c-f623-61ae-a0d8-d3281d4e8367 create_time=1628350194878909000
2021-08-08T00:29:55.619+0900 [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type error="Get "http://169.254.169.254/computeMetadata/v1/instance/machine-type": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2021-08-08T00:29:55.619+0900 [DEBUG] client.fingerprint_mgr.env_gce: error querying GCE Metadata URL, skipping
2021-08-08T00:29:57.624+0900 [DEBUG] client.fingerprint_mgr.env_azure: could not read value for attribute: attribute=compute/azEnvironment error="Get "http://169.254.169.254/metadata/instance/compute/azEnvironment?api-version=2019-06-04&format=text": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
2021-08-08T00:29:59.629+0900 [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs=[arch, cpu, host, network, nomad, signal, storage]
2021-08-08T00:29:59.629+0900 [INFO] client.plugin: starting plugin manager: plugin-type=csi
2021-08-08T00:29:59.629+0900 [INFO] client.plugin: starting plugin manager: plugin-type=driver
2021-08-08T00:29:59.629+0900 [INFO] client.plugin: starting plugin manager: plugin-type=device
2021-08-08T00:29:59.629+0900 [DEBUG] client.device_mgr: exiting since there are no device plugins
2021-08-08T00:29:59.630+0900 [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
2021-08-08T00:29:59.630+0900 [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
2021-08-08T00:29:59.630+0900 [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver
2021-08-08T00:29:59.630+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=healthy description=Healthy
2021-08-08T00:29:59.630+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=mock_driver health=healthy description=Healthy
2021-08-08T00:29:59.630+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=undetected description="exec driver unsupported on client OS"
2021-08-08T00:29:59.632+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=
2021-08-08T00:29:59.633+0900 [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:4647] old_servers=[]
2021-08-08T00:29:59.687+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=healthy description=Healthy
2021-08-08T00:29:59.825+0900 [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=healthy description=Healthy
2021-08-08T00:29:59.825+0900 [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[raw_exec mock_driver docker java] undetected:[exec qemu]]"
2021-08-08T00:29:59.825+0900 [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
2021-08-08T00:29:59.825+0900 [INFO] client: started client: node_id=7cd596ff-0354-861c-9fd0-2467e2f2237a
2021-08-08T00:29:59.827+0900 [DEBUG] client: updated allocations: index=1 total=0 pulled=0 filtered=0
2021-08-08T00:29:59.829+0900 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0
2021-08-08T00:29:59.829+0900 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
2021-08-08T00:29:59.830+0900 [INFO] client: node registration complete
2021-08-08T00:29:59.831+0900 [DEBUG] client: state updated: node_status=ready
2021-08-08T00:30:00.830+0900 [DEBUG] client: state changed, updating node and re-registering
2021-08-08T00:30:00.831+0900 [INFO] client: node registration complete
別のターミナルで作業
$ mkdir nomadtest && cd $_
$ nomad job init
$ nomad job init
Example job file written to example.nomad
example.nomadという名前でコメントを省略したものについて以下のようなファイルが作成されます。nomad job init
実行時に-short
オプションをつけるとコメント無しのJobファイルが作成されます。
job "example" {
datacenters = ["dc1"]
group "cache" {
network {
port "db" {
to = 6379
}
}
task "redis" {
driver = "docker"
config {
image = "redis:3.2"
ports = ["db"]
}
resources {
cpu = 500
memory = 256
}
}
}
}
固定ポートを使用する場合は、network
以下でポート番号の指定をしている部分をto
からstatic
に変更することで可能です。ただし、基本的にジョブの種類がsystem
かロードバランサーのような特殊なジョブのときのみに使用することが推奨されています。
static (int: nil) - Specifies the static TCP/UDP port to allocate. If omitted, a dynamic port is chosen. We do not recommend using static ports, except for system or specialized jobs like load balancers.
to (string:nil) - Applicable when using "bridge" mode to configure port to map to inside the task's network namespace. Omitting this field or setting it to -1 sets the mapped port equal to the dynamic port allocated by the scheduler. The NOMAD_PORT_<label> environment variable will contain the to value.
Deploymentがin progressと表示されて、しばらくするとsuccessfulになります。デプロイが正常に完了しない場合は、failedになります。
$ nomad job run example.nomad
==> 2021-08-08T00:30:07+09:00: Monitoring evaluation "2f88c6c1"
2021-08-08T00:30:07+09:00: Evaluation triggered by job "example"
==> 2021-08-08T00:30:08+09:00: Monitoring evaluation "2f88c6c1"
2021-08-08T00:30:08+09:00: Evaluation within deployment: "0aa67e40"
2021-08-08T00:30:08+09:00: Allocation "38353085" created: node "7cd596ff", group "cache"
2021-08-08T00:30:08+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T00:30:08+09:00: Evaluation "2f88c6c1" finished with status "complete"
==> 2021-08-08T00:30:08+09:00: Monitoring deployment "0aa67e40"
✓ Deployment "0aa67e40" successful
2021-08-08T00:30:36+09:00
ID = 0aa67e40
Job ID = example
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 1 1 1 0 2021-08-08T00:40:35+09:00
Play
Jobの一覧を確認。ここでは、example.nomadファイル中で定義したjobのexampleが確認できます。
$ nomad job status
ID Type Priority Status Submit Date
example service 50 running 2021-08-08T00:30:07+09:00
job名であるexampleを引数に指定するとより詳細を確認できます。AllocationsのIDを確認からAllocations IDを確認できます。これはnomad job run example.nomad
実行時に、ログ出力でAllocation
の行に記載されるIDと同じものになります。
$ nomad job status example
ID = example
Name = example
Submit Date = 2021-08-08T00:30:07+09:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
cache 0 0 1 0 0 0
Latest Deployment
ID = 0aa67e40
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 1 1 1 0 2021-08-08T00:40:35+09:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
38353085 7cd596ff cache 0 run running 8m13s ago 7m45s ago
確認したAllocation IDを引数に以下のコマンドを実行すると、リソース割当ての詳細を確認できます。Task Resources
のAddress
にIPアドレスおよびポート番号が確認できないIssueがあるようです。コンテナにアクセスするための代替案については後ほど言及します。
[feature] Expose contents of DriverNetwork via API #3285
$ nomad alloc status 38353085
ID = 38353085-0a43-cdcd-6c07-2697b5dfbf83
Eval ID = 2f88c6c1
Name = example.cache[0]
Node ID = 7cd596ff
Node Name = 147ddac625fa.ant.amazon.com
Job ID = example
Job Version = 0
Client Status = running
Client Description = Tasks are running
Desired Status = run
Desired Description = <none>
Created = 8m33s ago
Modified = 8m5s ago
Deployment ID = 0aa67e40
Deployment Health = healthy
Allocation Addresses
Label Dynamic Address
*db yes 127.0.0.1:30624 -> 6379
Task "redis" is "running"
Task Resources
CPU Memory Disk Addresses
16/500 MHz 748 KiB/256 MiB 300 MiB
Task Events:
Started At = 2021-08-07T15:30:25Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2021-08-08T00:30:25+09:00 Started Task started by client
2021-08-08T00:30:07+09:00 Driver Downloading image
2021-08-08T00:30:07+09:00 Task Setup Building Task Directory
2021-08-08T00:30:07+09:00 Received Task received by client
同じくAllocation IDを指定してログを確認できます。追加の引数でnomad alloc logs 38353085 redis
のようにnomadファイルで指定したtask名を指定することもできます。
$ nomad alloc logs 38353085
1:C 07 Aug 15:30:25.450 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 3.2.12 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1:M 07 Aug 15:30:25.451 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 07 Aug 15:30:25.451 # Server started, Redis version 3.2.12
1:M 07 Aug 15:30:25.451 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 07 Aug 15:30:25.451 * The server is now ready to accept connections on port 6379
http://localhost:4646/
のURLにアクセスすることでブラウザ上でGUIを通して操作することもできます。
コンテナに接続
先程、nomad alloc status <ALLOCATION_ID>
を実行した際に、Task Resources
のAddress
にIPアドレスおよびポート番号が確認できませんでしたので、確認するための代替案について言及します。
まずは以下のコマンドでコンテナIDを確認します。
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c08ced4cdb8d redis:3.2 "docker-entrypoint.s…" 7 minutes ago Up 7 minutes 127.0.0.1:25343->6379/tcp, 127.0.0.1:25343->6379/udp redis-433ee0d0-b2a8-a617-6d8d-05a575c7523c
確認したコンテナID(c08ced4cdb8d
)もしくはコンテナ名(redis-433ee0d0-b2a8-a617-6d8d-05a575c7523c
)を指定して以下のように実行します。すると動的にマッピングされたポート番号が確認できます。
$ docker port c08ced4cdb8d
6379/tcp -> 127.0.0.1:25343
6379/udp -> 127.0.0.1:25343
以下の方法でも取得可能です。
$ docker inspect c08ced4cdb8d --format='{{ (index (index .NetworkSettings.Ports "6379/tcp") 0) }}'
map[HostIp:127.0.0.1 HostPort:25343]
- References
無事に接続できることが確認できます。
$ redis-cli -p 25343
127.0.0.1:25343> INFO
# Server
redis_version:3.2.12
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b0df607ad3315254
redis_mode:standalone
os:Linux 4.19.121-linuxkit x86_64
arch_bits:64
:
# Cluster
cluster_enabled:0
# Keyspace
設定ファイルの変更
ためしに.nomad
ファイルのgroup
以下のcount
を1から3に変更してみます。すると、現状との差分について以下のように確認できます。
$ nomad job plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 create, 1 in-place update)
+/- Count: "1" => "3" (forces create)
+/- Task: "redis" (forces in-place update)
Scheduler dry-run:
- WARNING: Failed to place all allocations.
Task Group "cache" (failed to place 2 allocations):
* Resources exhausted on 1 nodes
* Dimension "network: reserved port collision db=6379" exhausted on 1 nodes
Job Modify Index: 61
To submit the job with version verification run:
nomad job run -check-index 61 example.nomad
When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
先程と同様にnomad job run example.nomad
を実行すると、コンテナが3つ起動していることが確認できます。
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4a8c9a46dac1 redis:3.2 "docker-entrypoint.s…" 18 seconds ago Up 17 seconds 127.0.0.1:29006->6379/tcp, 127.0.0.1:29006->6379/udp redis-3965372e-177d-d03a-258b-a4c8c3a3cefc
e10c1dd8554d redis:3.2 "docker-entrypoint.s…" 30 seconds ago Up 29 seconds 127.0.0.1:23744->6379/tcp, 127.0.0.1:23744->6379/udp redis-30dbcc89-d4d4-9125-12e1-97d9ee81a578
5e7abae7621a redis:3.2 "docker-entrypoint.s…" 30 seconds ago Up 29 seconds 127.0.0.1:25631->6379/tcp, 127.0.0.1:25631->6379/udp redis-8d5be185-c241-c5f3-ed3f-cb4b0ccd68ca
それぞれのコンテナにはホスト側のネットワークインターフェイスにポート番号が割り振られていることも確認できます。逆に言うと、.nomad
ファイルでnetwork
以下のport
でポート番号をto
ではなく、static
で指定すると、ホスト側でポート番号がバッティングするのでセットアップできないことが理由で、nomad job run <.nomadファイル>
を実行してもin progress
から進まなくなります。
$ docker port 4a8c9a46dac1
6379/tcp -> 127.0.0.1:29006
6379/udp -> 127.0.0.1:29006
$ docker port e10c1dd8554d
6379/tcp -> 127.0.0.1:23744
6379/udp -> 127.0.0.1:23744
$ docker port 5e7abae7621a
6379/tcp -> 127.0.0.1:25631
6379/udp -> 127.0.0.1:25631
Stop
以下のようにジョブを終了できます。
$ nomad job stop example
==> 2021-08-08T12:51:53+09:00: Monitoring evaluation "8642a80f"
2021-08-08T12:51:53+09:00: Evaluation triggered by job "example"
==> 2021-08-08T12:51:54+09:00: Monitoring evaluation "8642a80f"
2021-08-08T12:51:54+09:00: Evaluation within deployment: "18860b6e"
2021-08-08T12:51:54+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T12:51:54+09:00: Evaluation "8642a80f" finished with status "complete"
==> 2021-08-08T12:51:54+09:00: Monitoring deployment "18860b6e"
✓ Deployment "18860b6e" successful
2021-08-08T12:51:54+09:00
ID = 18860b6e
Job ID = example
Job Version = 7
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 3 3 3 0 2021-08-08T12:49:00+09:00
ステータスは以下のようになります。
$ nomad job status
ID Type Priority Status Submit Date
example service 50 dead (stopped) 2021-08-08T12:38:37+09:00
通常モード
先程はdevモードで設定を行ないましたが、ここでは通常モードで設定を行ないます。通常モードではサーバとクライアントを分けて起動することができたり、様々な柔軟な設定を行うことができます。ここでは、サーバ1台、クライアント3台構成です。
HashiCorp Nomad Workshopを参考にさせていただきました。
$ MY_PATH=$(pwd)
Nomadサーバ用の.hclファイルを作成します。
$ cat << EOF > nomad-local-config-server.hcl
data_dir = "${MY_PATH}/local-nomad-data"
bind_addr = "127.0.0.1"
server {
enabled = true
bootstrap_expect = 1
}
advertise {
http = "127.0.0.1"
rpc = "127.0.0.1"
serf = "127.0.0.1"
}
EOF
Nomadクライアント用の.hclファイルを作成します。
$ cat << EOF > nomad-local-config-client-1.hcl
data_dir = "${MY_PATH}/local-cluster-data-1"
bind_addr = "127.0.0.1"
client {
enabled = true
servers = ["127.0.0.1:4647"]
}
advertise {
http = "127.0.0.1"
rpc = "127.0.0.1"
serf = "127.0.0.1"
}
ports {
http = 5641
rpc = 5642
serf = 5643
}
EOF
残り2つのクライアントも同様に設定します。
$ cat << EOF > nomad-local-config-client-2.hcl
data_dir = "${MY_PATH}/local-cluster-data-2"
bind_addr = "127.0.0.1"
client {
enabled = true
servers = ["127.0.0.1:4647"]
}
advertise {
http = "127.0.0.1"
rpc = "127.0.0.1"
serf = "127.0.0.1"
}
ports {
http = 5644
rpc = 5645
serf = 5646
}
EOF
$ cat << EOF > nomad-local-config-client-3.hcl
data_dir = "${MY_PATH}/local-cluster-data-3"
bind_addr = "127.0.0.1"
client {
enabled = true
servers = ["127.0.0.1:4647"]
}
advertise {
http = "127.0.0.1"
rpc = "127.0.0.1"
serf = "127.0.0.1"
}
ports {
http = 5647
rpc = 5648
serf = 5649
}
EOF
起動用にスクリプトを作成します。
$ cat << EOF > run.sh
#!/bin/sh
pkill nomad
pkill java
sleep 10
nomad agent -config=${MY_PATH}/nomad-local-config-server.hcl &
nomad agent -config=${MY_PATH}/nomad-local-config-client-1.hcl &
nomad agent -config=${MY_PATH}/nomad-local-config-client-2.hcl &
nomad agent -config=${MY_PATH}/nomad-local-config-client-3.hcl &
EOF
作成した起動用スクリプトを実行します。
$ chmod +x run.sh
$ ./run.sh
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-server.hcl
==> Starting Nomad agent...
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-client-1.hcl
==> Starting Nomad agent...
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-client-3.hcl
==> Starting Nomad agent...
==> Loaded configuration from /Users/hayshogo/workspace/nomad-test/normal/nomad-local-config-client-2.hcl
==> Starting Nomad agent...
:~/workspace/nomad-test/normal/ ==> Nomad agent configuration:
Advertise Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
Bind Addrs: HTTP: 127.0.0.1:4646; RPC: 127.0.0.1:4647; Serf: 127.0.0.1:4648
Client: false
Log Level: INFO
Region: global (DC: dc1)
Server: true
Version: 1.1.3
==> Nomad agent started! Log data will stream in below:
2021-08-08T12:58:41.139+0900 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-nomad-data/plugins
2021-08-08T12:58:41.141+0900 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.141+0900 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.141+0900 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2021-08-08T12:58:41.141+0900 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2021-08-08T12:58:41.141+0900 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2021-08-08T12:58:41.141+0900 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
2021-08-08T12:58:41.628+0900 [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:4647 Address:127.0.0.1:4647}]"
2021-08-08T12:58:41.628+0900 [INFO] nomad.raft: entering follower state: follower="Node at 127.0.0.1:4647 [Follower]" leader=
2021-08-08T12:58:41.631+0900 [INFO] nomad: serf: EventMemberJoin: 147ddac625fa.ant.amazon.com.global 127.0.0.1
2021-08-08T12:58:41.631+0900 [INFO] nomad: starting scheduling worker(s): num_workers=8 schedulers=[batch, system, service, _core]
2021-08-08T12:58:41.631+0900 [INFO] nomad: adding server: server="147ddac625fa.ant.amazon.com.global (Addr: 127.0.0.1:4647) (DC: dc1)"
2021-08-08T12:58:43.417+0900 [WARN] nomad.raft: heartbeat timeout reached, starting election: last-leader=
2021-08-08T12:58:43.417+0900 [INFO] nomad.raft: entering candidate state: node="Node at 127.0.0.1:4647 [Candidate]" term=2
2021-08-08T12:58:43.576+0900 [INFO] nomad.raft: election won: tally=1
2021-08-08T12:58:43.576+0900 [INFO] nomad.raft: entering leader state: leader="Node at 127.0.0.1:4647 [Leader]"
2021-08-08T12:58:43.576+0900 [INFO] nomad: cluster leadership acquired
2021-08-08T12:58:43.831+0900 [INFO] nomad.core: established cluster id: cluster_id=4d6f6e28-5286-cdc8-f4ca-37f9bc3627fd create_time=1628395123785336000
==> Nomad agent configuration:
Advertise Addrs: HTTP: 127.0.0.1:5641
Bind Addrs: HTTP: 127.0.0.1:5641
Client: true
Log Level: INFO
Region: global (DC: dc1)
Server: false
Version: 1.1.3
==> Nomad agent started! Log data will stream in below:
2021-08-08T12:58:41.145+0900 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-1/plugins
2021-08-08T12:58:41.147+0900 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.147+0900 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.147+0900 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2021-08-08T12:58:41.147+0900 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2021-08-08T12:58:41.147+0900 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2021-08-08T12:58:41.147+0900 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
2021-08-08T12:58:41.147+0900 [INFO] client: using state directory: state_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-1/client
2021-08-08T12:58:41.415+0900 [INFO] client: using alloc directory: alloc_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-1/alloc
2021-08-08T12:58:47.652+0900 [INFO] client.plugin: starting plugin manager: plugin-type=csi
2021-08-08T12:58:47.652+0900 [INFO] client.plugin: starting plugin manager: plugin-type=driver
2021-08-08T12:58:47.652+0900 [INFO] client.plugin: starting plugin manager: plugin-type=device
2021-08-08T12:58:47.800+0900 [INFO] client: started client: node_id=5751917d-f7c6-2911-23fd-7f472e22a6cf
==> Nomad agent configuration:
Advertise Addrs: HTTP: 127.0.0.1:5647
Bind Addrs: HTTP: 127.0.0.1:5647
Client: true
Log Level: INFO
Region: global (DC: dc1)
Server: false
Version: 1.1.3
==> Nomad agent started! Log data will stream in below:
2021-08-08T12:58:41.148+0900 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-3/plugins
2021-08-08T12:58:41.151+0900 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2021-08-08T12:58:41.151+0900 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2021-08-08T12:58:41.151+0900 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
2021-08-08T12:58:41.151+0900 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.151+0900 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.151+0900 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2021-08-08T12:58:41.192+0900 [INFO] client: using state directory: state_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-3/client
2021-08-08T12:58:41.460+0900 [INFO] client: using alloc directory: alloc_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-3/alloc
2021-08-08T12:58:47.710+0900 [INFO] client.plugin: starting plugin manager: plugin-type=csi
2021-08-08T12:58:47.710+0900 [INFO] client.plugin: starting plugin manager: plugin-type=driver
2021-08-08T12:58:47.710+0900 [INFO] client.plugin: starting plugin manager: plugin-type=device
2021-08-08T12:58:47.847+0900 [INFO] client: started client: node_id=a8035563-e361-e676-5f87-5234fcc111b8
==> Nomad agent configuration:
Advertise Addrs: HTTP: 127.0.0.1:5644
Bind Addrs: HTTP: 127.0.0.1:5644
Client: true
Log Level: INFO
Region: global (DC: dc1)
Server: false
Version: 1.1.3
==> Nomad agent started! Log data will stream in below:
2021-08-08T12:58:41.173+0900 [WARN] agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-2/plugins
2021-08-08T12:58:41.177+0900 [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.177+0900 [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0
2021-08-08T12:58:41.177+0900 [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
2021-08-08T12:58:41.177+0900 [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0
2021-08-08T12:58:41.177+0900 [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0
2021-08-08T12:58:41.177+0900 [INFO] agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
2021-08-08T12:58:41.212+0900 [INFO] client: using state directory: state_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-2/client
2021-08-08T12:58:41.460+0900 [INFO] client: using alloc directory: alloc_dir=/Users/hayshogo/workspace/nomad-test/normal/local-cluster-data-2/alloc
2021-08-08T12:58:47.709+0900 [INFO] client.plugin: starting plugin manager: plugin-type=csi
2021-08-08T12:58:47.709+0900 [INFO] client.plugin: starting plugin manager: plugin-type=driver
2021-08-08T12:58:47.709+0900 [INFO] client.plugin: starting plugin manager: plugin-type=device
2021-08-08T12:58:47.846+0900 [INFO] client: started client: node_id=a93c7901-6556-95eb-2bab-7c84eb92a7ea
2021-08-08T12:58:47.951+0900 [INFO] client: node registration complete
2021-08-08T12:58:47.951+0900 [INFO] client: node registration complete
2021-08-08T12:58:47.951+0900 [INFO] client: node registration complete
2021-08-08T12:58:56.499+0900 [INFO] client: node registration complete
2021-08-08T12:58:56.579+0900 [INFO] client: node registration complete
2021-08-08T12:58:57.336+0900 [INFO] client: node registration complete
一方でjobを実行します。
$ nomad job run example.nomad
==> 2021-08-08T13:04:58+09:00: Monitoring evaluation "46e0ebb0"
2021-08-08T13:04:58+09:00: Evaluation triggered by job "example"
==> 2021-08-08T13:04:59+09:00: Monitoring evaluation "46e0ebb0"
2021-08-08T13:04:59+09:00: Evaluation within deployment: "3485031c"
2021-08-08T13:04:59+09:00: Allocation "84e0861d" created: node "a93c7901", group "cache"
2021-08-08T13:04:59+09:00: Allocation "c2c930dd" created: node "5751917d", group "cache"
2021-08-08T13:04:59+09:00: Allocation "caf0cdbd" created: node "a8035563", group "cache"
2021-08-08T13:04:59+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T13:04:59+09:00: Evaluation "46e0ebb0" finished with status "complete"
==> 2021-08-08T13:04:59+09:00: Monitoring deployment "3485031c"
✓ Deployment "3485031c" successful
2021-08-08T13:05:12+09:00
ID = 3485031c
Job ID = example
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 3 3 3 0 2021-08-08T13:15:11+09:00
コンテナが起動していることが確認できます。
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
da4aa44e3c50 redis:3.2 "docker-entrypoint.s…" 4 minutes ago Up 4 minutes 127.0.0.1:55002->6379/tcp, 127.0.0.1:55002->6379/udp redis-caf0cdbd-ab28-2913-c13c-ed808641fe75
0c011a5f5868 redis:3.2 "docker-entrypoint.s…" 4 minutes ago Up 4 minutes 127.0.0.1:55000->6379/tcp, 127.0.0.1:55000->6379/udp redis-84e0861d-759c-9597-c640-0426f921d7aa
095b95cbad84 redis:3.2 "docker-entrypoint.s…" 4 minutes ago Up 4 minutes 127.0.0.1:55001->6379/tcp, 127.0.0.1:55001->6379/udp redis-c2c930dd-a52a-93c2-1dfe-7f2795a3d57b
ポートマッピングも確認できます。
$ docker port da4aa44e3c50
6379/tcp -> 127.0.0.1:55002
6379/udp -> 127.0.0.1:55002
$ docker port 0c011a5f5868
6379/tcp -> 127.0.0.1:55000
6379/udp -> 127.0.0.1:55000
$ docker port 095b95cbad84
6379/tcp -> 127.0.0.1:55001
6379/udp -> 127.0.0.1:55001
終了するときは、devモードと同様にnomad job stop example
を実行します。
nomad job stop example
==> 2021-08-08T13:46:55+09:00: Monitoring evaluation "db4fe79c"
2021-08-08T13:46:55+09:00: Evaluation triggered by job "example"
2021-08-08T13:46:56.126+0900 [INFO] client.driver_mgr.docker: stopped container: container_id=1df557a9edd3f2f8da7cda236ce6b5a7a5412a8ce82a62f29b00a0b1e1b17b2f driver=docker
2021-08-08T13:46:56.170+0900 [INFO] client.driver_mgr.docker: stopped container: container_id=e6a36f303a6bef946e7964a9f83604c34c3e2f9bf8f80bce67e0233dd8151634 driver=docker
2021-08-08T13:46:56.273+0900 [INFO] client.driver_mgr.docker: stopped container: container_id=22b0dbab2875d94c555b168b47aa94e635556b29f8bf2c010188ece92d011335 driver=docker
2021-08-08T13:46:56.408+0900 [INFO] client.gc: marking allocation for GC: alloc_id=84e0861d-759c-9597-c640-0426f921d7aa
==> 2021-08-08T13:46:56+09:00: Monitoring evaluation "db4fe79c"
2021-08-08T13:46:56+09:00: Evaluation within deployment: "3485031c"
2021-08-08T13:46:56+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T13:46:56+09:00: Evaluation "db4fe79c" finished with status "complete"
==> 2021-08-08T13:46:56+09:00: Monitoring deployment "3485031c"
✓ Deployment "3485031c" successful
2021-08-08T13:46:56+09:00
ID = 3485031c
Job ID = example
Job Version = 0
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 3 3 3 0 2021-08-08T13:15:11+09:00
2021-08-08T13:46:56.616+0900 [INFO] client.gc: marking allocation for GC: alloc_id=c2c930dd-a52a-93c2-1dfe-7f2795a3d57b
:~/workspace/nomad-test/normal/ ==> 2021-08-08T13:46:56.887+0900 [INFO] client.gc: marking allocation for GC: alloc_id=caf0cdbd-ab28-2913-c13c-ed808641fe75
デーモン状態で起動していますので、ps aux | grep nomad
で対象プロセスを確認して、プロセスを終了するために、プロセスIDをスペース区切りで列挙して指定してkillします。
Errors
API error (500): toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
最終的に以下のようにfailedになった
$ nomad job run example.nomad
==> 2021-08-08T00:02:32+09:00: Monitoring evaluation "47ccd42c"
2021-08-08T00:02:32+09:00: Evaluation triggered by job "example"
2021-08-08T00:02:32+09:00: Allocation "1a366690" created: node "a5d3a114", group "cache"
==> 2021-08-08T00:02:33+09:00: Monitoring evaluation "47ccd42c"
2021-08-08T00:02:33+09:00: Evaluation within deployment: "56393041"
2021-08-08T00:02:33+09:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-08T00:02:33+09:00: Evaluation "47ccd42c" finished with status "complete"
==> 2021-08-08T00:02:33+09:00: Monitoring deployment "56393041"
! Deployment "56393041" failed
2021-08-08T00:12:32+09:00
ID = 56393041
Job ID = example
Job Version = 0
Status = failed
Description = Failed due to progress deadline
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
cache 1 4 0 4 2021-08-08T00:12:32+09:00
エージェントを起動した方のログで以下の出力がされていた
2021-08-08T00:23:01.070+0900 [ERROR] client.driver_mgr.docker: failed pulling container: driver=docker image_ref=redis:3.2 error="API error (500): toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"
2021-08-08T00:23:01.070+0900 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=7ce8b091-7cdb-b6c8-3918-213b02150700 task=redis error="Failed to pull `redis:3.2`: API error (500): toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"
2020/11/20ごろにDockerHubが行った制約によるものの様子。 6時間で匿名ユーザーは100まで、FreeのDocker Hubユーザーは、200までpullできるようです。
docker login -u *** -p ***
はしていなかったので、匿名ユーザー扱いだが、特に直近6時間でpullしていないため、理由が不明でした。
Understanding Docker Hub Rate Limiting
On November 20, 2020, rate limits anonymous and free authenticated use of Docker Hub went into effect. Anonymous and Free Docker Hub users are limited to 100 and 200 container image pull requests per six hours. You can read here for more detailed information.
が、より詳細にと書かれたリンク先にIPアドレスに基づいて制限されることが書かれていました。
Unauthenticated (anonymous) users will have the limits enforced via IP.
Scaling Docker to Serve Millions More Developers: Network Egress
For anonymous (unauthenticated) users, pull rates are limited based on the individual IP address.
これに気づいて、社内VPNで接続していたものを切断してみました。無事にデプロイすることができました。
nomad job run <.nomadファイル>
を実行しても起動しない
Pattern 1: mac OSでmode = "bridge"
を指定
NomodではDockerのドライバーを使用した際にbridgeモードが適用されますが、明示的に.nomadファイルにmodeをbridgeと指定すると、nomad job run <.nomadファイル>
を実行してもin progress
から進まなくなります。
Nomad uses bridged networking by default, like Docker.
理由は、NomadのネットワーキングモードのbridgeがLinux向けだからのようです。そのため、mode = "bridge"
はmacOS上では記載してはいけないようです。
"missing network" constraint w/ bridge network #8684
Pattern 2: .nomad
ファイルでnetwork
以下のport
でポート番号をto
ではなく、static
で指定して、複数のコンテナを起動
ホスト側でポート番号がバッティングするのでセットアップできないことが理由で、nomad job run <.nomadファイル>
を実行してもin progress
から進まなくなります。