HAYASHIER.COM - Private Page
AWS CLI, AWS SDKのリトライ処理の実装について


AWS CLI, AWS SDKのリトライ処理について、API実行時は、エクスポネンシャルバックオフのアルゴリズムによるリトライ処理を実行しますが、その具体的な内容について追ってみました。


AWS CLI or Python SDK


https://github.com/boto/botocore/blob/1.16.1/botocore/data/_retry.json#L91-L113 ランダムな振れ幅を持ったエクスポネンシャルバックオフアルゴリズムで、5回のリトライ処理。

  "retry": {
    "__default__": {
      "max_attempts": 5,
      "delay": {
        "type": "exponential",
        "base": "rand",
        "growth_factor": 2
      "policies": {
          "general_socket_errors": {"$ref": "general_socket_errors"},
          "general_server_error": {"$ref": "general_server_error"},
          "bad_gateway": {"$ref": "bad_gateway"},
          "service_unavailable": {"$ref": "service_unavailable"},
          "gateway_timeout": {"$ref": "gateway_timeout"},
          "limit_exceeded": {"$ref": "limit_exceeded"},
          "throttling_exception": {"$ref": "throttling_exception"},
          "throttled_exception": {"$ref": "throttled_exception"},
          "request_throttled_exception": {"$ref": "request_throttled_exception"},
          "throttling": {"$ref": "throttling"},
          "too_many_requests": {"$ref": "too_many_requests"},
          "throughput_exceeded": {"$ref": "throughput_exceeded"}



client = boto3.client('ec2', region_name='us-west-2', config=boto3_config) client.meta.events._unique_id_handlers['retry-config-ec2']['handler']._checker.__dict__['_max_attempts'] = 20


https://github.com/boto/botocore/blob/develop/botocore/client.py create_clientから_load_service_modelを呼び出す

def create_client(self, service_name, region_name, is_secure=True,

                  endpoint_url=None, verify=None,
                credentials=None, scoped_config=None,
  service_model = self._load_service_model(service_name, api_version)
  cls = self._create_client_class(service_name, service_model)
  endpoint_bridge = ClientEndpointBridge(
      self._endpoint_resolver, scoped_config, client_config,
  client_args = self._get_client_args(
      service_model, region_name, is_secure, endpoint_url,
      verify, credentials, scoped_config, client_config, endpoint_bridge)
  service_client = cls(**client_args)
      service_client, endpoint_bridge, endpoint_url, client_config,
  return service_client


def _load_service_model(self, service_name, api_version=None):

    json_model = self._loader.load_service_model(service_name, 'service-2',
  service_model = ServiceModel(json_model, service_name=service_name)
  return service_model


def _register_retries(self, service_model):

    endpoint_prefix = service_model.endpoint_prefix

    # First, we load the entire retry config for all services,
  # then pull out just the information we need.
  original_config = self._loader.load_data('_retry')
  if not original_config:

    retry_config = self._retry_config_translator.build_retry_config(
      endpoint_prefix, original_config.get('retry', {}),
      original_config.get('definitions', {}))

    logger.debug("Registering retry handlers for service: %s",
  handler = self._retry_handler_factory.create_retry_handler(
      retry_config, endpoint_prefix)
  unique_id = 'retry-config-%s' % endpoint_prefix
  self._event_emitter.register('needs-retry.%s' % endpoint_prefix,
                               handler, unique_id=unique_id)



def load_data(self, name):

    """Load data given a data path.
  This is a low level method that will search through the various
  search paths until it's able to load a value.  This is typically
  only needed to load *non* model files (such as _endpoints and
  _retry).  If you need to load model files, you should prefer
  :type name: str
  :param name: The data path, i.e ``ec2/2015-03-01/service-2``.
  :return: The loaded data.  If no data could be found then
      a DataNotFoundError is raised.
  for possible_path in self._potential_locations(name):
      found = self.file_loader.load_file(possible_path)
      if found is not None:
          return found
  # We didn't find anything that matched on any path.
  raise DataNotFoundError(data_path=name)

リトライのの挙動は、create_retry_handler()関数で返している。 https://github.com/boto/botocore/blob/develop/botocore/retryhandler.py#L72-L77

def create_retry_handler(config, operation_name=None):
    checker = create_checker_from_retry_config(
        config, operation_name=operation_name)
    action = create_retry_action_from_config(
        config, operation_name=operation_name)
    return RetryHandler(checker=checker, action=action)
def create_retry_action_from_config(config, operation_name=None):
    # The spec has the possibility of supporting per policy
    # actions, but right now, we assume this comes from the
    # default section, which means that delay functions apply
    # for every policy in the retry config (per service).
    delay_config = config['__default__']['delay']
    if delay_config['type'] == 'exponential':
        return create_exponential_delay_function(

def create_exponential_delay_function(base, growth_factor):
    """Create an exponential delay function based on the attempts.
    This is used so that you only have to pass it the attempts
    parameter to calculate the delay.
    return functools.partial(
        delay_exponential, base=base, growth_factor=growth_factor)


デフォルトでは、rand を base としているため、base = random.random() で取得した base の値から、リトライの時間間隔を base * (growth_factor ** (attempts - 1)) で計算する。

def delay_exponential(base, growth_factor, attempts):
    """Calculate time to sleep based on exponential function.
    The format is::
        base * growth_factor ^ (attempts - 1)
    If ``base`` is set to 'rand' then a random number between
    0 and 1 will be used as the base.
    Base must be greater than 0, otherwise a ValueError will be
    if base == 'rand':
        base = random.random()
    elif base <= 0:
        raise ValueError("The 'base' param must be greater than 0, "
                         "got: %s" % base)
    time_to_sleep = base * (growth_factor ** (attempts - 1)) 
    return time_to_sleep

Java SDK

Java SDK デフォルトのClientConfigurationオブジェクトの内容について

デフォルトのリトライポリシーは、PredefinedRetryPolicies.DEFAULTで指定されている内容になる。  そのため、設定内容は以下のとおりとなる。

public class PredefinedRetryPolicies {

 /* SDK default */

    /** SDK default max retry count **/
 public static final int DEFAULT_MAX_ERROR_RETRY = 3;

  * SDK default retry policy (except for AmazonDynamoDBClient,
  * whose constructor will replace the DEFAULT with DYNAMODB_DEFAULT.)
  public static final RetryPolicy DEFAULT;

以下の箇所について、 DEFAULT_RETRY_CONDITIONには、リトライの判定条件に関するオブジェクトが格納されており、 DEFAULT_BACKOFF_STRATEGYには、リトライをどのように行うかを判定するクラスのオブジェクトが格納されている。


 * The SDK default retry condition, which checks for various conditions in
 * the following order:

**Never retry on requests with non-repeatable content; *
*Retry on client exceptions caused by IOException; *
*Retry on service exceptions that are either 500 internal server * errors, 503 service unavailable errors, service throttling errors or * clock skew errors. *


public static final RetryPolicy.RetryCondition DEFAULT_RETRY_CONDITION = new SDKDefaultRetryCondition();


 * The SDK default back-off strategy, which increases exponentially up to a max amount of delay. It also applies a larger
 * scale factor upon service throttling exception.

public static final RetryPolicy.BackoffStrategy DEFAULT_BACKOFF_STRATEGY =

        new PredefinedBackoffStrategies.SDKDefaultBackoffStrategy();

上記の設定を元に、getDefaultRetryPolicy()関数によって RetryPolicy クラスのオブジェクトを返す関数を定義する。


 * Returns the SDK default retry policy. This policy will honor the
 * maxErrorRetry set in ClientConfiguration.
 * @see ClientConfiguration#setMaxErrorRetry(int)

public static RetryPolicy getDefaultRetryPolicy() {

    return new RetryPolicy(DEFAULT_RETRY_CONDITION,


getDefaultRetryPolicy()関数は以下の箇所で呼び出されて、DEFAULT 変数に格納されている。ここで格納された RetryPolicy クラスの DEFAULT 変数は別のクラス等から参照され、リトライ処理の挙動を決定する。

    static {
     DEFAULT = getDefaultRetryPolicy();
     DYNAMODB_DEFAULT = getDynamoDBDefaultRetryPolicy();

ClientConfiguration クラスについて

ClientConfiguration クラスにつきましては、リトライ処理等のデフォルト設定を上書きして、クライアントサイドで値等をカスタマイズしていただけるクラスとなっている。



@NotThreadSafe public class ClientConfiguration {

    /** The default timeout for creating new connections. */
 public static final int DEFAULT_CONNECTION_TIMEOUT = 10 * 1000;

    /** The default timeout for reading from a connected socket. */
 public static final int DEFAULT_SOCKET_TIMEOUT = 50 * 1000;

  * The default timeout for a request. This is disabled by default.
 public static final int DEFAULT_REQUEST_TIMEOUT = 0;

  * The default timeout for a request. This is disabled by default.
 public static final int DEFAULT_CLIENT_EXECUTION_TIMEOUT = 0;

    /** The default max connection pool size. */
 public static final int DEFAULT_MAX_CONNECTIONS = 50;

以下の箇所で、先ほどの PredefinedRetryPolicies クラスの DEFAULT の値が代入され、retryPolicy の変数として格納されている。

public static final RetryPolicy DEFAULT_RETRY_POLICY = PredefinedRetryPolicies.DEFAULT;
/** The retry policy upon failed requests. **/ private RetryPolicy retryPolicy = DEFAULT_RETRY_POLICY;

クライアント側で、ClientConfiguration クラスの設定をいただいた場合、以下の箇所で設定が上書きされる。

public ClientConfiguration(ClientConfiguration other) {

    this.connectionTimeout = other.connectionTimeout;
 this.maxConnections = other.maxConnections;
 this.maxErrorRetry = other.maxErrorRetry;
 this.retryPolicy = other.retryPolicy;

以上から、ClientConfiguration クラスを設定していない場合におきましてもリトライ処理は実行される。

なお、リトライ処理の判定に関しては、PredefinedRetryPolicies クラスと同一ファイル上にある SDKDefaultRetryCondition クラスの shouldRetry() 関数で判定が行われている。


    public boolean shouldRetry(AmazonWebServiceRequest originalRequest,
                            AmazonClientException exception,
                            int retriesAttempted) {
     // Always retry on client exceptions caused by IOException
     if (exception.getCause() instanceof IOException) return true;

        // Only retry on a subset of service exceptions
     if (exception instanceof AmazonServiceException) {
         AmazonServiceException ase = (AmazonServiceException)exception;

          * For 500 internal server errors and 503 service
          * unavailable errors, we want to retry, but we need to use
          * an exponential back-off strategy so that we don't overload
          * a server with a flood of retries.
         if (RetryUtils.isRetryableServiceException(ase)) return true;

          * Throttling is reported as a 400 error from newer services. To try
          * and smooth out an occasional throttling error, we'll pause and
          * retry, hoping that the pause is long enough for the request to
          * get through the next time.
         if (RetryUtils.isThrottlingException(ase)) return true;

          * Clock skew exception. If it is then we will get the time offset
          * between the device time and the server time to set the clock skew
          * and then retry the request.
         if (RetryUtils.isClockSkewError(ase)) return true;

        return false;



以下で、デフォルトでどのようなアルゴリズムでリトライ処理が実装されるかが定義されている。 スロットリングしていないときは、Full Jitter Backoffで、スロットリングしているときは、Equal Jitter Backoffとなっている。



 * A private class that implements the default back-off strategy.

static class SDKDefaultBackoffStrategy extends V2CompatibleBackoffStrategyAdapter {

    private final BackoffStrategy fullJitterBackoffStrategy;
 private final BackoffStrategy equalJitterBackoffStrategy;

    SDKDefaultBackoffStrategy() {
     fullJitterBackoffStrategy = new PredefinedBackoffStrategies.FullJitterBackoffStrategy(
     equalJitterBackoffStrategy = new PredefinedBackoffStrategies.EqualJitterBackoffStrategy(

    SDKDefaultBackoffStrategy(final int baseDelay, final int throttledBaseDelay, final int maxBackoff) {
     fullJitterBackoffStrategy = new PredefinedBackoffStrategies.FullJitterBackoffStrategy(
             baseDelay, maxBackoff);
     equalJitterBackoffStrategy = new PredefinedBackoffStrategies.EqualJitterBackoffStrategy(
             throttledBaseDelay, maxBackoff);

 public long computeDelayBeforeNextRetry(RetryPolicyContext context) {
      * We use the full jitter scheme for non-throttled exceptions and the
      * equal jitter scheme for throttled exceptions.  This gives a preference
      * to quicker response and larger retry distribution for service errors
      * and guarantees a minimum delay for throttled exceptions.
     if (RetryUtils.isThrottlingException(context.exception())) {
         return equalJitterBackoffStrategy.computeDelayBeforeNextRetry(context);
     } else {
         return fullJitterBackoffStrategy.computeDelayBeforeNextRetry(context);


JavaScript SDK

JavaScript SDKでは

JavaScript SDKのリトライ処理は、lib/直下のservice.jsで基本的に定義されている。




  defaultRetryCount: 3,

ただし、DynamoDB に関しては、デフォルトで10回と定義されている。

  defaultRetryCount: 10,


   * How many times a failed request should be retried before giving up.
   * the defaultRetryCount can be overriden by service classes.
   * @api private
  numRetries: function numRetries() {
    if (this.config.maxRetries !== undefined) {
      return this.config.maxRetries;
    } else {
      return this.defaultRetryCount;



   * @api private
  retryDelays: function retryDelays(retryCount) {
    return AWS.util.calculateRetryDelay(retryCount, this.config.retryDelayOptions);



ConfigurationOptionsクラスという抽象クラスで retryDelayOptions のプロパティを持つ

     * Returns A set of options to configure the retry delay on retryable errors.
    retryDelayOptions?: RetryDelayOptions

RetryDelayOptionsはbaseというエクスポネンシャルバックオフのベースの時間(ミリ秒単位, デフォルト 100ms)とバックオフアルゴリズムをカスタマイズする場合はそちらを定義したクラスを定義できるようになっている。

export interface RetryDelayOptions {
     * The base number of milliseconds to use in the exponential backoff for operation retries.
     * Defaults to 100 ms.
    base?: number
     * A custom function that accepts a retry count and returns the amount of time to delay in milliseconds.
     * The base option will be ignored if this option is supplied.
    customBackoff?: (retryCount: number) => number


最大リトライ回数は、各サービスで定義しるものに準じる。デフォルト設定は0。 リトライの時間間隔は、calculateRetryDelay()関数で計算。

   * @api private
  handleRequestWithRetries: function handleRequestWithRetries(httpRequest, options, cb) {
    if (!options) options = {};
    var http = AWS.HttpClient.getInstance();
    var httpOptions = options.httpOptions || {};
    var retryCount = 0;

    var errCallback = function(err) {
      var maxRetries = options.maxRetries || 0;
      if (err && err.code === 'TimeoutError') err.retryable = true;
      if (err && err.retryable && retryCount < maxRetries) {
        var delay = util.calculateRetryDelay(retryCount, options.retryDelayOptions);
        setTimeout(sendRequest, delay + (err.retryAfter || 0));
      } else {


   * @api private
  calculateRetryDelay: function calculateRetryDelay(retryCount, retryDelayOptions) {
    if (!retryDelayOptions) retryDelayOptions = {};
    var customBackoff = retryDelayOptions.customBackoff || null;
    if (typeof customBackoff === 'function') {
      return customBackoff(retryCount);
    var base = typeof retryDelayOptions.base === 'number' ? retryDelayOptions.base : 100;
    var delay = Math.random() * (Math.pow(2, retryCount) * base);
    return delay;



            var err = util.error(new Error(),
              { retryable: statusCode >= 500 || statusCode === 429 }
 if (err && err.code === 'TimeoutError') err.retryable = true;

タイムアウトは、calculateRetryDelay()関数で計算したdelayの時間に、retry-after ヘッダーで定義された秒数を足したもの。

 setTimeout(sendRequest, delay + (err.retryAfter || 0));
 var retryAfter = parseInt(httpResponse.headers['retry-after'], 10) * 1000 || 0;


sleep = min(cap, base * 2 ** attempt)
sleep = random_between(0, min(cap, base * 2 ** attempt))
temp = min(cap, base * 2 ** attempt)
sleep = temp / 2 + random_between(0, temp / 2)