ECS is aimed at making it easier to work with Docker containers, providing a clustering and orchestration layer for controlling the deployment of your containers onto hosts, and the subsequent management of the containers lifecycle within a cluster.
ECS is an alternative to tools such as Docker Swarm, Kubernetes or Mesos. It operates at the same layer, but is provided as a service. The difference is that whereas you need to setup and administer those tools yourself, ECS provides it for you ‘as a service’.
ECS is based on a proprietary clustering technology rather than leveraging another engine such as Docker Swarm, Kubernetes or Mesos. This is in contrast to Google’s Container Engine which is an equivalent to ECS but based on Kubernetes behind the scenes.
Why Do We Need Container Orchestration?
This orchestration layer for containers provided by ECS, Swarm or Kubernetes is an important piece in the puzzle of deploying and running container based applications.
Firstly, we need to cluster our containers for scalability. As our workloads grow, we need to add more containers and scale them out horizontally across servers to process more of the workload in parallel.
Secondly, we need to cluster containers for robustness and resilience. When a host or container fails, we want the container to be re-created, perhaps on another healthy host, so the system is not impacted.
Finally, tools in the orchestration layer provide an important function of abstracting developers away from underlying machines. In a containerized world, we shouldn’t need to care about individual hosts, only that our desired numbers of containers are up and running ‘somewhere appropriate’. Orchestration and clustering tools do this for us, allowing us to simply deploy the container to the cluster, and let the supporting software work out the optimal scheduling of containers onto hosts.
Designing robust and performant distributed clustering systems is notoriously difficult, so tools such as Kubernetes and Swarm give us that capability without having to build it ourselves. ECS takes this one step further by taking away the need to setup, run and administer the orchestration layer. For this reason, ECS is definitely something developers working on applications using containers in the cloud should be looking at closely.
EC2 Container Service Architecture
ECS uses tasks for scheduling containers on the container cluster similar to DC/OS. A task definition specifies the container image, port mappings (container ports, protocols, host ports), networking mode (bridge, host) and memory limits. Once a task definition is created tasks can be created either using the service scheduler, a custom scheduler or by manually running tasks. The service scheduler is used for long running applications and manual task creation can be used for batch jobs. If any business specific scheduling is needed a custom scheduler can be implemented. Consequently, a task would create a container on one of the container cluster hosts by pulling the container image from the given container registry and applying the port mappings, networking configuration, and resource limits.
Once a container is created the ECS service will use the health checks defined in the load balancer and auto recover the containers in unhealthy situations. Healthy and unhealthy conditions of the containers can be fine tuned according to the application requirements by changing the health check configuration.
In ECS CloudWatch alarms needs to be used for setting up autoscaling. Here AWS has utilized existing monitoring features for measuring the resource utilization and taking scaling up/down decisions. It also seems to support scaling the EC2 instances of the ECS cluster.
Currently, in ECS container ports are exposed using dynamic host port mappings and does not use an overlay network. As a result, each container port will have an ephemeral host port (between 49153 and 65535) exposed on the container host if the networking mode is set to bridge. If the host network mode is used the container port will be directly opened on the host and subsequently, only one such container will be able to run on a container host. Load balancing for above host ports can be done by creating an application load balancer and linking it to an ECS service. The load balancer will automatically update the listener ports based on the dynamic host ports provided via the service.
It might be important to note that due to this design, containers on different hosts might not be able to directly communicate with each other without discovering their corresponding host ports. The other solution would be to use the load balancer to route traffic if the relevant protocols support load balancing. Protocols such as JMS, AMQP, MQTT and Apache Thrift which use client-side load balancing might not work with a TCP load balancer and would need to discover the host ports dynamically.
Container Image Management
ECS supports pulling container images from both public and private container registries that are accessible from AWS. When accessing private registries Docker credentials can be provided via environment variables. ECS also provides a container registry service for managing container images within the same AWS network. This service would be useful for production deployments for avoiding any network issues that may arise when accessing external container registries.
AWS recommends setting up any deployment on AWS within a Virtual Private Cloud (VPC) for isolating its network from other deployments which might be running on the same infrastructure. The same may apply to ECS. The ECS instances may need to use a security group for restricting the ephemeral port range only to be accessed by the load balancer. This will prevent direct access to container hosts from any other hosts. If SSH is needed, a key pair can be given at the ECS cluster creation time and port 22 can be added to the security group when needed. For both security and reliability, it would be better to use ECS container registry and maintain all required container images within ECS.
Depending on the deployment architecture of the solution the load balancer security group might need to be configured to restrict inbound traffic from a specific network or open it to the internet. This design would ensure only the load balancer ports are accessible from the external networks.
Any container based deployment would need a centralized logging system for monitoring and troubleshooting issues as all users may not have direct access to container logs or container hosts. ECS provides a solution for this using CloudWatch Logs. At the moment it does not seem to provide advanced query features such as with Apache Lucene in Elasticsearch. Nevertheless, Amazon Elasticsearch Service or a dedicated Elasticsearch container deployment could be used as an alternative. Need to find more information on that.
Programmatic access through the API
Now that we have a key/value store, we can successfully coordinate the cluster and ensure that the desired number of containers is running because we have a reliable method to store and retrieve the state of the cluster. As mentioned earlier, we decoupled container scheduling from cluster management because we want customers to be able to take advantage of Amazon ECS’ state management capabilities. We have opened up the Amazon ECS cluster manager through a set of API actions that allow customers to access all the cluster state information stored in our key/value store in a structured manner.
Through ‘list’ commands, customers can retrieve the clusters under management, EC2 instances running in a specific cluster, running tasks, and the container configuration that make up the tasks (i.e., task definition). Through ‘describe’ commands, customers can retrieve details of specific EC2 instances and the resources available on each. Lastly, customers can start and stop tasks anywhere in the cluster. We recently ran a series of load tests on Amazon ECS, and we wanted to share some of the performance characteristics customers should expect when building applications on Amazon ECS.
The above graph shows a load test where we added and removed instances from an Amazon ECS cluster and measured the 50th and 99th percentile latencies of the API call ‘Describe Task’ over a seventy-two-hour period. As you can see, the latency remains relatively jitter-free despite large fluctuations in the cluster size. Amazon ECS is able to scale with you no matter how large your cluster size – all without you needing to operate or scale a cluster manager.
This set of API actions form the basis of solutions that customers can build on top of Amazon ECS. A scheduler just provides logic around how, when, and where to start and stop containers. Amazon ECS’ architecture is designed to share the state of the cluster and allow customers to run as many varieties of schedulers (e.g., bin packing, spread, etc) as needed for their applications. The architecture enables the schedulers to query the exact state of the cluster and allocate resources from a common pool. The optimistic concurrency control in place allows each scheduler to receive the resources it requested without the possibility of resource conflicts. Customers have already created a variety of interesting solutions on top of Amazon ECS and we want to share a few compelling examples