You’ve heard this before. The team has been working on this service and a couple months later traffic is picking up. Pretty awesome you think, customers are loving this feature! Hold on, now you hear finance people screaming at the Amazon bill. The application is considerably resource intensive. You got two options: 1) find the bottleneck and optimize, 2) limit cost of running the service. Let’s focus on the latter.
Welcome to Amazon Auto Scaling. If the fundamental premise of the cloud is “use only the resources you need for as long as you need”, then add “dynamic scaling based on traffic”. Bottom line: you save money when traffic is low. Enterprise SaaS is a great use case since customers are using your product during typical business hours, resulting in very low traffic at night. So what does it take to make the switch?
Stateless VS stateful application
Ideally you want to be dealing with a stateless application, where terminating one node won’t produce side effects on user experience. Typical stateless apps include frontend web servers or any app that doesn’t rely on keeping session state in memory.
Unfortunately not all software is created equal. Our use case is a video recorder for web based meetings. While the presenter is discussing slides, the recorder is watching the presentation unfolding real time.
Try to terminate one instance with active sessions and you’re impacting user experience. But there is a solution to which we’ll come back shortly.
One thing with dynamically terminating instances is that you can’t rely on SSH access any longer:
- Logs need to be forwarded to a remote host using Elasticsearch, Splunk or similar.
- Provisioning an instance is done in an automated fashion. We use Chef and Terraform.
- Not directly related to Auto Scaling here but proper deployment tooling is also a requirement. We use Jenkins pipelines and Chef.
Architecting the app around Auto Scaling
Our video recorder app went through a few changes. First you need to expose a health check endpoint that returns HTTP 200 if the app is in a good state. Amazon is continuously polling it and will replace unhealthy instances.
Then you need the app to properly report your chosen autoscaling metric to CloudWatch. For instance our metric is “number of running jobs”. A java thread can handle the task or even a cron job.
One more thing about stateful applications: we want to make sure we don’t disrupt running jobs during a scale down event. Amazon conveniently provides Lifecycle Hooks which allows to perform a custom action before terminating the instance. For instance: a decrease in traffic triggers a scale down event. The oldest instance (by creation time) is picked and moves to Termination:Wait state. Amazon sends a notification using SNS to check if the instance is ready to be terminated. The instance gets the notification and will give the green light for termination if there is no job running, otherwise it’ll respond with a heartbeat to keep waiting until all jobs are done. Which means your app needs to have a thread listening to SNS notifications.
Interestingly lifecycle hooks cannot be set up on Amazon web console, you'll have to use the CLI.
Amazon SDKs make it pretty straightforward to implement the above two items.