As some of you may have already known, we were off to somewhat of a rocky start when we launched this new website. For a recap: we ran into a few load issues right before our switchover deadline, so we decided to postpone the launch for nearly 22 hours before launching it again. We have been monitoring the servers very closely and working nonstop to squish all the bugs ever since. Now that the dust has settled, I'd like to talk to you guys about how we set up our servers on Amazon Web Services - the cloud.
This post may get a bit technical, so please to skip to the next one if you don't have any experience with system engineering.
When Patrick and and Lee told me they wanted to create a new website where we would get thousands of people sign up and use the Community daily, I knew our then-existing dedicated server setup wasn't going to cut it. Given that it'd need to scale based on the usage, the only thing that would fit bill would be AWS.
With AWS EC2, you can essentially set up your webserver farm so that it'd scale out when there's a lot of traffic, and scale back in when the traffic dies down (more on this later). AWS also offers other services like Virtual Private Cloud (VPC), Relational Database Service (RDS), Elastic Load Balancer (ELB), Route 53, ElastiCache, Simple Storage Service (S3), and Simple Email Service (SES) - all the things we ever need to run a website. Here's a quick rundown of what each service does:
- VPC: when you sign up for an AWS account and start using the service, Amazon automatically creates a VPC and assigns it to you, with your own subnet. Our subnet mask is 255.255.0.0/16 (yes, that's 65536 possible internal IPs). Basically, this is like being able to access all the computers in your home network or company. Pretty neat.
- RDS: we use MySQL for our website, so this is a must. With the help of AWS Security Groups, we can set this up so that only our EC2 instances (aka servers) can access our RDS instance. RDS is scalable. Right now, we're running on one m3.large instance.
- ELB: we're running a multi-webserver setup, so this is a must. Incoming internet traffic will be routed to one of the instances, Round-robin style. Each of you will be assigned a "sticky" cookie that would make you stick onto an instance for an hour from your last visit.
- Route 53: since AWS ELB can only be accessed using a CNAME, we have to use AWS Route 53 as our nameserver. Basically, our root DNS record (@) for our top level domain (fstoppers.com) has to point to a CNAME, which is illegal (unless of course we use Cloudflare). When using Route 53, Amazon has the ability to point our root record to an alias, which is a CNAME. The only downside with this is the time-to-live for the record very short, which isn't a very huge deal.
- ElastiCache: we use memcaches for our website - one for sessions, and one for content (by the way, if you sometimes get logged out when you come back to our website, it may be because we had to restart our session cache while pushing out new code). You simply can't run a website this big without cache. AWS ElastiCache is scalable. Right now,we're running on 1x m1.medium and 2x m1.small.
- SES: when you set up your webservers on AWS EC2, you have to keep in mind that these instances come and go, and so do their IP addresses. For any entity to send out emails as coming from a permitted sender, you have to set up your reverse DNS with your ISP so that when an email provider looks up, for example, email@example.com, fstoppers.com has to resolve to a static IP. Since our IP addresses are dynamic on EC2, that can't happen. SES is a solution for this case. SES is free when you send below 2000 emails a day, and those emails must come from EC2 instances. Just make sure you create your DKIM and SPF records!
- S3: all of existing images from the old server have been migrated to S3, and all the new ones that are uploaded by our writers and users will be saved on S3.
- EC2: basically, you can create and destroy any servers with only a few clicks on their console. With our current set up, we would work on our development environment. Once the code has been tested, we would then deploy it to our production server. The deploying process goes something like this: save our work, shutdown the instance, create Amazon Machine Image (AMI) based on the development instance, create new production instances based on the newly created AMI, add new production instances to ELB, remove existing instances. Right now we're running on 2 of m3.xlarge instances.
As you can see, things may get a bit tricky when these instances come and go. First, we have to assume that nothing will ever be saved on those instances. We may need direct access to get on those instances to inspect the logs at times, but images and other media files will have to stay on S3.
Also, since these instances only talk to the ELB, every traffic will look like they come from one source. We had to hack the web service to display the correct source.
AWS ELB also supports SSL, so all of our encrypted HTTPS traffic will be decrypted at the ELB, and then routed to the instances. We plan on rolling out HTTPS-everywhere in the near future. Right now, we're still focusing on fixing bugs that have been reported by our users in the Facebook group.
Please understand that there are a lot of moving parts to our setup, and the codebase is huge, so things may or may not work properly within the first few weeks/months. We're constantly working on fixing bugs and developing new features. If you have any questions or suggestions, drop us a comment below or post it in the Facebook group and we'll make sure to get to it as soon as we can.