Edit: I'm linking to the discussion on HN that has another 90 comments or so. More discussion.
I understand that it might sound heretical to claim that traditional colocation is still the more cost-effective approach to hosting, but let's look at the numbers.
tl;dr
Cloud computing isn't that "cheap" unless you're talking about tiny costs that early, early stage startups might face. Once you know you will require dedicated infrastructure it's very likely worth it to get your own hardware. You're employing technical people anyway; have some of them manage your infrastructure. You would have needed some people doing it with your cloud hosting anyway. Being able to throw stuff away is nice, but so is having authoritative control over your business operations and in-house accountability.
The actual numbers
Hypothetically, let's look at a scenario where we need 4 web servers, 50 app servers, and 2 reasonably high powered databases. For all intents and purposes, this is a small data center and the kind of configuration that cloud companies like Amazon claim they are designed to handle. It also mimics several of my clients' infrastructures, which are built on Rails, so it feels particularly apropos.
Secondly, let's suppose that each of these servers requires 8 cores to keep up with the workload. So all together we have 56 servers at 8 cores a piece. For the database, let's assume those are also 8 core machines but will require 68GB of RAM a piece plus about 10TB of storage. I think it would make more sense to actually add more RAM, but that's the largest amount you can get from Amazon right now on their Instance Types list.
We're going to talk about the total cost of ownership across a year for both options. I am also going to make the assumption that the duty cycle of activity is fairly uniform.
Note: If you're going to disagree with one of my assumptions it's this last one. I am perfectly aware that a uniform duty cycle is unheard of when it comes to web applications because we're saying that our traffic load is uniform across the day. I am, however, making this assumption because for all the companies I talk to that cite the ability to dynamically decommission hardware, none of them do. There seems to be a massive perceived risk in starting up and shutting down instances on the fly and whether or not the custom experience will suffer. If you are currently cloud hosted and you decommission hardware daily then congratulations: anecdotally you are extremely rare.
How many cores do we need?
54 * 8 = 432 cores
While it's totally overkill (and awesome), I put together a linear optimization model using glpk. If we optimize in server count then the fewest instances in Amazon that will meet this requirement are 54 high CPU extra large instances costing about $0.68 per hour. Further, let's assume 1 month is 720 hours. So for one year:
54 HCPUEL * 720hr * 12m * $0.68 = $317,260.80
That covers our application servers. For our databases we're going to need high-memory instances. The only one that offers 68GB of RAM is an "High-Memory Quadruple Extra Large Instance" which runs $2.00/hour.
2 HMQEL * 720hr * 12m * 2 = $34,560
Now, these 2 databases are going to need 10TB of storage apiece, so let's add in 10TB of storage. Checking the price sheet again: $0.10 per GB-month of provisioned storage.
$0.10 * 1024GB * 10TB * 2 servers * 12months = $24,576
We are currently up to $376,396.80 and we haven't talked about bandwidth or management. Since we have a uniform duty cycle (I can hear you screaming) let's suppose that all our web servers receive 1Mbps worth of traffic each. Not a ton, but certainly not trivial either. Well, at 1Mbps then every 8 seconds we're transferring a megabyte per machine. How much is that over a year?
(1Mbps * 54 servers * (60sec * 60min * 24hr * 30days)) / (8 * 1024) = 17,085.9375 GB/month
or
17,085GB * $0.15/GB * 12m = $21,754.69
There's one last variable to account for, and that's EBS IOP activity. You will be billed for every million IO requests you make to an EBS volume. With uniform duty cycle let's say we're looking at 300 read IOPS and 150 write IOPS per second. So 450 IOPs (once again, quite low).
(450IOPs * (60sec*60min*24hr*30days*12months) * 2 DB servers) * $0.10 / 1,000,000 = $2,799.36
Thus, our total cost for the year:
$400,950.85
What would it cost to colocate these servers ourselves? Let's look at some similar numbers. The boys (and girls) over at Dell offer a reasonably priced 8 core server for about $1901. Take a look for yourself over at Dell. I'm looking at the Dell R515 w/ 8 cores, 16GB of RAM, dual gigabit NICs, and redundant power.
Initial cost of servers:
54 servers * $1901 = $102,654
Let's add in 10% of the cost of servers in additional networking hardware just in case our colocation option doesn't offer any beyond an uplink port.
$10,265.40
Since the R515 is a 2U server we're going to need 108Us worth of rack space. A typical rack has 40Us of storage, but since we want room for power spikes and our network hardware let's say we need 4 racks. Further, let's say we want to host this hardware in a Tier IV data center. That's going to run about $1000/month per rack.
12m * $1000 * 4racks = $48,000
Depending on the data center they may give you a certain number of hours per month of "remote hands" where you can have techs handling your hardware, or you can colocate in your city and pay employees to interact with your hardware. If touching hardware is that abhorrent to you, you probably wouldn't have read this far anyway.
Let's talk bandwidth fees. Data centers don't usually bill like Amazon. Amazon charges per gigabyte, and most DCs charge at what's called the 95th percentile usage or Burstable Billing. So if 95% of the time your only using 54Mbps, then you're billed for a 54Mbps link. Let's choose a higher number for cost and get some premium, guaranteed bandwidth at $150 per Mbps (quite high).
54Mbps * $150 * 12m = $97,200
Lastly, let's choose some really sweet databases. We can pump 64GB of RAM in an R515 so let's use that for comparison bringing the cost to $3831 and we'll add 10TB of high-performance direct attached storage per server. I customized a Dell PowerVault MD1220 with 10TB, and SAS drives with a hardware buffered RAID controller for $12,200 apiece.
$3831 * 2servers + $12,200 * 2 servers = $32,062
Our servers, bandwidth, databases, and storage come out to a grand total of:
$290,181.40
That's $110,769.45 less than the cost of the equivalent Amazon service. So let's close out with a click commentary on how to compare these numbers.
- Note that these numbers are for the first year of operation. If you don't change anything in year 2 you will have an additional $400,950 for Amazon because it's a subscription. If you bought the hardware, it's yours. You'll just have the cost of colocation and bandwidth: $145,200. Have your accountant look into depreciating a capital asset and that number may come down further with tax advantages.
- I tried to achieve a certain level of "spec parity" between the two, but that isn't really fair. The hardware we're getting from Dell won't be virtualized (unless we add it), shared, and completely abstracted away. If you've ever had to deal with EBS volumes then you know the performance is questionable at best and you have zero visibility into it. We have direct attached storage connected and no contention with other customers. The same applies to our inner-network bandwidth: only our machines are linked on private switches.
- The advantage of cloud computing is you just "throw away" instances running poorly or misbehaving. You will have to manage and periodically fix broken hardware. Some of the hardware like the PowerEdge storage includes 3 years of 24x7x365 4 hour max response support contract. It's an opt-in for general servers but isn't ridiculously expensive. A glance at the AWS support options appears we'd need Platinum support for something comparable. That runs the greater of $15K a month 10% of monthly usage. That's $33,412 $15,000 a month.
- Never under-estimate the power of taking on responsibility for your entire stack down to the hardware. Yes, you will have more responsibility and you will need to employ people. You had to employ people with EC2 so you're just having the same people work on your own hardware. Need more bandwidth? Add switches. Need intrusion detection? Add it, you have control of the network. Want to share IPs for high-availability configurations? Add it, the switches are yours. Want to be PCI-compliant? Not a problem, your cabinets are yours and you can prove physical access restriction easily.
- You want to experiment with ideas and write code that can interface with your physical infrastructure? Then look at Amazon Virtual Private Cloud. Add cloud instances and link them in to your network. Throw them away when you're done and move them to dedicated hardware if they're doing something you plan on doing for the long-term.
- 37Signals manages their own hardware. "We were able to cut response times to about 1/3 of their previous levels even when handling over 20% more requests per minute." If it's cool enough for them, then it's cool enough for you: http://37signals.com/svn/posts/1819-basecamp-now-with-more-vroom
- What if we say we only keep 25 nodes up full-time and make 29 of them half-time to reflect a more likely duty cycle?
25 HCPUEL * 720hr * 12m * $0.68 + 29 HCPUEL * 360hr * 12m * $0.68 $232,070.4
- Yes, you will have to buildout your own data center and that will take some time. Most colocation facilities will be more than happy to help you with this. Yes, it will also take more time and you can't just click a button. The result though is most likely worth it. Look at most companies that maintain a tyrannical control over their products and integration and tell me they aren't generally perceived as market leaders.
- What about special Reserved Instances? The pricing options will help, but you won't come out ahead over colocation. Considering the example above used high-CPU extra larges then the cost of the reserved instance is $1820 + a "reduced" hourly rate. A mere $100 less than purchasing a comparable server outright. Plus, at the end of the year you'll have to make the investment again.