… and lazy. Jeff Atwood suggests that in this age of expensive programmers and cheap hardware it is almost always appropriate to throw hardware at the problem. A programmer is roughly $90k per year and actually a lot more when you figure the fully weighted cost which would tack on an additional 30-50% depending on the kind of organization you worked in. A modest five person programming team is a massive pile of hardware. Unfortunately his conclusion is way off the mark and I will explain why.
Programing is exponential. Hardware is Linear.
Adding additional hardware is an absolutely linear function. Go from three to six servers your capacity roughly doubles. This isn’t actually the case since there will be some performance lost to the increased overhead. Performance issues are often inefficiencies that multiply with large numbers, be it users, records what have you. When n is small everything is fast. When n is large … well … in the words of of Atwood, “things start to go sideways”.

Above is a simple graph representing the number of servers required for a given n, say millions of sessions, with two dominate algorithms n^2 and n log n. Obviously this is a pretty extreme example. The n^2 algorithm quickly requires hundreds of servers more than the more efficient n log n algotrithm. Programming optimization are almost always exponential in nature and are rarely linear. The old optimizing paradigm of reducing resources through profiling or “algorithmic optimization” does not apply. That assumes the computer is the CPU. The computer is a stack of systems and rarely is the CPU the one that is bogged down. Even the old trade off of memory vs. disk vs. CPU is irrelevant when you are discussing hundreds of servers. Being able to throw hardware at the problem often means many stacks of systems interconnected. Performant code has a cumulative effect that is exponential in nature. Since inefficiencies cascade and grow through the system. The problem is we often bench mark performance at a given n and measure the improvement linearly. Thus we think of our improvement in linear terms. For some given user base this optimization has a 4x improvement. The calculus of doing all this optimization work or throwing a few more servers at it seems obvious. Should the user base double, this relative improvement could easily be 16x or more. If you are in a fast growing business you could easily see usage double in a very short period of time and your server requirements spiral out of control.
Capacity is more than CPU Cycles.
Yes. Every year and a half Intel or AMD manages on jamming twice as many transistors on a die. However applications often bog down at disk IO, memory utilization, network throughput or latency and the CPU is relatively idle. Idle that is except for managing its buffer right before crashing. Disk and memory are not benefiting from same law of performance economies as processors. Compare the server cost of a 16GB server at roughly $85 per GB ( Dec 2008 Newegg prices with Supermicro barebones 1U server ) with a 48GB server at $165 per GB ( Dec 2008 Newegg prices with Supermicro barebones 1U server ). So to 3x the capacity you need to 6x the cost. The same is true with online transaction processing. As you move up the performance spectrum the cost per unit increases. To add linear capacity improvements the organization starts to pay exponential costs. Compare this with exponential capacity improvements with linear programming optimization costs. This become especially insidious if your application does not lend itself to parallelism. Writing distributed code does not come without a cost in both programming hours ( expensive remember ) and performance. Thankfully there has been a lot of work in things like memcached which make the costs much less significant. Of course if your application is highly specialized and more than just a website … good luck.
Hardware is more than the Purchase Price.
Often times programmers look at a hardware retailers website and wrongly assume this is the fully weighted cost of a server. This is definitely the fixed capitalized cost of the server, yet a much more significant cost is the ongoing recurring cost of keeping the server operational. A good rule of thumb is to start with 20% of the purchase price for vendor support. Then add rack space at $25 per month per U. That is an additional $300 a year per 1U server. Power and cooling could be significant as well. There was a movement toward denser systems at the expense of power consumption and cooling requirements, however most people quickly realized that power and cooling costs are much more dominant than space . Network costs are highly variable ( both bandwidth and port consumption ). Lastly, there is the technical operations staff, the system and network administrators. So what does a server cost? Well luckily cloud computing is increasingly popular and it makes it very easy to estimate the at scale, recurring annual cost of a server. Your mileage may vary depending on your company. A singles Amazon EC2 instance is roughly $875 per year. The performance of a single EC2 instance is basically what you would expect from a $800-900 server. One could then extrapolate that the fully weighted, at scale, annual recurring cost of a server is … well … the servers purchase price. So when you say, “Hey it’s only $3000 to add another server” it should really be, “Hey, it’s only $3000 per year to add another server.” Assuming a five year life the annual cost is the purchase price with 20% capitalized cost and 80% recurring cost.
Large Numbers of Servers has its Own Problems
Instead of scaling vertically you may be able to scale horizontally. Big iron, servers with lots of cores and tons of memory, are really expensive. It is much more economical to scale horizontally. Often lots of cheap servers will give you the same performance with a significantly lower cost than a small number of big servers. However, you cannot scale up your environment horizontally infinitely without encountering new and unforeseen problems. Potential inter server communication grows exponentially with the number of servers. A cluster of five servers has 5*(5-1) or 20 potential interconnects. A cluster of ten servers has 10*(10-1) or 90 potential interconnects. That is a 4.5x increase by just doubling the server count. An excellent write up of what Facebook did to optimize memcached illustrates this point exactly. The more servers they added the more resources each server spent managing its connections. Simple tasks become incredibly difficult in large environments. Your organizations management tools and software distribution become unwieldy at large numbers of servers and require more and more engineering and operational effort.
Optimization has Tangible, Very Real Benefits.
I used to work at a large domain name registrar. As a free service to our customers we would host their domain name service for free. We provided a web interface that allowed each customer to manage not only the registration records of the domain, but also individual records in each domain. This is pretty much industry standard now, but when we were doing it wasn’t very common. For a period of time we were most likely the largest DNS service provider in the world. Since this was a free service it was considered a cost center and had relatively low expectations and requirements. We had millions of domains under management and tens of millions of domain records. The DNS software we ran kept the entire record database in memory. Since the number of records we managed exceeded a hard limit in the software we ran multiple servers each with a segment of the total database in memory in pairs for redundancy. There were fifty something servers in total. Each pair had a subset of the entire record database. The DNS software maintained a text file for each domain. There was a master file that told each server which domain it was authoritative for. When the customer updated their DNS records the software would write a new file and hourly the DNS server would do a soft reload and pick up any changes or additions.
We eventually moved away from flat files and stored all the records in a relational database. The DNS software was modified to look up DNS records in the database. This had a number of benefits. First, we would able to reduce our server footprint from 50+ servers to four. Assuming $2000 a year per server that is almost $100k recurring. Even if it took a solid man year to complete ( which it didn’t ) the annuity nature of software optimization makes it totally economically advantageous. Second, DNS entries would be updated in near real time. This meant customers did not have to wait up to an hour for their changes to go into effect. Lastly, the previous system was fragile. It would take 15-20 minutes for the server to stat hundreds of thousands of flat files to determine if records had changed. In that time it would stop processing requests. If the server crashed and had to do a hard start and read each of those files it could take almost an hour to start up. That is an hour of not processing any requests. Of course there was the second half of the pair. If you know anything about DNS have an unresponsive nameserver isn’t an incredibly ideal situation.
What I am describing sounds like a really retarded system … and … it was. If we had the commonly available technology and techniques now, back then it would have been done differently. We didn’t and the system was created because, ” What is fifty servers to a $100mil business line?”. While the primary driver was to reduce cost, a very tangible benefit was a much better user experience and robust service. Eliminating artificial performance bottle necks doesn’t just provide additional capacity, it will often improve the the experience and stability of the application as a whole and thus your competitiveness improves. Software starts faster. Downtime is shorter. In our case when the product manager was no longer shackled to this hourly restart new revenue generating products were created.
Research and Development vs. Cost of Goods Sold.
This may be a bit inside baseball, but even if there was a dollar for dollar parity between optimization and hardware ( and we now know there is not ) you and your company would benefit from spending that dollar on optimization and not hardware. It all has to do with how companies are valued. Take two companies, Eastern Widgets and Western Sprockets. Western Sprockets spends a lot on research and development and has a high gross margin. Eastern Widgets spends less on R&D and has lower margins. They each bring in the same revenue and are equally profitable. Which one has the higher stock price? Western Sprockets of course. Why? Well Western Sprockets spends more of its resources on making and improving things to sell. Additionally Western Sprockets makes more profit per unit than Eastern Widgets because of the higher margin. Assuming both companies are in a growth industry ( in this economy that may be a HUGE assumption but it’ll get better someday ) Western Sprockets has greater potential to make more profit in the FUTURE because of its margins. The higher R&D budget is also beneficial. The assumption is that if both companies were to double its sales tomorrow Western Sprockets would have more profit. Since the market values what it thinks each companies performance will be in the FUTURE the stock price for Western Sprockets will be higher.
What does this have to do with servers vs. coding? The cost of operating servers falls under cost of goods sold and negatively impacts the calculation for your companies gross profit margin. The cost of purchasing servers will be capitalized, but remember roughly 80% of the total cost and thus the dominate cost driver is operations. Programmers are almost always budgeted under product development or R&D. These are usually excluded from the gross margin calculation. High gross margin, higher stock price. Who here has options?
Culture of Waste.
Throwing hardware at problems promotes a culture of waste. There is a Broken Window Theory of software development. The Broken Window Theory posits, “consider a building with a few broken windows. If the windows are not repaired, the tendency is for vandals to break a few more windows.” In software development it would imply that buggy or sloppy code promotes more bugs. Thus, this inattention to detail and software inefficiency creates an environment that fosters bugs. I have never seen anyone wake up in the morning and get excited about how big and bloated the stuff they make is. Programmers morale and dedication are as important to software quality as ability or experience.
Worse still is applications nasty habit of consuming all resources available to it. A generalization of Parkinson’s law is, “The demand upon a resource tends to expand to match the supply of the resource.” As long as there is available server, memory, disk, what have you capacity it is often consumed unnecessarily. Adding more supply by throwing servers at the problem will only increase demand irrespective of actual need. This is especially true in an organization that promote a culture of waste. Without dedicated attention to efficient, optimized code server environments quickly spiral out of control.
So What Have We Learned?
Software optimization at large numbers has a very dominate curve. Hardware does not. Hardware gets really expensive in large numbers. Most of the cost of hardware is hidden from the developers. It’s better to spend on programmers than have lower margins. Optimization means more than being fast. No one wants to be a wasteful jerk. Jeff Atwood may be a nice guy, but his point is complete douchebaggery. What manager in their right mind would spend five person years optimizing software? The initial comparison is intellectually dishonest.