Data Storage Exchange
Posted: January 7, 2007
The advent of the internet drove innovation in information services by removing the burden of distribution from content providers and allowing them to focus on content creation. When distribution service (in the form of internet connectivity) became a commodity, services providers (ISPs) competing on price were pressured to innovate to reduce costs of the commodity. At the same time, the lowered cost of distribution caused information providers to face greater competition from each other while having more resources to dedicate directly to improving their products. Such shifts toward standardized products, or "commodification," is a central theme of free markets, where it could be said that "anything that can be treated as a commodity, will be treated as a commodity." One such service that will likely experience a strong move toward commodification in the near future is hosted data storage. This article discusses a way to create an optimal marketplace for commodity data storage service using an exchange for buying and selling standardized service contracts. Such an exchange would create pressure on specialized data centers to provide storage service at a lower cost, reduce the capital expeditures of data-intensive software services companies, and create arbitrage opportunities for those inventive enough to mix and match buyers and sellers in non-obvious ways.
Information technology applications are already being segmented and delivered as component services, bringing increasing benefits of specialization and economies of scale to complete solutions. The current generation of innovation has seen a major shift to "software as a service," where applications are hosted and run by the software provider and accessed by a thin client over a network (typically a web browser). This model is displacing the traditional software sale of thick client applications because the burden of installing and running the necessary hardware and software stack is transfered from the customer to the service provider. Lower per-user cost is achieved as the service provider is able to more efficiently serve all users than each customer building a hardware/software stack from scratch.
A standardized storage service market would therefore be a logical next step in this commodification: rather than supplying specific information in a standard form (html) for a monthly fee, the storage of raw data would be broken out as it's own service accessed in it's standard form (ones and zeros). In this situation, the procurement and administration of data storage hardware would no longer be the burden of the software service provider, but would rather be handled by dedicated data centers that can serve up files with adequate speed and reliability. Much like the move to web services for applications simplifies the IT requirements of end users down to administration of a simple web browser, the adoption of storage as a service would allow software service providers to specialize in only the development and running of their software while outsourcing the task of building and maintaining a large storage grid.
Consider a hypothetical startup company that has developed a new business model for delivering High-Definition video content over the web. The primary costs for the business are the development of the business model, the software development, and a large data center to serve the files to customers. The business model and software make up the company's competitive advantage and are desirable investments. The data center, however, is a large capital expenditure that provides no real competitive advantage (although poor execution of the data center would swiftly kill the business). By purchasing storage contracts from companies specializing in storage services, the storage service provider can deliver a lower per-unit cost for storage, while the HD-video company can focus on supplying a user interface and legal contracts between the video files and the consumer.
Another natural application for such a market would be a service that provides off-site backups. For small companies and individuals, this is often both a must-have service and painfully expensive. Currently, third-party providers of such a service must manage both the software that synchronizes a user's files with an off-site data store and manage the data store itself. If a market existed for instantly purchasing additional data storage per unit time, then the service provider could pay for the exact amount of storage it's customers use without any additional capital expenditure. This would reduce the cost of entry into the off-site backup market considerably, which would in turn increase the likelihood of greater innovation in the available products and lower prices.
Exchange Mechanics
The fundamental use of the storage service exchange would be to buy and sell a standard unit of storage capacity with a variable level of performance for a variable length of time. An openly published protocol would be used as the mechanism to handle access to the storage service being traded and to report performance metrics back to the exchange for future listing.
Service Delivery
The delivery of the service from seller to buyer would require a standard protocol in order to ensure that buyers are able to swap out the service of different providers seamlessly (with little cost as compared to extending their service contract with their current provider). The protocol would also require the buyer's software client to collect performance metrics of the service so that the seller would be listed appropriately on the exchange in the future. The last key feature of the software would be the ability to redirect the service requests to one or more other clients. This would allow reselling of service contracts and enable arbitrage and futures opportunities as described below.
A storage service's performance is a function of reliability (uptime), latency (response time of requests to the storage over the network), and throughput (the data transfer rate once the initial latency time has passed). Performance could be represented as a multidimensional value, but for the purposes here can be treated as a single value. The commodity being traded is then a credential to use a network API with a particular performance rating for a specified length of time.
Pricing
Providers would post the number of units of storage they have available, the start and end times (dates) that the storage is available, and their asking price. The exchange would list these values along with a performance rating based on the statistics collected about that provider's service. Likewise, consumers would post the amount of storage, time window, performance requirements, and price they're willing to pay. When a match is made, the exchange would send each the necessary credentials for the buyer to use the provider's service for the paid-for period of time.
In this scenario, either the exchange or a third party service could match buyers with those sellers that together could provide the requested level of service for the least cost. This would satisfy situations where no seller was offering the exact terms the buyer was requesting and the lowest priced option was to buy "too much" service.
Arbitrage
Arbitrage of service contracts of different lengths and performance levels will create some of the most significant opportunities in the market. In an ideal market, the ratio of price/time of a service would be constant, so that doubling the length of time of a service contract would double its cost. If the ratio were ever to become variable so that the price/time ratio of long term contracts was less than that of short term contracts, arbitragers could capitalize on the depressed prices of long term contracts by buying them and reselling them as a series of short term contracts. Such techniques are possible because of the distributed nature of the service delivery protocol that lets an arbitrager redirect a request for service that they have sold to another provider whose contracts they have bought.
Arbitrage based on performance provides a more involved example. Consider that a reliability rating of one expected failure every 1065 years can be achieved with just 3 redundant systems mirroring the same data when those systems are expected to fail 5 times each per year. 1 Furthermore, by mirroring data and using a Bittorrent-style downloading scheme, the throughput experienced by the user would be 3x that of having a single service provider. Resellers and arbitragers could monitor the market for situations where the premium for high-performance service was greater than the sum of the cost for multiple low-performance contracts and act accordingly.
The potential to create higher level service offerings by combining service contracts is perhaps the single greatest advantage of such a market. It allows small providers that have invested in storage infrastructure, but don't have the resources to achieve the coveted "five nines of reliability" the ability to participate by being sold as part of compound solutions. This also means that only a modest number of buyers and sellers is necessary for the market to be virtually liquid. That is, it won't take many participants before any buyer or seller can have their needs met by grouping or segmenting participants on the other side of the transaction. For instance, when a buyer shows up, even if there is nothing similar to what they need posted for sale, it will simply be created for them out of pieces of what is.
Closing
The increasing commodification of computing services is inevitable, it is simply a question of what mechanisms will win out. I believe simpler, lighter weight methods will prevail. Note the popularity of bittorent as a massively networked data retrieval mechanism for home users while on-demand computing initiatives by Sun and IBM have floundered with their for-big-business-by-big-business mentality. The market described above is the type of solution that can both support realistic business models and allow small players to participate. Not surprisingly, an open marketplace with electronic transactions and delivery appears to be a strong contender to best lubricate this market and spur innovation in the production of storage services and the web services that consume them.
Footnotes
1) Without belaboring the math, 3 disks mirroring the same data with a failure rate of 5 per year and a mean-time-to-repair of 1 day will typically mean that each disk will be unavailable for approximately 5 days per year. A system failure (where all 3 disks fail on the same day) would be expected to occur just .00094 times a year, or once every 1065.8 years. This is a somewhat complicated extension of the "birthday problem," where the likelihood of two people in a group of having the same birthday is calculated. The combinatorics are complicated enough that I had to write a short program to get an answer. The ocaml code is available here.
Copyright 2007 Peter Groves. This text may be reproduced only in it's entirety in any medium without royalty provided this copyright notice is included.