Search This Blog

Petabyte Files for Cloud Computing

What are the costs of a Petabyte now and in 2020?

In the blog on the "Semantic Web for Navy Information Dominance Operations" I estimated that with sensors included the Navy could be generating over 300 petabytes of information per day.*  Most of this data would come from sources (such as UAVs) that have limited use for only a few hours. Therefore, the accumulation of data for forensic purposes could initially add further storage requirements for not much more than >1,000 petabytes.

The idea of ultimately requiring thousands of petabytes to support the Navy's Information Dominance objectives has raised the question about affordability.

The current costs of a petabyte range from the stripped down $117,000/petabyte from Backblaze (http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/) to the $826,000/petabyte price from Dell. Considering the scale and software available within the Navy Private Cloud, we will use Backblaze as the basis for estimating the affordability of gigantic disk files.
The 2010 cost of 300 petabytes is $35 millions, or 12 cents per gigabyte. This is much less than the total costs of the legacy disk files that are in place for thousands of Navy servers. Huge savings are therefore available immediately.

The cost of disks is declining faster, at 25-30% per year, than the costs of semiconductor memories that follow Moore's Law. A conservative estimate for 2020 would therefore deliver to the Navy a petabyte for about $6,500. The daily 300 petabytes could be then supported for a highly cost effective $2 million.

Summary

Cloud computing offers enormous cost savings in disk storage immediately and in the future.
None of this includes the costs of software to operate petabyte files in the cloud. None of this includes the costs of redundancy and backup. However, the projected costs are well within the limits of the Navy’s IT spending.  Planning for the migration to cloud computing can therefore proceed without concerns about the size of data files. The size of the Information Dominance files should also not be seen as a deterrent to proceeding with the virtualization of data files without further delay.


* http://pstrassmann.blogspot.com/2010/06/semantic-web-for-navy-information.html

Integration of Legacy Applications on a Cloud

Although there have been a sea of changes in the software industry over the last 30 years, there has been no major change in data management since the introduction of the relational database system (RDBMS) in the 1970’s. The world has changed drastically since then. We have orders of magnitude more data, arriving at much faster rates, from more sources. Applications that depend on this data have proliferated, reflecting the needs of the business to have faster and more ready access to information. The relationships among those applications have grown as one business process affects another, requiring the applications to share the data in real-time.

Modern relational databases have resolved many of the problems that they either introduced or suffered from in the early stages. They now provide mechanisms for dealing with high availability, clustering and fault tolerance. They can replicate data to peer databases around the world. However, a few problems remain. Firstly, relational databases are a good way to achieve data integration but are poor at achieving process integration. Secondly, using features such as ‘triggers’, they may be able to detect ‘events’ (changes in data that some application may be interested in) but they are traditionally poor at distributing events back out to the client tier. And thirdly, they do not store, nor present data to the client in a ‘ready-to-use’ format for most of the applications. There are multiple layers of translation, transformation, memory mapping and allocation, network I/O and disk I/O that need to occur in order for the simplest of queries to return the simplest of responses. As our use of the RDBMS has grown over time, we have come to depend on them to share data, but they were really only designed to store data.

In an attempt to break down stovepipe systems, there has been a move to Service Oriented Architectures (SOA). SOA helps organizations achieve reuse of individual components of a business process, and makes it easier to adapt their overall processes to align with changing business needs. SOA enables organizations to quickly build new business workflows. However, SOA still fundamentally leaves business processes as stovepipes and it operates on a basic assumption that the components are completely independent. SOA does not address the issue of the real-time interdependencies on the data that the processes share.

In an attempt to get a comprehensive view of data, large organizations are building data warehouses and online/real-time dashboards, so that senior management can see the big picture and drill into critical details. Most dashboard and/or data warehouse solutions pull a copy of the operational data together (usually into a new dimensional format), leaving the original data in place. The operational applications cannot advantage of this combined data view. Data warehousing does not do anything to solve the real-time data interdependencies between applications where business processes intersect. The missing link is ‘data awareness’.

Consider as an example the way that mission-planning applications (such as JTT or JMPS – Joint Mission Planning System) depend on data from Battle Damage Assessment (BDA) systems, Enemy Order of Battle (MIDB), situation reports, etc. The process flow from the mission planner’s perspective and how potential changes to sources he works with impact the work and the mission.

1. The mission planner(s) start work design missions to destroy enemy targets (bridges, bunkers, SAM batteries, etc.).
2. They pull in data from other systems, BDA, MIDB. Whether they use an SOA based process or not has no real impact in the result. Only on how tightly coupled one system is to another
3. If one second later there is an update to the BDA system or MIDB, the mission planner is left unaware. He continues to plan to destroy a target that may already be destroyed, or plan a mission with inadequate resources due to a change at the target location (new SAM battery, additional enemy forces, etc).
4. The mission planner(s) pull in data from others systems as a final check before releasing the plan. They make adjustments to the plan and release it for execution.
5. If one second later there is an update to the BDA system or MIDB, the mission planner is unaware. The executor of the mission will have to deal with it at run-time. Rerouting to another target, hitting the wrong target or encountering unexpected enemy resistance.

How could this be different? The next generation in data management combines:

1. Distributed Caching
2. Messaging & Active Event Notification
3. Active/Continuous Querying
4. Traditional Querying
5. Support for users/applications on disadvantaged or periodically disconnected networks.
6. High Availability and some degree of Fault Tolerance

The interdependency between applications on data and changes to data has serious impacts on mission critical processes. The current way in which data management is done in enterprise applications is over 40 years old and just can not provide many of the critical features needed to build today’s high performance, cross organization applications. It is time to consider enhancing your systems data management ability.

Arguably one of the most significant developments in the history of data management came with the invention of the relational database (circa early 1970’s). With traditional database access, queries are run against the database, result sets are returned, and work then begins from the returned information.
If new data arrives in the database a microsecond later that would have changed the results set, life is tough. You work with the data you have and maybe synchronize with the database before you finish your analysis, planning, or other work. But once again, the data could change right after you finish.

What can you do? If only the database could call you back, based on your query and show the data changes that would have caused that query to have a different result set. That is exactly what happens with new database software. It acts like a tap on the shoulder to alert people when queries they made have changed results.

The new software enables the creation of applications that work both in garrison and in the field. It has built in support for applications that are not always on the network and/or need to work over distressed networks. During the Trident Warrior military exercise the Navy experienced a 90% reduction in the bandwidth used by a Command and Control (C2) application on a ship built with new software. Additionally, that ship experienced a network outage, during which the application continued to function (although it did not receive new data). When the network was functioning again, the ship received new data and the latest update for each piece of stale data.

Performance
a. Customers experience applications that run 4 to 10 times faster on the average
b. Speeds online transaction processing and analysis systems in the financial industry by as much as factor of 10
c. Speeds up complex long running scientific computing jobs on a data grid
d. Order of Battle data access for complex unit subordination sped up data access times from previous two to twenty minutes (depending on the size of the nations forces) to sub-second.
e. Reduce application footprints – instead of running faster shrink footprint
f. Instead of running an application faster, that speed can be used to run the application on less CPU’s, and less database resources. In turn that often means reduced costs for other software licenses. Overall the result is a significant savings in cost, and power to deploy a system. With today’s edge environments stressed for electricity, the new software can help address that issue. It also supports the green initiatives in government.

The Global Command and Control Systems new Common Operating Picture application, Agile Client, uses advanced software to provide high performance service oriented architecture, where users can dynamically subscribe to near real-time track management data from multiple sources, and view that data live on 3-D or 2D surface. Agile Client enables data fusion from multiple sources. It supports DIL environments.

In the defense and intelligence sector, the new software provides four fundamental virtues: Data Awareness, Support Disconnected Operations and Distressed Networks Increase Performance, Reduce Application Footprint and Real Time view of operational data. In essence, it is was able to pull in all the data streams produced by the military, manage that data in-memory so application could provide a window into that data for the military and achieve all of this with guaranteed low-latency, fault tolerance and high throughput. Additionally the software can provide a unified view of data across datacenters with high throughput and low latency.

Summary
Real time integration of transactions from diverse legacy applications will be dictating most of the investments in command and control of military operations.

Because of its critical importance in achieving the sensor-to-shooter integration for Information Dominance, this blog has extracted most of the text from the Gemfire Company (a division of VMware) relevant text.

Why DoD Must Reduce Costs

On July 22, 2010 a Task Group of the Defense Business Board, appointed by the Secretary of Defense, delivered initial observations on “Reducing Overhead and Improving Business Operations”. The observations include a number of facts that should be relevant to cyber commanders:

1. Increased spending from 1988 through 2010 supported less active duty personnel, fewer reserves, fewer ships, less Army divisions and a smaller number of fighters (SOURCE: National Defense Budget Estimate for FY2011 as of April 2010):

 Conclusion: Spending more bought substantially less. 

2. 40% of active duty personnel were never deployed. 11.4% was deployed three or more times. 339,142 active duty personnel, costing on the average $160,000/year were performing commercial (not war fighting-related) activities.

Conclusion: A small number of active duty personnel do most of the fighting. Commercial work was performed by active duty personnel costing $54 billion. 

3. There has been a steady rise in non-payroll costs for DoD personnel. In FY10 this accounted for $22.5 Billion such as for TRICARE, family separation allowances and survivor benefits. Total retiree outlays for military personnel were $46.7 Billion in FY10.

Conclusion: The trend of rising non-payroll costs will reduce the amount of money available for active duty and civilian personnel.

4. The Federal Deficits, as a % of GDP, can be expected to increase from 9.9% for FY2009 to 24% in FY2040 if present trends continue without change. (SOURCE: Peterson Foundation – A Citizen Guide April 2010).

Conclusion: A 24% share of the GDP deficit is unlikely to be sustainable.

5. The FY 2010 $3.5 Trillion Federal Budget, in constant 2009 $s, will allocate 40% for Social Security & Medicare; 6% for net interest; 34% for all other and 20% for Defense. The FY 2040 $12.3 Trillion Federal Budget, in constant 2009 $s, will allocate 52% of to Social Security & Medicare; 30% to net interest and 7% to all other. Only 11% of the Federal Budget would be available for Defense (SOURCE: Peterson Foundation – A Citizen Guide April 2010).

Conclusion: If the Federal Budget, in 2040 $s is $12.3 Trillion, it would support $1.4 Trillion for Defense, or double the current FY 2010 budget. However, the rise in non-payroll costs is likely to absorb most of Defense budget increases over a thirty-year period.

6. DoD overhead costs, which are concentrated in Administration, Logistics, Finance and HR systems, have an estimated total cost of about $212 Billion. This is approximately 40% of the total DoD budget (SOURCE: Slide 16 of Defense Business Board report).

Conclusion: Information technology should be considered as a premier tool for reducing overhead costs by means of systems simplification and through business process improvement.

Summary:


The costs of DoD information technologies for FY10 are $33.7 Billion for O&M costs plus compensation of an estimated number >200,000 of military and civilian personnel at an average cost of $130,000/year (approximately $26 Billion). This represents about 10% of the total DoD FY10 budget.

Since information technology is deflating at a rate that is greater that 15% it offers an attractive opportunities for cost reductions.  If information technology is also applied to business process redesign the resultant manpower O&M savings can be even greater than what can be accomplished by cuts in technology spending.

Will NGEN Be 99.999% Reliable?

NGEN has been conceived as the backbone network for supporting the Navy’s Information Dominance Corps (IDC). IDC networks will be transmitting sensor-to-shooter communications, globally. Therefore NGEN will have to operate with 99.999% reliability.  This means zero unavailability to connect sources of information (such as keyboards from personal computers or tracking data) to network hosted databases.  All of that will have to be accomplished with a latency time that is at least comparable to Google’ end-to-end responses which are measured in microseconds.

Will NGEN achieve the IDC its ultimate uptime and latency objectives?  It may not be headed in this direction with the most recent metrics included in an extension of the NMCI contract. The original contract with EDS called for the following Service Level Agreement (SLA) uptimes such as:

Critical services with an average >99.7% service level, which is 26-hours/year of downtime per client device.
E-mail would be delivered, on the average, 99% of time with at least 4 minutes of delay for 88 hours/year per client device.

The extent to which NGEN has adopted NMCI SLAs in the new contract extension is not known. The tracking of actual failure statistics was recently abandoned in favor of sampling user satisfaction ratings of questionable utility as a statistically verifiable indicator.

To keep track of compliance with uptime SLAs can be done automatically as a byproduct of information assurance. A counter can keep track of the availability of each of the 400,000 computers on NMCI in order to support critical services where over ten million hours/year are not available to the Navy and the MC.

Setting annual averages for IDC service levels is misleading. What matters is uptime availability of the individual devices that the personnel use whenever needed. Averages hide large variability. At any time there will be numerous client devices that will be out of commission, regardless whether that is due a failure of a desktop, a server or a communication channel. Unless NGEN tracks the availability of each individual client device, in real time, along with a situational awareness of local conditions any contractual compliance report is not only without value but will also give commanders a false sense of the trustworthiness. IDC warriors will always have second thoughts to the possibility that NGEN may not be completely reliable and may call for contingency actions.

The only solution that has merit for the IDC is to implement NGEN as a totally dependable network. There is no way of accomplishing this objective except by deploying massive amounts of redundancy everywhere.

Instead of expensive desktops NGEN should depend on an excess population of thin clients plus replaceable smart phones for all communications. These devices are location agnostic and can work anywhere.

Instead of fixed connections to dedicated servers NGEN should depend on a pool of virtual devices that can relocate workloads automatically and instantly.
The above arrangement is not a backup, but a fail-over design. The fail-over takes place in seconds and is completely automatic,

Instead of depending on fixed network connections NGEN should depend on virtual networks that set up connections to data centers through multiple paths.

SUMMARY

NGEN should be constructed as an inter-connected and completely redundant network that is for all practical purposes failure free. The economics of proceeding in this manner is attractive. Savings in labor will more than offset the costs of added technologies.

 The greatest gain comes from the capacity to support IDC missions where the reliability of NGEN communications will be now a part of warfare operations.

GAO Report on Social Media

The GAO Report on “Challenges in Use of Web 2.0 Technologies” (GAO-10-872T) of July 22, 2010 defines social media as inclusive of Web logs (known as “blogs); social-networking sites (such as Facebook and Twitter; video-sharing Web sites (such as YouTube; “wikis,” which allow individual users to directly collaborate; “podcasting,” which allows users to download audio content; and “mashups,” which are Web sites that combine content from multiple sources. GAO based its findings on reviews of the Department of Homeland Security (DHS), General Services Administration (GSA), and National Archives and Records Administration (NARA) but not from an examination of DoD requirements.

Security Issues
DoD personnel using social media face persistent threats targeting to messages received as well as sent. Under the Federal Information Security Management Act (644 USC 3544(a)(1)) DoD is responsible for the security of all information collected or maintained, which includes social media. When the *@mil.gov or a message from NIPRNET identifies someone on a social media site as a DoD employee or contractor they may be providing information that may be exploited in a cyber attacks. Therefore, OSD policy must provide guidance how to safeguard social media communications.

Privacy Act Issues
The Privacy Act of 1974 applies to the control of the collection, use and disclosure of personal information. The GAO makes it clear that the Privacy Act applies to social media using systems owned and operated by the government. Government personnel have no privacy in such cases, though their information is protected.
If government personnel uses a third party service (such as Facebook or Twitter) over which there is no government control, the Privacy Act does not apply.  However, a government Agency must be able to make a “… determination what information to collect…” about information that is exchanged in this way and what rules apply to the disclosure of personal information.

Records Management Issues
Does the information exchanged through social media technologies constitute federal records pursuant to the Federal Records Act (44 U.S.C.§ 3301)?  In the case of content created with interactive software on sites owned by government all transactions constitute Agency records and managed accordingly. When social media transactions take place through third party services the recording and retention of such records is ambiguous and is subject to Agency interpretation whether any information is at risk. In the case of DoD the most likely answer is that all NIPRNET communications trough sites not controlled by the government would have to be labeled as DoD records.

Freedom of Information Issues
The GAO has been unable to address the question whether social media communications are open to FOIA requests. Whether social media transactions qualify as DoD records is determined whether DoD controls these exchanges. This is a matter determined by the courts, though the key criteria are whether DoD has relinquished control or not.

SUMMARY
The GAO report has raised four issues that affect the policies how to deal with social media in DoD. Though security, privacy, records and FOIA are addressed in an inconsistent manner the security issue overrides all. National security interests must be placed ahead of other considerations.

From the standpoint of DoD the use of social media qualifies as DoD business whenever conveyed over the NIPRNET through the public Internet.

Malware Delivered Through Social Media

Facebook malware ultimately involves interaction of a downloaded Facebook page with applications that can then transform a personal computer into a conduit of malicious code.

Once a malicious Facebook malware has been endorsed by clicking or passing automatically, the malware can steal personal information, monitor activity or spread infection.

For example, a fake notification will claim that somebody has “posted something on your pages” or “tagged a private video”.  The icon next to the notification would utilize standard notification windows. This will prompt instinctive acceptance.

There are several fake messages on Facebook that can be exploited.  Similar tricks will apply to dozens of social messages that are flooding mailboxes. Subversive text is placed directly or inserted as spam. It is always made to look plausible.

The list of malware that can be triggered by social media is growing at a fast rate. According to the April 20, 2010 Symantec report (http://www.symantec.com/about/news/release/article.jsp?prid=20100419_02) there were 240 million new malicious programs against which DoD must protect seven million computers. These programs show a rising sophistication how to hide malware.  Cybercrime attack toolkits are now available to speed the introduction of “zero day” attacks that flood the defenses and make most anti-virus countermeasures ineffective (http://www.symantec.com/connect/blogs/zeus-king-underground-crimeware-toolkits). The Zeus malware generation software can be purchased anonymously for $700.

The greatest threat to the Department of Defense from social media originates from various forms of “phishing”. According to a RSA report of January 20, 2010 (http://www.rsa.com/press_release.aspx?id=10671) three in ten participants in social networking are easy prey to such compromise.

SUMMARY
There is not much that can be done by DoD to prevent “phishing” or social engineering via social media. The only available defense is through tracking what are potentially compromising outgoing responses from DoD personnel.

DoD cannot depend on BlogSpot, Digg, Epernicus, Exploroo, Gossipreport, Facebook, Flickr, Metacafe, Myspace, LinkedIn, Orkut, Technorati, Twitter and YouTube providers to install acceptable safeguards. In view of DoD’s endorsement of social media it will have to trust its people but must also take reasonable measures to verify that a leak of intelligence information is not taking place.

Why Social Media Can Convey Cyber-Attacks

RSA the premier provider of security solutions for organizations has just published a blog (http://rsa.com/blog/blog_entry.aspx?id=1684). An extract from the blog warrants posting.

RSA has tracked the operation of a banking Trojan, which is a custom variant of a large malware family. Any website that lets users upload social media content and then publish it can be exploited to store a Trojans’ encrypted configuration. This includes almost any Web 2.0 platform that enables unrestricted posting of comments, creation of public profiles and the setting up of newsgroups. This is how it works:

1. The cybercriminal sets up a bogus profile, such as “Ana Maria”.
2. An encrypted malware string is coded as text and then uploaded into the bogus profile.
3. After the message enters into a customer’s machine it will search for the string, which will signal the beginning of the malware code.
4. The malware is then executed. If it is a Trojan or a bot, it can proceed to attack the customer’s computer or to propagate further.

Using social media as a conveyor of malware has many advantages:

1. Cybercriminals need not buy and maintain a domain name where they can be traced. The public web site such as YouTube, Facebook, MySpace and Twitter from where to launch attacks will sign up anybody.
2. Cybercriminals need not pay or maintain a dedicated server, which can be used to track the origination source. For instance, this makes Russian and Chinese origins of malware untraceable because all messages will show up as originating in California.
3. As soon as a suspected profile or account is removed, a new profile or account can be easily set up quickly.

SUMMARY

From the cybercriminal’s point of view, the exploitation of public social media is not difficult. Detecting malware hosted on public websites note feasible merely by scanning of suspicious URLs for viruses. Compromising attacks from public sources will require more sophisticated detection means. Unless a social media site is specifically protected for such incursions the chances are that the cybercriminals will succeed.

Managing Open Source Software

DoD CIO’s “Guidance Regarding Open Source Software (OSS)” of October 16, 2009 states that OSS meets the definition of commercial computer software and shall be given preference in software acquisitions.

Capers Jones, in “Quality and Productivity Comparison of Selected Software Development Methodologies” (Version 11) of 6/4/2010 shows a range in the costs of software. The development costs plus five years of use for 1,000 Function Points are as follows:
-        85% reuse of certified code = $54,032
-        Capability Maturity Model 1 method = $2,804,224

Reusing certified software from open sources is clearly the most advantageous way for DoD to write applications programs. 

What is needed is a programming model for open source code that insulates software components from the complexities of platform services, from application management, from transaction control, from security assurance and from data access procedures. In this way components can be configured and then “wired” together. The result is a code that is more portable, reusable, testable and maintainable.

The largest open source code library is SourceForge.net, although there are many other similar collections. Library documentation shows how to install downloaded code into applications at no cost.

SourceForge includes: Engineering (19,320 components); Formats and Protocols (5,185); Database (9,175); Security (5,070); Printing (756); Terminals (889); Business (13,200); System (24,737); Desktop Environment (4,918); Software Development (35,265); Communications (19,712); Multimedia (18,221); Text Editors (4,138) and Internet (31,620).

Each component would be tagged with its size (in megabytes), the number of weekly downloads, an index of quality (% of recommendations received) and the number of reviewers. A random sample showed a 32MB component with 759 weekly downloads and a 94% acceptance level by 19 reviewers.

DISA has extracted from the SourceForge the Forge.mil web. It has been modified to meet DOD security requirements with smart cards used to provide log-in credentials. There are only few open-source components hosted at Forge.mil so far. All of the code is open for public view, though only those with Defense Department credentials can edit or contribute. 


Meanwhile, the DoD issued a memorandum (10/16/09) mandating the use of Open Sources Software (OSS). * The memorandum states that:

1. OSS is defined. The definition is actionable.
2. OSS shall be used in classified and in unclassified environments.
3. Director, Enterprise Services is the lead on promoting OSS.
4. OSS meets the definition of “commercial computer software”.
5. OSS shall be given statutory preference in all acquisitions
6. PEOs required to conduct market research on OSS availability.
7. Market research for software must include OSS.

SUMMARY
The Defense Department is committed to using open source software as a customer as well as a developer. Program Executive Officers (PEOs) will have to verify the integrity of open source components to preserve a continuation of peer reviews.  PEO will also have to see to it that expensive contractor originated code will be kept to a minimum.


* http://cio-nii.defense.gov/sites/oss/index.shtml


Storing Records of Social Media Transactions

The capture and the retrieval of the social media transactions for forensic purposes (see Strassmann blog on “Tracking Anomalies in Social Computing”) cannot use relational databases, such as provided by Oracle.

Database management designs, such as the proprietary “BigTable” used by Google, departs from the typical convention of a fixed number of columns. What is needed is a system that will store sparse, diverse and non-standard records that require hundreds of petabytes of storage (see Strassmann blog on “Should Petabyte Files Inhibit Migration to Cloud Computing?”).

The National Security Agency is taking a cloud computing approach to the development of intelligence gathering that can link disparate intelligence sources (see http://www.darkgovernment.com/news/nsa-embraces-cloud-computing#ixzz0tyFFCgV7). This increases intelligence awareness and safeguards national security.

Such a system can house the streams of outgoing social media communications. Analysts can then add metadata and tags that enable search, discovery, collaboration, correlation, and analysis.

NSA is using the Hadoop file system (http://hadoop.apache.org/), which is an implementation of Google’s Map/Reduce parallel processing system. This makes it easier to rapidly reconfigure data and to scale up files as the number of recorded messages grows. Such a system will run on cheap commodity hardware and will manage data servers as pools of storage resources.

SUMMARY

Map/Reduce databases rather than relational data software are the only ones suitable for the retention of petabytes of diverse messages that cannot be categorized in advance. The Google approach, which tracks billions of free form transactions, cannot be used by DoD. However, Open Source Software, such as Hadoop will fit DoD needs (see DoD memorandum of October 16, 2009 from the OSD CIO).

Planning for Data Center Consolidation

The Federal Chief Information Officer issued on 2/26/10 a request for submission of data center consolidation plans. Agencies were asked to deliver this by June 30, 2010. The plans would address reductions in the growth from 432 in 1998 to 1,100 in 2009 but the objective was cost reduction.

A review of what the Agencies are proposing has raised a number of issues:

1. Agencies are using server consolidations in lieu of data center reductions. Counting the number of servers is not an appropriate metric. A HP Proliant DL100 server costs from $699 to $1,529. A HP Proliant DL700 server costs from $16,999 to $28,999. There is a factor of 41x difference.  All servers are not equal. The number of servers is not an index of data center size.

2. Server consolidation without virtualization can replace a poorly utilized $699 server with a poorly utilized $28,999 server. There should be distinctions between consolidations through upgrading vs. consolidations through virtualization. There are operating major cost advantages in virtualization.

3. The greatest gains in data center consolidations are not in reducing hardware costs (<10% of operating costs) but in declining applications operations and maintenance expenses (>50% of cost) plus data center overhead (>20%). An excessive concentration on the cutting of hardware costs as well as power efficiency costs overlooks what are the greatest savings opportunities.

4. In DoD there are a large number of locations, which are not designated as data centers, but are nevertheless running numerous servers. Excluding small locations from consolidation plans many opportunities for the savings are lost.

5. Any discussion of data center consolidation must include backup, fallback and service availability. Stand-alone, hard to defend and underfunded sites, with low capacity utilization, will result in service levels that are below acceptable standards. The primary objectives for data center consolidation should be the improvement in performance and in high levels of information security. Cost reductions are necessary but only secondary.

6. The analysis of data center consolidation savings must include an evaluation of related communications costs. Since the speed of propagation of transactions over telecomm lines can be counted in a few microseconds anywhere in the world a reduction in the number of data centers is feasible if greater bandwidth is available.

SUMMARY

Any plans proposing a consolidation of over 1,000 data centers to a much smaller number of secure sites must include the total cost of ownership of Operations and Maintenance of $49 billion/year, according to OMB VUE FY09. (http://www.whitehouse.gov/omb/egov/vue-it).  Plans that focus only on hardware savings are not acceptable.

Defending the Presently Indefensible

DoD’s 7 million machines linked through 15,000 networks are exposed to unauthorized probes more than 6 million times a day according to Gen. Keith Alexander.

It is only a matter of time that another version similar to the Conflicker computer worm will sneak into the DoD. It will be looking for gaps in firewalls and for computers with weak passwords or without the latest security updates. There will be always some computers that will be compromised and then commanded remotely to propagate malware. A more virulent form of attack on DoD computers are “bots”, which are hard to detect robotic software that now occupies ten thousands of computers parked on the Internet. “Bots” can be directed by their controllers to execute subversive missions that could impair DoD war fighting operations. Bots self-propagate once they have gained network access.

Gen. Alexander advocates the establishment of a “situational awareness” as the first step in countering threats to the DoD network.  Accordingly, DoD must posses a “common operating picture” of its network.  These are good ideas but leave open the question how to proceed with implementation. What is the cyber defense program schedule? How can the DoD afford creating 99.9999% reliable security perimeters? How will the Cyber Command reallocate security funding that now consume over 10% of IT spending while not delivering demonstrable security?

Advocating network consolidation appears to be an obvious remedy. However, that is hard to execute in short order, especially if funds are limited and the intensity of assaults is rising faster than the ability to mount defenses. DoD networks are wedged into legacy applications, which are controlled by hundreds of local bureaucracies and by thousands of contractors. Budget-squeezed network operators do not have the funds to invest, on thousands of separate applications, the defenses against in increasingly aggressive attackers.

The ideal solution is to expose the DoD to the Internet deploying only a small number of extremely well defended perimeters. To generate the cash savings that would fund such an investments would require the virtualization of all computer services. That would require the disentangling of application software from their respective CPUs, data and communications in order to form separate pools of well-protected resources.

It is hard to see how DoD could commit itself to the pursuit of a centrally managed program for the achievement of virtualized systems under the technical and financial guidance of the Cyber Command. The creation of a unified DoD network will require changes in the organization of IT resources.

Although DoD will be ultimately driven to adopting a similar solution for security and financial reasons, in the immediate future the best option is to start “herding” major DoD applications in the desired direction through encapsulation of legacy system into a private cloud.

SUMMARY

Separately managed 15,000 networks that connect over seven million computers are indefensible.  Though the Cyber Command may proceed to improve security by means of only policy and standards, there are insufficient people and funding to safeguard DoD networks by such methods.

The Cyber Command must immerse itself into the operational control of unified DoD networks.   The objective of creating shared pools of resources that can be protected in a secure cloud may be distant. Nevertheless, a start toward the desired direction can be made now through rapid encapsulation of legacy applications into a version of the cloud that performs as infrastructure-as-a-service.

Constructing an Enterprise Data Base on the Cloud

The ultimate objective of cloud computing is to separate the CPUs, memory, storage and communication technologies from their respective applications.  The objective is to allow the creation of shared pools that can be re-organized for achieving high levels of utilization.
Diagram from VMware.com

When each application owns its own technologies the customer will acquire excess capacity for peak loads. With separate virtualization of technologies that is not necessary. Sharing of resources will deliver reductions in capital costs as well as cuts in operating expenses. Such pools can support thousands of applications and can be managed with fewer people.

The greatest cost advantages will be derived from the creation of storage pools. The need for added disk capacity is rising faster than for other assets while disk capacity utilization is declining.  The technical means for creating a storage pool is accomplished by means of virtual disks.  These are stored as files on the hypervisor.

One of the key features of Type 1 hypervisors is the encapsulation of legacy applications. This means that the complete legacy files can be migrated, copied, moved, de-duplicated and accessed quickly. Since an entire disk partition is saved as a file, virtual disks are easy to back up, move, and copy.
The bare-metal hypervisor architecture for managing storage pools permits near-native virtual machine performance as well as reliability and scalability without the need for a host operating system. Virtual machine disk files offer access to data while giving to administrators the flexibility to create, manage and migrate virtual machine storage as separate, self-contained files. Redundant virtual disks eliminate single points of failure and balance storage resources. This allows the clustering of files and enables accessing several files concurrently.

Many of the available hypervisors are certified with storage such as systems from Dell, EMC, Fujitsu, Fujitsu Siemens, HP, Hitachi Data Systems, IBM, NEC, Network Appliance, StorageTek, Sun Microsystems and 3PAR. Internal SATA drives, Direct Attached Storage (DAS), Network Attached Storage (NAS).  Both fibre channel SAN and iSCSI SAN are supported. This provides the means for providing infrastructure services such storage migration, distributed resource scheduling, consolidated backup and automated disaster recovery.

 All files that make a virtual environment consolidate data in a single logical directory managed by means of a meta-data directory. With automated handling of virtual machine files, the management system provides encapsulation so that it can easily become part of a disaster recovery solution.

 Conventional file systems allow only one server to have read-write access to the same file at a given time. By contrast, enterprise storage allows multiple instances of virtual servers to have concurrent read and write access to the same resources. Virtual enterprise files also utilize the journaling of meta-data changes to allow fast recovery across these multi-server resource pools. Snapshot features are available for disaster recovery and backups.

SUMMARY

The migration of data files from individual applications into the virtual disk environment should be seen as a way to deliver storage pools into a cloud environment. Initially, the linking between applications and corresponding data will be closely coupled. However, through conversion software (such as provided by firms comparable to Informatica and AbInitio), a clean separation of data from applications can be achieved ultimately.

The consolidation of all business applications data files into a DoD business repository, controlled by a single meta directory, will materially reduce the costs of DoD business systems operations. Over $20 billion/year is spent in DoD running applications that consume machine cycles in exchanging each other’s data files.  A consolidated data file will eliminate much of that and make a SOA (Service Oriented Architecture) possible.

Secure Workstations for Systems Development

A virtual workstation enables the development, testing and deployment of diverse applications without changing equipment at a customer’s site. This is accomplished by adding a hypervisor, as virtualization software. Desktops, laptops or smart-phones thus become virtual workstations with the capacity to perform a large variety of tasks. The following are the functions of virtual workstations that operate as the platforms for systems development:

  1. Test applications, with different levels of security on the identical desktop, using Linux or Windows but without rebooting.
  2. Experiment and test a combination of new versions of security proposed safeguards on separate and isolated virtual computers without the need to acquire separate computing devices.
  3. Deploy different combinations of browsers and third party security appliances for examination how they interact with different applications. Assure the elimination of conflicts arising from new software patches.
  4. Validate if there is interference between security software and various versions of browsers, operating systems and proprietary application development tools. The number of cases that need testing could be in thousands.
  5. Demonstrate how the performance of the security software will affect proposed computing configurations, multi-core processors or virtual disks. This includes the verification of encryption codes.
  6. Run demonstrations of prototype versions of applications, which includes systems assurance.

The development environment for a secure workstation requires the creation of fully isolated and secure virtual machines that encapsulate an operating system and its applications. The virtualization layer must map the physical hardware resources to the virtual machine's resources, so each virtual machine has its own CPU, memory, disks, and I/O devices, and is the full equivalent of a standard x86 machine.

Virtual workstations can operate either as a Type 1 (or native, bare metal) or as a Type 2 (or hosted) hypervisors. The difference is that Type 1 runs directly on the host's hardware to control the hardware and to monitor guest operating systems whereas Type 2 runs within a conventional operating system environment.

The following figure shows one physical system with a type 1 hypervisor running directly on the system hardware, and three virtual systems using virtual resources provided by the hypervisor.


The following figure shows one physical system with a type 2 hypervisor running on a host operating system and three virtual systems using the virtual resources provided by the hypervisor.




The virtual workstation will run on any standard personal computer and will be the equivalent of a full PC, with full networking and devices — each virtual machine has its own CPU, memory, disks, I/O devices, etc. This allows the capacity to run on the supported guest applications such as Microsoft Office, Adobe Photoshop, Apache Web Server, Microsoft Visual Studio, Kernel debuggers, as well as all security software provided by vendors such as McAfee, RSA, Check Point, Symantec, Sophos and others.

SUMMARY


The development environment for secure systems requires the capacity to test and validate complex interactions between hardware, operating systems, applications and a variety of security offerings. A very large number of possible combinations must be tested not only to verify compliance with required functionality but also to assure operational viability. A virtual workstation has the capacity to assure the exploration of a large number of security features so that project schedules can be accelerated.

By using virtual workstations developers can check the acceptability of available security options in a non-homogeneous environment.

Central Management of Desktops for Security

Workstation desktops can be managed as a central service instead of provisioning each workstation individually. These server-managed desktops bring the power from the datacenter to the network clients by creating images of standard desktops. User desktops can be then administered by removing the management of their IT infrastructure to central offices. Users can then instantly access their standard desktops, including data, applications and settings, from anywhere.

This arrangement allows standard desktops to run directly on a user's laptop, rather than on a server inside the data center. Centralized management provides users with better performance and lower maintenance costs.


 SUMMARY

The management of desktops as a service makes it possible to impose the identical security safeguards upon an entire population of computing clients. Up to 16,000 computing workstations can be managed, with limited personnel, from a single set of virtual servers.

Cyber Security is Asymmetric

DoD is spending $3.2 billion/year on information technology to secure networks against incoming malware. Meanwhile, DoD spent hardly any money to protect against outgoing compromising data from insiders.  Nobody seems to care much about the prevention of exfiltration of information.

Time has come to recognize that cyber security has to deal with unequal amounts of inbound and outbound traffic. Our enemies can gain more credible information from easily available disclosures from inside sources than from encrypted data that must be mined through firewalls, virus protection and filtering. That is why the imbalance between the expensive defenses against incoming intrusions vs. the puny amounts spend to deter outgoing leaks can be labeled as asymmetric.

A large scale extraction of SECRET communications was accomplished by means of a disk that copied transactions. This method has become identified recentnly as "wiki" leaks though similar cases are most likely more prevalent than is acknowledged. A disgruntled military person, properly cleared for unrestricted access to all SIPRNET data, had access to a wide range of messages originating primarily from  the Department of State.

When the difficulties in sharing of information across several agencies in Iraq came to light, the DoD policy makers relaxed restrictions that were previously in place to allow access to the SIPRNET.  As result the number of personnel cleared for SIPRNET searches increased.

The responsibility for enabling "wiki"-type uncontrollable disclosures can be traced to the decision to lift limitation on SIPRNET access without corresponding restrictions. Exfiltration will continue until DoD institutes   changes how access authorizations are granted not as a blanket permission that applies in all situations. Access to the SIPRNET must have limits as to scope and time as defined by a person's specific mission. Implementation of such a policy will require major changes in the way DoD personnel systems are administered.

The greatest source of persistent information leakage from DoD can be found in social computing such as through YouTube, Facebook, MySpace Twitter and blogs. The OSD policy on social networking of February 25, 2010 makes such activity “integral to operations across DoD”. It orders the re-configuration of the NIPRNET to provide access to Internet-based capabilities from all Components. How to implement that was left without any guidance or how to arrest the revealing of military information.  In short, the current OSD policy has opened the gates to the loss of intelligence to close a billion people now engaged in social computing. A well-informed source tells me that about 20% of all DoD traffic is in conducting social communications through public sites which are unprotected as well as potentially toxic.

A recent incident demonstrated that outsiders could use the social media to extract DoD information. A phony “Robin Sage”, easily masquerading as an employee of the Naval Network Warfare command, was able to accumulate in a few months 300 friends on LinkedIn, 110 on Facebook and had 141 followers on Twitter. She connected with the Joint Chiefs of Staff, the CIO of the NSA, an intelligence director for the U.S. Marines and the a chief of staff for the U.S. House of Representatives. In all communications there were clues that “Robin” was a fake. In one case “Robin” duped an Army Ranger into friending her. The Ranger inadvertently exposed information about his coordinates in Afghanistan with uploaded photos from the field that contained GeoIP data.

Here is another case of disregarding elementary security which forgot about asymmetric effects. It is a case in which I was involved. A bank's currency trading system was very secure. In its operations it followed best practices and was often praised as an exemplar of good risk management. All of the money transfers - sometimes in hundreds of millions of dollars in a matter of an hour - was securely executed without ever having a problem. The computers, the data center and the transmission lines were locked-down securely. Yet, suddenly, there was a problem--a large sum of money ($80 million) disappeared in a matter of seconds. When we finally walked through all of the scenarios, the problem was that although the currency applications were absolutely secure, the maintenance programmers (who were supporting money transfer applications) were communicating by open e-mail about software fixes and the next software release. The e-mails were mostly about project management housekeeping, such as when you run the tests and when you do a software update. The e-mails therefore flagged when the money systems were most vulnerable. By keeping track of the programmers' chatter over e-mail the attackers knew exactly when, for a few seconds, the system was naked.

When verifying cyber security the number one rule is that the attackers will first devote their time not on attacking a target directly. Devoting efforts to seek out locations of maximum vulnerability will always take precedence. Therefore, I favor managing social media on the NIPRNET against potential exfiltration as a priority (see http://pstrassmann.blogspot.com/2010/06/tracking-anomalies-in-social-computing.html). Unchecked outgoing traffic will always leave military information vulnerable.

SUMMARY

Cyber security leaks originate from insiders. Unchecked social computing can be the attacker's favorite means for data mining. From the standpoint of our enemies, acquiring easily accessible intelligence from inside sources can be simpler than whatever can be obtained by means of hard work to crack DoD barriers.

Placing Legacy Applications in a Virtual Environment

Legacy applications can be encapsulated into executable packages that run completely isolated from each other in a virtual environment on data center servers. The virtualization layer maps the physical hardware resources to the virtual machine's resources, so each virtual machine has its own CPU, memory, disks, and I/O devices, and is the full equivalent of a standard x86 machine with Intel and AMD processors as well as with most Windows or Linux host operating systems.
In virtual operations the hardware support is provided by means of inheritance from the legacy host operating system.


Migrating individual applications to run on top of a hypervisor makes it possible to place different versions of the Windows or Linux operating systems to run conflict-free on the same server. Once the legacy applications are deployed in the virtual environment, individual application packages can be moved to different virtual computers, eliminating costly recoding and testing. After that the existing applications can migrate into successor environments in order to start conversion or conversion of previously disjointed or incompatible systems.

The placement of diverse legacy applications on a shared hypervisor offers the following advantages:

Delivers uniform application access to all users.
Eliminates the need for additional server hardware in support of different operating systems.
Converts legacy applications for support by different operating systems versions without the need to recode, retest and recertify.
Streams applications from a shared network drive with no intermediate servers or client software to install.
Controls storage costs by providing a higher level of utilization.
Allows over-allocation of storage capacity for increased storage utilization, enhanced application uptime, and simplifies storage capacity management.
Lowers capital and operating expenditures by reducing disk purchase while reducing power and cooling cost.

SUMMARY

The encapsulation of legacy systems, with subsequent migration into a virtual environment is the next step after server consolidation. It should be seen as another phase of cloud formation.

The relocation of legacy applications into a virtual data center should be seen as an evolutionary step. It will deliver cost savings even after the legacy system will continue intact until such time when it is finally phased out.

The placement of legacy applications in a virtual data center should be seen as a way for aiming for the ultimate achievement of greater interoperability of data, communication links and application logic.


Protecting the Cloud

The security of virtual computers can be achieved by means of application program interfaces that enable select partners (such as McAfee, RSA, Check Point, Symantec, Sophos and others) to install security products that will support virtual environments. The result is an approach to security that provides customers with a cloud-based approach for running secured applications.

The interoperability of hypervisors with the offerings of various security products makes it possible for third-party vendors to manage, through the hypervisor, the protection of virtual machines in a cloud. By this means the security applications can identify malware or denial of service attacks. Security vendors can also use the hypervisors to detect and eliminate intrusions that have unprecedented characteristics, while retaining a record of such attempts for taking corrective actions.

The virtualization technology program for security partners includes the sharing of open, interoperable and cross-platform technologies. These become affordable by providing a continued stream of innovative security solutions, which is spread over a large machine population. By deployment of security measures to the entire cloud of virtual machines customers can obtain lower costs and gain greater visibility at network control centers. By applying consolidated security techniques it is possible to fund sophisticated forensic analysis, which can be scaled over thousands of servers and millions of personal computers.

Virtualization of security cannot be simply appended to servers or desktop computers, as is currently the case when virus protection software and firewalls are installed individually.  There will be always gaps in the protective measures on account of the obsolescence of security software updates as well as the insufficiency of maintenance talent due to funding limitations. In most cases there will not be adequate personnel available for monitoring and then reacting to security incursions.


The intruders will be always seeking out unprotected gaps in protection. With millions of security incursions into the DoD networks per day, the number of potential out of control situations will overwhelm the defenders, unless the systems assurance designs offer well staffed consolidation of surveillance. 

Third party “Security Virtual Appliances” should be embedded within the hypervisor technologies. These appliances provide services such as antivirus, personal firewall, intrusion detection, intrusion prevention, anti-spam, URL filtering, and others. With the growing importance of cloud management of thousands of servers, under central control, it is important to realize that the implementation of security cannot be an afterthought. Security must be fused into the cloud design as it evolves into a comprehensive virtual machine infrastructure.



This arrangement allows the monitoring and enforcement of network traffic within a virtual datacenter to meet corporate security policies and ensure regulatory compliance.  It enables the running of  applications efficiently within a shared computing resource pool, while still maintaining trust and network segmentation of users and sensitive data as needed.

SUMMARY

Cloud computing, which can include thousands of servers, requires the full integration with the capabilities offered by vendor supplied security appliances. Such safeguards are expensive. They also require the vigilance of exceptionally well-trained personnel as well as the availability of an extensive suite of forensic tools.

On account of the huge costs for assuring the protection of DoD computing, the security of its 15,000 networks can be achieved only through the application of protective safeguards that operate in a cloud environment. 



Who Will Deliver Actionable Cybersecurity Solutions?

In his first public appearance, Gen. Keith Alexander, the head of the US Cyber Command and Director of National Security Agency, stated that DoD was lacking situational awareness - simply, knowing what systems' hackers were doing.  

The lack of situational awareness means that key defense IT systems remain exposed to sabotage. With 7 million DoD computers linked by means of 15,000 networks there are unauthorized probes of 250,000 times per hour. Such events are discovered only after the fact and often never.

The cybersecurity dangers are clear and present. What is the government doing to address such threats?

The US Government Accountability Office (GAO-10-466) has just published a report on the current status of cybersecurity research & development. The GAO findings reveal an unsatisfactory proliferation of efforts, which are marginally funded.

The following organizations were identified as dealing with cyber defenses:

President's Council of Advisors on Science and Technology;
President's Information Technology Advisory Committee;
National Security Council;
Cybersecurity Office/U.S. Cybersecurity Coordinator;
Office of the Director of National Intelligence;
Office of Management and Budget;
Office of Science and Technology Policy;
National Science and Technology Council;
Committee on Technology;
Subcommittee on Networking and Information Technology;
National Coordination Office of Networking;
Senior Steering Group for Cyber Security;
Cyber security and Information Assurance Interagency Working Group;
Special Cyber Operations Research and Engineering Group;
OSD Research and Engineering Directorate;
Office of Naval Research;
Army Research Laboratory;
Air Force Research Laboratory;
Defense Advanced Research Projects Agency;
National Security Agency;
Department of Energy Cybersecurity Research;
National Institute for Science and Technology;
Department of Homeland Security;
Institute for Information Infrastructure Protection.

In addition to the government-managed R&D dealing with cybersecurity there are at least another dozen private sector organizations funding similar efforts.

The chances are remote that any of the above institutions will solve general Alexander's problems in the foreseeable future. In addition to the continuation of policy-level discussions, that rarely produce actionable solutions, the DoD should meanwhile concentrate on how to overcome the vulnerabilities of its porous networks and computers. DoD spends over $33 billion/year of IT for managing the sources of its risky networks. DoD should concentrate immediately on the security of its presently installed computer applications. This can be achieved only through the adoption of much safer cloud computing designs.  How to do will be discussed in follow-on blogs.

SUMMARY

There is no question that the existing institutions will have to continue de-conflicting and re-focusing the activities of their research and development cybersecurity efforts. Meanwhile, on a more actionable level, the migration to a more secure cloud environment should proceed at an accelerated pace.  

Edge Servers for Information Dominance

Managing the Navy's DNS (Domain Name Services)

In my November 2009 AFCEA Signal article on "Internet Vulnerabilities" I focused on the vulnerability of DNS routing in the path of transactions to end users.

One of the major exploitable vulnerabilities of the Internet is the hostile modification of routing tables that are managed by DNS.

It is now possible for the Navy to set up local  "Edge Servers" for the distribution of web transactions. These servers are configured as secondary DNS services, set up by centrally and controlled by network control operations (see diagram below).

This arrangement makes it possible to leverage improved performance through reduced transaction latency. It makes possible the placement of redundant web services, which is useful for achieving reliability under war-fighting conditions. A mission oriented edge server can be also placed on a vehicle
or on ships. Such servers can be assigned restricted roles and could operate with minimum bandwidth requirements.

The primary web servers are never exposed to the end users, therefore mitigating the risks from corruption, such as "cache poisoning" or denial-of-service attacks.

The edge servers have sufficient capacity to incorporate numerous locally hosted security technologies. These can act as the first line of defense between the end user and the web sites located on the cloud.

SUMMARY

To support the highly diverse requirements of the Information Dominance initiative calls for fielding of distributed computing resources. An "edge servers" design will meet these needs.

The placement of "edge servers" throughout the Navy could avoid investments in huge data centers and allow incremental roll out the Information Dominance program.



NOTE: Illustration based on Akamai concepts.

GAO Concerns Do Not Apply to the Navy

GAO Report GAO-10-855T of July 1, 2010 noted that "... 22 of 24 major federal agencies reported that they are either concerned or very concerned about the potential information security risks associated with cloud computing. Risks include dependence on the security practices and assurances of a vendor, and the sharing of computing resources [with other firms]. Agencies have also identified challenges in assessing vendor compliance with government information security requirements and clarifying the division of information security responsibilities between the customer and vendor."
How appropriate are the GAO concerns?

By far the largest vendor of cloud services is the Amazon Elastic Cloud (EC). According to best estimates it generated in 2009 revenues of about $220 million. That is a small amount compared with the Navy's annual IT operating costs of $4.9 billion (FY09). With the scope of Navy operations, and with the declared objective to operate a single network, it is inconceivable that the Navy would rely on the security practices of any one cloud vendor. It is also inconceivable that the Navy would wish to share computing resources with any other enterprise or agency.

The GAO concerns can be overcome by the following policies:

1. The Navy will operate and control all security practices of its cloud.
2. Navy personnel will manage and be solely accountable for security measures.
3. Network and computing operations will be managed by network control centers operated exclusively by Navy personnel.
4. Physical aspects of data centers may be operated by cloud providers but without control over the software or data.
5. There will be no division of responsibility between the Navy and a contractor with regard to information security.
6. The Navy will operate its cloud environment as a service-as-a-platform, totally isolated from any other shared service.

SUMMARY

The GAO concerns about information security risks in cloud operations are not appropriate. GAO did not give consideration to security policies that the Navy could adopt to protect its operations.

The six policies outlined above should become the basis of guidelines for the Navy to proceed with secure cloud computing.

Should Petabyte Files Inhibit Migration to Cloud Computing?

What are the Costs of a Petabyte Now and in 2020?

In the blog on the "Semantic Web for Navy Information Dominance Operations" I estimated that with sensors included the Navy could be generating over 300 petabytes of information per day. Most of this data would come from sources (such as UAVs) that have limited use for only a few hours. Therefore, the accumulation of data for forensic purposes could initially add further storage requirements for not much more than >1,000 petabytes.

The idea of ultimately requiring thousands of petabytes to support the Navy's Information Dominance objectives has raised the question about affordability.

The current costs of a petabyte range from the stripped down $117,000/petabyte from Backblaze* to the $826,000/petabyte price from Dell. Considering the scale and software available within the Navy Private Cloud, we will use Backblaze as the basis for estimating the affordability of gigantic disk files.

The 2010 cost of 300 petabytes is $35 millions, or 12 cents per gigabyte. This is much less than the total costs of the legacy disk files that are in place for thousands of Navy servers. Huge savings are therefore available immediately.

The cost of disks is declining faster, at 25-30% per year, than the costs of semiconductor memories that follow Moore's Law. A conservative estimate for 2020 would therefore deliver to the Navy a petabyte for about $6,500. The daily 300 petabytes could be then supported for a highly cost effective $2 million.

SUMMARY

Cloud computing offers enormous cost savings in disk storage immediately and in the future.

None of this includes the costs of software to operate petabyte files in the cloud. None of this includes the costs of redundancy and backup.

However, the projected costs are well within the limits of the Navy’s IT spending.  Planning for the migration to cloud computing can therefore proceed without concerns about the size of data files. The size of the Information Dominance files should also not be seen as a deterrent to proceeding with the virtualization of data files without further delay.

*http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/