Sunday, July 18, 2010

Storing Records of Social Media Transactions

The capture and the retrieval of the social media transactions for forensic purposes (see Strassmann blog on “Tracking Anomalies in Social Computing”) cannot use relational databases, such as provided by Oracle.

Database management designs, such as the proprietary “BigTable” used by Google, departs from the typical convention of a fixed number of columns. What is needed is a system that will store sparse, diverse and non-standard records that require hundreds of petabytes of storage (see Strassmann blog on “Should Petabyte Files Inhibit Migration to Cloud Computing?”).

The National Security Agency is taking a cloud computing approach to the development of intelligence gathering that can link disparate intelligence sources (see http://www.darkgovernment.com/news/nsa-embraces-cloud-computing#ixzz0tyFFCgV7). This increases intelligence awareness and safeguards national security.

Such a system can house the streams of outgoing social media communications. Analysts can then add metadata and tags that enable search, discovery, collaboration, correlation, and analysis.

NSA is using the Hadoop file system (http://hadoop.apache.org/), which is an implementation of Google’s Map/Reduce parallel processing system. This makes it easier to rapidly reconfigure data and to scale up files as the number of recorded messages grows. Such a system will run on cheap commodity hardware and will manage data servers as pools of storage resources.

SUMMARY

Map/Reduce databases rather than relational data software are the only ones suitable for the retention of petabytes of diverse messages that cannot be categorized in advance. The Google approach, which tracks billions of free form transactions, cannot be used by DoD. However, Open Source Software, such as Hadoop will fit DoD needs (see DoD memorandum of October 16, 2009 from the OSD CIO).

No comments:

Post a Comment

For comments please e-mail paul@strassmann.com