ABSTRACT

    Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-06-102, January, 2006.

    Challenges and Opportunities in Internet Data Mining

    David G. Andersen, Nick Feamster1

    Parallel Data Laboratory
    School of Computer Science & Electrical and Computer Engineering
    Carnegie Mellon University
    Pittsburgh, PA 15213

    1Georgia Institute of Technology

    http://www.pdl.cmu.edu/

    Internet measurement data provides the foundation for the operation and planning of the networks that comprise the Internet, and is a necessary component in research for analysis, simulation, and emulation. Despite its critical role, however, the management of this data—from collection and transmission to storage and its use within applications—remains primarily ad hoc, using techniques created and re-created by each corporation or researcher that uses the data. This paper examines several of the challenges faced when attempting to collect and archive large volumes of network measurement data, and outlines an architecture for an Internet data repository—the datapository—designed to create a framework for collaboratively addressing these challenges.

    KEYWORDS: network monitoring, databases, data management, data mining

    FULL PAPER: pdf


    PDL Home Publications Home

    © 2008.
    Last updated 1 February, 2006