Tools and Techniques for Analyzing Product and Process Data

Research output: Chapter in Book/Conference proceedings/Edited volumeChapterScientificpeer-review

5 Citations (Scopus)


The analysis of data from software products and their development process is tempting, but often non-trivial. A flexible, extensible, scalable, and efficient way for performing this analysis is through the use of line-oriented textual data streams, which are the lowest useful common denominator for many software analysis tasks. Using this technique, Unix tool-chest programs are combined into a pipeline that forms the pattern: fetching, selecting, processing, and summarizing. Product artifacts that can be handled in this way include source code (using heuristics, lexical analysis, or full-blown parsing and semantic analysis) as well as compiled code, which spans assembly code, machine code, byte code, and libraries. On the process front, data that can be analyzed includes configuration management metadata, time series snapshots, and checked repositories. The resulting data can then be visualized as graphs, diagrams, charts, and maps.

Original languageEnglish
Title of host publicationThe Art and Science of Analyzing Software Data
Number of pages52
ISBN (Electronic)9780124115439
ISBN (Print)9780124115194
Publication statusPublished - 1 Sep 2015
Externally publishedYes


  • Binary code analysis
  • Repository mining
  • Source code analysis
  • Unix toolkit
  • Visualization

Fingerprint Dive into the research topics of 'Tools and Techniques for Analyzing Product and Process Data'. Together they form a unique fingerprint.

Cite this