The analysis of data from software products and their development process is tempting, but often non-trivial. A flexible, extensible, scalable, and efficient way for performing this analysis is through the use of line-oriented textual data streams, which are the lowest useful common denominator for many software analysis tasks. Using this technique, Unix tool-chest programs are combined into a pipeline that forms the pattern: fetching, selecting, processing, and summarizing. Product artifacts that can be handled in this way include source code (using heuristics, lexical analysis, or full-blown parsing and semantic analysis) as well as compiled code, which spans assembly code, machine code, byte code, and libraries. On the process front, data that can be analyzed includes configuration management metadata, time series snapshots, and checked repositories. The resulting data can then be visualized as graphs, diagrams, charts, and maps.
|Title of host publication||The Art and Science of Analyzing Software Data|
|Number of pages||52|
|Publication status||Published - 1 Sep 2015|
- Binary code analysis
- Repository mining
- Source code analysis
- Unix toolkit