Tools and Techniques for Analyzing Product and Process Data

Diomidis Spinellis

doi:10.1016/B978-0-12-411519-4.00007-0

Tools and Techniques for Analyzing Product and Process Data

Diomidis Spinellis^*

^*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volume › Chapter › Scientific › peer-review

5 Citations (Scopus)

Abstract

The analysis of data from software products and their development process is tempting, but often non-trivial. A flexible, extensible, scalable, and efficient way for performing this analysis is through the use of line-oriented textual data streams, which are the lowest useful common denominator for many software analysis tasks. Using this technique, Unix tool-chest programs are combined into a pipeline that forms the pattern: fetching, selecting, processing, and summarizing. Product artifacts that can be handled in this way include source code (using heuristics, lexical analysis, or full-blown parsing and semantic analysis) as well as compiled code, which spans assembly code, machine code, byte code, and libraries. On the process front, data that can be analyzed includes configuration management metadata, time series snapshots, and checked repositories. The resulting data can then be visualized as graphs, diagrams, charts, and maps.

Original language	English
Title of host publication	The Art and Science of Analyzing Software Data
Publisher	Elsevier
Pages	161-212
Number of pages	52
ISBN (Electronic)	9780124115439
ISBN (Print)	9780124115194
DOIs	https://doi.org/10.1016/B978-0-12-411519-4.00007-0
Publication status	Published - 1 Sept 2015
Externally published	Yes

Keywords

Binary code analysis
Repository mining
Source code analysis
Unix toolkit
Visualization

Access to Document

10.1016/B978-0-12-411519-4.00007-0

Cite this

@inbook{d7dc8f185e4745969d6b7147e7d4fda4,

title = "Tools and Techniques for Analyzing Product and Process Data",

abstract = "The analysis of data from software products and their development process is tempting, but often non-trivial. A flexible, extensible, scalable, and efficient way for performing this analysis is through the use of line-oriented textual data streams, which are the lowest useful common denominator for many software analysis tasks. Using this technique, Unix tool-chest programs are combined into a pipeline that forms the pattern: fetching, selecting, processing, and summarizing. Product artifacts that can be handled in this way include source code (using heuristics, lexical analysis, or full-blown parsing and semantic analysis) as well as compiled code, which spans assembly code, machine code, byte code, and libraries. On the process front, data that can be analyzed includes configuration management metadata, time series snapshots, and checked repositories. The resulting data can then be visualized as graphs, diagrams, charts, and maps.",

keywords = "Binary code analysis, Repository mining, Source code analysis, Unix toolkit, Visualization",

author = "Diomidis Spinellis",

year = "2015",

month = sep,

day = "1",

doi = "10.1016/B978-0-12-411519-4.00007-0",

language = "English",

isbn = "9780124115194",

pages = "161--212",

booktitle = "The Art and Science of Analyzing Software Data",

publisher = "Elsevier",

}

TY - CHAP

T1 - Tools and Techniques for Analyzing Product and Process Data

AU - Spinellis, Diomidis

PY - 2015/9/1

Y1 - 2015/9/1

N2 - The analysis of data from software products and their development process is tempting, but often non-trivial. A flexible, extensible, scalable, and efficient way for performing this analysis is through the use of line-oriented textual data streams, which are the lowest useful common denominator for many software analysis tasks. Using this technique, Unix tool-chest programs are combined into a pipeline that forms the pattern: fetching, selecting, processing, and summarizing. Product artifacts that can be handled in this way include source code (using heuristics, lexical analysis, or full-blown parsing and semantic analysis) as well as compiled code, which spans assembly code, machine code, byte code, and libraries. On the process front, data that can be analyzed includes configuration management metadata, time series snapshots, and checked repositories. The resulting data can then be visualized as graphs, diagrams, charts, and maps.

AB - The analysis of data from software products and their development process is tempting, but often non-trivial. A flexible, extensible, scalable, and efficient way for performing this analysis is through the use of line-oriented textual data streams, which are the lowest useful common denominator for many software analysis tasks. Using this technique, Unix tool-chest programs are combined into a pipeline that forms the pattern: fetching, selecting, processing, and summarizing. Product artifacts that can be handled in this way include source code (using heuristics, lexical analysis, or full-blown parsing and semantic analysis) as well as compiled code, which spans assembly code, machine code, byte code, and libraries. On the process front, data that can be analyzed includes configuration management metadata, time series snapshots, and checked repositories. The resulting data can then be visualized as graphs, diagrams, charts, and maps.

KW - Binary code analysis

KW - Repository mining

KW - Source code analysis

KW - Unix toolkit

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=84944112190&partnerID=8YFLogxK

U2 - 10.1016/B978-0-12-411519-4.00007-0

DO - 10.1016/B978-0-12-411519-4.00007-0

M3 - Chapter

AN - SCOPUS:84944112190

SN - 9780124115194

SP - 161

EP - 212

BT - The Art and Science of Analyzing Software Data

PB - Elsevier

ER -

Tools and Techniques for Analyzing Product and Process Data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this