AUTOMATIC dependent surveillance–broadcast (ADS-B) [1,2] is widely implemented in modern commercial aircraft and will become mandatory equipment in 2020. Flight state information such as position, velocity, and vertical rate are broadcast by tens of thousands of aircraft around the world constantly using onboard ADS-B transponders. These data are identified by a 24-bit International Civil Aviation Organization (ICAO) address, are unencrypted, and can be received and decoded with simple ground station set-ups. This large amount of open data brings a huge potential for ATM research. Most studies that rely on aircraft flight data (historical or real-time) require knowledge on the flight phase of each aircraft at a given time [3–7]. However, when dealing with large datasets such as from ADS-B, which can contain many tens of thousands of flights, exceptions to deterministic definitions of flight phases are inevitable, due to large variances in climb rate, altitude, velocity, or a combination of these. In this case, instead of using deterministic logic to process and extract flight data based on flight conventions, robust and versatile identification algorithms are required. In this paper, a twofold method is proposed and tested: 1) a machine learning clustering step that can handle large amounts of scattered ADS-B data to extract continuous flights, and 2) a flight phase identification step that can segment flight data of any type of aircraft and trajectory by different flight phases.