Machine learning (ML) for real-time security assessment requires a diverse training database to be accurate for scenarios beyond historical records. Generating diverse operating conditions is highly relevant for the uncertain future of emerging power systems that are completely different to historical power systems. In response, for the first time, this work proposes a novel split-based sequential sampling approach based on optimisation that generates more diverse operation scenarios for training ML models than state-of-the-art approaches. This work also proposes a volume-based coverage metric, the convex hull volume (CHV), to quantify the quality of samplers based on the coverage of generated data. This metric accounts for the distribution of samples across multidimensional space to measure coverage within the physical network limits. Studies on IEEE test cases with 6, 68 and 118 buses demonstrate the efficiency of the approach. Samples generated using the proposed split-based sampling cover 37.5% more volume than random sampling in the IEEE 68-bus system. The proposed CHV metric can assess the quality of generated samples (standard deviation of 0.74) better than a distance-based coverage metric which outputs the same value (standard deviation of <0.001) for very different data distributions in the IEEE 68-bus system. As we demonstrate, the proposed split-based sampling is relevant as a pre-step for training ML models for critical tasks such as security assessment.
|Number of pages||15|
|Journal||International Journal of Electrical Power and Energy Systems|
|Publication status||Published - 2023|
- Database generation
- Machine learning
- Power system operation
- Security assessment