Security misconfigurations and neglected updates commonly leadto systems being vulnerable. Especially in the context of websites,we often find pages that were forgotten, that is, they were left onlineafter they served their purpose and never updated thereafter.In this paper, we introduce new methodology to detect such forgottenor orphaned web pages. We combine historic data from theInternet Archive with active measurements to identify pages nolonger reachable via a path from the index page, yet stay accessiblethrough their specific URL. We show the efficacy of our approachand the real-world relevance of orphaned web-pages by applyingit to a sample of 100,000 domains from the Tranco Top 1M.Leveraging our methodology, we find 1,953 pages on 907 uniquedomains that are orphaned, some of which are 20 years old. Analyzingtheir security posture, we find that these pages are significantly(p < 0.01 using χ2) more likely to be vulnerable to cross-site scripting(XSS) and SQL injection (SQLi) vulnerabilities than maintainedpages. In fact, orphaned pages are almost ten times as likely tosuffer from XSS (19.3%) than maintained pages from a random Internetcrawl (2.0%), and maintained pages of websites with someorphans are almost three times as vulnerable (5.9%). ConcerningSQLi, maintained pages on websites with some orphans are almostas vulnerable (9.5%) as orphans (10.8%), and both are significantlymore likely to be vulnerable than other maintained pages (2.7%).We encounter similar pattern for following best security practices.Overall, we see a clear hierarchy: Orphaned pages are the mostvulnerable, followed by maintained pages on websites with someorphans, and least vulnerable are other maintained pages.To allow researchers to reproduce our results and practitionersto scrutinize their own pages, we provide an implementation ofour methodology as open source software.
