Measurement by Proxy: On the Accuracy of Online Marketplace Measurements

Alejandro Cuevas, F.E.G. Miedema*, Kyle Soska, Nicolas Christin, R.S. van Wegberg

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

44 Downloads (Pure)

Abstract

A number of recent studies have investigated online anony- mous (“dark web”) marketplaces. Almost all leverage a “measurement-by-proxy” design, in which researchers scrape market public pages, and take buyer reviews as a proxy for ac- tual transactions, to gain insights into market size and revenue. Yet, we do not know if and how this method biases results. We build a framework to reason about marketplace mea- surement accuracy, and use it to contrast estimates projected from scrapes of Hansa Market with data from a back-end database seized by the police. We further investigate, by sim- ulation, the impact of scraping frequency, consistency and rate-limits. We find that, even with a decent scraping regimen, one might miss approximately 46% of objects – with scraped listings differing significantly from not-scraped listings on price, views and product categories. This bias also impacts revenue calculations. We find Hansa’s total market revenue to be US $50M, which projections based on our scrapes un- derestimate by a factor of four. Simulations further show that studies based on one or two scrapes are likely to suffer from a very poor coverage (on average, 14% to 30%, respectively). A high scraping frequency is crucial to achieve reliable coverage, even without a consistent scraping routine. When high-frequency scraping is difficult, e.g., due to deployed anti- scraping countermeasures, innovative scraper design, such as scraping most popular listings first, helps improve cover- age. Finally, abundance estimators can provide insights on population coverage when population sizes are unknown.
Original languageEnglish
Title of host publicationProceedings of the 31st USENIX Security Symposium
PublisherUSENIX Association
Pages2153-2170
Number of pages18
Publication statusPublished - 2022
Event31th Usenix security symposium - Boston, United States
Duration: 10 Aug 202212 Aug 2022
Conference number: 31
https://www.usenix.org/conference/usenixsecurity22

Conference

Conference31th Usenix security symposium
Country/TerritoryUnited States
CityBoston
Period10/08/2212/08/22
Internet address

Fingerprint

Dive into the research topics of 'Measurement by Proxy: On the Accuracy of Online Marketplace Measurements'. Together they form a unique fingerprint.

Cite this