Search tasks provide a medium for the evaluation of system performance and the underlying analytical aspects of IR systems. Researchers have recently developed new interfaces or mechanisms to support vague information needs and struggling search. However, little attention has been paid to the generation of a unified task set for evaluation and comparison of search engine improvements for struggling search. Generation of such tasks is inherently difficult, as each task is supposed to trigger struggling and exploring user behavior rather than simple search behavior. Moreover, the everchanging landscape of information needs would render old task sets less ideal if not unusable for system evaluation. In this paper, we propose a task generation method and develop a crowd-powered platform called TaskGenie to generate struggling search tasks online. Our experiments and analysis show that the generated tasks are qualified to emulate struggling search behaviors consisting of ‘repeated similar queries’ and ‘quick-back clicks’, etc. – tasks of diverse topics, high quality and difficulty can be created using this framework. For the benefit of the community, we publicly released the platform, a task set containing 80 topically diverse struggling search tasks generated and examined in this work, and the corresponding anonymized user behavior logs.