2026年4月团队成果被计算机类期刊World Wide Web录用;
点击次数:
基础信息
标题:Trails to Gold-standard Data: Harnessing Scientific Networks for Dataset Recommendation
作者:周鑫, 姜元春,钱洋*,刘业政
发表期刊/来源:World Wide Web
链接:
摘要
Dataset recommendation is pivotal for streamlining data selection and accelerating scientific discovery. In this study, we propose the Sparse-Link Dataset Recommendation Model (SLDRM), an explainable framework that maps textual content, authors, and datasets into a unified topic space. Specifically, SLDRM captures the correlations among words, research communities, and dataset usage patterns by linking their respective latent topics. To handle the inherent sparsity of the research landscape, we incorporate a Spike-and-Slab prior. We validate our model using a real-world dataset collected from the PapersWithCode website. Experimental results show that our model not only improves recommendation accuracy but also enhances interpretability. The proposed model provides researchers with an efficient tool for dataset discovery and deepens the understanding of the knowledge production process in scientific networks.
中文翻译:
数据集推荐对于简化数据筛选流程、加速科学发现具有重要意义。在本研究中,我们提出了稀疏链接数据集推荐模型(SLDRM),这是一种可解释的框架,能够将文本内容、作者和数据集映射到一个统一的主题空间中。具体而言,SLDRM通过连接各自的潜在主题,捕捉词汇、研究社区与数据集使用模式之间的关联。为了应对科研环境中固有的稀疏性问题,我们引入了Spike-and-Slab先验。我们利用从PapersWithCode网站收集的真实世界数据集对模型进行了验证。实验结果表明,我们的模型不仅提高了推荐的准确性,还增强了可解释性。所提出的模型为研究人员提供了一个高效的数据集发现工具,并加深了对科学网络中知识生产过程的理解。
