研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

mlf-core:一个确定性机器学习框架。

mlf-core: a framework for deterministic machine learning.

发表日期:2023 Apr 02
作者: Lukas Heumos, Philipp Ehmele, Luis Kuhn Cuellar, Kevin Menden, Edmund Miller, Steffen Lemke, Gisela Gabernet, Sven Nahnsen
来源: BIOINFORMATICS

摘要:

机器学习近年来已经显示了广泛的增长,并且现在已经在敏感领域中得到了常规应用。为了在部署之前允许适当的预测模型验证,模型必须是确定性的。仅仅固定所有随机种子对于确定性机器学习是不够的,因为主要的机器学习库默认使用基于原子操作的非确定性算法。各种机器学习库发布了非确定性算法的确定性对应物。我们评估了这些算法对确定性和运行时间的影响。基于这些结果,我们制定了一组确定性机器学习要求,并开发了一个新的软件解决方案,即mlf-core生态系统,它可以帮助机器学习项目满足并保持这些要求。我们应用mlf-core开发了在各种生物医学领域中的确定性模型,包括使用TensorFlow的单细胞自编码器,基于PyTorch的CT扫描肝肿瘤分割的U-Net模型,以及基于基因表达谱的肝癌分类器与XGBoost。完整的数据以及mlf-core生态系统的实现和用例模型都可在https://github.com/mlf-core上获得。附加数据可在Bioinformatics online获取。©作者(s)2023。由牛津大学出版社出版。
Machine learning has shown extensive growth in recent years and is now routinely applied to sensitive areas. To allow appropriate verification of predictive models before deployment, models must be deterministic. Solely fixing all random seeds is not sufficient for deterministic machine learning, as major machine learning libraries default to the usage of non-deterministic algorithms based on atomic operations.Various machine learning libraries released deterministic counterparts to the non-deterministic algorithms. We evaluated the effect of these algorithms on determinism and runtime. Based on these results, we formulated a set of requirements for deterministic machine learning and developed a new software solution, the mlf-core ecosystem, which aids machine learning projects to meet and keep these requirements. We applied mlf-core to develop deterministic models in various biomedical fields including a single cell autoencoder with TensorFlow, a PyTorch-based U-Net model for liver-tumor segmentation in CT scans, and a liver cancer classifier based on gene expression profiles with XGBoost.The complete data together with the implementations of the mlf-core ecosystem and use case models are available at https://github.com/mlf-core.Supplementary data are available at Bioinformatics online.© The Author(s) 2023. Published by Oxford University Press.