细胞分类与最坏情况增强，用于智能宫颈癌筛查。

Cell classification with worse-case boosting for intelligent cervical cancer screening.

Original text

发表日期：2023 Oct 29

作者： Youyi Song, Jing Zou, Kup-Sze Choi, Baiying Lei, Jing Qin

来源： MEDICAL IMAGE ANALYSIS

摘要：

细胞分类是智能宫颈癌筛查的基础，这是一种细胞学检查，可有效降低宫颈癌的发病率和死亡率。然而，这项任务相当具有挑战性，主要是因为收集足以代表看不见的测试数据的训练数据集很困难，因为不同癌症状态下的细胞外观和形状存在很大差异。这种困难使得分类器尽管训练得当，但经常对训练数据集代表性不足的细胞进行错误分类，最终导致错误的筛选结果。为了解决这个问题，我们提出了一种新的学习算法，称为最坏情况提升，用于分类器有效地从宫颈细胞分类中代表性不足的数据集进行学习。关键思想是从更坏情况的数据中学习更多信息，与其他训练数据相比，分类器具有更大的梯度范数，因此这些数据更有可能对应于代表性不足的数据，通过动态地为它们分配更多的训练迭代和更大的损失权重用于提高分类器对代表性不足的数据的通用性。我们通过根据梯度范数信息对最坏情况的数据进行采样，然后增强其损失值以更新分类器来实现这一想法。我们在两个公开可用的宫颈细胞分类数据集（据我们所知最大的两个数据集）上证明了这种新学习算法的有效性，并在广泛的实验中产生了积极的结果（4% 的准确性提高）。源代码位于：https://github.com/YouyiSong/Worse-Case-Boosting.Copyright © 2023 Elsevier B.V. 保留所有权利。

Cell classification underpins intelligent cervical cancer screening, a cytology examination that effectively decreases both the morbidity and mortality of cervical cancer. This task, however, is rather challenging, mainly due to the difficulty of collecting a training dataset representative sufficiently of the unseen test data, as there are wide variations of cells' appearance and shape at different cancerous statuses. This difficulty makes the classifier, though trained properly, often classify wrongly for cells that are underrepresented by the training dataset, eventually leading to a wrong screening result. To address it, we propose a new learning algorithm, called worse-case boosting, for classifiers effectively learning from under-representative datasets in cervical cell classification. The key idea is to learn more from worse-case data for which the classifier has a larger gradient norm compared to other training data, so these data are more likely to correspond to underrepresented data, by dynamically assigning them more training iterations and larger loss weights for boosting the generalizability of the classifier on underrepresented data. We achieve this idea by sampling worse-case data per the gradient norm information and then enhancing their loss values to update the classifier. We demonstrate the effectiveness of this new learning algorithm on two publicly available cervical cell classification datasets (the two largest ones to the best of our knowledge), and positive results (4% accuracy improvement) yield in the extensive experiments. The source codes are available at: https://github.com/YouyiSong/Worse-Case-Boosting.Copyright © 2023 Elsevier B.V. All rights reserved.