基于深度强化学习的癌症治疗中化疗药物剂量控制。

Deep reinforcement learning-based control of chemo-drug dose in cancer treatment.

Original text

发表日期：2023 Oct 24

作者： Hoda Mashayekhi, Mostafa Nazari, Fatemeh Jafarinejad, Nader Meskin

来源： Comput Meth Prog Bio

摘要：

癌症作为全球主要死亡原因，其治疗的进步促进了各个相关领域的多项研究活动。在过去的几十年中，使用数学模型框架开发具有最佳药物剂量的有效治疗方案受到了广泛的研究关注。然而，大多数癌症化疗的控制技术主要是基于模型的方法。现有的基于强化学习（RL）的无模型技术通常会离散问题状态和变量，除了要求严格的专家监督之外，无法准确地对现实世界的条件进行建模。最近的深度强化学习（DRL）方法能够在其原始连续空间中对问题进行建模，但很少应用于癌症化疗。在本文中，我们提出了一种有效且稳健的基于 DRL 的无模型方法，用于解决封闭问题-癌症化疗药物剂量的循环控制。非线性药理学癌症模型用于模拟患者并捕获癌症动态。与之前的工作相比，状态变量和控制动作在其原始的无限空间中建模，以避免专家引导的离散化并提供更现实的解决方案。 DRL 网络经过训练，可以根据监测到的患者状态自动调整药物剂量。该方法提供了一种自适应控制技术来响应不同类别患者的特殊条件和诊断测量。所提出的基于 DRL 的控制器的性能通过不同模拟患者的数值分析进行评估。与使用离散状态和动作空间的最先进的基于强化学习的方法相比，显示了该方法在癌症化疗治疗过程和持续时间方面的优越性。在大多数研究案例中，所提出的模型减少了用药时间和给药总量，同时提高了肿瘤细胞的减少率。版权所有 © 2023 Elsevier B.V. 保留所有权利。

Advancement in the treatment of cancer, as a leading cause of death worldwide, has promoted several research activities in various related fields. The development of effective treatment regimens with optimal drug dose administration using a mathematical modeling framework has received extensive research attention during the last decades. However, most of the control techniques presented for cancer chemotherapy are mainly model-based approaches. The available model-free techniques based on Reinforcement Learning (RL), commonly discretize the problem states and variables, which other than demanding expert supervision, cannot model the real-world conditions accurately. The more recent Deep Reinforcement Learning (DRL) methods, which enable modeling the problem in its original continuous space, are rarely applied in cancer chemotherapy.In this paper, we propose an effective and robust DRL-based, model-free method for the closed-loop control of cancer chemotherapy drug dosing. A nonlinear pharmacological cancer model is used for simulating the patient and capturing the cancer dynamics. In contrast to previous work, the state variables and control action are modeled in their original infinite spaces to avoid expert-guided discretization and provide a more realistic solution. The DRL network is trained to automatically adjust the drug dose based on the monitored states of the patient. The proposed method provides an adaptive control technique to respond to the special conditions and diagnosis measurements of different categories of patients.The performance of the proposed DRL-based controller is evaluated by numerical analysis of different diverse simulated patients. Comparison to the state-of-the-art RL-based method, which uses discretized state and action spaces, shows the superiority of the approach in the process and duration of cancer chemotherapy treatment. In the majority of the studied cases, the proposed model decreases the medication period and the total amount of administrated drug, while increasing the rate of reduction in tumor cells.Copyright © 2023 Elsevier B.V. All rights reserved.