土壤侵蚀预报对区域生态修复和水土保持具有重要作用。由于受地域特异性影响,土壤侵蚀经验模型在不同地区使用时需进行参数率定和校验,时间和人力成本高。研究通过集成前人研究成果并结合研究区特点,确定土壤侵蚀特征因子(降雨量(Pre)、黏粒含量(Clay)、粉粒含量(Sil) t、砂粒含量(Sand)、坡长(SL)、坡度(Slope)、归一化植被指数(NDVI)、土地利用类型(Land use)和剖面曲率PC),并以RUSLE模型计算的土壤侵蚀模数为基准值,采用随机森林、LightGBM和Catboost三种机器学习模型,对土壤侵蚀量进行建模计算;引入SHAP值以解析各土壤侵蚀特征因子对土壤侵蚀的作用机制。结果表明:(1)3种机器学习模型在训练集与测试集的性能表现均为“非常好”,表明机器学习模型可用于土壤侵蚀预报研究。(2)以云龙县为例,Catboost模型表现最好(NSE=0.984,RSR=0.125),其次分别为随机森林(NSE=0.966,RSR=0.183)和LightGBM(NSE=0.964,RSR=0.189)。(3) SHAP值分析表明,对云龙县土壤侵蚀影响最大的4个因子依次是NDVI、SL、Slope和Pre,其中NDVI与SHAP值呈极显著的负相关关系(r=-0.68,P < 0.01),而SL、Slope和Pre与SHAP值呈极显著的正相关关系(r=0.54,P < 0.01;r=0.57,P < 0.01;r=0.69,P < 0.01)。(4)基于Catboost模型,仅需3.07s便可完成土壤侵蚀空间分布制图(CPU i5-9300H),分类正确区域占总区域的86.62%,且误差未呈现明显的聚集现象。研究结果不仅能为快速、准确地进行土壤侵蚀空间分布分析提供新的途径,也可为土壤侵蚀预报模型研究提供科学依据。
Soil erosion prediction plays a crucial role in regional ecological restoration and soil and water conservation. However, empirical models require extensive parameter calibration and validation when applied to different areas due to the regional specificity of soil erosion processes, rendering the process both time-consuming and labor-intensive. Additionally, quantifying the individual contributions of erosion-influencing factors remains methodologically challenging. In this study, we identify key factors influencing soil erosion on previous research and the unique characteristics of the study area, including precipitation (Pre), clay content (Clay), silt content (Silt), sand content (Sand), slope length (SL), slope, normalized difference vegetation index (NDVI), land use type(Land use), and profile curvature (PC). Using the soil erosion estimates derived from the Revised Universal Soil Loss Equation (RUSLE) as a benchmark, three machine learning models (Random forest, LightGBM, and Catboost) were employed to predict soil erosion amounts. To analyze the contributions of individual factors, we utilized SHapley Additive exPlanations (SHAP). The results showed that: (1) All three machine learning models demonstrated strong performance on both training and testing datasets, with "very good" predictive accuracy, suggesting the effectiveness of machine learning approaches for soil erosion prediction. (2) In the case study of Yunlong County, the Catboost model outperformed the other models (NSE=0.984, RSR=0.125), followed by Random forest (NSE=0.966, RSR=0.183) and LightGBM (NSE=0.964, RSR=0.189). (3) SHAP analysis revealed that the four most influential factors in soil erosion in Yunlong County were NDVI, SL, slope, and Pre, in that order. NDVI exhibited a highly significant negative correlation with SHAP value(r=-0.68, P < 0.01), while SL, slope, and Pre showed highly significant positive correlations(r=0.54, P < 0.01; r=0.57, P < 0.01; r=0.69, P < 0.01). (4) Using the Catboost model, the mapping of soil erosion spatial distribution was completed in just 3.07 seconds (on a CPU i5-9300H), achieving an accuracy of 86.62% for correctly classified areas, with no significant clustering of errors. These findings not only provide a novel approach for rapid and accurate soil erosion spatial distribution mapping but also offer valuable insights for advancing soil erosion prediction research.