Estimation of Soil Organic Carbon and Total Nitrogen in Thailand's Rubber Plantations Using Multispectral Imagery and Machine Learning Algorithms
2026-09-02 , Conference Management Room5

Soil organic carbon (SOC) and total nitrogen were estimated using Sentinel-2 vegetation indices and machine learning in northeastern Thailand. After outlier removal, Random Forest achieved R² = 0.63 for SOC and R² = 0.39 for total N, with BSI and BAEI as dominant predictors.


Description (≈320 words)

This study investigates the potential of multispectral satellite imagery and machine learning techniques to estimate soil organic carbon (SOC) and total nitrogen (N) in rubber plantation soils in northeastern Thailand. Soil organic carbon and nitrogen are important indicators of soil fertility, nutrient cycling, and ecosystem productivity. In rubber plantation systems, maintaining adequate soil nutrient levels is essential for sustainable agricultural production and long-term soil health. Conventional soil sampling and laboratory analyses provide reliable measurements but are often costly, time-consuming, and limited in spatial coverage. Remote sensing approaches, particularly satellite-derived spectral indices combined with machine learning algorithms, provide an alternative method for large-scale soil property assessment and monitoring.

Field-based soil measurements were integrated with spectral indices derived from Sentinel-2 multispectral imagery. Sentinel-2 offers high spatial resolution and multiple spectral bands suitable for vegetation and soil analysis. Several vegetation, moisture, and soil-related indices were calculated, including the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Soil Adjusted Vegetation Index (SAVI), Modified Soil Adjusted Vegetation Index (MSAVI), Green Normalized Difference Vegetation Index (GNDVI), Normalized Difference Water Index (NDWI), Modified Normalized Difference Water Index (MNDWI), Normalized Difference Moisture Index (NDMI), Normalized Burn Ratio (NBR), Leaf Area Index (LAI), Bare Soil Index (BSI), Soil Index (SI), Normalized Difference Built-up Index (NDBI), Urban Index (UI), Built-up Area Extraction Index (BAEI), Normalized Difference Red Edge Index (NDRE), Red-Edge Chlorophyll Index (CIre), and MERIS Terrestrial Chlorophyll Index (MTCI). These indices capture variations in vegetation condition, soil exposure, moisture dynamics, and surface reflectance characteristics that may influence soil nutrient variability.

Data preprocessing involved removing missing values and detecting outliers to improve model reliability. After preprocessing, 79 samples were retained for SOC modelling and 95 samples for total nitrogen modelling. Random Forest regression was applied due to its ability to capture nonlinear relationships and interactions among predictor variables.

The modelling results indicate that SOC estimation achieved moderate predictive performance with a coefficient of determination (R²) of 0.63, root mean square error (RMSE) of 0.198, and mean absolute error (MAE) of 0.166. Feature importance analysis showed that the Bare Soil Index (BSI) and Built-up Area Extraction Index (BAEI) were the most influential predictors, followed by the Normalized Difference Built-up Index (NDBI) and Urban Index (UI). For total nitrogen prediction, the model showed lower predictive performance (R² = 0.39, RMSE = 0.0107, MAE = 0.0088), with key predictors including soil organic matter, MERIS Terrestrial Chlorophyll Index (MTCI), Modified Normalized Difference Water Index (MNDWI), Green Normalized Difference Vegetation Index (GNDVI), and Soil Index (SI).

The results demonstrate the potential of integrating Sentinel-2 multispectral data and machine learning techniques for soil property estimation. Future work should improve field sampling strategies and incorporate additional environmental variables to enhance model accuracy and support soil monitoring in rubber plantation systems.


Level of technical complexity: 2 - intermediate Give indication of resources (video, web pages, papers, etc.) to read in advance, that will help get up to speed on advanced topics.:

The following resources may help participants become familiar with the topics discussed in this presentation:
• Copernicus Sentinel-2 User Guide (ESA)
• Google Earth Engine documentation and tutorials
• scikit-learn documentation for machine learning methods, particularly Random Forest regression
• Introductory materials on vegetation indices and multispectral remote sensing for soil and vegetation monitoring.

Indicate what is (are) the open source project(s) essential in your talk:

Python (scikit-learn, pandas, NumPy, rasterio), Google Earth Engine, matplotlib, and seaborn.

I make my conference contribution available under the CC BY 4.0 license. The conference contribution comprises the abstract, the text contribution for the conference proceedings, the presentation materials as well as the video recording and live transmission of the presentation:

Pramet Kaewmesri is a researcher in remote sensing, geospatial analysis, and environmental modelling. His work focuses on integrating satellite data, machine learning, and spatial analysis to monitor soil properties, ecosystems, and environmental change. He has experience working with multispectral satellite imagery and data-driven approaches for agricultural and environmental applications.