The ripeness of Camellia oleifera fruits is closely related to their oil yield and tea oil quality. Manual one-time harvesting is the primary harvesting for Camellia oleifera at present. However, the uneven ripeness levels among fruits harvested in the same batch can significantly reduce the overall quality of the fruits. Furthermore, manual harvesting cannot fully meet the large-scale Camellia oleifera industry, such as low efficiency and high costs. Therefore, it is very necessary to implement intelligent harvesting for Camellia oleifera fruits. The maturity of Camellia oleifera fruit can be detected to determine the best maturity using deep learning. The purpose of this study is to establish the ideal ripeness of Camellia oleifera fruits in the natural environment. The harvesting period was also estimated to implement the intelligent harvesting, in order to improve the oil yield and quality of fruits. A dataset was constructed for the ripeness detection. The photographs of Camellia oleifera fruits were captured at different ripening stages in natural environments using a smartphone. The phenotypic characteristics of Camellia oleifera fruits were also determined to follow the industry standards. The ripeness was categorized into three stages: immature, mature, and over-mature. Data augmentation techniques were applied on the dataset, such as brightness adjustment, salt-and-pepper noise addition, and simulating artificial occlusion. After that, the dataset was divided into the training, validation, and testing sets with a ratio of 7:1:2. An improved YOLOv7 model was constructed to deal with the occlusion in the natural environment. A cross-attention module was added in the YOLOv7 feature extraction network. The vertical and horizontal information was calculated for each pixel in Camellia oleifera images using two attention-weighted procedures. The key features were identified to determine the ripeness of Camellia oleifera fruits, thus effectively avoiding the interference from the complex backgrounds, such as the branches and leaves. Additionally, the traditional non-maximum suppression was replaced with the distance- and intersection-based NMS. The normalized distance between the center points of two candidate boxes was selected to calculate the intersection ratio, particularly for the missed detections due to mutual occlusion of Camellia oleifera fruits, in order to detect the overlapping fruits. The YOLOv7 model was trained on 3098 images from the training set, then evaluated using 442 images from the validation set, and finally tested on 885 images from the test set. The better performance was achieved in the precision rate of 93.52%, a recall rate of 90.25%, an F1 score of 91.86%, an average precision of 94.60%, an average detection time of 0.77 s, and a model weight of 82.6 MB in the test set. The ablation experiments demonstrated that the improved model was used to effectively detect the ripeness of Camellia oleifera fruit. Compared with the original YOLOv7 model, the mean average accuracies were improved by 1.10 and 1.81 percentage points, respectively. The overall mean average accuracy was improved by 2.91 percentage points. However, the detection time and model size only increased by 0.015 s and 11.3 MB, respectively. Compared with the Faster R-CNN, EfficientDet, YOLOv3, and YOLOv5l models, the improved YOLOv7 model increased the average accuracy by 7.51, 5.89, 4.21, and 4.21 percentage points, respectively. Additionally, the detection time was reduced by 1.06, 1.12, 0.10, and 0.03 s, respectively. The maturity grade of Camellia oleifera fruits was accurately discriminated, compared with the previous. In summary, the improved YOLOv7 model was achieved in the higher accuracy with only a slight sacrifice in the detection time and model size. This finding can provide a theoretical basis to estimate the optimal harvesting period of Camellia oleifera fruits and intelligent picking under natural conditions. [ABSTRACT FROM AUTHOR]