Small and hidden conditions of dense cherry tomatoes have posed a great challenge on the rapid identification and positioning of fruits. A new key technology with a strong robustness is highly demanding to improve the efficiency and yield prediction of cherry tomatoes in the facility agriculture environment. In this study, a novel recognition method was proposed to locate the dense cherry tomatoes using improved YOLOv4-LITE lightweight neutral network. A mobileNet-v3 easy migration to mobile terminals was selected as the feature extraction network of model to construct a YOLOv4-LITE for a higher detection speed of cherry tomatoes. A feature pyramid network was set as the modified (FPN) + Path Aggregation Network (PANet) structure, in order to avoid replacing the backbone network to reduce the detection accuracy. Specifically, a 104×104 Future map was introduced to achieve fine-grained detection for the small targets. More importantly, a deep separable convolution was used in the PANet structure to reduce the amount of model calculations. The new network was more lightweight, where the generalization ability of model was improved by loading pre-training weights and freezing partial layer training. A comparison was made on the recognition effect of YOLOv4, F1 and AP on the test set with the same degree of occlusion or adhesion, further to evaluate the difference between the models. The test results show that the improved FPN structure on the basis of YOLOv4 was higher than the AP50 of the original YOLOv4 AP75 increased by 15 percentage points, and the F1 increased by 0.14 and 0.24 under the corresponding IOU threshold. However, the weight increased by 4MB, while the detection speed increased by 5.84%, and the amount of network parameters increased by 14.85%. The improved FPN structure on the basis of YOLOv4+MobiletNet-V3, AP50 increased by 0.14%, AP75 increased by 23.25 percentage points, F1 value increased by 0.14 and 0.49 under the corresponding IOU threshold, indicating that YOLOv4 and YOLOv4+MobiletNet-V3 lacked small goals. Fortunately, the Future map of small targets was added to improve the fine-grained detection of model, but the amount of model parameters and weights increased accordingly. As such, the PANet structure was improved to introduce a deep separable convolutional network, while ensuring high F1, AP, Recall and Precision. An optimal performance was achieved, where the model weight was compressed to 45.3MB, the detection speed was 3.01ms/sheet, and the network parameters were 12026685. Specifically, the new network was reduced by 198.7MB, increased by 34.85%, and reduced by 81.34%, respectively, compared with the original YOLOv4. It was also reduced by 55.9%, slower by 0.23 s, and reduced by 69.97%, respectively, compared with YOLOv4+MobiletNet-V3. The data indicated that the improved PANet strategy presented the similar accuracy under such circumstances, while effectively reduced memory consumption, and the amount of model parameters, but accelerated the speed of model recognition. The F1, AP50, and recall of the proposed recognition model for the dense cherry tree on all test sets were 0.99, 99.74% and 99.15%, respectively. The improved YOLOv4 increased by 0.15, 8.29, and 6.54 percentage points, respectively, and the weight size was 45.3MB, about 1/5 of YOLOv4. Additionally, the detection of a single 416×416 image reached a speed of 3.01ms/frame on the GPU. Therefore, the recognition model of dense cherry tomatoes behaved a higher speed of recognition, a higher accuracy, and lighter weight than before. The finding can provide a strong support to the efficient production forecast of cherry tomatoes in the facility agriculture environment. [ABSTRACT FROM AUTHOR]