With the advancement of remote sensing technology, the high spatial resolution remote sensing image contains rich special information with a great detail. At the same time, the complexity of high spatial resolution remote sensing images also requires higher the classification technology of remote sensing images. However, in the face of high spatial resolution remote sensing image more obvious geometrical structure and the more rich texture characteristics, how to design rational system of characteristics, select the appropriate sorting algorithms to accurately and quickly grasp the number of rural land of building and its distribution status, are of great significance to balance urban and rural areas, save land, and realize sustainable development. This will help in exploring the application of deep learning model in high spatial resolution remote sensing image building extraction, and have research significance for improving the classification accuracy of high resolution remote sensing image. In this paper, the semantic segmentation model (SegNet) was used for extracting buildings. SegNet is mainly composed of encoder network, decoder network and pixel-wise classification layer. The encoder network transforms high-dimensional vectors into low-dimensional vectors, enabling low-dimensional extraction of high-dimensional features. The decoder network maps low-resolution feature maps to high spatial resolution feature maps, realizing the reconstruction of low-dimensional vectors to high-dimensional vectors. The softmax classifier separately classifies each pixel, which outputs the probability that each pixel belongs to each class. In this paper, a 3000 pixel × 3000 pixel and two 2000 pixel × 2000 pixel slices were taken from the global remote sensing image of Bazhou City, Hebei Province as training samples, and a 3000 pixel × 3000 pixel slice was taken as the verification sample. In this paper, five comparative experiments were used to extract the buildings, including PSPNet, support vector machine, random forest, ISO clustering and maximum likelihood method. The confusion matrix of each classification method was obtained by calculating the difference between the classification results of the comparison experiment and the real value. From the traditional classification algorithm to the shallow learning algorithm to the deep learning algorithm, the Kappa coefficient and overall accuracy of classification kept constantly increasing, among which SegNet semantic segmentation algorithm based on the deep convolutional network performed better than the other five algorithms in extracting buildings from high spatial resolution remote sensing image. The Kappa coefficient and the overall accuracy of SegNet semantic segmentation algorithm were 0.90 and 96.61%, respectively, and the ground truth value was basically the same as the classification result. The F1Score of building extraction of SegNet semantic segmentation algorithm based on deep convolution network was 0.91, but the other five algorithms were below 0.87. SegNet had the lowest error rate of 9.71% for buildings, indicating that the ability to identify buildings of semantic segmentation algorithm from high spatial resolution remote sensing was superior to traditional classification algorithms, shallow layer learning algorithms based on machine learning, and PSPNet semantic segmentation algorithm based on deep convolution network. The Kappa coefficient and overall accuracy of the remaining five classification algorithms were respectively below 0.83 and 94.68%, and the difference between the ground truth value and the classification result was relatively large. SegNet can not only make use of spectral information but also make full use of abundant spatial information. During SegNet training, more essential features can be learned, and more ideal features suitable for pattern classification were finally formed, which can enhance the ability of convergence and generalization of the model and improve the classification accuracy. Traditional classification algorithms, such as ISO clustering and maximum likelihood method, failed to make use of the rich spatial information of the high-resolution remote sensing image, so the accuracy was relatively low. Due to limited computing units and large amount of high spatial resolution remote sensing image data, shallow layer learning algorithms based on machine learning such as support vector machines and random forest cannot effectively express complex features of ground objects, so their advantages are not obvious in building extraction from the high spatial resolution remote sensing images. The experimental results showed that the SegNet based on deep learning has the best performance, and it has important theoretical significance to explore the application of deep learning model to remote sensing image classification methods. At the same time, the research results also provide a reference for improving the classification accuracy of high resolution remote sensing images. [ABSTRACT FROM AUTHOR]