The precise identification of enterprise activity codes stands as a crucial task enabling the rapid and effective establishment or renewal of databases encompassing both public and private companies, which in return helps to make an informative decision about countries' economic tendencies. The research involves combining multi-source datasets, data cleaning, explanatory data analysis, retrieval of embeddings, feature selection, the optimal number of clusters identification, data clustering, and post-clustering analysis. Gathered insights allow for informative decisions about taxes, needed state aid and competition analysis. In both the Republic of Lithuania and the European Union, the enterprise classification system operates under the Nomenclature of Economic Activities (NACE), which employs a six-digit framework. For instance, code 461900 indicates that the business conducts the sales of various goods that involve agents. The initial two digits represent overarching enterprise classifications, in this case, retail trade, while the final four digits delineate specific categorisations within the country's industries. This study aims to apply clustering methods to help in the identification of the economic activities of enterprises using descriptions that could be found in the "Company Description" section of the rekvizitai.lt website. The dataset consists of 28350 business descriptions. Two main themes were observed in the data: (1) the average description lengths are 14, excluding stop-words; (2) the most common activities in the Lithuania economic sector are wholesale, retail, agriculture, and service industry. In this study, 3 embedding methods (BERT, LaBSE and Word2Vec), 4 feature selection methods (PCA, UMAP, SVD, and autoencoders) and 8 clustering methods (K-means, GMM, agglomerative, mean shift, OPTICS, BIRCH, HDBSCAN, DEC) were used for experimentations with 195 models trained in total. Three main metrics, silhouette score, Davies Bouldin score, and Calinski-Harabasz Index, are evaluated across all clustering algorithms, with adjusted Rand Index and mutual information evaluated for hard-clustering methods. The initial experiments showed that LaBSE and Word2Vec are the most prominent methods for embedding retrieval, while PCA and UMAP are most suitable for dimensionality reduction. The elbow approach was employed in additional experiments to determine the ideal number of clusters. Although these experiments demonstrated that data may be grouped into fewer clusters, the outcomes did not indicate a statistically significant improvement, and adhering to the original NACE space facilitates a more accurate assessment of the current economic landscape situation. Clustering results from K-means, agglomerative, and mean shift methods showed good intra-clustering and slightly above average inter-clustering results. This research demonstrates that enterprise activity sectors can be categorised using Lithuanian descriptions and the K-means, agglomerative, or mean shift clustering algorithms. Future research will focus on all three algorithms hyperparameter optimisation to improve inter-clustering and intra-clustering results. [ABSTRACT FROM AUTHOR]