As the fields of machine learning, natural language processing, and big data become increasingly important throughout various industries, including finance, it becomes crucial to evaluate how they are being utilised and understand the motivations behind the recommendations of algorithms, in order to make sound decisions based on this information. In finance, particularly, being able to create concise and easy to comprehend tools for understanding machine learning models is extremely beneficial. These tools could serve multiple purposes, including allowing managers to better explain to and convince investors of machine learning based trading strategies. Decision-makers themselves might also be able to act according to the information provided by these tools, which could assist in, for example, making more ethically informed decisions. Thus, this type of research not only has implications in the areas of machine learning, Natural Language Processing (NLP), and finance but also in fields like AI explainability, model interpretability and Responsible Research and Innovation. The central problem this thesis addresses is whether machine learning methods can be used to mine representative sentences from financial text that are able to capture the majority of sentiment in a full document with only a few sentences. These mined sentences are referred to as 'justifications', and the process has been called 'justification mining'. The full documents used here are data taken from 10-K filings. Before examining justification mining, however, transfer learning methods suitable for training data annotated at a different level than testing data (sentence- versus document-level) are assessed. The purpose of this is to address a problem that often occurs in NLP research, where no annotated training data perfectly suited to the research is readily available. Therefore, methods must be created to make use of what is available, in this case sentence-annotated training data being employed to train classifiers for prediction on document-level testing data. These transfer learning methods are then employed in the next step, which focuses on justification mining. The process of justification mining itself is first developed using transformer models to encode embeddings and, then, using clustering algorithms and cosine similarity to extract or mine justifications. This process is evaluated by comparing sentiment from mined justifications to sentiment from full documents, in order to assess the ability of justification mining to capture or summarise sentiment. It is also evaluated by correlating predicted sentiment from mined justifications and full documents to future stock returns, to gauge whether justification mining offers any benefit in identifying signals in the data for financial purposes. Little work has been done previously that evaluates the results of financial sentiment analyses in this way. In the NLP domain, the best way of extracting aggregate justifications for sentiment is still an open question. Moreover, few research papers attempt to apply transfer learning from lower-level data to entire documents. Nor are 10-K filings widely studied in this context, as they are difficult to parse. The methods created in this thesis might offer a novel means of providing information that can assess the motivations behind sentiment analyses. In the transfer learning process, feature engineering and preprocessing steps were modified to obtain accuracies up to 0.903. For justification mining, considering statistically significant (p≤0.05) correlations of |r| > 0.2, sentiment from mined justifications more often correlated with future stock returns than did full document sentiment. Although the correlations (r) were somewhat weak overall, they could potentially be combined with traditional alpha signals to enhance these signals (see Section 11 for further discussion). Moreover, high degrees of similarity were found between aggregated sentiment from mined justifications and full documents, with similarity scores up to 0.9999, supporting the efficacy of justification mining in capturing full document sentiment. In these evaluations, transformer models performed better in numericising text for input into ML models than traditional approaches like Bag of Words. In fact, every statistically significant sentiment to stock return correlation bar one, as well as every mined justification to full document similarity and correlation score above 0.7, used transformer models for numericisation. These results imply that justification mining might be successful in eliminating sentiment noise in financial data, as well as in capturing the majority of sentiment from a full document. Moreover, transformer models might provide an advantage over traditional approaches like Bag of Words for numericisation of text. Mined justifications, themselves, provide an easily interpretable and presentable means of explaining the output of sentiment analysis and have numerous uses, including identifying the driving factors behind sentiment in a financial document, which could be helpful for making more ethically informed decisions or building investor trust in the methodology of sentiment analysis algorithms.