Lightning talk presentation for the 17th International Digital Cuation Conference (IDCC22) on the topic of Reusability. These slides present the methods and some preliminary results from Springer Nature and OpenAIRE's collaborationto improve data linking, specifically a workstream enabling better measurement of data reuse from existing publications. Improving our measurement of data reuse: OpenAIRE and Springer Nature's collaboration to automate data link identification and classification Abstract: Publishers, funders, governments and research institutions increasingly encourage researchers to make their data reusable. Making the case for FAIR data presents a challenge in how to track and measure data reuse. Springer Nature and OpenAIRE are collaborating on a text and data mining approach to improve our knowledge of data reuse. This lightning talk presents our research aims, methods and some preliminary results. Since the publication of the FAIR principles in 2016, organisational efforts to make data FAIR have mainly focused on requirements and motivations for the dataset creator; data policies widely adopted by publishers and funders emphasise what authors should do with their data, supported by editorial guidance and checks. Technical solutions have built on the increased usage of repositories, metadata standards and the growing role of data curators. While there are clearly-quantified problems FAIR data can solve (e.g. the €10.2bn cost to the European economy of non-FAIR data), researchers may ask "will the time and effort spent making my data FAIR actually lead to its reuse?". Effective implementation of FAIR should lead to data being reused, not just theoretically reusable. This necessitates investigation of actual data reuse patterns, which presents certain challenges. The information required to link dataset creation and reuse, or creator and reuser, is often incomplete or absent. Even where technical frameworks such as Scholix have enabled progress, the required culture change for data accreditation is insufficient to provide a comprehensive picture. A number of recent initiatives have sought to address the data reuse knowledge gap using machine learning techniques on published literature. The Coleridge Institute’s ‘Show US the Data’ program and the NIH LitCoin Natural Language Processing (NLP) challenge are two such competitions to use NLP to identify public datasets from research publications. Springer Nature and OpenAIRE are partnering to address this issue by improving the discoverability of links between Springer Nature’s publications and underlying research data. This collaboration employs OpenAIRE’s text and data mining algorithms to detect and classify data-article links. It has the advantage of being able to analyse the existing corpus, and interrogate data reuse: based on authorship (reuse by the same / different authors) based on discipline (reuse within / between disciplines) throughout time across repositories OpenAIRE identifies dataset from the literature using data identifiers and surrounding context from the research publication. Advanced artificial intelligence and NLP techniques process the corpus, aiming to capture all instances of underlying data in the manuscript. Authors’ data citation behaviour is then disambiguated in terms of reuse and attribution to creators based on the surrounding context. The process constructs a scientific knowledge graph integrating identified datasets, linked to FAIR metadata via the manuscript. Finally, we consolidate and quantify our analysis across different topics, disciplines and organisations over time. We intend that this work will provide a methodological basis for data link detection and classification, with insights into data reuse patterns across the published literature. Determining granular and comprehensive patterns of data reuse offers the potential for more targeted interventions to promote reusability by research organisations.