This paper describes a linkage between the New York State Cancer Registry (NYSCR) and the New York State (NYS) Medicaid program, both housed within the New York State Department of Health (NYSDOH). Medicaid, the state-based health program for individuals and families with low incomes and resources in the United States, insures approximately one-sixth of New York adults aged 18–64 and one-fifth of elderly over 65, most of whom are dually enrolled in Medicare and Medicaid. Deficiencies in cancer care are known to disproportionately affect the poor as well as racial and ethnic minorities who are overrepresented in Medicaid (Agency for Healthcare Research and Quality 2005; Institute of Medicine 2005; Landon et al. 2007;). The Surveillance, Epidemiology and End Results (SEER)–Medicare linkage is the prototype for our effort (Warren et al. 2002; National Cancer Institute, Division of Cancer Control and Population Sciences, Health Services and Economics Branch 2009;). This linkage has enabled the development of a resource containing information on over 3 million elderly persons with cancer, which has yielded a large number of influential publications describing the patterns and outcomes of cancer care in the United States (Warren et al. 2002; Hershman et al. 2007; Wong et al. 2007; Gooden et al. 2008; Morris et al. 2008; White et al. 2008;). The SEER program collects demographic and diagnostic information for persons diagnosed with cancer, historically representing about 14 percent of the U.S. population and more recently expanded to cover 26 percent. SEER data provide detailed characterizations of cancer diagnosis, tumor stage, and initial treatment but do not track care longitudinally other than vital status. Linkage with Medicare claims for enrollees in fee-for-service plans enables characterization of covered health services longitudinally. A comparable match to Medicaid records has seldom been attempted owing to technical and procedural obstacles (Bradley, Given, and Roberts 2001; Bradley et al. 2007;). These obstacles include the fact that each state's Medicaid program is distinct, that Medicaid enrollment can be discontinuous, and that managed care penetration might be high and thus limit the utility of the claims for ascertaining care. The few attempts to construct registry—Medicaid linkages at the state level include those in Michigan (Bradley, Given, and Roberts 2001, 2002, 2003), California (Perkins et al. 2001; Chan et al. 2006;), Ohio (Koroukian et al. 2006a, 2006b), Washington (Ramsey et al. 2008), and Louisiana (Whitaker et al. 2009). The NYSCR–Medicaid linkage has two primary objectives. The first, discussed in this paper, is to create a deidentified analytic dataset for use in ongoing research projects with extramural partners. We will be using this dataset, for example, to compare community-dwelling cancer patients by Medicaid status, stage distribution, and receipt of quality care to identify whether racial and ethnic disparities exist within the Medicaid program. Findings will be related to timing and duration of enrollment, as those who enrolled in Medicaid in response to a cancer diagnosis might have quite different diagnostic or demographic characteristics than long-term enrollees. The second objective is to assess the degree to which Medicaid claims data add value to the cancer treatment information already collected by the NYSCR, with an emphasis on the treatment of breast and colorectal cancer. The collection of detailed treatment information is a relatively new undertaking for most U.S. cancer registries that are not part of the SEER program; treatment data for New York are considered complete only for cases diagnosed beginning in 2003. Even for this “complete” data, the amount of missing information is typically more than double that seen in SEER. For example, 3.2 percent of colorectal cancer cases diagnosed in NYS between 2004 and 2006 have unknown surgery information, compared with 1.3 percent in SEER. In addition, there is potentially useful information in claims records that cancer registries do not collect at all, such as screening utilization, prescription medications, and nursing home services. Claims data also provide a means for independently verifying that information being collected by central registries is correct. In this paper, we detail the data linkage process and the resulting analytic dataset. We report the percentage of cancer cases matching to the Medicaid enrollment files by cancer site, stage, age, marital status, race/ethnicity, and geography, and show how these differ in important ways from the cancer population generally. We then discuss ways in which the linkage enhanced the quality of registry data, some of which were unanticipated. Finally, we discuss file size issues as the single largest technical hurdle encountered.