[2015 PNAS] How to Borrow Information from Unlinked Data?



Article

“Assortative Mating without Assortative Preference.” Yu Xie, Siwei Cheng & Xiang Zhou Proceedings of the National Academy of Sciences 2015.

Article

Abstract

One of the most important developments in the current era of social sciences is the growing availability and diversity of data, big and small. Social scientists increasingly combine information from multiple data sets in their research. While conducting statistical analyses with linked data is relatively straightforward, borrowing information across unlinked data can be much more challenging due to the absence of unit-to-unit linkages. This article proposes a new methodological approach for borrowing information across unlinked surveys to predict unobserved distributions. The gist of the proposed approach lies in the idea of using the relative density between the observed and unobserved distributions in the reference data to characterize the difference between the two distributions and borrow that information to the base data. Relying on the assumption that the relative density between the observed and unobserved distributions is similar between data sets, the proposed relative density approach has the key advantage of allowing the researcher to borrow information about the shape of the distribution, rather than a few summary statistics. The approach also comes with a method for incorporating and quantifying the uncertainty in its output. We illustrate the formulation of this approach, demonstrate with simulation examples, and finally apply it to address the problem of employment selection in wage inequality research.