[2020 Sociological Methods & Research] How to Borrow Information from Unlinked Data?



Article

“How to Borrow Information from Unlinked Data? A Relative Density Approach for Predicting Unobserved Distributions.” Siwei Cheng Sociological Methods & Research 2020

Article

Abstract

One of the most important developments in the current era of social sciences is the growing availability and diversity of data, big and small. Social scientists increasingly combine information from multiple data sets in their research. While conducting statistical analyses with linked data is relatively straightforward, borrowing information across unlinked data can be much more challenging due to the absence of unit-to-unit linkages. This article proposes a new methodological approach for borrowing information across unlinked surveys to predict unobserved distributions. The gist of the proposed approach lies in the idea of using the relative density between the observed and unobserved distributions in the reference data to characterize the difference between the two distributions and borrow that information to the base data. Relying on the assumption that the relative density between the observed and unobserved distributions is similar between data sets, the proposed relative density approach has the key advantage of allowing the researcher to borrow information about the shape of the distribution, rather than a few summary statistics. The approach also comes with a method for incorporating and quantifying the uncertainty in its output. We illustrate the formulation of this approach, demonstrate with simulation examples, and finally apply it to address the problem of employment selection in wage inequality research.