1. PanelWhiz: Efficient Data Extraction of Complex Panel Data Sets - an Example Using the German SOEP
- Author
-
John P. Haisken-DeNew and Markus Hahn
- Subjects
Computer science ,analysis ,Interface (computing) ,jel:C23 ,computer.software_genre ,Data file ,analysis procedure ,panel ,EDV-Programm ,Graphical user interface ,education.field_of_study ,Sozialwissenschaften, Soziologie ,indicator ,practical information ,General Medicine ,Daten ,development of methods ,Data extraction ,data ,Methodenentwicklung ,method ,ddc:300 ,Data mining ,jel:C81 ,Analyseverfahren ,Population ,jel:C87 ,Set (abstract data type) ,praktisch-informativ ,Zeitreihe ,education ,Social sciences, sociology, anthropology ,research project ,Erhebungstechniken und Analysetechniken der Sozialwissenschaften ,research ,Forschung ,business.industry ,software ,anwendungsorientiert ,Methode ,SOEP ,Indikator ,Panel data, storage, retrieval ,PanelWhiz ,Stata ,Analyse ,Data set ,Methods and Techniques of Data Collection and Data Analysis, Statistical Methods, Computer Methods ,applied research ,data processing program ,Forschungsprojekt ,time series ,business ,computer ,Panel data - Abstract
Applied social scientists have forever been faced with different data interfaces for different data sets. In most cases, an interface is not even available, forcing the researcher to address data files by name, and extract the information required by hand. However, the specific structure of panel data can be very complex and vary dramatically as described in Haisken-DeNew (2001). Some panel data sets provide many files per year (“wide format”), differing by their population, or level of aggregation etc., creating many obstacles for researchers. If one wants to put together variables across time (“long format”), this is typically much more difficult, but ultimately the format which is required for estimation. PanelWhiz is a collection of subroutines that allows researchers to use an intuitive “common” graphical interface for accessing many panel datasets directly within the statistical package Stata/SE 10 or better (http://www.stata. com), whereby the researcher does not select individual variables, but rather vectors of variables (items) with one mouse click. This allows for an efficient method of selecting information for a data set retrieval, especially if the panel data set contains many waves (years) of information. With one mouse-click, data can be automatically retrieved, with merging and matching done automatically. With the PanelWhiz system, the user can open data files by clicking on a browse page.
- Published
- 2010