Data and code

Data 

You are free to use and re-analyze all the data below. If so, please be sure to cite the paper in which the data was originally analyzed. Feel free to contact me for any questions. Similarly, the R code below is free to be re-used, adapted, improved upon or ignored. I occasionally add new data to my figshare account.

1) Conscientiousness and the brain

This covariance matrix was used to analyze the relationship between conscientiousness and the brain in the paper ‘Kievit, R.A., Romeijn, J.W., Waldorp, L.J., Wicherts, J. M., H. Steven Scholte & Borsboom, D. (2011) Mind the gap: A psychometric approach to the reduction problem. Psychological Inquiry, 22, 67-87′ pdf

2. IQ and brain size

This covariance matrix was used to analyze the relationship between general intelligence and brain size in the paper ‘Kievit, R.A., Romeijn, J.W., Waldorp, L.J., Wicherts, J. M., H. Steven Scholte & Borsboom, D. (2011) Mind the gap: A psychometric approach to the reduction problem. Psychological Inquiry, 22, 67-87’ pdf 

3. Intelligence and the brain

This covariance matrix was used to analyze the relationship between general intelligence and brain size in the paper ‘Kievit, R.A., van Rooijen, H., Wicherts, J.M., Kan, K.J., Waldorp, L.J., Scholte & Borsboom, D. (2012) Intelligence and the brain: A model-based approach. Cognitive Neuroscience, 3, 89-9’ Kievitetal2012

4) Feeling the future

This dataset was collected in the context of a fully confirmatory, pre-registered study on prerecognition for the paper ‘Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R.A. (2012). An Agenda for Purely Confirmatory Research. Perspectives on Psychological Science, 7, 632-638.’

You can load the data as txt or as .Rdata file. The paper is here, the online appendix with the results is here and our our preregistration protocol is here.

multigroup_sim

 

Code

Detection and analysis of Simpsons Paradox using package ‘Simpsons’ in R.

Technical details

The package is written for use in R, a free statistical software analysis package available here. For more documentation on R, see http://cran.r-project.org/. This package was created to accompany the paper
Kievit, R. A., Frankenhuis, W. E., Waldorp, L. J. & Borsboom, D. (2013). Simpson’s Paradox in Psychological Science: A Practical Guide. Frontiers in Psychology, 4, 513.  pdf

The package can be downloaded as source or binary here:

source: Simpsons_0.1.0.tar
Windows Binary: Simpsons_0.1.0
CRAN: http://cran.r-project.org/web/packages/Simpsons/index.html

Description
This package detects instances of Simpson’s Paradox in datasets of bivariate continuous data. That is, it tests whether some bivariate relationship found at the level of the whole dataset is consistent (in direction and strength) for possible subpopulations in the data, either user-defined or by means of cluster analysis. It examines whether there is evidence for more than one cluster in the data in the data using cluster analysis, either user-defined or by means of cluster analysis. Then, it plots the data, using a different color for every cluster, plots the regression lines for each cluster, and estimates the regression of X on Y for each cluster. Finallt, it tests whether the regression at the level of the whole dataset is different from the regression at the level of the subclusters using a permutation test to correct for dependencies.

Examples
The package contains three examples that cover three types of analysis. Below we describe the syntax and output of example one. Other examples can be found in the manual on the CRAN website.

Example 1. Looking for Simpson’s Paradox using manifest group membership.
ex11
Here, we want to estimate the relationship between ‘Coffee’ and ‘Neuroticism’, both bivariate continuous variables, taking into account possible gender differences. As we have measured gender, we supply this information using the clusterid’ command.  This means that the function runs the analysis both for the dataset as a whole and within the two subgroups. It then checks whether the subgroups deviate significantly from the regression at the level of the group. Code to simulate data can be found here. The data looks this this plot: There is no significant relationship between coffee and neuroticism.

Once the function Simpsons is called and run on the dataset (for instance as follows: example1=Simpsons(Coffee,Neuroticism,clusterid=gender, data=data,nreps=10)

The results are fed back and clusters, if any, are plotted.

Output:
2 clusters detected

Permuting cluster 1

|=====================================================================| 100%

Permuting cluster 2

|====================================================================| 100%

Warning: Beta regression estimate in Cluster 1 is significantly different compared to the group!

Sign reversal: Simpson’s Paradox! Cluster  2  is significantly different and in the opposite direction compared to the group!

We can then plot the results, showing the two clusters and their respective regression estimates:

example1

Finally, we can extract the regression coefficients for the individual clusters:

coef(example1)

N            Int                Beta                   Pval

Alldata    200   89.46165     0.0559708    4.351747e-01

Cluster 1 100   8.717222    0.8693445    2.725523e-29

Cluster 2 100  176.55301   -0.8192838   8.826872e-26

Extract clusters:
cluster(example1,1)

Extracts all datapoints belonging to the first cluster.

Future developments: The package is currently being extended and further developed. Future versions will include automated frequency table analysis for detection instances of instances of Simpson’s Paradox. For questions or suggestions, please e-mail rogierkievit@gmail.com.

Using fuzzy set theory to study Major Depressive Disorder

This R code can be used to simulate a number of different assignment methods to quantify the level of depression in either a latent variable, a formative latent variable or a fuzzy set theory perspective. More information available in our bookchapter.