Pearson Matching
Correlation Conversion
Let's say we really wanted to estimate the Spearman correlation between the temperature and ozone.
julia> spearman_corr = cor(Matrix(df), Spearman)
2×2 Matrix{Float64}: 1.0 0.774043 0.774043 1.0
julia> cor_bounds(margins[1], margins[2], Spearman)
(lower = -1.0, upper = 1.0)
If we just use the Spearman correlation when we simulate data, then the errors are double.
- The NORTA algorithm is expecting a Pearson correlation
- The non-linear transformation in the NORTA step does not guarantee that the input correlation is the same as the output.
Here is what we get when we use the Spearman correlation directly with no transformation:
julia> x2 = rvec(1_000_000, spearman_corr, margins);
julia> cor(x2, Spearman)
2×2 Matrix{Float64}: 1.0 0.759132 0.759132 1.0
Let's try to address problem 1 and convert the Spearman correlation to a Pearson correlation.
julia> adjusted_spearman_corr = cor_convert(spearman_corr, Spearman, Pearson);
julia> x3 = rvec(1_000_000, adjusted_spearman_corr, margins);
julia> cor(x3, Spearman)
2×2 Matrix{Float64}: 1.0 0.774339 0.774339 1.0
Notice that the estimated Spearman correlation is essentially the same as the target Spearman correlation. This is because the transformation in the NORTA step is monotonic, which means that rank-based correlations are preserved. As a consequence, we can match the Spearman correlation exactly (up to stochastic error) with an explicit transformation.
Pearson Matching
We can employ a Pearson matching algorithm that determines the necessary input correlation in order to achieve the target Pearson correlation. Let's now address problem 2.
julia> pearson_corr = cor(Matrix(df), Pearson)
2×2 Matrix{Float64}: 1.0 0.69836 0.69836 1.0
If we use the measured correlation directly, then the estimated correlation from the simulated data is far off:
julia> x4 = rvec(1_000_000, pearson_corr, margins);
julia> cor(x4, Pearson)
2×2 Matrix{Float64}: 1.0 0.572569 0.572569 1.0
The estimated correlation is much too low. Let's do some Pearson matching and observe the results.
julia> adjusted_pearson_corr = pearson_match(pearson_corr, margins)
2×2 Matrix{Float64}: 1.0 0.850495 0.850495 1.0
julia> x5 = rvec(1_000_000, adjusted_pearson_corr, margins);
julia> cor(x5)
2×2 Matrix{Float64}: 1.0 0.697836 0.697836 1.0
Now the simulated data results in a correlation structure that exactly matches the target Pearson correlation!