Dria

Swan
NFTHardhat
21,000 USDC
View results
Submission Details
Severity: low
Invalid

Biased variance estimator will result in narrower accepted scores intervals

Summary

When working with small sample sizes, using an unbiased estimator of variance is important to accurately reflect the population variance. Bessel’s correction is applied by adjusting the denominator to , rather than , to counteract the tendency of small samples to underestimate the true population variance. This adjustment removes 1 degree of freedom because the sample mean is calculated from the sample itself:

Vulnerability Details

Underestimating the variance results in a smaller standard deviation, which, in turn, leads to narrower confidence intervals. This bias reduces the range of scores considered in calculating final validation and generation scores, potentially missing the statistically optimal range of responses.

Impact

The impact of underestimating variance becomes more pronounced when:

  • The sample size of scores is small (as with 10 validation scores).

  • Scores vary significantly across samples.

To quantify this, a Monte Carlo simulation with 100,000 trials was conducted using MATLAB. In each trial, random validation scores were generated for different fictitious responses, sampling from a normal distribution with a fixed mean and standard deviation. The range of validation scores selected to calculate response scores was computed using both the biased and unbiased estimators, as well as the known population parameters.

The simulation aimed to determine how often the unbiased estimator’s filtered range provided a closer approximation to the optimal score derived using population boundaries (our theoretical standard) than the biased estimator did.

According to the information provided by the sponsor team, the scores are in the range 0 to 1e18, to keep the simulation closer to the real scenarios, they are generated from a normal distribution with mean 5e17 and a standard deviation of 1e17.

See PoC
% Outer loop for Monte Carlo simulations
diffs = [];
for i = 1:10
% Monte Carlo Simulation for Biased vs Unbiased Variance Estimator
% Simulation parameters
num_trials = 100000; % Number of Monte Carlo trials
num_generations = 10; % Number of generator responses in each trial
num_validations = 10; % Number of validation scores per generation
mu = 5e17; % Population mean
stdev = 1e17; % Population standard deviation
count_biased_diffs = 0;
count_unbiased_diffs = 0;
for trial = 1:num_trials
% Generate validation scores for each generator response
scores = randn(num_generations, num_validations) * stdev + mu; % Random scores ~ N(50, 10^2)
% Store results for biased and unbiased estimators
avg_scores_biased = zeros(num_generations, 1);
avg_scores_unbiased = zeros(num_generations, 1);
avg_scores_theoretical = zeros(num_generations, 1);
% Evaluate each generator response
for gen = 1:num_generations
% Current generator's validation scores
validation_scores = scores(gen, :);
% Calculate the biased and unbiased standard deviations
mean_score = mean(validation_scores);
biased_variance = var(validation_scores, 1); % biased: uses n
unbiased_variance = var(validation_scores, 0); % unbiased: uses n-1
% Filter scores within ±2 SD
filtered_scores_biased = validation_scores(abs(validation_scores - mean_score) <= 2 * sqrt(biased_variance));
filtered_scores_unbiased = validation_scores(abs(validation_scores - mean_score) <= 2 * sqrt(unbiased_variance));
theoretical_filtered_scores = validation_scores(abs(validation_scores - mu) <= 2 * stdev);
% Compute averages for scores within ±2 SD
avg_scores_biased(gen) = mean(filtered_scores_biased);
avg_scores_unbiased(gen) = mean(filtered_scores_unbiased);
avg_scores_theoretical(gen) = mean(theoretical_filtered_scores);
end
% Identify the optimal response for biased and unbiased estimators
[~, opt_gen_biased] = max(avg_scores_biased);
[~, opt_gen_unbiased] = max(avg_scores_unbiased);
[~, opt_gen_theoretical] = max(avg_scores_theoretical);
if opt_gen_biased ~= opt_gen_theoretical
count_biased_diffs = count_biased_diffs + 1;
end
if opt_gen_unbiased ~= opt_gen_theoretical
count_unbiased_diffs = count_unbiased_diffs + 1;
end
end
diffs = [diffs count_biased_diffs - count_unbiased_diffs];
fprintf('%d iterations completed\n',i);
end
% Display results
fprintf('Unbiased estimator results in a more accurate response on average %d runs more than the biased\n', mean(diffs))

The simulation found that, on average, the unbiased estimator more accurately approximates the optimal score (based on the true population range) approximately 300 times out of 100,000 more than the biased estimator.

Tools Used

Manual review.

Recommendations

Use the unbiased variance estimator in Statistics::variance:

function variance(
uint256[] memory data
) internal pure returns (uint256 ans, uint256 mean) {
mean = avg(data);
uint256 sum = 0;
for (uint256 i = 0; i < data.length; i++) {
uint256 diff = data[i] - mean;
sum += diff * diff;
}
- ans = sum / data.length;
+ ans = sum / (data.length-1);
}
Updates

Lead Judging Commences

inallhonesty Lead Judge 12 months ago
Submission Judgement Published
Invalidated
Reason: Non-acceptable severity

Support

FAQs

Can't find an answer? Chat with us on Discord, Twitter or Linkedin.