Dria

Swan
NFTHardhat
21,000 USDC
View results
Submission Details
Severity: high
Valid

Statistics.sol's `variance` doesn't handle cases in which individual scores will be less than mean

Summary

The variance function upon which stdDev is based incorrectly assumes that the mean of the scores will always be greater than or equal to the individual scores which is not always the case.

Vulnerability Details

The arithmetic mean is the sum of a dataset of values divided by the count of values in the dataset. It provides the "central" tendency of the elements in the dataset and always lies between the smallest and largest elements. As a result, cases can occur in which the mean is greater than certain elements in the set.

This property isn't taken into consideration instead when calculating variance for the stddev function.

We can see that the stddev function first attempt to calculate the variance of the data.

function stddev(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
@> (uint256 _variance, uint256 _mean) = variance(data);
mean = _mean;
ans = sqrt(_variance);
}

And when variance is calculated, just like the formula requires, the mean is deducted from each indivual member of the data array. The parameter is set as diff, uint256.

function variance(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
mean = avg(data);
uint256 sum = 0;
for (uint256 i = 0; i < data.length; i++) {
@> uint256 diff = data[i] - mean;
sum += diff * diff;
}
ans = sum / data.length;
}

However, as established the mean lies in between the elements of the array, and in a lot of cases, will actually be greater than the element. But since the variance function doesn't account this, the line uint256 diff = data[i] - mean; will mostly always revert due to undeflow error. As a result, the queries to stddev will be dossed.

This is important as to validate a taskId's request, the finalizeValidation function is reqiured to compute the standard deviation and mean of the scores and generation scores to be validated.

function finalizeValidation(uint256 taskId) private {
TaskRequest storage task = requests[taskId];
// compute score for each generation
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
// get the scores for this generation, i.e. the g_i-th element of each validation
uint256[] memory scores = new uint256[]();
for (uint256 v_i = 0; v_i < task.parameters.numValidations; v_i++) {
scores[v_i] = validations[taskId][v_i].scores[g_i];
}
// compute the mean and standard deviation
@> (uint256 _stddev, uint256 _mean) = Statistics.stddev(scores);
// compute the score for this generation as the "inner-mean"
// and send rewards to validators that are within the range
uint256 innerSum = 0;
uint256 innerCount = 0;
for (uint256 v_i = 0; v_i < task.parameters.numValidations; ++v_i) {
uint256 score = scores[v_i];
if ((score >= _mean - _stddev) && (score <= _mean + _stddev)) {
innerSum += score;
innerCount++;
// send validation fee to the validator
_increaseAllowance(validations[taskId][v_i].validator, task.validatorFee);
}
}
// set score for this generation as the average of inner scores
uint256 inner_score = innerCount == 0 ? 0 : innerSum / innerCount;
responses[taskId][g_i].score = inner_score;
}
// now, we have the scores for each generation
// compute stddev for these and pick the ones above a threshold
uint256[] memory generationScores = new uint256[]();
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
generationScores[g_i] = responses[taskId][g_i].score;
}
// compute the mean and standard deviation
@> (uint256 stddev, uint256 mean) = Statistics.stddev(generationScores);
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
// ignore lower outliers
if (generationScores[g_i] >= mean - generationDeviationFactor * stddev) {
_increaseAllowance(responses[taskId][g_i].responder, task.generatorFee);
}
}
}

Impact

The function would mostly always revert due to underflow error, dossing request validations which is a very important protocol process.

Tools Used

Manual Review

Recommendations

Modify the variance function to check first for which is greater between data[i] and mean, then deduct the lesser from bigger.
Something like this.

function variance(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
mean = avg(data);
uint256 sum = 0;
for (uint256 i = 0; i < data.length; i++) {
- uint256 diff = data[i] - mean;
+ uint256 diff;
+ if (data[i] => mean) {
+ diff = data[i] - mean;
+ }
+ else {
+ diff = mean - data[i];
+ }
+ sum += diff * diff;
+ }
ans = sum / data.length;
}
Updates

Lead Judging Commences

inallhonesty Lead Judge 8 months ago
Submission Judgement Published
Validated
Assigned finding tags:

Underflow in computing variance

Support

FAQs

Can't find an answer? Chat with us on Discord, Twitter or Linkedin.