Summary
The variance
function upon which stdDev
is based incorrectly assumes that the mean of the scores will always be greater than or equal to the individual scores which is not always the case.
Vulnerability Details
The arithmetic mean is the sum of a dataset of values divided by the count of values in the dataset. It provides the "central" tendency of the elements in the dataset and always lies between the smallest and largest elements. As a result, cases can occur in which the mean is greater than certain elements in the set.
This property isn't taken into consideration instead when calculating variance
for the stddev
function.
We can see that the stddev
function first attempt to calculate the variance
of the data.
function stddev(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
@> (uint256 _variance, uint256 _mean) = variance(data);
mean = _mean;
ans = sqrt(_variance);
}
And when variance
is calculated, just like the formula requires, the mean is deducted from each indivual member of the data array. The parameter is set as diff, uint256.
function variance(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
mean = avg(data);
uint256 sum = 0;
for (uint256 i = 0; i < data.length; i++) {
@> uint256 diff = data[i] - mean;
sum += diff * diff;
}
ans = sum / data.length;
}
However, as established the mean lies in between the elements of the array, and in a lot of cases, will actually be greater than the element. But since the variance
function doesn't account this, the line uint256 diff = data[i] - mean;
will mostly always revert due to undeflow error. As a result, the queries to stddev
will be dossed.
This is important as to validate
a taskId's request, the finalizeValidation
function is reqiured to compute the standard deviation and mean of the scores and generation scores to be validated.
function finalizeValidation(uint256 taskId) private {
TaskRequest storage task = requests[taskId];
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
uint256[] memory scores = new uint256[]();
for (uint256 v_i = 0; v_i < task.parameters.numValidations; v_i++) {
scores[v_i] = validations[taskId][v_i].scores[g_i];
}
@> (uint256 _stddev, uint256 _mean) = Statistics.stddev(scores);
uint256 innerSum = 0;
uint256 innerCount = 0;
for (uint256 v_i = 0; v_i < task.parameters.numValidations; ++v_i) {
uint256 score = scores[v_i];
if ((score >= _mean - _stddev) && (score <= _mean + _stddev)) {
innerSum += score;
innerCount++;
_increaseAllowance(validations[taskId][v_i].validator, task.validatorFee);
}
}
uint256 inner_score = innerCount == 0 ? 0 : innerSum / innerCount;
responses[taskId][g_i].score = inner_score;
}
uint256[] memory generationScores = new uint256[]();
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
generationScores[g_i] = responses[taskId][g_i].score;
}
@> (uint256 stddev, uint256 mean) = Statistics.stddev(generationScores);
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
if (generationScores[g_i] >= mean - generationDeviationFactor * stddev) {
_increaseAllowance(responses[taskId][g_i].responder, task.generatorFee);
}
}
}
Impact
The function would mostly always revert due to underflow error, dossing request validations which is a very important protocol process.
Tools Used
Manual Review
Recommendations
Modify the variance
function to check first for which is greater between data[i]
and mean
, then deduct the lesser from bigger.
Something like this.
function variance(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
mean = avg(data);
uint256 sum = 0;
for (uint256 i = 0; i < data.length; i++) {
- uint256 diff = data[i] - mean;
+ uint256 diff;
+ if (data[i] => mean) {
+ diff = data[i] - mean;
+ }
+ else {
+ diff = mean - data[i];
+ }
+ sum += diff * diff;
+ }
ans = sum / data.length;
}