## Summary
`LLMOracleCoordinator.finalizeValidation()` function will revert when calculating the `_stddev` in the `variance()` library function when the score in a set is less than the mean value of that set, where this would be the case for a scores set with a high spread (the generator scores are widely distributed and far apart from the mean).
## Vulnerability Details
- A buyer agent calls `BuyerAgent.oraclePurchaseRequest()` with the required input (listed assets to be purchased), where a call is made to the `LLMOracleCoordinator.request()` to make a purchase request after paying the requets fees.
- The oracle request is processed first by the generators adding their responses (via `LLMOracleCoordinator.respond()`) --> then after the responses generation is done, the task is moved to the validation phase to be validated and finalized after getting the required validations -if the task requires a validation- (via `LLMOracleCoordinator.validate()`).
- In the validation phase: each validator gives a score for all generators responses, and after that the validation is finalized via `LLMOracleCoordinator.finalizeValidation()`.
- In `LLMOracleCoordinator.finalizeValidation()`; the validators scores for each generator response is processed and the scores for each generator is accepted if it's within an acceptable calculated range (mean +- standard deviation) , then the scores of all generators are processed and generators with scores within the calculated range will be rewarded:
```javascript
function finalizeValidation(uint256 taskId) private {
//...
// compute score for each generation
for (uint256 g_i = 0; g_i < task.parameters.numGenerations; g_i++) {
// get the scores for this generation, i.e. the g_i-th element of each validation
uint256[] memory scores = new uint256[]();
for (uint256 v_i = 0; v_i < task.parameters.numValidations; v_i++) {
scores[v_i] = validations[taskId][v_i].scores[g_i];
}
// compute the mean and standard deviation
(uint256 _stddev, uint256 _mean) = Statistics.stddev(scores);
uint256 innerSum = 0;
uint256 innerCount = 0;
for (uint256 v_i = 0; v_i < task.parameters.numValidations; ++v_i) {
uint256 score = scores[v_i];
@L343 if ((score >= _mean - _stddev) && (score <= _mean + _stddev)) {
innerSum += score;
innerCount++;
// send validation fee to the validator
_increaseAllowance(validations[taskId][v_i].validator, task.validatorFee);
}
}
// set score for this generation as the average of inner scores
uint256 inner_score = innerCount == 0 ? 0 : innerSum / innerCount;
responses[taskId][g_i].score = inner_score;
}
//...
}
```
- The mean and standard deviation are calculated by the `Statistics.stddev()` library function:
```javascript
function stddev(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
(uint256 _variance, uint256 _mean) = variance(data);
mean = _mean;
ans = sqrt(_variance);
}
```
```javascript
function variance(uint256[] memory data) internal pure returns (uint256 ans, uint256 mean) {
mean = avg(data);
uint256 sum = 0;
for (uint256 i = 0; i < data.length; i++) {
uint256 diff = data[i] - mean; //<< @audit : will underflow if the data is less than the mean
sum += diff * diff;
}
ans = sum / data.length;
}
```
- But it can be noticed from `Statistics.variance()` that calculates the variance to proceed with mean calculation; it will revert due to underflow if the score (`data[i]`) is less than the `mean`, and this is possible when the scores spread is large (see the numerical example in the PoC section).
## Impact
This would result in preventing finalizing the task validation, resulting in the state of the task being stuck in the `PendingValidation` and never being `Completed` , and this in turns would result in the buyerAgent being stuck and unable to complete the purchase unless another call to the `oraclePurchaseRequest()` is made causing loss of funds for the BuyerAgent (paid fees for the oracles and the platform to process a new request).
## Proof of Concept
This example demonstrates a dataset with a high spread, where the **standard deviation** is greater than the **mean** :
**Dataset**
data = [2, 4, 6, 100]
**Mean Calculations**
mean = (2 + 4 + 6 + 100)/4 = 112/4 = 28
**Variance Calculation**
Variance is the average of the squared differences from the mean.
1. For data point 1: (2 - 28)^2 = (-26)^2 = 676
2. For data point 2: (4 - 28)^2 = (-24)^2 = 576
3. For data point 3: (6 - 28)^2 = (-22)^2 = 484
4. For data point 100: (100 - 28)^2 = (72)^2 = 5184
variance = (676 + 576 + 484 + 5184)/4 = 1730
**Standard Deviation Calculation**
standard deviation = sqrt(1730)= approx 41.6
## Tools Used
Manual Review.
## Recommendations
In `Statistics.variance()`: handle the case when the score is less than the mean.