NFTBridge
60,000 USDC
View results
Submission Details
Severity: low
Invalid

The bridge cannot handle bridged NFT collections from L1 to L2 that use multi-byte UTF-8 characters.

Summary

It is common for some NFTs on L1 to use multi-byte UTF-8 characters, such as emojis, in their token names, symbols, or URIs. However, if this scenario is not properly considered, the bridged token on L2 will fail to decode these characters correctly, resulting in incorrect information in the L2 collection's metadata.

Vulnerability Details

Strings in Solidity are dynamically-sized byte arrays (bytes). Solidity strings are encoded in UTF-8 by default, meaning each character could be 1 to 4 bytes long depending on the character set used. while Cairo does not natively support UTF-8 encoding. Instead, Cairo handles strings either as packed ASCII characters (for short strings) or as raw byte arrays (using ByteArray).

If the Solidity strings only contain ASCII characters (1 byte per character), the encoding can be relatively straightforward when transferring to Cairo. In UTF-8, single bytes with values in the range of 0 to 127 map directly to Unicode code points in the ASCII range. if the Solidity string contains non-ASCII characters (such as emojis), the Cairo will not properly decode it by default, additional logic on the Cairo contract to correctly interpret these sequences of bytes as multi-byte characters is needed.

reference : https://docs.starknet.io/architecture-and-concepts/smart-contracts/serialization-of-cairo-types/

PoC :

Consider two types of NFT with different metadata (token's name for instance) :

  • one consist of only normal ASCII characters (1 byte per character) : "Hello"

  • and the other use emoji (Multi-bytes per character) : "Hello 🦄"

For only normal ASCII characters (1 byte per character) "Hello" :

Run the following code :

function cairoStringSerializedLength(
string memory str
)
internal
pure
returns (uint256)
{
// @audit - (DONE) - learn more about this, is this always correct?
bytes memory strBytes = bytes(str);
// uint256 constant CAIRO_STR_LEN = 31;
uint256 dataLen = strBytes.length / CAIRO_STR_LEN;
uint256 packedLen = 1 + dataLen + 1 + 1;
return packedLen;
}
function cairoStringSerialize(
string memory str,
uint256[] memory buf,
uint256 offset
)
internal
pure
returns (uint256)
{
uint256[] memory packed = cairoStringPack(str);
// @audit - (DONE) - if this too long, will go OOG, that is why there is limit on L1
for (uint256 i = 0; i < packed.length; i++) {
buf[offset + i] = packed[i];
}
return packed.length;
}
function cairoStringPack(
string memory str
)
internal
pure
returns (uint256[] memory)
{
// CAIRO_STR_LEN = 31;
bytes memory strBytes = bytes(str);
uint256 dataLen = strBytes.length / CAIRO_STR_LEN;
uint256 pendingLen = strBytes.length % CAIRO_STR_LEN;
uint256 packedLen = 1 + dataLen + 1 + 1;
uint256[] memory packedData = new uint256[](packedLen);
uint256 index = 0;
uint256 v;
uint256 offset = 0x20; // length is first u256
packedData[index] = dataLen;
index++;
for (uint256 i = 0; i < dataLen; i ++) {
assembly {
v := mload(add(strBytes, offset))
// @audit - (DONE) - why shr eight here?
v := shr(8, v)
}
packedData[index] = v;
index++;
offset += CAIRO_STR_LEN;
}
// pending word
assembly {
v := mload(add(strBytes, offset))
// @audit - (DONE) - why like this?
v := shr(mul(sub(32, pendingLen), 8),v)
}
packedData[index] = v;
index++;
packedData[index] = pendingLen;
return packedData;
}
function serializeStringTest() public pure returns(uint256[] memory) {
string memory input = unicode"Hello";
uint256[] memory buf = new uint256[](cairoStringSerializedLength(input));
cairoStringSerialize(input, buf, 0);
return buf;
}

The returned serialized data :

0,310939249775,5

Decode it on Cairo/StarNet :

Add the test on byte_array_extra.cairo, the test should run successfully and the string decoded properly

#[test]
fn test_unicode_match() {
let mut a: Span<felt252> = array![0,310939249775,5].span();
let b: Option<ByteArray> = Serde::deserialize(ref a);
match b {
Option::Some(e) => {
assert!(e.data.is_empty(), "Data should be empty");
assert_eq!(e.pending_word, 'Hello', "Wrong pending word");
assert_eq!(e.pending_word_len, 5, "Wrong pending word len");
},
Option::None => panic!("Should not be None")
}
}

Now for multi-bytes string from L1 "Hello 🦄" :

Run the test :

function serializeStringTest() public pure returns(uint256[] memory) {
string memory input = unicode"Hello 🦄";
uint256[] memory buf = new uint256[](cairoStringSerializedLength(input));
cairoStringSerialize(input, buf, 0);
return buf;
}

The returned serialized data :

0,341881320659699967698564,10

Decode it on Cairo/StarNet :

Add the test on byte_array_extra.cairo, the test should run but it will failed to decode the string properly :

#[test]
fn test_unicode_missmatch() {
// 0,341881320659699967698564,10
let mut a: Span<felt252> = array![0,341881320659699967698564,10].span();
let b: Option<ByteArray> = Serde::deserialize(ref a);
match b {
Option::Some(e) => {
assert!(e.data.is_empty(), "Data should be empty");
// FAILED TO DECODE! "Hello 🦄"
assert_eq!(e.pending_word, 341881320659699967698564, "Wrong pending word");
assert_eq!(e.pending_word_len, 10, "Wrong pending word len");
},
Option::None => panic!("Should not be None")
}
}

Impact

The metadata of bridged NFTs will not be properly decoded and set on the L2 collection, potentially causing the bridged L2 collection to lose its value.

Tools Used

Manual review

Recommendations

Implement custom multi-byte UTF-8 decoding in Cairo, or clearly state in the documentation that NFTs with multi-byte characters metadata are not supported.

Updates

Lead Judging Commences

n0kto Lead Judge 10 months ago
Submission Judgement Published
Invalidated
Reason: Incorrect statement
Assigned finding tags:

invalid-UTF8-not-supported

Serialization and deserialization are made directly on bytes, no data are lost during the transfer.

Appeal created

said Submitter
10 months ago
n0kto Lead Judge
9 months ago
n0kto Lead Judge 9 months ago
Submission Judgement Published
Invalidated
Reason: Incorrect statement
Assigned finding tags:

invalid-UTF8-not-supported

Serialization and deserialization are made directly on bytes, no data are lost during the transfer.

Support

FAQs

Can't find an answer? Chat with us on Discord, Twitter or Linkedin.