ArkProject: NFT Bridge

ArkProject

NFTBridge

60,000 USDC

View results

Previous Next

Submission Details

Severity: low

Invalid

The bridge cannot handle bridged NFT collections from L1 to L2 that use multi-byte UTF-8 characters.

said

Summary

It is common for some NFTs on L1 to use multi-byte UTF-8 characters, such as emojis, in their token names, symbols, or URIs. However, if this scenario is not properly considered, the bridged token on L2 will fail to decode these characters correctly, resulting in incorrect information in the L2 collection's metadata.

Vulnerability Details

Strings in Solidity are dynamically-sized byte arrays (bytes). Solidity strings are encoded in UTF-8 by default, meaning each character could be 1 to 4 bytes long depending on the character set used. while Cairo does not natively support UTF-8 encoding. Instead, Cairo handles strings either as packed ASCII characters (for short strings) or as raw byte arrays (using ByteArray).

If the Solidity strings only contain ASCII characters (1 byte per character), the encoding can be relatively straightforward when transferring to Cairo. In UTF-8, single bytes with values in the range of 0 to 127 map directly to Unicode code points in the ASCII range. if the Solidity string contains non-ASCII characters (such as emojis), the Cairo will not properly decode it by default, additional logic on the Cairo contract to correctly interpret these sequences of bytes as multi-byte characters is needed.

reference : https://docs.starknet.io/architecture-and-concepts/smart-contracts/serialization-of-cairo-types/

PoC :

Consider two types of NFT with different metadata (token's name for instance) :

one consist of only normal ASCII characters (1 byte per character) : "Hello"
and the other use emoji (Multi-bytes per character) : "Hello 🦄"

For only normal ASCII characters (1 byte per character) "Hello" :

Run the following code :

function cairoStringSerializedLength(

string memory str

)

internal

pure

returns (uint256)

{

// @audit - (DONE) - learn more about this, is this always correct?

bytes memory strBytes = bytes(str);

// uint256 constant CAIRO_STR_LEN = 31;

uint256 dataLen = strBytes.length / CAIRO_STR_LEN;

uint256 packedLen = 1 + dataLen + 1 + 1;

return packedLen;

}

function cairoStringSerialize(

string memory str,

uint256[] memory buf,

uint256 offset

)

internal

pure

returns (uint256)

{

uint256[] memory packed = cairoStringPack(str);

// @audit - (DONE) - if this too long, will go OOG, that is why there is limit on L1

for (uint256 i = 0; i < packed.length; i++) {

buf[offset + i] = packed[i];

}

return packed.length;

}

function cairoStringPack(

string memory str

)

internal

pure

returns (uint256[] memory)

{

// CAIRO_STR_LEN = 31;

bytes memory strBytes = bytes(str);

uint256 dataLen = strBytes.length / CAIRO_STR_LEN;

uint256 pendingLen = strBytes.length % CAIRO_STR_LEN;

uint256 packedLen = 1 + dataLen + 1 + 1;

uint256[] memory packedData = new uint256[](packedLen);

uint256 index = 0;

uint256 v;

uint256 offset = 0x20; // length is first u256

packedData[index] = dataLen;

index++;

for (uint256 i = 0; i < dataLen; i ++) {

assembly {

v := mload(add(strBytes, offset))

// @audit - (DONE) - why shr eight here?

v := shr(8, v)

}

packedData[index] = v;

index++;

offset += CAIRO_STR_LEN;

}

// pending word

assembly {

v := mload(add(strBytes, offset))

// @audit - (DONE) - why like this?

v := shr(mul(sub(32, pendingLen), 8),v)

}

packedData[index] = v;

index++;

packedData[index] = pendingLen;

return packedData;

}

function serializeStringTest() public pure returns(uint256[] memory) {

string memory input = unicode"Hello";

uint256[] memory buf = new uint256[](cairoStringSerializedLength(input));

cairoStringSerialize(input, buf, 0);

return buf;

}

The returned serialized data :

0,310939249775,5

Decode it on Cairo/StarNet :

Add the test on byte_array_extra.cairo, the test should run successfully and the string decoded properly

#[test]

fn test_unicode_match() {

let mut a: Span<felt252> = array![0,310939249775,5].span();

let b: Option<ByteArray> = Serde::deserialize(ref a);

match b {

Option::Some(e) => {

assert!(e.data.is_empty(), "Data should be empty");

assert_eq!(e.pending_word, 'Hello', "Wrong pending word");

assert_eq!(e.pending_word_len, 5, "Wrong pending word len");

Option::None => panic!("Should not be None")

}

Now for multi-bytes string from L1 "Hello 🦄" :

Run the test :

function serializeStringTest() public pure returns(uint256[] memory) {

string memory input = unicode"Hello 🦄";

uint256[] memory buf = new uint256[](cairoStringSerializedLength(input));

cairoStringSerialize(input, buf, 0);

return buf;

}

The returned serialized data :

0,341881320659699967698564,10

Decode it on Cairo/StarNet :

Add the test on byte_array_extra.cairo, the test should run but it will failed to decode the string properly :

#[test]

fn test_unicode_missmatch() {

// 0,341881320659699967698564,10

let mut a: Span<felt252> = array![0,341881320659699967698564,10].span();

let b: Option<ByteArray> = Serde::deserialize(ref a);

match b {

Option::Some(e) => {

assert!(e.data.is_empty(), "Data should be empty");

// FAILED TO DECODE! "Hello 🦄"

assert_eq!(e.pending_word, 341881320659699967698564, "Wrong pending word");

assert_eq!(e.pending_word_len, 10, "Wrong pending word len");

Option::None => panic!("Should not be None")

}

Impact

The metadata of bridged NFTs will not be properly decoded and set on the L2 collection, potentially causing the bridged L2 collection to lose its value.

Tools Used

Manual review

Recommendations

Implement custom multi-byte UTF-8 decoding in Cairo, or clearly state in the documentation that NFTs with multi-byte characters metadata are not supported.

Updates

Lead Judging Commences

n0kto Lead Judge 10 months ago

Submission Judgement Published

Invalidated

Reason: Incorrect statement

Assigned finding tags:

invalid-UTF8-not-supported

Serialization and deserialization are made directly on bytes, no data are lost during the transfer.

Appeal created

said Submitter

10 months ago

n0kto Lead Judge

9 months ago

n0kto Lead Judge 9 months ago

Submission Judgement Published

Invalidated

Reason: Incorrect statement

Assigned finding tags:

invalid-UTF8-not-supported

Serialization and deserialization are made directly on bytes, no data are lost during the transfer.

Prize pool breakdown

Total prize

60,000 USDC

nSLOC

2,301

26 USDC / LOC

High Medium

50,000 USDC

Low

5,500 USDC

Judges

4,500 USDC

Live

The contest is live. Earn rewards by submitting a finding.

Judging

Submissions are being carefully reviewed by our judges.

View all submissions

Appeals

This is your time to appeal against judgements on your submissions.

Appeals Review

Appeals are being carefully reviewed by our judges.

Rewards Distribution

The contest is complete and the rewards are being distributed.

View results

Support

FAQs

Can't find an answer? Chat with us on Discord, Twitter or Linkedin.