It is common for some NFTs on L1 to use multi-byte UTF-8 characters, such as emojis, in their token names, symbols, or URIs. However, if this scenario is not properly considered, the bridged token on L2 will fail to decode these characters correctly, resulting in incorrect information in the L2 collection's metadata.
Strings in Solidity are dynamically-sized byte arrays (bytes
). Solidity strings are encoded in UTF-8 by default, meaning each character could be 1 to 4 bytes long depending on the character set used. while Cairo does not natively support UTF-8 encoding. Instead, Cairo handles strings either as packed ASCII characters (for short strings) or as raw byte arrays (using ByteArray
).
If the Solidity strings only contain ASCII characters (1 byte per character), the encoding can be relatively straightforward when transferring to Cairo. In UTF-8, single bytes with values in the range of 0 to 127 map directly to Unicode code points in the ASCII range. if the Solidity string contains non-ASCII characters (such as emojis), the Cairo will not properly decode it by default, additional logic on the Cairo contract to correctly interpret these sequences of bytes as multi-byte characters is needed.
reference : https://docs.starknet.io/architecture-and-concepts/smart-contracts/serialization-of-cairo-types/
PoC :
Consider two types of NFT with different metadata (token's name
for instance) :
one consist of only normal ASCII characters (1 byte per character) : "Hello"
and the other use emoji (Multi-bytes per character) : "Hello 🦄"
For only normal ASCII characters (1 byte per character) "Hello" :
Run the following code :
The returned serialized data :
Decode it on Cairo/StarNet :
Add the test on byte_array_extra.cairo
, the test should run successfully and the string decoded properly
Now for multi-bytes string from L1 "Hello 🦄" :
Run the test :
The returned serialized data :
Decode it on Cairo/StarNet :
Add the test on byte_array_extra.cairo
, the test should run but it will failed to decode the string properly :
The metadata of bridged NFTs will not be properly decoded and set on the L2 collection, potentially causing the bridged L2 collection to lose its value.
Manual review
Implement custom multi-byte UTF-8 decoding in Cairo, or clearly state in the documentation that NFTs with multi-byte characters metadata are not supported.
Serialization and deserialization are made directly on bytes, no data are lost during the transfer.
The contest is live. Earn rewards by submitting a finding.
This is your time to appeal against judgements on your submissions.
Appeals are being carefully reviewed by our judges.