Normalize Unicode String
<< Back
A misunderstanding on my part caused a lengthy discussion of something semi-related. Take a look at this
dicussion. In case you didn't fully read all the replies, one of the things that was discussed was the normalization of unicode
strings before storing them into the database.
Take the Spanish alphabet ñ for example. It can be represented by a single Unicode code point 0xf1 (241),
a precomposite character; or it can be represented by two Unicode code points 0x6e (110) and 0x303 (771), the alphabet
n combined with a tilde character.
To the end users, this ñ is the same as this ñ; to the programmers, those
two characters are different (at least different in length). The following function will help you to convert an unicode string to its normalized
form (use one code point instead of two for the alphabet ñ).
#IF !@<200
#ERROR 9999 "Requires DF 20+"
#ENDIF
Use UI
Define NormalizationC For 1
Define NormalizationD For 2
External_Function NormalizeString "NormalizeString" Normaliz.dll ;
Integer NormForm ;
WString lpSrcString ;
Integer cwSrcLength ;
Address lpDstString ;
Integer cwDstLength ;
Returns Integer
Function Normalize Global String sSrc Boolean bCompositionPass Returns String
WString swDst swSrc
String sDst
Integer iBufferSize iLength iPos
Boolean bComposition
Move (If(Num_Arguments>1,bCompositionPass,True)) To bComposition
Move sSrc to swSrc
Move (SizeOfWString(swSrc) + 1) to iBufferSize
If (Not(bComposition)) Move (iBufferSize * 2) to iBufferSize
Move (Repeat(Character(0),iBufferSize)) to swDst
Move (NormalizeString( ;
If(bComposition, NormalizationC, NormalizationD), ;
swSrc, ;
-1, ;
AddressOf(swDst), ;
iBufferSize)) to iLength
Move swDst to sDst
Move (Pos(Character(0),sDst)) to iPos
Function_Return (Left(sDst,iPos-1))
End_Function