Normalize Unicode String << Back

A misunderstanding on my part caused a lengthy discussion of something semi-related. Take a look at this dicussion. In case you didn't fully read all the replies, one of the things that was discussed was the normalization of unicode strings before storing them into the database.

Take the Spanish alphabet ñ for example. It can be represented by a single Unicode code point 0xf1 (241), a precomposite character; or it can be represented by two Unicode code points 0x6e (110) and 0x303 (771), the alphabet n combined with a tilde character.

To the end users, this ñ is the same as this ñ; to the programmers, those two characters are different (at least different in length). The following function will help you to convert an unicode string to its normalized form (use one code point instead of two for the alphabet ñ).

#IF !@<200
	#ERROR 9999 "Requires DF 20+"
#ENDIF

Use UI

Define NormalizationC		For 1
Define NormalizationD		For 2

External_Function NormalizeString "NormalizeString" Normaliz.dll ;
	Integer NormForm ;
	WString lpSrcString ;
	Integer cwSrcLength ;
	Address lpDstString ;
	Integer cwDstLength ;
	Returns Integer

Function Normalize Global String sSrc Boolean bCompositionPass Returns String
	WString swDst swSrc
	String sDst
	Integer iBufferSize iLength iPos
	Boolean bComposition
	Move (If(Num_Arguments>1,bCompositionPass,True)) To bComposition
	Move sSrc to swSrc
	Move (SizeOfWString(swSrc) + 1) to iBufferSize
	If (Not(bComposition)) Move (iBufferSize * 2) to iBufferSize
	Move (Repeat(Character(0),iBufferSize)) to swDst
	Move (NormalizeString( ;
		If(bComposition, NormalizationC, NormalizationD), ;
		swSrc, ;
		-1, ;
		AddressOf(swDst), ;
		iBufferSize)) to iLength 
	Move swDst to sDst
    Move (Pos(Character(0),sDst)) to iPos
	Function_Return (Left(sDst,iPos-1))
End_Function