programing

T-SQL을 사용한 퍼지 매칭

minimums 2023. 4. 18. 21:56
반응형

T-SQL을 사용한 퍼지 매칭

개인 데이터 등이 포함된 테이블이 있습니다.많은 열이 있지만 여기서 한 번 주목해야 할 부분은 다음과 같습니다.addressindex,lastname ★★★★★★★★★★★★★★★★★」firstname서 ''는addressindex아파트 문까지 드릴로 뚫린 고유 주소입니다. '두 명 '아래처럼'이 두 명 lastname 는 ★★★★★★★★★★★★★★★★★★★.firstnames중복될 가능성이 가장 큽니다.

이 복제품들을 나열할 방법이 필요합니다.

tabledata:

personid     1
firstname    "Carl"
lastname     "Anderson"
addressindex 1

personid     2
firstname    "Carl Peter"
lastname     "Anderson"
addressindex 1

모든 열에 정확히 일치해야 하는 경우 이 방법을 알고 있지만 위의 예에서 나온 결과를 사용하여 다음과 같은 트릭을 수행하려면 퍼지 매치가 필요합니다.

Row     personid      addressindex     lastname     firstname
1       2             1                Anderson     Carl Peter
2       1             1                Anderson     Carl
.....

어떻게 하면 이 문제를 잘 해결할 수 있을까요?

SQL Server에서 제공하는 퍼지 매칭은 매우 투박하다는 것을 알게 되었습니다.Levenshtein 거리 알고리즘과 가중치를 사용한 CLR 기능은 정말 운이 좋았습니다.이 알고리즘을 사용하여 GetSimilarityScore라는 UDF를 만들었습니다.이 UDF는 2개의 스트링을 사용하여 0.0에서 1.0 사이의 점수를 반환합니다.1.0에 가까울수록 좋습니다.그런 다음 임계값 >=0.8 정도로 쿼리하여 가장 가능성이 높은 일치를 가져옵니다.다음과 같은 경우:

if object_id('tempdb..#similar') is not null drop table #similar
select a.id, (
    select top 1 x.id
   from MyTable x
   where x.id <> a.id
   order by dbo.GetSimilarityScore(a.MyField, x.MyField) desc
) as MostSimilarId
into #similar
from MyTable a

select *, dbo.GetSimilarityScore(a.MyField, c.MyField)
from MyTable a
join #similar b on a.id = b.id
join MyTable c on b.MostSimilarId = c.id

정말 큰 테이블에서는 하지 마세요.느린 과정입니다.

CLR UDF는 다음과 같습니다.

''' <summary>
''' Compute the distance between two strings.
''' </summary>
''' <param name="s1">The first of the two strings.</param>
''' <param name="s2">The second of the two strings.</param>
''' <returns>The Levenshtein cost.</returns>
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function ComputeLevenstheinDistance(ByVal string1 As SqlString, ByVal string2 As SqlString) As SqlInt32
    If string1.IsNull OrElse string2.IsNull Then Return SqlInt32.Null
    Dim s1 As String = string1.Value
    Dim s2 As String = string2.Value

    Dim n As Integer = s1.Length
    Dim m As Integer = s2.Length
    Dim d As Integer(,) = New Integer(n, m) {}

    ' Step 1
    If n = 0 Then Return m
    If m = 0 Then Return n

    ' Step 2
    For i As Integer = 0 To n
        d(i, 0) = i
    Next

    For j As Integer = 0 To m
        d(0, j) = j
    Next

    ' Step 3
    For i As Integer = 1 To n
        'Step 4
        For j As Integer = 1 To m
            ' Step 5
            Dim cost As Integer = If((s2(j - 1) = s1(i - 1)), 0, 1)

            ' Step 6
            d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
        Next
    Next
    ' Step 7
    Return d(n, m)
End Function

''' <summary>
''' Returns a score between 0.0-1.0 indicating how closely two strings match.  1.0 is a 100%
''' T-SQL equality match, and the score goes down from there towards 0.0 for less similar strings.
''' </summary>
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function GetSimilarityScore(string1 As SqlString, string2 As SqlString) As SqlDouble
    If string1.IsNull OrElse string2.IsNull Then Return SqlInt32.Null

    Dim s1 As String = string1.Value.ToUpper().TrimEnd(" "c)
    Dim s2 As String = string2.Value.ToUpper().TrimEnd(" "c)
    If s1 = s2 Then Return 1.0F ' At this point, T-SQL would consider them the same, so I will too

    Dim flatLevScore As Double = InternalGetSimilarityScore(s1, s2)

    Dim letterS1 As String = GetLetterSimilarityString(s1)
    Dim letterS2 As String = GetLetterSimilarityString(s2)
    Dim letterScore As Double = InternalGetSimilarityScore(letterS1, letterS2)

    'Dim wordS1 As String = GetWordSimilarityString(s1)
    'Dim wordS2 As String = GetWordSimilarityString(s2)
    'Dim wordScore As Double = InternalGetSimilarityScore(wordS1, wordS2)

    If flatLevScore = 1.0F AndAlso letterScore = 1.0F Then Return 1.0F
    If flatLevScore = 0.0F AndAlso letterScore = 0.0F Then Return 0.0F

    ' Return weighted result
    Return (flatLevScore * 0.2F) + (letterScore * 0.8F)
End Function

Private Shared Function InternalGetSimilarityScore(s1 As String, s2 As String) As Double
    Dim dist As SqlInt32 = ComputeLevenstheinDistance(s1, s2)
    Dim maxLen As Integer = If(s1.Length > s2.Length, s1.Length, s2.Length)
    If maxLen = 0 Then Return 1.0F
    Return 1.0F - Convert.ToDouble(dist.Value) / Convert.ToDouble(maxLen)
End Function

''' <summary>
''' Sorts all the alpha numeric characters in the string in alphabetical order
''' and removes everything else.
''' </summary>
Private Shared Function GetLetterSimilarityString(s1 As String) As String
    Dim allChars = If(s1, "").ToUpper().ToCharArray()
    Array.Sort(allChars)
    Dim result As New StringBuilder()
    For Each ch As Char In allChars
        If Char.IsLetterOrDigit(ch) Then
            result.Append(ch)
        End If
    Next
    Return result.ToString()
End Function

''' <summary>
''' Removes all non-alpha numeric characters and then sorts
''' the words in alphabetical order.
''' </summary>
Private Shared Function GetWordSimilarityString(s1 As String) As String
    Dim words As New List(Of String)()
    Dim curWord As StringBuilder = Nothing
    For Each ch As Char In If(s1, "").ToUpper()
        If Char.IsLetterOrDigit(ch) Then
            If curWord Is Nothing Then
                curWord = New StringBuilder()
            End If
            curWord.Append(ch)
        Else
            If curWord IsNot Nothing Then
                words.Add(curWord.ToString())
                curWord = Nothing
            End If
        End If
    Next
    If curWord IsNot Nothing Then
        words.Add(curWord.ToString())
    End If

    words.Sort(StringComparer.OrdinalIgnoreCase)
    Return String.Join(" ", words.ToArray())
End Function

여기에 있는 다른 유용한 정보 외에도 일반적으로 SOUNDEX보다 더 낫다고 여겨지는 이중 메타포네 음성 알고리즘을 사용하는 것을 고려해 보는 것이 좋습니다.

Tim Feiffer는 기사 Double Metaphone Sounds Great Convert the C++ Double Metaphone 알고리즘을 T-SQL(원래는 SQL Mag에서, 그 후 SQL Server Pro에서)에서 SQL에서의 구현에 대해 자세히 설명합니다.

이름 철자가 약간 틀리면 칼 대 칼 대 칼과 같은 이름을 일치시키는 데 도움이 됩니다..

업데이트: 실제 다운로드 가능한 코드가 사라진 것 같습니다만, 여기 github repo에서 발견된 구현이 있습니다.원래 코드를 복제한 것으로 보입니다.

SQL Server 전체 텍스트 인덱싱을 사용하면 검색 및 반환을 수행할 수 있으며 단어뿐만 아니라 철자 오류도 있을 수 있습니다.

Master Data Services의 첫 출시 이후 SOUNDEX가 구현하는 것보다 더 고급 퍼지 논리 알고리즘에 액세스할 수 있게 되었습니다.따라서 MDS가 설치되어 있는 경우 mdq 스키마(MDS 데이터베이스)에서 유사성()이라는 함수를 찾을 수 있습니다.

동작의 상세한 것에 대하여는, http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/ 를 참조해 주세요.

저는 개인적으로 Jaro-Winkler 알고리즘의 CLR 실장을 사용하고 있습니다.이 알고리즘은 약 15자를 넘는 문자열로 조금 고생하고 이메일 주소의 매칭을 좋아하지 않지만, 그 이외는 꽤 좋습니다.전체 실장 가이드는 여기에서 찾을 수 있습니다.

다음은 t-sql의 샘플입니다.

SELECT * 
FROM myTable1 as t1 
INNER JOIN myTable2 as t2 
ON dbo.StringDistance(t1.MyField, t2.MyField) > 0.85

Soundex에 비해 이 접근법의 장점은 오타를 더 쉽게 처리할 수 있다는 것입니다. 오타가 다른 소리를 낼 경우 SOUNDEX와 올바르게 일치하지 않기 때문입니다.

예를 들어, "type" 대신 "ytpe"를 입력하면 SOUNDEX에서 올바른 일치를 알 수 없습니다.SOUNDEX('ytpe') 및 SOUNDEX('type')에 대해 반환되는 코드는 다음과 같습니다.

Ytpe Y310

T100 타입

여기서 Jaro-Winkler String Distance('ytpe' 'type')를 사용하면 0.91(6)을 얻을 수 있습니다.이것은 좋은 일치를 의미합니다.1은 정확히 일치하므로 0.91이 매우 가깝습니다.

어떤 이유로든 CLR 함수를 사용할 수 없는 경우 SSIS 패키지(퍼지 변환 룩업 사용)를 통해 데이터를 실행해 보십시오.상세한 내용은 여기를 참조하십시오.

코드 파트 2 Redfilter 응답 링크의 경우:

참고 자료:
https://github.com/mb16/geocoderNet/blob/master/build/sql/doubleMetaphone.sql

BEGIN
                        SET @MP1 = @MP1 + 'N'
                        SET @MP2 = @MP2 + 'KN'
                    END
                                    ELSE
                    BEGIN
                        SET @MP1 = @MP1 + 'KN'
                        SET @MP2 = @MP2 + 'KN'
                    END
                END
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    --'tagliaro'
                    ELSE IF (dbo.fnStringAt((@CurrentPosition + 1),@Word,'LI')=1) AND 
(dbo.fnSlavoGermanic(@Word)=0)
                    BEGIN
                SET @MP1 = @MP1 + 'KL'
                SET @MP2 = @MP2 + 'L'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    -- -ges-,-gep-,-gel-, -gie- at beginning
            -- This call to fnStringAt() is the 'worst case' in number of values passed. A UDF that used DEFAULT values instead of
                        -- a multi-valued argument would require ten DEFAULT arguments for EP, EB, EL, etc. (assuming the first was not defined with a DEFAULT).
                    ELSE IF ((@CurrentPosition = 1)
                            AND ((SUBSTRING(@Word,@CurrentPosition + 1,1) = 'Y') 
                                    OR (dbo.fnStringAt((@CurrentPosition + 
1),@Word,'ES,EP,EB,EL,EY,IB,IL,IN,IE,EI,ER')=1)) )
                    BEGIN
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'J'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    -- -ger-,  -gy-
                    ELSE IF (((dbo.fnStringAt((@CurrentPosition + 1), @Word, 'ER')=1) OR 
(SUBSTRING(@Word,@CurrentPosition + 1,1) = 'Y'))
                                    AND (dbo.fnStringAt(1, @Word, 'DANGER,RANGER,MANGER')=0)
                                            AND (dbo.fnStringAt((@CurrentPosition - 1), @Word, 
'E,I,RGY,OGY')=0) )
                    BEGIN
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'J'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    -- italian e.g, 'biaggi'
                    ELSE IF (dbo.fnStringAt((@CurrentPosition + 1),@Word,'E,I,Y')=1) OR 
(dbo.fnStringAt((@CurrentPosition - 1),@Word,'AGGI,OGGI')=1)
                    BEGIN
                            --obvious germanic
                            IF ((dbo.fnStringAt(1,@Word,'VAN ,VON ,SCH')=1)
                                    OR (dbo.fnStringAt((@CurrentPosition + 1),@Word,'ET')=1))
                BEGIN
                    SET @MP1 = @MP1 + 'K'
                    SET @MP2 = @MP2 + 'K'
                END
                            ELSE
                BEGIN
                                    --always soft if french ending
                                    IF (dbo.fnStringAt((@CurrentPosition + 1),@Word,'IER ')=1)
                    BEGIN
                        SET @MP1 = @MP1 + 'J'
                        SET @MP2 = @MP2 + 'J'
                    END
                                    ELSE
                    BEGIN
                        SET @MP1 = @MP1 + 'J'
                        SET @MP2 = @MP2 + 'K'
                    END
                END
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

            ELSE
            BEGIN
                        IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'G')
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                END
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'K'
            END
        END

        ELSE IF @CurrentChar = 'H'
        BEGIN
                    --only keep if first & before vowel or btw. 2 vowels
                    IF (((@CurrentPosition = 1) OR 
(dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition - 1,1))=1)) 
                            AND (dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition + 1,1))=1))
                    BEGIN
                SET @MP1 = @MP1 + 'H'
                SET @MP2 = @MP2 + 'H'
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    --also takes care of 'HH'
                    ELSE 
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
        END

        ELSE IF @CurrentChar = 'J'
        BEGIN
                    --obvious spanish, 'jose', 'san jacinto'
                    IF (dbo.fnStringAt(@CurrentPosition,@Word,'JOSE')=1) OR 
(dbo.fnStringAt(1,@Word,'SAN ')=1)
                    BEGIN
                            IF (((@CurrentPosition = 1) AND (SUBSTRING(@Word,@CurrentPosition 
+ 4,1) = ' ')) OR (dbo.fnStringAt(1,@Word,'SAN ')=1) )
                BEGIN
                    SET @MP1 = @MP1 + 'H'
                    SET @MP2 = @MP2 + 'H'
                END
                            ELSE
                            BEGIN
                    SET @MP1 = @MP1 + 'J'
                    SET @MP2 = @MP2 + 'H'
                            END
                            SET @CurrentPosition = @CurrentPosition + 1
                    END

                    ELSE IF ((@CurrentPosition = 1) AND 
(dbo.fnStringAt(@CurrentPosition,@Word,'JOSE')=0))
            BEGIN
                SET @MP1 = @MP1 + 'J'
                --Yankelovich/Jankelowicz
                SET @MP2 = @MP2 + 'A' 
                        --it could happen!
                        IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'J') 
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                END
            END
                    ELSE
            BEGIN
                            --spanish pron. of e.g. 'bajador'
                            IF( (dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition - 1,1))=1)
                                    AND (dbo.fnSlavoGermanic(@Word)=0)
                                            AND ((SUBSTRING(@Word,@CurrentPosition + 1,1) = 'A') OR 
(SUBSTRING(@Word,@CurrentPosition + 1,1) = 'O')))
                BEGIN
                    SET @MP1 = @MP1 + 'J'
                    SET @MP2 = @MP2 + 'H'
                END
                            ELSE
                BEGIN
                                    IF (@CurrentPosition = @WordLength)
                    BEGIN
                        SET @MP1 = @MP1 + 'J'
                        SET @MP2 = @MP2 + ''
                    END
                                    ELSE
                    BEGIN
                                            IF ((dbo.fnStringAt((@CurrentPosition + 1), @Word, 
'L,T,K,S,N,M,B,Z')=0) 
                                                            AND (dbo.fnStringAt((@CurrentPosition - 1), @Word, 
'S,K,L')=0))
                        BEGIN
                            SET @MP1 = @MP1 + 'J'
                            SET @MP2 = @MP2 + 'J'
                        END
                    END
                END
                        --it could happen!
                        IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'J') 
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                END
            END
        END

        ELSE IF @CurrentChar = 'K'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'K')
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
            SET @MP1 = @MP1 + 'K'
            SET @MP2 = @MP2 + 'K'
        END

        ELSE IF @CurrentChar = 'L'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'L')
                    BEGIN
                            --spanish e.g. 'cabrillo', 'gallegos'
                            IF (((@CurrentPosition = (@WordLength - 3)) 
                                    AND (dbo.fnStringAt((@CurrentPosition - 
1),@Word,'ILLO,ILLA,ALLE')=1))
                                             OR (((dbo.fnStringAt((@WordLength - 1),@Word,'AS,OS')=1) 
OR (dbo.fnStringAt(@WordLength,@Word,'A,O')=1)) 
                                                    AND (dbo.fnStringAt((@CurrentPosition - 
1),@Word,'ALLE')=1)) )
                            BEGIN
                    SET @MP1 = @MP1 + 'L'
                    SET @MP2 = @MP2 + ''
                                SET @CurrentPosition = @CurrentPosition + 2
                            END
                ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                    SET @MP1 = @MP1 + 'L'
                    SET @MP2 = @MP2 + 'L'
                END
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
                SET @MP1 = @MP1 + 'L'
                SET @MP2 = @MP2 + 'L'
            END
        END

        ELSE IF @CurrentChar = 'M'
        BEGIN
                     --'dumb','thumb'
                    IF (((dbo.fnStringAt((@CurrentPosition - 1), @Word,'UMB')=1)
                            AND (((@CurrentPosition + 1) = @WordLength) OR 
(dbo.fnStringAt((@CurrentPosition + 2),@Word,'ER')=1)))

                                    OR (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'M') )
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
            SET @MP1 = @MP1 + 'M'
            SET @MP2 = @MP2 + 'M'
        END

        ELSE IF @CurrentChar = 'N'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'N')
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
            SET @MP1 = @MP1 + 'N'
            SET @MP2 = @MP2 + 'N'
        END

        ELSE IF @CurrentChar = 'Ñ'
        BEGIN
                        SET @CurrentPosition = @CurrentPosition + 1
            SET @MP1 = @MP1 + 'N'
            SET @MP2 = @MP2 + 'N'
        END

        ELSE IF @CurrentChar = 'P'
        BEGIN
                    --What about Michelle Pfeiffer, star of Grease 2? Price-Pfister?, Pfizer?
                    --Don't just look for an 'F' next, what about 'topflight', helpful, campfire, leapfrog, stepfather
                    --Sorry, Mark Knopfler, I don't know how to help you
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'H')

                OR ((@CurrentPosition = 1) AND 
(SUBSTRING(@Word,@CurrentPosition + 1,1) = 'F') AND 
(dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition+2,1))=1))
                    BEGIN
                SET @MP1 = @MP1 + 'F'
                SET @MP2 = @MP2 + 'F'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    --also account for "campbell", "raspberry"
                    ELSE 
            BEGIN
                IF (dbo.fnStringAt((@CurrentPosition + 1),@Word, 'P,B')=1)
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                END
                SET @MP1 = @MP1 + 'P'
                SET @MP2 = @MP2 + 'P'
            END
        END

        ELSE IF @CurrentChar = 'Q'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'Q')
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
            SET @MP1 = @MP1 + 'K'
            SET @MP2 = @MP2 + 'K'
        END

        ELSE IF @CurrentChar = 'R'
        BEGIN
            --QQ: Will SQL short circuit eval? Otherwise, I could try to read before string begins here...
                    --french e.g. 'rogier', but exclude 'hochmeier'
                    IF ((@CurrentPosition = @WordLength)
                            AND (dbo.fnSlavoGermanic(@Word)=0)
                                    AND (dbo.fnStringAt((@CurrentPosition - 2), @Word, 'IE')=1) 
                                            AND (dbo.fnStringAt((@CurrentPosition - 4), @Word, 
'ME,MA')=0))
            BEGIN
                SET @MP1 = @MP1 + ''
                SET @MP2 = @MP2 + 'R'
            END
                    ELSE
            BEGIN
                SET @MP1 = @MP1 + 'R'
                SET @MP2 = @MP2 + 'R'
            END

                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'R')
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
        END

        ELSE IF @CurrentChar = 'S'
        BEGIN
                    --special cases 'island', 'isle', 'carlisle', 'carlysle'
                    IF (dbo.fnStringAt((@CurrentPosition - 1), @Word, 'ISL,YSL')=1)
                    BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
                    END

                    --special case 'sugar-'
                    ELSE IF ((@CurrentPosition = 1) AND (dbo.fnStringAt(@CurrentPosition, 
@Word, 'SUGAR')=1))
                    BEGIN
                SET @MP1 = @MP1 + 'X'
                SET @MP2 = @MP2 + 'S'
                            SET @CurrentPosition = @CurrentPosition + 1
                    END

                    ELSE IF (dbo.fnStringAt(@CurrentPosition, @Word, 'SH')=1)
                    BEGIN
                            --germanic
                            IF (dbo.fnStringAt((@CurrentPosition + 1), @Word, 
'HEIM,HOEK,HOLM,HOLZ')=1)
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'S'
                END
                            ELSE
                BEGIN
                    SET @MP1 = @MP1 + 'X'
                    SET @MP2 = @MP2 + 'X'
                END
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    --italian & armenian
                    ELSE IF (dbo.fnStringAt(@CurrentPosition, @Word, 'SIO,SIA')=1) OR 
(dbo.fnStringAt(@CurrentPosition, @Word, 'SIAN')=1)
                    BEGIN
                            IF (dbo.fnSlavoGermanic(@Word)=0)
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'X'
                END
                            ELSE
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'S'
                END
                            SET @CurrentPosition = @CurrentPosition + 3
                    END

                    --german & anglicisations, e.g. 'smith' match 'schmidt', 'snider' match 'schneider'
                    --also, -sz- in slavic language altho in hungarian it is pronounced 's'
                    ELSE IF (((@CurrentPosition = 1) 
                                    AND (dbo.fnStringAt((@CurrentPosition + 1), @Word, 'M,N,L,W')=1))
                                            OR (dbo.fnStringAt((@CurrentPosition + 1), @Word, 'Z')=1))
                    BEGIN
                SET @MP1 = @MP1 + 'S'
                SET @MP2 = @MP2 + 'X'
                            IF (dbo.fnStringAt((@CurrentPosition + 1), @Word, 'Z')=1)
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                            ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                            END
                    END

                    ELSE IF (dbo.fnStringAt(@CurrentPosition, @Word, 'SC')=1)
                    BEGIN
                            --Schlesinger's rule
                            IF (SUBSTRING(@Word,@CurrentPosition + 2,1) = 'H')
                BEGIN
                                    --dutch origin, e.g. 'school', 'schooner'
                                    IF (dbo.fnStringAt((@CurrentPosition + 3), @Word, 
'OO,ER,EN,UY,ED,EM')=1)
                                    BEGIN
                                            --'schermerhorn', 'schenker'
                                            IF (dbo.fnStringAt((@CurrentPosition + 3), @Word, 'ER,EN')=1)
                                            BEGIN
                            SET @MP1 = @MP1 + 'X'
                            SET @MP2 = @MP2 + 'SK'
                        END
                                            ELSE
                        BEGIN
                            SET @MP1 = @MP1 + 'SK'
                            SET @MP2 = @MP2 + 'SK'
                        END
                                    SET @CurrentPosition = @CurrentPosition + 3
                    END
                                    ELSE
                    BEGIN
                                            IF ((@CurrentPosition = 1) AND 
(dbo.fnIsVowel(SUBSTRING(@Word,3,1))=0) AND (SUBSTRING(@Word,3,1) <> 'W'))
                        BEGIN
                            SET @MP1 = @MP1 + 'X'
                            SET @MP2 = @MP2 + 'S'
                        END
                                            ELSE
                        BEGIN
                            SET @MP1 = @MP1 + 'X'
                            SET @MP2 = @MP2 + 'X'
                        END
                                    SET @CurrentPosition = @CurrentPosition + 3
                                    END
                END

                            ELSE IF (dbo.fnStringAt((@CurrentPosition + 2), @Word, 'I,E,Y')=1)
                            BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'S'
                                SET @CurrentPosition = @CurrentPosition + 3
                            END
                            ELSE
                BEGIN
                    SET @MP1 = @MP1 + 'SK'
                    SET @MP2 = @MP2 + 'SK'
                                SET @CurrentPosition = @CurrentPosition + 3
                            END
                    END

                    ELSE 
            BEGIN
                        --french e.g. 'resnais', 'artois'
                IF ((@CurrentPosition = @WordLength) AND 
(dbo.fnStringAt((@CurrentPosition - 2), @Word, 'AI,OI')=1))
                BEGIN
                    SET @MP1 = @MP1 + ''
                    SET @MP2 = @MP2 + 'S'
                END
                        ELSE
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'S'
                END

                        IF (dbo.fnStringAt((@CurrentPosition + 1), @Word, 'S,Z')=1)
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                END
            END
        END

        ELSE IF @CurrentChar = 'T'
        BEGIN
                    IF (dbo.fnStringAt(@CurrentPosition, @Word, 'TION,TIA,TCH')=1)
                    BEGIN
                SET @MP1 = @MP1 + 'X'
                SET @MP2 = @MP2 + 'X'
                            SET @CurrentPosition = @CurrentPosition + 3
            END

                    ELSE IF (dbo.fnStringAt(@CurrentPosition, @Word, 'TH,TTH')=1) 
                    BEGIN
                            --special case 'thomas', 'thames' or germanic
                            IF (dbo.fnStringAt((@CurrentPosition + 2), @Word, 'OM,AM')=1) 
                                    OR (dbo.fnStringAt(1, @Word, 'VAN ,VON ,SCH')=1) 
                            BEGIN
                    SET @MP1 = @MP1 + 'T'
                    SET @MP2 = @MP2 + 'T'
                END
                            ELSE    
                BEGIN
                    SET @MP1 = @MP1 + '0'
                    SET @MP2 = @MP2 + 'T'
                            END
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

            ELSE
            BEGIN
                        IF (dbo.fnStringAt((@CurrentPosition + 1), @Word, 'T,D')=1)
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                END
                SET @MP1 = @MP1 + 'T'
                SET @MP2 = @MP2 + 'T'
            END
        END

        ELSE IF @CurrentChar = 'V'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'V')
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
            SET @MP1 = @MP1 + 'F'
            SET @MP2 = @MP2 + 'F'
        END

        ELSE IF @CurrentChar = 'W'
        BEGIN
                    --can also be in middle of word
                    IF (dbo.fnStringAt(@CurrentPosition, @Word, 'WR')=1)
                    BEGIN
                SET @MP1 = @MP1 + 'R'
                SET @MP2 = @MP2 + 'R'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    ELSE IF ((@CurrentPosition = 1) 
                            AND ((dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition + 1,1))=1) 
OR (dbo.fnStringAt(@CurrentPosition, @Word, 'WH')=1)))
                    BEGIN
                            --Wasserman should match Vasserman
                            IF (dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition + 1,1))=1)
                BEGIN
                    SET @MP1 = @MP1 + 'A'
                    SET @MP2 = @MP2 + 'F'
                END
                            ELSE
                BEGIN
                                    --need Uomo to match Womo
                    SET @MP1 = @MP1 + 'A'
                    SET @MP2 = @MP2 + 'A'
                END
                            SET @CurrentPosition = @CurrentPosition + 1
                    END

                    --Arnow should match Arnoff
                    ELSE IF (((@CurrentPosition = @WordLength) AND 
(dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition - 1,1))=1)) 
                            OR (dbo.fnStringAt((@CurrentPosition - 1), @Word, 
'EWSKI,EWSKY,OWSKI,OWSKY')=1) 
                                            OR (dbo.fnStringAt(1, @Word, 'SCH')=1))
                    BEGIN
                SET @MP1 = @MP1 + ''
                SET @MP2 = @MP2 + 'F'
                            SET @CurrentPosition = @CurrentPosition + 1
            END

                    --polish e.g. 'filipowicz'
                    ELSE IF (dbo.fnStringAt(@CurrentPosition, @Word, 'WICZ,WITZ')=1)
                    BEGIN
                SET @MP1 = @MP1 + 'TS'
                SET @MP2 = @MP2 + 'FX'
                            SET @CurrentPosition = @CurrentPosition + 4
            END
    -- skip it
                    ELSE 
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
        END

        ELSE IF @CurrentChar = 'X'
        BEGIN
                    --french e.g. breaux
                    IF (NOT((@CurrentPosition = @WordLength) 
                            AND ((dbo.fnStringAt((@CurrentPosition - 3), @Word, 'IAU,EAU')=1) 
                                            OR (dbo.fnStringAt((@CurrentPosition - 2), @Word, 
'AU,OU')=1))) )
            BEGIN
                SET @MP1 = @MP1 + 'KS'
                SET @MP2 = @MP2 + 'KS'
            END

                    IF (dbo.fnStringAt((@CurrentPosition + 1), @Word, 'C,X')=1)
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
        END

        ELSE IF @CurrentChar = 'Z'
        BEGIN
                    --chinese pinyin e.g. 'zhao'
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'H')
                    BEGIN
                SET @MP1 = @MP1 + 'J'
                SET @MP2 = @MP2 + 'J'
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            IF ((dbo.fnStringAt((@CurrentPosition + 1), @Word, 'ZO,ZI,ZA')=1) 
                                    OR ((dbo.fnSlavoGermanic(@Word)=1) AND ((@CurrentPosition > 
1) AND SUBSTRING(@Word,@CurrentPosition - 1,1) <> 'T')))
                            BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'TS'
                            END
                            ELSE
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'S'
                END
                        IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'Z')
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                        ELSE
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 1
                        END
            END
        END

            ELSE
        BEGIN
                        SET @CurrentPosition = @CurrentPosition + 1
        END
    END

        --only give back 4 char metaphone
        IF (LEN(@MP1) > 4)
    BEGIN
        SET @MP1 = LEFT(@MP1, 4)
    END
        IF (LEN(@MP2) > 4)
    BEGIN
        SET @MP2 = LEFT(@MP2, 4)
    END
    IF @MP2 = @MP1
    BEGIN
        SET @MP2 = ''
    END

    INSERT @DMP(Metaphone1,Metaphone2) VALUES( @MP1, @MP2 )
    RETURN
END
GO;
------------------------------------------------------------------------
IF OBJECT_ID('fnDoubleMetaphoneScalar') IS NOT NULL BEGIN DROP FUNCTION 
fnDoubleMetaphoneScalar END
GO;
CREATE FUNCTION fnDoubleMetaphoneScalar( @MetaphoneType int, @Word varchar(50) )
RETURNS char(4)
AS
BEGIN
        RETURN (SELECT CASE @MetaphoneType WHEN 1 THEN Metaphone1 
WHEN 2 THEN Metaphone2 END FROM fnDoubleMetaphoneTable( @Word ))
END

Red Filter 코드를 두 부분으로 붙여 링크 부패를 방지하다

참고 자료:
https://github.com/mb16/geocoderNet/blob/master/build/sql/doubleMetaphone.sql 파트 1:

--WEB LISTING 1: Double Metaphone Script

-------------------------------------
IF OBJECT_ID('fnIsVowel') IS NOT NULL BEGIN DROP FUNCTION fnIsVowel END
GO;
CREATE FUNCTION fnIsVowel( @c char(1) )
RETURNS bit
AS
BEGIN
    IF (@c = 'A') OR (@c = 'E') OR (@c = 'I') OR (@c = 'O') OR (@c = 'U') OR (@c = 'Y') 
    BEGIN
        RETURN 1
    END
    --'ELSE' would worry SQL Server, it wants RETURN last in a scalar function
    RETURN 0
END
GO;
-----------------------------------------------
IF OBJECT_ID('fnSlavoGermanic') IS NOT NULL BEGIN DROP FUNCTION fnSlavoGermanic 
END
GO;
CREATE FUNCTION fnSlavoGermanic( @Word char(50) )
RETURNS bit
AS
BEGIN
    --Catch NULL also...
    IF (CHARINDEX('W',@Word) > 0) OR (CHARINDEX('K',@Word) > 0) OR 
(CHARINDEX('CZ',@Word) > 0)
    --'WITZ' test is in original Lawrence Philips C++ code, but appears to be a subset of the first test for 'W'
    -- OR (CHARINDEX('WITZ',@Word) > 0)
    BEGIN
        RETURN 1
    END
    --ELSE
        RETURN 0
END
GO;
---------------------------------------------------------------------------------------------------------------------------------
----------------------
--Lawrence Philips calls for a length argument, but this has two drawbacks:
--1. All target strings must be of the same length
--2. It presents an opportunity for subtle bugs, ie fnStringAt( 1, 7, 'Search me please', 'Search' ) returns 0 (no matter what is in the searched string)
--So I've eliminated the argument and fnStringAt checks the length of each target as it executes

--DEFAULTS suck with UDFs. Have to specify DEFAULT in caller - why bother?
IF OBJECT_ID('fnStringAtDef') IS NOT NULL BEGIN DROP FUNCTION fnStringAtDef END
GO;
CREATE FUNCTION fnStringAtDef( @Start int, @StringToSearch varchar(50), 
    @Target1 varchar(50), 
    @Target2 varchar(50) = NULL,
    @Target3 varchar(50) = NULL,
    @Target4 varchar(50) = NULL,
    @Target5 varchar(50) = NULL,
    @Target6 varchar(50) = NULL )
RETURNS bit
AS
BEGIN
    IF CHARINDEX(@Target1,@StringToSearch,@Start) > 0 RETURN 1
    --2 Styles, test each optional argument for NULL, nesting further tests
    --or just take advantage of CHARINDEX behavior with a NULL arg (unless 65 compatibility - code check before CREATE FUNCTION?
    --Style 1:
    --IF @Target2 IS NOT NULL
    --BEGIN
    --  IF CHARINDEX(@Target2,@StringToSearch,@Start) > 0 RETURN 1
    -- (etc.)
    --END
    --Style 2:
    IF CHARINDEX(@Target2,@StringToSearch,@Start) > 0 RETURN 1
    IF CHARINDEX(@Target3,@StringToSearch,@Start) > 0 RETURN 1
    IF CHARINDEX(@Target4,@StringToSearch,@Start) > 0 RETURN 1
    IF CHARINDEX(@Target5,@StringToSearch,@Start) > 0 RETURN 1
    IF CHARINDEX(@Target6,@StringToSearch,@Start) > 0 RETURN 1
    RETURN 0
END
GO;
-------------------------------------------------------------------------------------------------
IF OBJECT_ID('fnStringAt') IS NOT NULL BEGIN DROP FUNCTION fnStringAt END
GO;
CREATE FUNCTION fnStringAt( @Start int, @StringToSearch varchar(50), @TargetStrings 
varchar(2000) )
RETURNS bit
AS
BEGIN
    DECLARE @SingleTarget varchar(50)
    DECLARE @CurrentStart int
    DECLARE @CurrentLength int

    --Eliminate special cases
    --Trailing space is needed to check for end of word in some cases, so always append comma
    --loop tests should fairly quickly ignore ',,' termination
    SET @TargetStrings = @TargetStrings + ','

    SET @CurrentStart = 1
    --Include terminating comma so spaces don't get truncated
    SET @CurrentLength = (CHARINDEX(',',@TargetStrings,@CurrentStart) - 
@CurrentStart) + 1
    SET @SingleTarget = SUBSTRING(@TargetStrings,@CurrentStart,@CurrentLength)
    WHILE LEN(@SingleTarget) > 1
    BEGIN
        IF SUBSTRING(@StringToSearch,@Start,LEN(@SingleTarget)-1) = 
LEFT(@SingleTarget,LEN(@SingleTarget)-1)
        BEGIN
            RETURN 1
        END
        SET @CurrentStart = (@CurrentStart + @CurrentLength)
        SET @CurrentLength = (CHARINDEX(',',@TargetStrings,@CurrentStart) - 
@CurrentStart) + 1
        IF NOT @CurrentLength > 1 --getting trailing comma 
        BEGIN
            BREAK
        END
        SET @SingleTarget = 
SUBSTRING(@TargetStrings,@CurrentStart,@CurrentLength)
    END
    RETURN 0
END
GO;
------------------------------------------------------------------------
IF OBJECT_ID('fnDoubleMetaphoneTable') IS NOT NULL BEGIN DROP FUNCTION 
fnDoubleMetaphoneTable END
GO;
CREATE FUNCTION fnDoubleMetaphoneTable( @Word varchar(50) )
RETURNS @DMP TABLE ( Metaphone1 char(4), Metaphone2 char(4) )
AS
BEGIN
    DECLARE @MP1 varchar(4), @MP2 varchar(4)
    SET @MP1 = ''
    SET @MP2 = ''
    DECLARE @CurrentPosition int, @WordLength int, @CurrentChar char(1)
    SET @CurrentPosition = 1
    SET @WordLength = LEN(@Word)

    IF @WordLength < 1 
    BEGIN
        RETURN
    END

    --ensure case insensitivity
    SET @Word = UPPER(@Word)

    IF dbo.fnStringAt(1, @Word, 'GN,KN,PN,WR,PS') = 1 
    BEGIN
        SET @CurrentPosition = @CurrentPosition + 1
    END

    IF 'X' = LEFT(@Word,1)
    BEGIN
        SET @MP1 = @MP1 + 'S'
        SET @MP2 = @MP2 + 'S'
        SET @CurrentPosition = @CurrentPosition + 1
    END

    WHILE (4 > LEN(RTRIM(@MP1))) OR (4 > LEN(RTRIM(@MP2)))
    BEGIN
        IF @CurrentPosition > @WordLength 
        BEGIN
            BREAK
        END

        SET @CurrentChar = SUBSTRING(@Word,@CurrentPosition,1)

        IF @CurrentChar IN('A','E','I','O','U','Y')
        BEGIN
            IF @CurrentPosition = 1 
            BEGIN
                SET @MP1 = @MP1 + 'A'
                SET @MP2 = @MP2 + 'A'
            END
            SET @CurrentPosition = @CurrentPosition + 1
        END
        ELSE IF @CurrentChar = 'B'
        BEGIN
            SET @MP1 = @MP1 + 'P'
            SET @MP2 = @MP2 + 'P'
            IF 'B' = SUBSTRING(@Word,@CurrentPosition + 1,1) 
            BEGIN
                SET @CurrentPosition = @CurrentPosition + 2
            END
            ELSE 
            BEGIN
                SET @CurrentPosition = @CurrentPosition + 1
            END
        END
        ELSE IF @CurrentChar = 'Ç'
        BEGIN
            SET @MP1 = @MP1 + 'S'
            SET @MP2 = @MP2 + 'S'
            SET @CurrentPosition = @CurrentPosition + 1
        END
        ELSE IF @CurrentChar = 'C'
        BEGIN
            --various germanic
            IF (@CurrentPosition > 2) 
               AND (dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition-2,1))=0) 
               AND (dbo.fnStringAt(@CurrentPosition-1,@Word,'ACH') = 1) 
               AND ((SUBSTRING(@Word,@CurrentPosition+2,1) <> 'I') 
                AND ((SUBSTRING(@Word,@CurrentPosition+2,1) <> 'E') OR 
(dbo.fnStringAt(@CurrentPosition-2,@Word,'BACHER,MACHER')=1)))
            BEGIN
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'K'
                SET @CurrentPosition = @CurrentPosition + 2
            END
            -- 'caesar'
            ELSE IF (@CurrentPosition = 1) AND 
(dbo.fnStringAt(@CurrentPosition,@Word,'CAESAR') = 1)
            BEGIN
                SET @MP1 = @MP1 + 'S'
                SET @MP2 = @MP2 + 'S'
                SET @CurrentPosition = @CurrentPosition + 2
            END
            -- 'chianti'
            ELSE IF dbo.fnStringAt(@CurrentPosition,@Word,'CHIA') = 1
            BEGIN
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'K'
                SET @CurrentPosition = @CurrentPosition + 2
            END
            ELSE IF dbo.fnStringAt(@CurrentPosition,@Word,'CH') = 1
            BEGIN
                -- Find 'michael'
                IF (@CurrentPosition > 1) AND 
(dbo.fnStringAt(@CurrentPosition,@Word,'CHAE') = 1)
                BEGIN
                    --First instance of alternate encoding
                    SET @MP1 = @MP1 + 'K'
                    SET @MP2 = @MP2 + 'X'
                    SET @CurrentPosition = @CurrentPosition + 2
                END
                --greek roots e.g. 'chemistry', 'chorus'
                ELSE IF (@CurrentPosition = 1) AND (dbo.fnStringAt(2, @Word, 
'HARAC,HARIS,HOR,HYM,HIA,HEM') = 1) AND (dbo.fnStringAt(1,@Word,'CHORE') = 0)
                BEGIN
                    SET @MP1 = @MP1 + 'K'
                    SET @MP2 = @MP2 + 'K'
                    SET @CurrentPosition = @CurrentPosition + 2
                END
                --germanic, greek, or otherwise 'ch' for 'kh' sound
                ELSE IF ((dbo.fnStringAt(1,@Word,'VAN ,VON ,SCH')=1) OR 
                   (dbo.fnStringAt(@CurrentPosition-
2,@Word,'ORCHES,ARCHIT,ORCHID')=1) OR 
                   (dbo.fnStringAt(@CurrentPosition+2,@Word,'T,S')=1) OR 
                   (((dbo.fnStringAt(@CurrentPosition-1,@Word,'A,O,U,E')=1) 
OR 
                   (@CurrentPosition = 1))  
                   AND 
(dbo.fnStringAt(@CurrentPosition+2,@Word,'L,R,N,M,B,H,F,V,W, ')=1)))
                BEGIN
                    SET @MP1 = @MP1 + 'K'
                    SET @MP2 = @MP2 + 'K'
                    SET @CurrentPosition = @CurrentPosition + 2
                END
                ELSE
                BEGIN
                    --is this a given?
                    IF (@CurrentPosition > 1)   
                    BEGIN
                        IF (dbo.fnStringAt(1,@Word,'MC') = 1)
                        BEGIN
                            --eg McHugh
                            SET @MP1 = @MP1 + 'K'
                            SET @MP2 = @MP2 + 'K'
                        END
                        ELSE
                        BEGIN
                            --Alternate encoding
                            SET @MP1 = @MP1 + 'X'
                            SET @MP2 = @MP2 + 'K'
                        END
                    END
                    ELSE
                    BEGIN
                        SET @MP1 = @MP1 + 'X'
                        SET @MP2 = @MP2 + 'X'
                    END
                    SET @CurrentPosition = @CurrentPosition + 2
                END
            END
                    --e.g, 'czerny'
                    ELSE IF (dbo.fnStringAt(@CurrentPosition,@Word,'CZ')=1) AND 
(dbo.fnStringAt((@CurrentPosition - 2),@Word,'WICZ')=0)
                    BEGIN
                SET @MP1 = @MP1 + 'S'
                SET @MP2 = @MP2 + 'X'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    --e.g., 'focaccia'
                    ELSE IF(dbo.fnStringAt((@CurrentPosition + 1),@Word,'CIA')=1)
                    BEGIN
                SET @MP1 = @MP1 + 'X'
                SET @MP2 = @MP2 + 'X'
                            SET @CurrentPosition = @CurrentPosition + 3
                    END

                    --double 'C', but not if e.g. 'McClellan'
                    ELSE IF(dbo.fnStringAt(@CurrentPosition,@Word,'CC')=1) AND NOT 
((@CurrentPosition = 2) AND (LEFT(@Word,1) = 'M'))
                            --'bellocchio' but not 'bacchus'
                            IF (dbo.fnStringAt((@CurrentPosition + 2),@Word,'I,E,H')=1) AND 
(dbo.fnStringAt((@CurrentPosition + 2),@Word,'HU')=0)
                            BEGIN
                                    --'accident', 'accede' 'succeed'
                                    IF (((@CurrentPosition = 2) AND 
(SUBSTRING(@Word,@CurrentPosition - 1,1) = 'A')) 
                                                    OR (dbo.fnStringAt((@CurrentPosition - 
1),@Word,'UCCEE,UCCES')=1))
                    BEGIN
                        SET @MP1 = @MP1 + 'KS'
                        SET @MP2 = @MP2 + 'KS'
                    END
                                    --'bacci', 'bertucci', other italian
                                    ELSE
                    BEGIN
                        SET @MP1 = @MP1 + 'X'
                        SET @MP2 = @MP2 + 'X'
                    END
                                SET @CurrentPosition = @CurrentPosition + 3
                END
                            --Pierce's rule
                            ELSE 
                BEGIN
                    SET @MP1 = @MP1 + 'K'
                    SET @MP2 = @MP2 + 'K'
                                SET @CurrentPosition = @CurrentPosition + 2
                            END

                    ELSE IF (dbo.fnStringAt(@CurrentPosition,@Word,'CK,CG,CQ')=1)
                    BEGIN
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'K'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    ELSE IF (dbo.fnStringAt(@CurrentPosition,@Word,'CI,CE,CY')=1)
                    BEGIN
                            --italian vs. english
                            IF (dbo.fnStringAt(@CurrentPosition,@Word,'CIO,CIE,CIA')=1)
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'X'
                END
                            ELSE
                BEGIN
                    SET @MP1 = @MP1 + 'S'
                    SET @MP2 = @MP2 + 'S'
                END
                            SET @CurrentPosition = @CurrentPosition + 2
                    END

                    ELSE
            BEGIN
                SET @MP1 = @MP1 + 'K'
                SET @MP2 = @MP2 + 'K'

                        --name sent in 'mac caffrey', 'mac gregor
                        IF (dbo.fnStringAt((@CurrentPosition + 1),@Word,' C, Q, G')=1)
                BEGIN
                                SET @CurrentPosition = @CurrentPosition + 3
                END
                        ELSE
                BEGIN
                                IF (dbo.fnStringAt((@CurrentPosition + 1),@Word,'C,K,Q')=1)
                                        AND (dbo.fnStringAt((@CurrentPosition + 1), 2, 'CE,CI')=0)
                    BEGIN
                                    SET @CurrentPosition = @CurrentPosition + 2
                    END
                                ELSE
                    BEGIN
                                    SET @CurrentPosition = @CurrentPosition + 1
                    END
                        END
            END

        END
        ELSE IF @CurrentChar = 'D'
        BEGIN
                    IF (dbo.fnStringAt(@CurrentPosition, @Word, 'DG')=1)
            BEGIN
                            IF (dbo.fnStringAt((@CurrentPosition + 2),@Word,'I,E,Y')=1)
                            BEGIN
                                    --e.g. 'edge'
                    SET @MP1 = @MP1 + 'J'
                    SET @MP2 = @MP2 + 'J'
                                SET @CurrentPosition = @CurrentPosition + 3
                            END
                            ELSE
                BEGIN
                                    --e.g. 'edgar'
                    SET @MP1 = @MP1 + 'TK'
                    SET @MP2 = @MP2 + 'TK'
                                SET @CurrentPosition = @CurrentPosition + 2
                            END
            END
                    ELSE IF (dbo.fnStringAt(@CurrentPosition,@Word,'DT,DD')=1)
                    BEGIN
                SET @MP1 = @MP1 + 'T'
                SET @MP2 = @MP2 + 'T'
                            SET @CurrentPosition = @CurrentPosition + 2
                    END
                    ELSE
            BEGIN
                SET @MP1 = @MP1 + 'T'
                SET @MP2 = @MP2 + 'T'
                            SET @CurrentPosition = @CurrentPosition + 1
            END
        END

        ELSE IF @CurrentChar = 'F'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'F')
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 2
            END
                    ELSE
            BEGIN
                            SET @CurrentPosition = @CurrentPosition + 1
            END
            SET @MP1 = @MP1 + 'F'
            SET @MP2 = @MP2 + 'F'
        END

        ELSE IF @CurrentChar = 'G'
        BEGIN
                    IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'H')
                    BEGIN
                            IF (@CurrentPosition > 1) AND 
(dbo.fnIsVowel(SUBSTRING(@Word,@CurrentPosition - 1,1)) = 0)
                            BEGIN
                    SET @MP1 = @MP1 + 'K'
                    SET @MP2 = @MP2 + 'K'
                                SET @CurrentPosition = @CurrentPosition + 2
                            END
                                --'ghislane', ghiradelli
                                ELSE IF (@CurrentPosition = 1)
                                BEGIN 
                                        IF (SUBSTRING(@Word,@CurrentPosition + 2,1) = 'I')
                    BEGIN
                        SET @MP1 = @MP1 + 'J'
                        SET @MP2 = @MP2 + 'J'
                    END
                                        ELSE
                    BEGIN
                        SET @MP1 = @MP1 + 'K'
                        SET @MP2 = @MP2 + 'K'
                    END
                                SET @CurrentPosition = @CurrentPosition + 2
                            END
                            --Parker's rule (with some further refinements) - e.g., 'hugh'
                            ELSE IF (((@CurrentPosition > 2) AND (dbo.fnStringAt((@CurrentPosition 
- 2),@Word,'B,H,D')=1) )
                                    --e.g., 'bough'
                                    OR ((@CurrentPosition > 3) AND (dbo.fnStringAt((@CurrentPosition 
- 3),@Word,'B,H,D')=1) )
                                    --e.g., 'broughton'
                                    OR ((@CurrentPosition > 4) AND (dbo.fnStringAt((@CurrentPosition 
- 4),@Word,'B,H')=1) ) )
                            BEGIN
                                SET @CurrentPosition = @CurrentPosition + 2
                END
                            ELSE
                BEGIN
                                    --e.g., 'laugh', 'McLaughlin', 'cough', 'gough', 'rough', 'tough'
                                    IF ((@CurrentPosition > 3) 
                                            AND (SUBSTRING(@Word,@CurrentPosition - 1,1) = 'U') 
                                            AND (dbo.fnStringAt((@CurrentPosition - 
3),@Word,'C,G,L,R,T')=1) )
                                    BEGIN
                        SET @MP1 = @MP1 + 'F'
                        SET @MP2 = @MP2 + 'F'
                    END
                                    ELSE
                    BEGIN
                                            IF ((@CurrentPosition > 1) AND 
SUBSTRING(@Word,@CurrentPosition - 1,1) <> 'I')
                        BEGIN
                            SET @MP1 = @MP1 + 'K'
                            SET @MP2 = @MP2 + 'K'
                        END
                    END

                                SET @CurrentPosition = @CurrentPosition + 2
                            END
                    END

                    ELSE IF (SUBSTRING(@Word,@CurrentPosition + 1,1) = 'N')
                    BEGIN
                            IF ((@CurrentPosition = 2) AND (dbo.fnIsVowel(LEFT(@Word,1))=1) AND 
(dbo.fnSlavoGermanic(@Word)=0))
                            BEGIN
                    SET @MP1 = @MP1 + 'KN'
                    SET @MP2 = @MP2 + 'N'
                END
                            ELSE
                BEGIN
                                    --not e.g. 'cagney'
                                    IF ((dbo.fnStringAt((@CurrentPosition + 2),@Word,'EY')=0) 
                                                    AND (SUBSTRING(@Word,@CurrentPosition + 1,1) <> 
'Y') AND (dbo.fnSlavoGermanic(@Word)=0))

SQL Server에서 SOUNDEX 및 관련 Difference 기능을 사용하여 유사한 이름을 찾을 수 있습니다.MSDN에 대한 참조는 다음과 같습니다.

중복 제거에 관해서는 스트링이 분할되어 일치하는 것이 첫 번째 컷입니다.워크로드 감소 및/또는 더 나은 결과를 얻기 위해 활용할 수 있는 데이터에 대해 알려진 항목이 있는 경우 항상 이러한 항목을 활용하는 것이 좋습니다.대부분의 경우 중복 제거를 위해 수동 작업을 완전히 제거하는 것은 불가능하지만, 가능한 한 많은 정보를 자동으로 포착한 후 "불확실성 사례" 보고서를 생성하면 훨씬 쉽게 작업을 수행할 수 있습니다.

이름 매칭에 대해서: SOUNDEX는 매칭 품질에 좋지 않고, 특히 목표와 너무 멀리 떨어져 있기 때문에 작업 타입에 좋지 않습니다.이중 메타포네 결과와 Levenshtein 거리를 조합하여 이름 매칭을 수행하는 것이 좋습니다.적절한 바이어스를 사용하면 이 기능은 매우 잘 작동하며, 알려진 항목을 정리한 후 두 번째 패스로 사용할 수 있습니다.

SSIS 패키지를 사용하여 Fuzzy Lookup and Grouping 변환(http://msdn.microsoft.com/en-us/library/ms345128(SQL.90).aspx)을 참조하는 것도 고려할 수 있습니다.

SQL 전체 텍스트 검색(http://msdn.microsoft.com/en-us/library/cc879300.aspx)을 사용하는 것도 가능하지만 특정 문제 영역에는 적합하지 않을 수 있습니다.

이렇게 하다

         create table person(
         personid int identity(1,1) primary key,
         firstname varchar(20),
         lastname varchar(20),
         addressindex int,
         sound varchar(10)
         )

나중에 트리거를 만듭니다.

         create trigger trigoninsert for dbo.person
         on insert 
         as
         declare @personid int;
         select @personid=personid from inserted;
         update person
         set sound=soundex(firstname) where personid=@personid;

이제 제가 할 수 있는 것은 이렇게 생긴 프로시저를 만드는 것입니다.

         create procedure getfuzzi(@personid int)
          as
         declare @sound varchar(10);
         set @sound=(select sound from person where personid=@personid;
         select personid,firstname,lastname,addressindex from person
         where sound=@sound

특정 개인 ID에 대해 에서 제공한 이름과 거의 일치하는 모든 이름이 반환됩니다.

언급URL : https://stackoverflow.com/questions/921978/fuzzy-matching-using-t-sql

반응형