De-duplicate records in an Access table using VBA

Jimbo's picture

Performing transformations on legacy data often entails removing duplicates from legacy data, especially when working with data from multiple legacy systems. The access feature Totals, often denoted by the Greek symbol Sigma (Σ), can be used for this, but there is a subtle difference between identifying (or counting) unique records and removing duplicates. Also, creating an Access query marries it to a single table with a fixed format; each table in need of de-duplication requires its own query.

This VBA function can be used to identify and delete duplicate records. It works by forming a simple SQL query that selects all the records in a table ordered by every field name and then deleting any record that is identical to the preceding record. Due to limitations in Access, tables with too many fields may cause an error because of the requisite complexity of the SQL query that sorts on every field.

Another limitation of the Windows operating system is the number of connections that can be opened into a single file at any time. It is adjustable in the registry, but a simpler workaround is to close the recordset and execute the query again in order to close some open connections into the file and then to adjust the status bar accordingly.

Function RemoveTableDuplicates(strTableName As String) As Boolean

    Dim dbs As DAO.Database
    Dim rs As DAO.Recordset
    Dim nCurrent As Long, nFieldCount As Long, nRecordCount As Long
    Dim RetVal As Variant, nCurRec As Long, dnow As Date, nCurSec As Long
    Dim nTotalSeconds As Long, nSecondsLeft As Long
    Dim strTest As String, strLastRecord As String, strThisRecord As String
    Dim strSQL As String, nTotalDeleted As Long
    
    Set rs = CurrentDb.OpenRecordset(strTableName)
    nFieldCount = rs.Fields.Count
    
    Rem **** Build Query ****
    strSQL = "SELECT * FROM " & strTableName & " ORDER BY "
    For nCurrent = 0 To rs.Fields.Count - 1
        strSQL = strSQL & rs.Fields(nCurrent).Name
        If nCurrent < rs.Fields.Count - 1 Then
            strSQL = strSQL & ", "
        End If
    Next
    strSQL = strSQL & ";"
    rs.Close
    Set rs = CurrentDb.OpenRecordset(strSQL)
    
    nRecordCount = rs.RecordCount

    RetVal = SysCmd(acSysCmdInitMeter, "Removing duplicates from " & _
     strTableName & " . . .", nRecordCount)
    Do While Not rs.EOF
        nCurRec = nCurRec + 1
        If Second(Now()) <> nCurSec And nCurRec <> rs.RecordCount Then
            nCurSec = Second(Now())
            RetVal = SysCmd(acSysCmdUpdateMeter, nCurRec)
            RetVal = DoEvents()
        End If
        
        strThisRecord = ""
        For nCurrent = 0 To rs.Fields.Count - 1
            strThisRecord = strThisRecord & rs.Fields(nCurrent).Value
        Next
        If strThisRecord = strLastRecord Then
            rs.Delete
            nTotalDeleted = nTotalDeleted + 1
            'Check to ensure there aren't too many connections to the file.
            If Right(nTotalDeleted, 3) = "000" Then
                'Close and reopen the query to close connections to the file.
                rs.Close
                Set rs = CurrentDb.OpenRecordset(strSQL)
                'Adjust the progress bar so that it reflects current status.
                nRecordCount = nRecordCount + nCurrentRecord
                RetVal = SysCmd(acSysCmdRemoveMeter)
                RetVal = SysCmd(acSysCmdInitMeter, "Removing duplicates from " & _
                 strTableName & " . . .", nRecordCount)
            End If
        End If
        strLastRecord = strThisRecord
        rs.MoveNext
    Loop
    rs.Close
    RemoveTableDuplicates = True
    RetVal = SysCmd(acSysCmdRemoveMeter)
End Function
Programming Language: 
VBA