De-duplicate records in an Access table using VBA
Performing transformations on legacy data often entails removing duplicates from legacy data, especially when working with data from multiple legacy systems. The access feature Totals, often denoted by the Greek symbol Sigma (Σ), can be used for this, but there is a subtle difference between identifying (or counting) unique records and removing duplicates. Also, creating an Access query marries it to a single table with a fixed format; each table in need of de-duplication requires its own query.
This VBA function can be used to identify and delete duplicate records. It works by forming a simple SQL query that selects all the records in a table ordered by every field name and then deleting any record that is identical to the preceding record. Due to limitations in Access, tables with too many fields may cause an error because of the requisite complexity of the SQL query that sorts on every field.
Another limitation of the Windows operating system is the number of connections that can be opened into a single file at any time. It is adjustable in the registry, but a simpler workaround is to close the recordset and execute the query again in order to close some open connections into the file and then to adjust the status bar accordingly.
Function RemoveTableDuplicates(strTableName As String) As Boolean Dim dbs As DAO.Database Dim rs As DAO.Recordset Dim nCurrent As Long, nFieldCount As Long, nRecordCount As Long Dim RetVal As Variant, nCurRec As Long, dnow As Date, nCurSec As Long Dim nTotalSeconds As Long, nSecondsLeft As Long Dim strTest As String, strLastRecord As String, strThisRecord As String Dim strSQL As String, nTotalDeleted As Long Set rs = CurrentDb.OpenRecordset(strTableName) nFieldCount = rs.Fields.Count Rem **** Build Query **** strSQL = "SELECT * FROM " & strTableName & " ORDER BY " For nCurrent = 0 To rs.Fields.Count - 1 strSQL = strSQL & rs.Fields(nCurrent).Name If nCurrent < rs.Fields.Count - 1 Then strSQL = strSQL & ", " End If Next strSQL = strSQL & ";" rs.Close Set rs = CurrentDb.OpenRecordset(strSQL) nRecordCount = rs.RecordCount RetVal = SysCmd(acSysCmdInitMeter, "Removing duplicates from " & _ strTableName & " . . .", nRecordCount) Do While Not rs.EOF nCurRec = nCurRec + 1 If Second(Now()) <> nCurSec And nCurRec <> rs.RecordCount Then nCurSec = Second(Now()) RetVal = SysCmd(acSysCmdUpdateMeter, nCurRec) RetVal = DoEvents() End If strThisRecord = "" For nCurrent = 0 To rs.Fields.Count - 1 strThisRecord = strThisRecord & rs.Fields(nCurrent).Value Next 'The line of code below was changed to ensure a case-sensitive comparison. If StrComp(strThisRecord, strLastRecord, 0) = 0 Then rs.Delete nTotalDeleted = nTotalDeleted + 1 'Avoid the "File sharing lock count exceeded." error. If nTotalDeleted / 1000 = Int(nTotalDeleted / 1000) Then rs.Close Set rs = CurrentDb.OpenRecordset(strSQL) rs.MoveFirst 'Technically starting over, but not completely. nCurRec = nRecordCount - rs.RecordCount End If End If strLastRecord = strThisRecord rs.MoveNext Loop rs.Close RemoveTableDuplicates = True RetVal = SysCmd(acSysCmdRemoveMeter) End Function