Handling Data from Dun And Bradstreet
D&B can be very useful when determining the credit worthiness of a client and they assign each unique business entity a 9-digit number for the purpose of tracking. Knitting this data into an extract can be useful in identifying duplicates and parent-child relationships in Customer Masters and Vendor Masters.
Some of the D&B data come from self-reporting, so it is by no means perfect. The flaws in the data are far outweighed by the simplicity afforded when it comes time to disambiguate source data against itself and against what is already in the production system.
Because the data comes in the form of a CSV file and inevitably passes through the hands of a finance person before reaching the Data Migration team, it is often corrupted by Excel which seems to hate text fields. The interface hands the data to the finance user as a CSV file which opens automatically in Excel and is then subsequently saved as an Excel file or as a CSV that has been corrupted by Excel; this strips off all of the leading zeros in numeric values and breaks names and addresses with commas across multiple cells.
The way around this is to insist that the CSV arrive to the Data Migration team unmodified. Some hand holding may be required, so plan on sitting with the finance user who handles the data from D&B.
Once the data arrives, open it in Notepad. It will look a lot like comma-delimited text and the leading zeros will still be intact.
Next, copy-paste all of it into Excel. Verify that each row of the source data fills just one cell and then click on the "Text To Columns" button.
The details on the next three screens are very important. Follow these steps closely to ensure that the data is not corrupted.
The D&B data is now in Excel with all of the leading zeros and no scientific notation. The first 5 lines are just notes from the computer that generated the report and they can be thrown out.
All that remains to be done at this point is to knit the data into the LFA1 extract with a simple VLOOKUP function. Finding the duplicates is easier when the data is sorted; it is recommended to eye the data all the way through when it is sorted by NAME1, then by STRAS, then by PSTLZ and then by D&B to ensure that of the duplicates are identified and rolled up into a single Master Data record.