Skip to content
Advertisement

Finding duplicate column values in a CSV

I’m importing a CSV that has 3 columns, one of these columns could have duplicate records.

I have 2 things to check:

JavaScript

So far, I’m parsing the CSV file, once and checking that 1. (NAME is valid), which if it fails, it simply breaks out of the while loop and stops.

I guess the question is, how I’d check that ID is unique?

I have fields like the following:

JavaScript

This would output something like `Duplicate ID on line 3′

Thanks

P.S this CSV file has more columns and can have around 100,000 records. I have simplified it for a specific reason to solve the duplicate column/field

Thanks

Advertisement

Answer

I went assuming a certain type of design, as stripped out the CSV part, but the idea will remain the same :

JavaScript

100, 000 rows aren’t that much, this will be enough. (It ran in 3 seconds at my place.)

EDIT: As pointed out, in_array is less efficient than key lookup. I’ve updated my code consequently.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement