Categorical Data preprocessing for Data mining
1 view (last 30 days)
I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.
I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.
For instance, when trying to correct those for world bank I tried this expression which is still failing
Here i was testing the expression in Atom, but it fails to correctly replace those selected words
However, I am still wondering if there could be another "faster" way of approaching the issue!