Also add tests for race and gender categorization
instead of hard coded values. Produce a clean csv and a csv with all the removed values and columns with reason for removal. Add script for running cleaning for each project