Data cleaning and preparation still eats up nearly half the workload of data scientists, according to Anaconda’s new survey
The hassles of data intake and cleaning, problems with biased models and data privacy, and difficulty finding experience and technical skills—all these ranked among the biggest challenges facing data scientists and software engineers in data-science disciplines according to a newly released survey.
Anaconda, makers of the Python distribution of the same name for scientific computing applications, conducted its 2020 State Of Data Science survey with 2,360 respondents from 100 countries, slightly less than half of those hailing from the U.S.
Despite all the advances in recent years in data science work environments, data drudgery remains a major part of the data scientist’s workday. According to self-reported estimates by the respondents, data loading and cleaning took up 19% and 26% of their time, respectively—almost half of the total. Model selection, training/scoring, and deployment took up about 34% total (around 11% for each of those tasks individually).
When it came to moving data science work into production, the biggest overall obstacle—for data scientists, developers, and sysadmins alike—was meeting IT security standards for their organization. At least some of that is in line with the difficulty of deploying any new app at scale, but the lifecycles for machine learning and data science apps pose their own challenges, like keeping multiple open source application stacks patched against vulnerabilities.
Another issue cited by the respondents was the gap between skills taught in institutions and the skills needed in enterprise settings. Most universities offer classes in statistics, machine learning theory, and Python programming, and most students load up on such courses. But enterprises find themselves most in need of data management skills that are taught only rarely or not at all, and advanced math skills that students don’t often develop. Students themselves felt lack of experience (40%) and technical skills (26%) were the biggest barriers to jobs in the field, shortcomings that (according to Anaconda) could be better addressed by strong internship programs that “go beyond providing a résumé enhancement and hands-on-keyboard technical skills.”