Success stories about the benefits of machine learning and data science prompt companies from all industries to adopt some level of automated data analysis for themselves. In doing so they often encounter problems early on, or, even worse, late into the process. In this article I will give you an overview of the main pillars of Data Science readiness, as well as how you can achieve them. Data Availability No matter the method or goal, the basis for any machine learning endeavor is data. Any machine learning expert (or hobbyist [...]
About Max UppenkampMax Uppenkamp has been a Data Scientist at INFORM since 2019. After previously working in Natural Language Processing and Text Mining, he is now engaged in the machine-learning-supported optimization of processes. In addition to accompanying customer projects, he translates the knowledge gained into practice-oriented products and solutions.
Data quality is an innocuous term. Upon first encounter, the association is usually big tables filled with numbers, some of which erroneous, math, and complex statistics. The consequences, however, can be very real. In my previous article “Data Cleaning: Pitfalls and solutions” I shed some light on some of the shapes data quality issues can take. I also talked about a few approaches towards improving data quality and shared some insight on the business impact of inadequate data quality. Today, I would like to approach the topic from a more [...]
As the interest in machine learning and artificial intelligence grows, companies regularly find themselves confronted with the dissatisfying quality of their data. This discovery is either made early-on with a structured approach, or a lot later, when poor data quality is identified as the root-cause of poorly performing models. In either case, the next step should be a methodical exploration of the available data, followed by a series of steps to remedy the identified issues. In this article, I will give you an overview of common data quality issues and [...]