Bachelor Thesis: Rule-based Data Smell Detection

With increased usage of machine learning approaches, quality control of training data becomes increasingly essential. Data smells, which  have been defined as context-independent data quality problems, may indicate low quality data. The scope of this bachelor thesis is to develop a software package in order to detect potential data quality issues in form of data smells. The implementation will be written in Python, since it is a widely  used language in the field of machine learning. It will be based on a data quality framework, such as Great Expectations. The evaluation of the developed data smell detectors will be performed on real-world CSV datasets.