This article is aimed for software engineers interested in building a machine learning application, and data scientists who want to adopt more engineering knowledge into their daily work.
In traditional software engineering, bugs can be categorized into two types: implementation and logic. However, there are two extra types of bugs in a machine learning system: the model and the data. The bugs in the model refer to the situations where the model is not suitable for the problem, which can result from the limitations of its capabilities. For example, one uses a linear classifier when the true decision boundary is non-linear. The bugs with the data can be caused by many things: outliers, insufficient amount of data, uninformative features etc.