When working with AI/ML classification or categorization, data cleansing and standardization are critical to developing a good model. Most AI/ML algorithms do not handle missing data values. Good data with a simple model will be more effective than a bad data with a sophisticated model. The process of cleansing is easier when you are dealing with smaller data sets. It is much easier to look data quality issues or missing elements and fix them with some reasonable values (e.g. mean value for a continuous data element). This is not scalable when you deal with millions or billions or trillions of rows of data where finding anomalies in your data set is a daunting task. This presentation recommends different strategies your BigDataOps team can employ to prepare the data for AI/ML modeling.