Taming the data analytics jungle.


Often, the hardest part of accomplishing a task is just getting off the starting blocks. This is definitely true when adopting advanced analytics in your organization. This can be overwhelming, and often times, we don’t know where to start. Go ahead. Google “Advanced Analytics for Operations.” You’ll get 3 million results in less than half a second. Yet winnowing those results to find what is going to help YOU most is daunting. That’s why we hope this blog post can give you tips to set yourself up for a successful analytics project.
First off, it helps to define precisely what problem you wish to solve. Do you want to predict:

Taming the data analytics jungle.
  • Production Quantity?
  • Energy Utilization?
  • Product Quality?
  • An Undesirable Event?

Let’s start with one! The next thing you’ll want to look at is your data. Is it properly labeled? Labeling means that we know when the item we are trying to predict occurred in the data set. CORTEX uses a semi-supervised learning method. This means that you train models using historical data with known outcomes to predict what will happen in the future. Examples we have seen used as labels:

  • Production flow meter
  • Pass/Fail of product
  • Energy Consumption Meter
  • Yes/No that the event happened

By leveraging a mix of statistical and machine learning models, CORTEX can help you highlight previously hidden relationships in your data. Additionally, CORTEX gives you the ability to encode your subject matter expertise into the models. For example, you can set up the ranges (bins) of your variables in the typical operating conditions of your process.

When building your models CORTEX has two different types of models, classification and regression. The first question to ask is “what is the nature of your target variable?” If you are predicting a failure, class, or range of values you will want to pick from our classification algorithms. If you are predicting an exact value, you will want to choose a regression algorithm. Keep in mind, many regression problems can be framed up in a way to use a classification algorithm. For example, if you are trying to predict the concentration of an undesired product in the overhead stream from a distillation column you might want to understand what the difference is between good and bad. If you always want to stay below 5%, 0-5 and 5-100 would be your target variable bins. The target does not need to be binary, but we’ve found the fewer amount of target variables bins included in the model, the better.

What exactly are bins? Binning allows you to break down your continuous variables into classes or groups. This also allows you to bring in your own unique process knowledge into your model.

A successful analytics project entails much more than what a simple blog post can cover, but these steps are a great start to framing up your problem and gaining insights from your data.

Check out our video “How to get started with CORTEX” for more information on how to start using CORTEX with your data. And get excited about where your analytics journey will lead you!

Kayla Graff
About Kayla Graff

As an advanced analytics process engineer, Kayla spends time working with clients to assist them in their analytics journey. This Kansas State University graduate is passionate about utilizing analytics to make data-driven decisions.

Follow Kayla on LinkedIn

Taming the data analytics jungle.

Thanks for your interest.