How a machine learning application led to new KPI’s and simpler dimensional analysis

Every time a timecard or schedule is touched to add a punch, change a shift or edit time, a digital record is generated to record the before and after values. Throughout the course of a year, these records can grow to millions of rows.

Its size makes it difficult to proactively audit through manual methods and it overwhelms spreadsheets. These records are most often used in response to a personnel issue or to respond to the Department of Labor where the specifics of the request make it easy to focus on a small subset. For the majority of time, it is simply considered a record required to be stored by FLSA for two years.

The hidden value in these records is that behaviors are embedded. This is an ideal scenario to use machine learning to identify otherwise hard to identify patterns. Because each company has its own “signature” in terms of behaviors, unsupervised learning using k-means clustering is applied to look for unusual clusters. Below is one example of how its output looks. In this case the purple cluster shows a unique behavior.


As an investigative tool, the machine learning algorithm works wonders identifying unusual patterns within a company’s own set of data. But it’s not the right tool for everyone who is interested in this information. What it did teach us was that while not exactly the same between each company, there were telltale signs that we could identify very quickly that something was not operating correctly. This led us to a much simpler method of creating a KPI that quickly measured specific types of behavior. Using that KPI and dimensional analysis allowed us to roll up and drill down to identify significant changes in distribution or sometimes even root causes.

One example of this is understanding how supervisors are editing timecards in terms of adding or subtracting time from employee’s original punches. To accommodate for this, we created a new kpi that could be applied at any level of the organization – Timecard Skewness. A Timecard Skewness score of 0 means that whenever a timecard edit is made it adds time to an employee’s timecard compared to the original punch. When Timecard Skewness is 100 it means every edit takes time from an employee’s timecard relative to the original punch. In general, we find that companies have company-wide Timecard Skewness ratings in the high 40’s to low 50’s. At a corporate level, this means that edits to timecards are well distributed, they both give and take away small amounts of time which you expect to see in the normal course of business. As always you still need to look at the distribution to understand if this holds true at all levels of the organization. This Timecard Skewness rating can be applied at any level of the organization including the supervisor. Below is an example of supervisors and their timecard edits with a skewness rating. The top of the chart shows the number of edits over the course of a year. The lower chart shows the supervisor’s individual’s skewness rating with the orange line showing a skewness rating of 50. Can you easily tell which supervisors make the most edits and which have abnormally high or abnormally low Timecard Skewness ratings? In this example, most supervisors make very few edits. A handful on the left suggest closer inspection is warranted. Don’t draw conclusions yet! It could simply be that a timeclock is not located properly or that there aren’t enough clocks to accommodate a shift change. In some cases, however, we also find that favoritism or time theft from employees is occurring often from just one or two supervisors among hundreds.


Inspiration struck again when we were helping companies understand how well employees were working to scheduled hours. There are a variety of reasons why employees might deviate, all of which impact employee engagement and business performance. Understanding how well employees are adhering to schedules is tricky. Employees sometimes don’t work when they are scheduled and sometimes they work beyond their scheduled hours. Sometimes schedules are edited to account for this and sometimes they aren’t. In this case we developed a metric called Schedule Adherence. A high Schedule Adherence means employees are working to the schedule and a low value means they are not working to their scheduled hours. We see high performing companies or departments typically scoring in the 80’s. Once again, the score can be applied at any level of the organization. It uncovers a variety of situations. In one case we saw a score in the high 90’s. At face value, we might congratulate the manager for exceptional performance. But it seemed unusual to have such a high score, so we kept exploring. By charting a histogram of schedule edits for this manager, shown below, it shows edits were primarily made after the schedule had been worked, which is not typical nor recommended. By looking at the individual edits, it became apparent what was happening at several locations. Supervisors were changing schedules to match whatever hours employees worked to make it look like employees were following the schedule which was the company’s intended practice. The x-axis in this chart shows how many days before or after the day of work that the schedule is edits. 0 is the day of work.

historical edits

These are two examples that demonstrate how machine learning is providing an initial step in the innovation process that would be very difficult for a data scientist to accomplish through traditional dimensional analysis & visualization techniques. Yet the final outcome is much simpler and economical than the original machine learning process.

What’s a good schedule worth?

I was recently having lunch with a friend, Aram Faghfouri, and we were catching up on a variety of topics when I asked him how he approached business problems when there simply wasn’t much data to analyze.

Knowledgeable as always, he told me that this problem had long been solved and suggested I read How to Measure Anything. This book does a great job of explaining through examples how to use limited data along with different analysis techniques to improve decision-making and reduce risk.

I thought I’d share one of the areas where we applied what we learned. For many of our large retail customers, they have a very digitized process that allows them to forecast labor demand, schedule against rules and employee availability, track actual hours worked and measure resulting sales and productivity.

This allows retailers to follow the entire labor process through data to understand where there might be areas of improvement. For example, they can look at charts like the one below to quickly identify challenges such as not enough employee availability to staff a schedule or does the generated schedule follow the forecast. In the example below it is the roll-up of all of the company’s locations broken out by hour. As the chart shows, every step from forecast to sales is fairly tightly grouped. It’s not surprising as this is a sophisticated specialty retailer that has been honing its processes for years:

retail schedule analysis

For other industries however, scheduling is a much more manual process. Many manufacturers take a production schedule and convert it into either labor budget or hours through a spreadsheet or simply through experience. Supervisors then schedule employees manually based on knowledge of their employees’ skills and production processes.

As a result, there is significantly less data generated to analyze how well this process works. There is decent data in the beginning (production orders) and at the end (labor hours consumed and actual production completed). But understanding if there is even opportunity to improve labor scheduling becomes a traditional industrial engineering approach of inspecting the actual process.

In reading the book, it became clear that rather than throwing up our hands and declaring defeat until we had more data, there was a middle ground. It might be possible to shed a little light on the process and determine whether it was worth more investment.

When we inventoried the data we had, we recognized that most manufacturers put their schedules into a system to measure actual punches against it to determine if employees are following attendance policies. And of course, they have the punches and pay rules to know the actual hours worked.

Our hypothesis was that if any of the curves deviated significantly then we would know that some part of the process before that deviation was not working well and there is opportunity to improve. After some experimentation we generated charts that look like the following:

schedule adherence

What we are looking at above is a single manufacturing plant and each circle represents the hours for one department for one week. The scheduled hours are in the first chart and the actual hours are in the second chart. The horizontal x-axis represents hours of unused capacity and the vertical y-axis represents overtime hours. Capacity here is defined as regular hours (40 and under) that are not worked by a full-time employee but are available to work (e.g. they are not on vacation). The scheduled chart looks pretty good. The departments are heavily clustered near the origin, meaning they have little scheduled overtime and little unused capacity. There are a couple of departments that have significant unused capacity so those would be worth investigating. However, when we look at the hours actually worked in the next chart we see that overtime and unused capacity have both grown and in many cases departments that had neither now have both! This means that some people who were scheduled were sent home or asked not to work and others that were scheduled worked more than their scheduled hours.

Using this analysis, we can now calculate the financial opportunity (Hours of OT that could potentially be converted to regular if worked by someone with capacity). We can also guess that these employees are probably not thrilled because what they thought would be a stable schedule has suddenly changed with some employees working a lot more hours and other employees working a lot fewer hours.

The limited data we do have has told us the financial magnitude of the impact and it has also told us the location of the problem. It lies somewhere in the labor demand calculation or the scheduling process. What we often see is that department supervisors that are performing these processes manually make approximations to simplify the process of creating and staffing a schedule. And now we know what the cost of that manual approximation costs the company in financial terms as well as employee engagement cost. We can communicate the specific area of the problem and financial opportunity to management using simple schedule and time data.

HR benefits too as they can understand the the hard dollar benefit of investing in cross-training.  If retention is an issue, better adherence to schedules providing more stable work hours will reduce turnover.

It’s exciting to see the continued creativity and results of applying even limited data to common business challenges. Especially when we can improve employee’s lives while improving financial outcomes.