Every time a timecard or schedule is touched to add a punch, change a shift or edit time, a digital record is generated to record the before and after values. Throughout the course of a year, these records can grow to millions of rows.
Its size makes it difficult to proactively audit through manual methods and it overwhelms spreadsheets. These records are most often used in response to a personnel issue or to respond to the Department of Labor where the specifics of the request make it easy to focus on a small subset. For the majority of time, it is simply considered a record required to be stored by FLSA for two years.
The hidden value in these records is that behaviors are embedded. This is an ideal scenario to use machine learning to identify otherwise hard to identify patterns. Because each company has its own “signature” in terms of behaviors, unsupervised learning using k-means clustering is applied to look for unusual clusters. Below is one example of how its output looks. In this case the purple cluster shows a unique behavior.
As an investigative tool, the machine learning algorithm works wonders identifying unusual patterns within a company’s own set of data. But it’s not the right tool for everyone who is interested in this information. What it did teach us was that while not exactly the same between each company, there were telltale signs that we could identify very quickly that something was not operating correctly. This led us to a much simpler method of creating a KPI that quickly measured specific types of behavior. Using that KPI and dimensional analysis allowed us to roll up and drill down to identify significant changes in distribution or sometimes even root causes.
One example of this is understanding how supervisors are editing timecards in terms of adding or subtracting time from employee’s original punches. To accommodate for this, we created a new kpi that could be applied at any level of the organization – Timecard Skewness. A Timecard Skewness score of 0 means that whenever a timecard edit is made it adds time to an employee’s timecard compared to the original punch. When Timecard Skewness is 100 it means every edit takes time from an employee’s timecard relative to the original punch. In general, we find that companies have company-wide Timecard Skewness ratings in the high 40’s to low 50’s. At a corporate level, this means that edits to timecards are well distributed, they both give and take away small amounts of time which you expect to see in the normal course of business. As always you still need to look at the distribution to understand if this holds true at all levels of the organization. This Timecard Skewness rating can be applied at any level of the organization including the supervisor. Below is an example of supervisors and their timecard edits with a skewness rating. The top of the chart shows the number of edits over the course of a year. The lower chart shows the supervisor’s individual’s skewness rating with the orange line showing a skewness rating of 50. Can you easily tell which supervisors make the most edits and which have abnormally high or abnormally low Timecard Skewness ratings? In this example, most supervisors make very few edits. A handful on the left suggest closer inspection is warranted. Don’t draw conclusions yet! It could simply be that a timeclock is not located properly or that there aren’t enough clocks to accommodate a shift change. In some cases, however, we also find that favoritism or time theft from employees is occurring often from just one or two supervisors among hundreds.
Inspiration struck again when we were helping companies understand how well employees were working to scheduled hours. There are a variety of reasons why employees might deviate, all of which impact employee engagement and business performance. Understanding how well employees are adhering to schedules is tricky. Employees sometimes don’t work when they are scheduled and sometimes they work beyond their scheduled hours. Sometimes schedules are edited to account for this and sometimes they aren’t. In this case we developed a metric called Schedule Adherence. A high Schedule Adherence means employees are working to the schedule and a low value means they are not working to their scheduled hours. We see high performing companies or departments typically scoring in the 80’s. Once again, the score can be applied at any level of the organization. It uncovers a variety of situations. In one case we saw a score in the high 90’s. At face value, we might congratulate the manager for exceptional performance. But it seemed unusual to have such a high score, so we kept exploring. By charting a histogram of schedule edits for this manager, shown below, it shows edits were primarily made after the schedule had been worked, which is not typical nor recommended. By looking at the individual edits, it became apparent what was happening at several locations. Supervisors were changing schedules to match whatever hours employees worked to make it look like employees were following the schedule which was the company’s intended practice. The x-axis in this chart shows how many days before or after the day of work that the schedule is edits. 0 is the day of work.
These are two examples that demonstrate how machine learning is providing an initial step in the innovation process that would be very difficult for a data scientist to accomplish through traditional dimensional analysis & visualization techniques. Yet the final outcome is much simpler and economical than the original machine learning process.