Pitch Profile Clustering
- hudsonhalling5
- Jul 5, 2022
- 6 min read
Pitchers and coaches are striving to find the most optimal movement for each pitch. While this is heavily dependent on the pitcher's arsenal, it is valuable to find if there is a clear pitch profile that is superior among each pitch type. For the purpose of this study, there are 6 main pitch types: Fastballs, Sinkers, Cutters, Sliders, Curveballs, and Changeups. With this post, I have attempted to find out if there is one movement that is superior to the rest.
I originally planned for this to just be for sliders because there is heavy debate among movement versus speed. However, the code was easily interchangeable among the 6 pitch types, so I expanded it to all of them. I expect there to be an obvious difference of profiling for every pitch except the Slider and Curveball. I would guess before hand that those pitches offer a wide variety of success and really depend on the pitcher and the rest of his arsenal. The results of this study will allow coaches that are stuck in between two pitch profiles to choose one that will help pitchers to theoretically generate more swings and misses.
For the purposes of simplicity, swing and miss percentage was the only statistic used to evaluate the pitch success. Other stats, such as wOBA, Run Value, xwOBA, etc., really rely on the skills and abilities of the hitter and less on the quality of the pitch. While there isn't a stat that perfectly highlights just the success of the pitch, swing and miss is the closest evaluator I could think of that relied on the batter the least. We can also use some intuition on characteristics that we know are valued among pitches to help make better decisions. With that being said, let's move into the methods of the study.
I utilized K-means clustering to determine where each cluster of pitch movement would lie. I filtered data from the MLB season up to 6/29 based on pitch type and plotted each movement plot with horizontal movement and induced vertical break in inches (All of the data is normalized to a RHP). K-means clustering finds the center point (centroid) of a cluster and pulls all points closest to it towards that cluster. It continues this process until every point is the closest to its current centroid. This process allows us to objectively find where each movement plot fall in regards to the cluster. The only subjective piece was selecting how many clusters. Obviously with data as large as we have, the ideal number of clusters is way higher than we have in the study. However, I limited it to 7 clusters to not overcomplicate movement of pitches. 7 was the perfect number to accurately describe the types of movements for each pitch (you will see later as to why). Anything more than 7 and it would say a pitch with 10 horizontal 10 vertical would be different than 10 horizontal 11 vertical, which realistically there is almost no difference. Anyways, let's start to look at each pitch to further explain the clusters.
4-Seam Fastball

The 4-Seam fastball has a clear advantage with pitches that have more vertical carry. This backs up the common theory that fastballs with high velocity and high carry are superior to any other variant. You can consistently see that as the fastball gets less vertical carry, the pitch is less of a swing and miss factor. Surprisingly, flat fastballs still have a 17% swing and miss rate. I would assume there are some tagging errors involved with some of the data, so it would be interesting to see how the numbers would change if pitches were tagged correctly. However, this does not sway the confidence of the study at all as there are more than 75,000 fastballs shown above.
Sinker

Two-Seamers have the highest swing and miss percentage but these are acting more like high carry fastballs, which aren't anywhere near sinkers. Out of all true sinkers, The heavy sinkers have the highest swing and miss rate at about 16%. This would make sense as these pitches can be impacted by elite velocity and movement. Top spin sinkers aren't as effective, most likely because of a slower velocity or the movement is expected based on arm slot. The "Cut Sinkers" have the lowest swing and miss rate, which would make sense as those have some flatter characteristics and lack horizontal run (the component that causes swing and misses for sinkers). Like the fastballs, nothing surprising here.
Cutter

Cutters are a little more subjective because a lot of these pitches could be classified as sliders, fastballs, or even splitters based on movement characteristics alone. Regardless, the Top Spun Cutters have the highest swing and miss rate, which again makes sense because they are essentially sliders which have the highest whiff rate among all pitches. Interestingly enough, cutters that are thrown with more fastball-esc movement have the lowest whiff rate at about 16%. This could be due to the relatively lower vertical carry, which would result in a lower effectiveness of the fastball-like pitch. With cutters, the more they move like sliders, the more swings and misses they will generate. This is a pitch where it is more important to aim for a pitchers strength rather than aim for a specific movement pattern. Cutters can be used to fill movement gaps, provide deception for other pitchers, or in use of a fastball. Because of this, cutters need to be thrown to fit the overall movement profile.
Slider

As mentioned earlier, this study was originally geared towards the slider due to the conflicting opinions on velocity vs. movement. Some people think high velocity is important while others think high movement is more important. According to whiff rates, the Gyro Slider is the best swing and miss profile for the slider. Unlike other pitches, there is not a pitch that has a clear disadvantage in comparison to the rest. However, the sliders with more vertical drop see the lower whiff rates. This suggests that harder sliders with tighter movements are preferred in comparison to slower sliders with lots of movement. In the ideal world, we could combine the speed of the Gyro Sliders with the movement of the Sweeping Sliders to create the "Super Slider". Because all sliders are so successful, profiles need to be shaped around the entire arsenal of the pitcher. If a Gyro Slider works better for a pitcher, they should throw that. If they have a more sweeping profile, then they should throw that. However, it can be stated that pitchers should look to throw a slider as they will generate plenty of swings and misses with it.
Curveball

Curveballs with a slider-esc profile tend to have more swings and misses, but don't really act as a curveball so we'll look past those for now. Really balanced curveballs show the most success with about 10 inches of horizontal run and any amount of vertical break. This reveals that pitchers should look to at least get some horizontal movement on their curveball instead of a straight 12-6. The most important thing about the curveball shape is if it mirrors the axis of your fastball. The movement patterns of the Banger Curveballs best match the axis of the fastballs, and thus see the most success. Like the Slider, if we could take the velocity of the smaller ones and pair it with the movement of the bangers, we would build our "Super Curveball". Because this is not very realistic, organizations will continuously have the choose between speed and movement.
Changeup

Changeups with top spin have the most swing and miss potential, which would fit the logical conclusion. Splitters have the least amount of success, but still hold a high whiff rate. Just like the slider, there isn't a clear winner or loser among the changeup movement profiles. Because of this, pitchers should throw a changeup that best pairs with their fastball. If they have high carry, they will look for something with decent vertical carry whereas a sinker pitcher will look for something more in the top spun range. It is unrealistic to aim for the Devin Williams Changeup (high carry fastball paired with a heavy top spun changeup), which is why we can't push everyone to throw a top spun changeup. Most high carry fastball pitchers will not be able to achieve this unique movement.
Conclusion
While this study didn't provide any groundbreaking results, we were still able to identify an advantageous movement pattern for each pitch. For further emphasis, this is a simple study only offering a comparison of whiff rate and movement patterns. For a more in depth study, more stats and analysis would be provided to come to conclusions. Because much of the results line up with common knowledge among the community, I feel comfortable leaving the study at this simplistic state. This study also uses some advanced machine learning to come to these conclusions, which aid in the reliability and accuracy of the conclusions.
Comments