In a previous study, we showed that indentation was regular across multiple languages and the variance in the level of indentation of a block of revised code is correlated with metrics such as McCabe Cyclomatic complexity. Building on that work the current paper investigates the relationship between the “shape” of the indentation of the revised code block (the “revision”) and the corresponding syntactic structure of the code. We annotated revisions matching these three indentation shapes: “flat” (all lines are equally indented), “slash” (indentation becomes increasingly deep), or “bubble” (indentation increases and then decreases). We then classified the code structure as one of: function definition, loop, expression, comment, etc. We studied thousands of revisions, coming from over 200 software projects, written in a variety of languages. Our study indicates that indentation shape correlates positively with code structure; that is, certain shapes typically corr...
Abram Hindle, Michael W. Godfrey, Richard C. Holt