Using machine learning to classify student code has many applications in computer science education, such as auto-grading, identifying struggling students from their code, and propagating feedback to address particular misconceptions. However, a fundamental challenge of using machine learning for code classification is how to represent program code as a vector to be processed by modern learning algorithms. A piece of programming code is structurally represented by an abstract syntax tree (AST), and a variety of approaches have been proposed to extract features from these ASTs to use in learning algorithms, but no work has directly compared their effectiveness. In this paper, we do so by comparing three different feature engineering approaches for classifying the behavior of novices’ open-ended programming projects according to expert labels. In order to evaluate the effectiveness of these feature engineering approaches, we hand-labeled a dataset of novice programs from the Scratch repository to indicate the presence of five complex, game-related programming behaviors. We compared these feature engineering approaches by evaluating their classification effectiveness. Our results show that the three approaches perform similarly across different target labels. However, we also find evidence that all approaches led to overfitting, suggesting the need for future research to select and reduce code features, which may reveal advantages in more complex feature engineering approaches.