TMH: A Transformer-Markov Hybrid Model for Behavior-Aware Code Summarization
Abstract
Automatic code comment generation is crucial for enhancing code readability, maintainability, and developer efficiency. However, existing models often treat source code as static text, overlooking its dynamic execution behavior. To address this, we propose Transformer with Markov modeling (TMH) a hybrid architecture that combines static lexical embeddings with behavioral signals derived from control-flow semantics. This dual-view representation enables the model to capture both what the code says and how it behaves. To further enhance relevance, we introduce an entropy-guided attention mechanism that prioritizes tokens critical to control logic during decoding. TMH outperforms state-of-the-art baselines (e.g., SeTransformer, ALSI-Transformer) by +1.91 BLEU-4 and +1.37 METEOR on large-scale Java datasets. Human evaluations confirm improved accuracy and contextual fluency, particularly for logic-heavy methods. By unifying static and dynamic code understanding, our approach advances neural code summarization and paves the way for more intelligent, behavior-aware documentation tools in software engineering.
Downloads
Copyright (c) 2025 ITEGAM-JETIA

This work is licensed under a Creative Commons Attribution 4.0 International License.








