Morphology-aware spatial transcriptomics: a critical review of translation and integration methods

1. Introduction
Spatial transcriptomics (ST) couples high-throughput RNA profiling with the visual context of histology, moving gene-centric biology from one-dimensional lists to two-dimensional tissue maps. Over the past five years deep-learning pipelines have proliferated, reflecting both the richness and the idiosyncrasies of ST data [1]. Commercial platforms such as 10x Visium, NanoString CosMx, Slide-seqV2, DBiT-seq, MERFISH and Stereo-seq routinely provide a registered haematoxylin–eosin (H&E) image for every captured section. Exploiting that image can reduce sequencing cost, denoise sparse counts and add pathological context, but the community has pursued two distinct modelling philosophies. We first formalise these as translation versus integration, then survey algorithms, validation practice, benchmarks, technical barriers and future directions.

2. Conceptual framework: translation versus integration
Translation models treat morphology as a proxy for the transcriptome: given an H&E patch they predict a full or partial gene-expression vector. Proof-of-concepts outside ST such as HE2RNA showed that whole-slide images can approximate bulk RNA-seq [2]. Spot-level extensions—HE2Gene [3], Hist2ST [4] and HistoSPACE [5]—transfer the idea to spatial assays. The practical motivation is obvious: if H&E already encodes sufficient molecular information, digital surrogates could replace expensive ST assays and enable retrospective analysis of hospital slide archives.

Integration models, by contrast, assume that images contain complementary rather than redundant information. They fuse morphological, spatial and transcriptomic cues into a joint embedding that supports downstream tasks such as spatial domain identification, cell-type deconvolution or super-resolution. Graph-based encoders like SpaGCN [6], STAGATE [7] and MCGAE [8] epitomise this philosophy. The fundamental trade-off is between shared information, which eases training but risks learning imaging artefacts, and exclusive information, which increases relevance but complicates disentanglement. Similar dilemmas are well known in gene-set enrichment where redundancy generates over-optimistic statistics [9] and in pathway analysis [10].

3. Learning strategies, gene selection and model architectures
3.1 Translation
Early CNN backbones (ResNet, DenseNet) were coupled to gene-wise regressors or negative-binomial likelihoods. HE2Gene attains transcriptome-wide prediction via multi-task learning (Chen et al., 2024). Hist2ST augments convolution with a Vision-Transformer and a graph neural network (GNN) to model both long-range context and local neighbourhoods (Zeng et al., 2022). mclSTExp adds multimodal contrastive pre-training that aligns image and transcript embeddings [11], while STGAT incorporates bulk RNA-seq during training so it can generalise to cohorts lacking ST [12]. Gene-selection is typically supervised—top spatially variable genes (SVGs), pathology-related panels or ligand–receptor lists—to reduce output dimensionality and heteroscedastic loss; nevertheless HE2Gene shows that full-transcriptome prediction is feasible at the cost of heavier compute.

3.2 Integration
Most current integrative models construct an explicit graph whose edges encode Euclidean proximity and/or histological similarity. SpaGCN blends RGB correlation with spatial distance to produce crisp cortical layers (Hu et al., 2021). STAGATE adds adaptive attention so that boundary spots are not over-smoothed (Dong & Zhang, 2022). SEDR couples a masked auto-encoder with a variational GNN to co-embed gene and spatial manifolds [13], whereas MCGAE (Yang et al., 2024) and SpaDAC [14] employ contrastive objectives to maximise agreement across modalities. The field is rapidly adopting vision foundation models: GIST plugs a CLIP-style backbone trained on millions of histology tiles into a hybrid graph-transformer and reports up to 50 % ARI improvement on domain detection [15]. AttentionVGAE [16] and SpaInGNN [17] further demonstrate how multi-head attention can balance local and global structures.

Gene-selection for integration is usually unsupervised (highly variable genes, Moran’s I ranking) but several frameworks update the gene set during training; SpaDAC dynamically drops redundant genes whose gradients vanish (Huo et al., 2023), and STAMarker uses saliency maps to highlight domain-specific SVGs [18].

3.3 Training and validation practice
Translation models minimise either mean-squared error on log-normalised counts or a negative-binomial likelihood (Zeng et al., 2022). Integration models use reconstruction or contrastive losses plus downstream clustering quality (Yang et al., 2024). Crucially, naïve random spot splits inflate performance because adjacent spots share both pixels and counts; spatial block cross-validation or leave-one-tile-out evaluation is now recommended [19]. Batch-effect removal receives less attention than in single-cell RNA-seq, yet ResST shows that Margin-Disparity Discrepancy can align multiple sections without degrading biological signal [20].

4. Typical tasks and evaluation metrics
• Gene-expression prediction: Pearson/Spearman correlation, R-squared and fraction of genes with R > 0.3 (Chen et al., 2024).
• Spatial domain identification: Adjusted Rand Index (ARI) and Normalised Mutual Information against manual annotations such as DLPFC layers (Hu et al., 2021).
• Super-resolution & imputation: ImSpiRE redistributes counts via optimal transport [21]; TransformerST adds a cross-scale graph to reach near-single-cell granularity [22]. Structural-similarity index and gene-wise correlation after down-sampling serve as metrics.
• Cell-type deconvolution: F1 or area-under-precision–recall relative to scRNA-derived labels (Yang et al., 2024).
• Denoising: preservation of differentially expressed genes and improvement of downstream clustering (Xu et al., 2024).
Reporting metrics both spot-wise and gene-wise avoids Simpson’s paradox where a handful of high-expression genes dominate averages (Zyla et al., 2017).

5. Datasets and benchmarks
10x Visium still dominates algorithm papers, yet cross-platform resources are emerging. SpatialRef aggregates > 9 million manually annotated spots across 17 tissues and several technologies [23]. SpatialBenchVisium provides splenic datasets with matched protocols for fair comparison [24]. A systematic comparison of eleven sequencing-based platforms highlights diffusion during capture as a hidden variable that affects effective resolution [25]. LLOKI tackles feature and batch alignment across five imaging-based platforms without requiring shared gene panels [26], pointing towards unified atlases.

6. Technical challenges
Registration is hampered by tissue warping between cryosectioned ST slides and paraffin-embedded diagnostic slides; Liu & Yang [27] review computational alignment strategies. Staining variability across scanners induces covariate shift; adaptive thresholding improves multiplex immunofluorescence quantification [28]. Batch effects across patients or chemistries confound embeddings; HarmonizR and stMDA show that missing-value-tolerant adjustment or domain adaptation can preserve biology while reducing variance [29, 30]. Resolution mismatch obliges super-resolution models to extrapolate beyond the Nyquist limit—TransformerST partially addresses this via cross-scale constraints (Zhao et al., 2024). Training on gigapixel slides strains GPU memory; task-specific self-supervised pre-training reduces parameter count and carbon footprint [31], while cloud tools such as ElasticBLAST illustrate how on-demand infrastructure can amortise compute cost [32].

7. Pitfalls and failure modes
Redundancy between image texture and gene counts can make a model appear accurate even when it learns trivial proxies [33]. Spatial leakage—train/test spots sharing a histological neighbourhood—inflates correlation metrics [34]. High Moran’s I housekeeping genes dominate aggregate scores yet add little biological insight [35]. Permutation baselines, gene-stratified reporting and spatial block CV mitigate these artefacts.

8. Interpretability
Grad-CAM, Score-CAM and attention roll-out reveal which nuclei or stroma drive predictions [36, 37]. STAMarker converts saliency maps into SVG lists with statistical control (Zhang et al., 2023). Pathomic Fusion shows how Kronecker-product fusion permits modality-specific attribution while improving survival prediction [38]. Virtual staining networks such as MVFStain [39] and ULST [40] offer orthogonal validation by predicting immunostains from H&E, and have already been applied to tumour budding assessment [41].

9. Computational cost
Patch extraction and on-the-fly augmentation are I/O bottlenecks; naïve graph construction scales quadratically in spot number. Sparse adjacency, mini-batch contrastive learning (Yang et al., 2024) and cloud elasticity (Camacho et al., 2023) partly alleviate this. Vision transformers outperform CNNs on robustness but add memory overhead [42]. Researchers should therefore report compute time and energy alongside accuracy.

10. Outlook
Histology foundation models pretrained on millions of slides [43] now rival ImageNet backbones and transfer gracefully to ST (Ge et al., 2025). Crossmodal pre-training that pairs WSIs with ST spots [44] or bulk RNA-seq (Baul et al., 2024) hints at truly multimodal foundation models. PearlST shows that diffusion-regularised auto-encoders can trace developmental trajectories and spatiotemporal gradients [45]. Integration with spatial ATAC-seq, proteomics and metabolomics will require vertical data-fusion strategies now standard in other omics [46]. Ultimately, prompt-based models that ingest raw histology and output gene, protein and chromatin landscapes could enable rapid digital-pathology triage.

11. Recommendations for robust, clinically meaningful research
Transparency and FAIR principles are pre-requisites for regulatory adoption. Researchers should preregister protocols, release code and model weights, and adopt reporting frameworks such as CALIFRAME [47]. Data should be split by spatial blocks or patients, not random spots, and per-gene as well as per-spot metrics reported. Comparative baselines must include image-only and gene-only models to quantify complementary value. Saliency maps should be validated by quantitative perturbation tests and, where possible, orthogonal IHC or RNAscope. External validation on an independent laboratory dataset—ideally from a different scanner and staining protocol—is essential before clinical claims.

By situating future work along the translation–integration continuum and embracing rigorous validation, the field can progress from proof-of-concept correlations toward clinically actionable, morphology-aware molecular diagnostics.

References
----------
[1] Roxana Zahedi, Reza Ghamsari, Ahmadreza Argha, Callum Macphillamy, Amin Beheshti, Roohallah Alizadehsani, Nigel H Lovell, Mohammad Lotfollahi, Hamid Alinejad-Rokny (2024). Deep learning in spatially resolved transcriptfomics: a comprehensive technical view. PMID: 38483255.
[2] Benoît Schmauch, Alberto Romagnoni, Elodie Pronier, Charlie Saillard, Pascale Maillé, Julien Calderaro, Aurélie Kamoun, Meriem Sefta, Sylvain Toldo, Mikhail Zaslavskiy, Thomas Clozel, Matahi Moarii, Pierre Courtiol, Gilles Wainrib (2020). A deep learning model to predict RNA-Seq expression of tumours from whole slide images.. PMID: 32747659.
[3] Xingjian Chen, Jiecong Lin, Yuchen Wang, Weitong Zhang, Weidun Xie, Zetian Zheng, Ka-Chun Wong (2024). HE2Gene: image-to-RNA translation via multi-task learning for spatial transcriptomics data.. PMID: 38837395.
[4] Yuansong Zeng, Zhuoyi Wei, Weijiang Yu, Rui Yin, Yuchen Yuan, Bingling Li, Zhonghui Tang, Yutong Lu, Yuedong Yang (2022). Spatial transcriptomics prediction from histology jointly through Transformer and graph neural networks.. PMID: 35849101.
[5] Shivam Kumar, Samrat Chatterjee (2024). HistoSPACE: Histology-inspired spatial transcriptome prediction and characterization engine.. PMID: 39521362.
[6] Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J Irwin, Edward B Lee, Russell T Shinohara, Mingyao Li (2021). SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network.. PMID: 34711970.
[7] Kangning Dong, Shihua Zhang (2022). Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder.. PMID: 35365632.
[8] Yiwen Yang, Chengming Zhang, Zhaonan Liu, Kazuyuki Aihara, Chuanchao Zhang, Luonan Chen, Wu Wei (2024). MCGAE: unraveling tumor invasion through integrated multimodal spatial transcriptomics.. PMID: 39576225.
[9] Joanna Zyla, Michal Marczyk, January Weiner, Joanna Polanska (2017). Ranking metrics in gene set enrichment analysis: do they matter?. PMID: 28499413.
[10] Ruth Alexandra Stoney, Jean-Marc Schwartz, David L Robertson, Goran Nenadic (2018). Using set theory to reduce redundancy in pathway sets.. PMID: 30340461.
[11] Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang (2024). Multimodal contrastive learning for spatial gene expression prediction using histology images.. PMID: 39471412.
[12] Sudipto Baul, Khandakar Tanvir Ahmed, Qibing Jiang, Guangyu Wang, Qian Li, Jeongsik Yong, Wei Zhang (2024). Integrating spatial transcriptomics and bulk RNA-seq: predicting gene expression with enhanced resolution through graph attention networks.. PMID: 38960406.
[13] Hang Xu, Huazhu Fu, Yahui Long, Kok Siong Ang, Raman Sethi, Kelvin Chong, Mengwei Li, Rom Uddamvathanak, Hong Kai Lee, Jingjing Ling, Ao Chen, Ling Shao, Longqi Liu, Jinmiao Chen (2024). Unsupervised spatially embedded deep representation of spatial transcriptomics.. PMID: 38217035.
[14] Yuying Huo, Yilang Guo, Jiakang Wang, Huijie Xue, Yujuan Feng, Weizheng Chen, Xiangyu Li (2023). Integrating multi-modal information to detect spatial domains of spatial transcriptomics by graph attention network.. PMID: 37356752.
[15] Yongxin Ge, Jiake Leng, Ziyang Tang, Kanran Wang, Kaicheng U, Sophia Meixuan Zhang, Sen Han, Yiyan Zhang, Jinxi Xiang, Sen Yang, Xiang Liu, Yi Song, Xiyue Wang, Yuchen Li, Junhan Zhao (2025). Deep Learning-Enabled Integration of Histology and Transcriptomics for Tissue Spatial Profile Analysis.. PMID: 39830364.
[16] Lixin Lei, Kaitai Han, Zijun Wang, Chaojing Shi, Zhenghui Wang, Ruoyan Dai, Zhiwei Zhang, Mengqiu Wang, Qianjin Guo (2024). Attention-guided variational graph autoencoders reveal heterogeneity in spatial transcriptomics.. PMID: 38627939.
[17] Fangqin Zhang, Zhan Shen, Siyi Huang, Yuan Zhu, Ming Yi (2025). SpaInGNN: Enhanced clustering and integration of spatial transcriptomics based on refined graph neural networks.. PMID: 39542070.
[18] Chihao Zhang, Kangning Dong, Kazuyuki Aihara, Luonan Chen, Shihua Zhang (2023). STAMarker: determining spatial domain-specific variable genes with saliency maps in deep learning.. PMID: 37811885.
[19] Oscar E Ospina, Alex C Soupir, Roberto Manjarres-Betancur, Guillermo Gonzalez-Calderon, Xiaoqing Yu, Brooke L Fridley (2024). Differential gene expression analysis of spatial transcriptomic experiments using spatial mixed models.. PMID: 38744956.
[20] Jinjin Huang, Xiaoqian Fu, Zhuangli Zhang, Yinfeng Xie, Shangkun Liu, Yarong Wang, Zhihong Zhao, Youmei Peng (2024). A graph self-supervised residual learning framework for domain identification and data integration of spatial transcriptomics.. PMID: 39266614.
[21] Yuwei Hua, Yizhi Zhang, Zhenming Guo, Shan Bian, Yong Zhang (2025). ImSpiRE: image feature-aided spatial resolution enhancement method.. PMID: 39327391.
[22] Chongyue Zhao, Zhongli Xu, Xinjun Wang, Shiyue Tao, William A MacDonald, Kun He, Amanda C Poholek, Kong Chen, Heng Huang, Wei Chen (2024). Innovative super-resolution in spatial transcriptomics: a transformer model exploiting histology images and spatial gene expression.. PMID: 38436557.
[23] Ting Cui, Yan-Yu Li, Bing-Long Li, Han Zhang, Ting-Ting Yu, Jia-Ning Zhang, Feng-Cui Qian, Ming-Xue Yin, Qiao-Li Fang, Zi-Hao Hu, Yu-Xiang Yan, Qiu-Yu Wang, Chun-Quan Li, De-Si Shang (2025). SpatialRef: a reference of spatial omics with known spot annotation.. PMID: 39417483.
[24] Mei R M Du, Changqing Wang, Charity W Law, Daniela Amann-Zalcenstein, Casey J A Anttila, Ling Ling, Peter F Hickey, Callum J Sargeant, Yunshun Chen, Lisa J Ioannidis, Pradeep Rajasekhar, Raymond K H Yip, Kelly L Rogers, Diana S Hansen, Rory Bowden, Matthew E Ritchie (2025). Benchmarking spatial transcriptomics technologies with the multi-sample SpatialBenchVisium dataset.. PMID: 40156041.
[25] Yue You, Yuting Fu, Lanxiang Li, Zhongmin Zhang, Shikai Jia, Shihong Lu, Wenle Ren, Yifang Liu, Yang Xu, Xiaojing Liu, Fuqing Jiang, Guangdun Peng, Abhishek Sampath Kumar, Matthew E Ritchie, Xiaodong Liu, Luyi Tian (2024). Systematic comparison of sequencing-based spatial transcriptomic methods.. PMID: 38965443.
[26] Ellie Haber, Ajinkya Deshpande, Jian Ma, Spencer Krieger (2025). Unified integration of spatial transcriptomics across platforms.. PMID: 40236180.
[27] Yuyao Liu, Can Yang (2024). Computational methods for alignment and integration of spatially resolved transcriptomics data.. PMID: 38495555.
[28] Anja L Frei, Anthony McGuigan, Ritik Rak Sinha, Mark A Glaire, Faiz Jabbar, Luciana Gneo, Tijana Tomasevic, Andrea Harkin, Tim J Iveson, Mark Saunders, Karin Oein, Noori Maka, Francesco Pezella, Leticia Campo, Jennifer Hay, Joanne Edwards, Owen J Sansom, Caroline Kelly, Ian Tomlinson, Wanja Kildal, Rachel S Kerr, David J Kerr, Håvard E Danielsen, Enric Domingo, David N Church, Viktor H Koelzer (2023). Accounting for intensity variation in image analysis of large-scale multiplexed clinical trial datasets.. PMID: 37697694.
[29] Simon Schlumbohm, Julia E Neumann, Philipp Neumann (2025). HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation.. PMID: 39934730.
[30] Lequn Wang, Yaofeng Hu, Kai Xiao, Chuanchao Zhang, Qianqian Shi, Luonan Chen (2024). Multi-modal domain adaptation for revealing spatial functional landscape from spatially resolved transcriptomics.. PMID: 38819253.
[31] Tawsifur Rahman, Alexander S Baras, Rama Chellappa (2025). Evaluation of a Task-Specific Self-Supervised Learning Framework in Digital Pathology Relative to Transfer Learning Approaches and Existing Foundation Models.. PMID: 39455029.
[32] Christiam Camacho, Grzegorz M Boratyn, Victor Joukov, Roberto Vera Alvarez, Thomas L Madden (2023). ElasticBLAST: accelerating sequence search via cloud computing.. PMID: 36967390.
[33] Alona Levy-Jurgenson, Xavier Tekpli, Vessela N Kristensen, Zohar Yakhini (2020). Spatial transcriptomics inferred from pathology whole-slide images links tumor heterogeneity to survival in breast and lung cancer.. PMID: 33139755.
[34] Péter Nagy, Brigitta Tóth, István Winkler, Ádám Boncz (2024). The effects of spatial leakage correction on the reliability of EEG-based functional connectivity networks.. PMID: 38825981.
[35] Gabriel Ostlund, Erik L L Sonnhammer (2014). Avoiding pitfalls in gene (co)expression meta-analysis.. PMID: 24184361.
[36] Yusuf Brima, Marcellin Atemkeng (2024). Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis.. PMID: 38909228.
[37] Daniel T Huff, Amy J Weisman, Robert Jeraj (2021). Interpretation and visualization techniques for deep learning models in medical imaging.. PMID: 33227719.
[38] Richard J Chen, Ming Y Lu, Jingwen Wang, Drew F K Williamson, Scott J Rodig, Neal I Lindeman, Faisal Mahmood (2022). Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis.. PMID: 32881682.
[39] Ranran Zhang, Yankun Cao, Yujun Li, Zhi Liu, Jianye Wang, Jiahuan He, Chenyang Zhang, Xiaoyu Sui, Pengfei Zhang, Lizhen Cui, Shuo Li (2022). MVFStain: Multiple virtual functional stain histopathology images generation based on specific domain mapping.. PMID: 35810588.
[40] Haoran Zhang, Mingzhong Pan, Chenglong Zhang, Chenyang Xu, Hongxing Qi, Dapeng Lei, Xiaopeng Ma (2025). ULST: U-shaped LeWin Spectral Transformer for virtual staining of pathological sections.. PMID: 40164031.
[41] Xingzhong Hou, Zhen Guan, Xianwei Zhang, Xiao Hu, Shuangmei Zou, Chunzi Liang, Lulin Shi, Kaitai Zhang, Haihang You (2024). Evaluation of tumor budding with virtual panCK stains generated by novel multi-model CNN framework.. PMID: 39241330.
[42] Maximilian Springenberg, Annika Frommholz, Markus Wenzel, Eva Weicken, Jackie Ma, Nils Strodthoff (2023). From modern CNNs to vision transformers: Assessing the performance, robustness, and classification strategies of deep learning models in histopathology.. PMID: 37201221.
[43] Daisuke Komura, Mieko Ochi, Shumpei Ishikawa (2025). Machine learning methods for histopathological image analysis: Updates in 2024.. PMID: 39897057.
[44] Zarif L Azher, Michael Fatemi, Yunrui Lu, Gokul Srinivasan, Alos B Diallo, Brock C Christensen, Lucas A Salas, Fred W Kolling, Laurent Perreard, Scott M Palisoul, Louis J Vaickus, Joshua J Levy (2024). Spatial Omics Driven Crossmodal Pretraining Applied to Graph-based Deep Learning for Cancer Pathology Analysis.. PMID: 38160300.
[45] Haiyun Wang, Jianping Zhao, Qing Nie, Chunhou Zheng, Xiaoqiang Sun (2024). Dissecting Spatiotemporal Structures in Spatial Transcriptomics via Diffusion-Based Adversarial Learning.. PMID: 38812530.
[46] Pedro H Godoy Sanches, Nicolly Clemente de Melo, Andreia M Porcari, Lucas Miguel de Carvalho (2024). Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics.. PMID: 39596803.
[47] Kirubel Biruk Shiferaw, Irina Balaur, Danielle Welter, Dagmar Waltemath, Atinkut Alamirrew Zeleke (2024). CALIFRAME: a proposed method of calibrating reporting guidelines with FAIR principles to foster reproducibility of AI research in medicine.. PMID: 39430802.