Understanding the activity of miRNA in individual cells presents a challenge due to the limitations of single-cell technologies in capturing miRNAs. To tackle this obstacle, we introduce two deep learning models: Cross-Modality (CM) and Single-Modality (SM). These models utilize encoder-decoder architectures to predict miRNA expression at the bulk and single-cell levels from mRNA data. We compared CM and SM with a state-of-the-art approach, miRSCAPE, using both bulk and single-cell datasets. We found that both CM and SM outperformed miRSCAPE in terms of accuracy. We also observed that integrating miRNA target information led to a significant enhancement in performance compared to using all genes. These models offer valuable tools for predicting miRNA expression from single-cell mRNA data.