
可视文字      自监督学习      增量检测      其他工作     


  1. W Zeng, Y Shu, Z Li, D Yang, Y Zhou*. "TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control." NeurIPS, 2024. (CCF-A, Spotlight, PDF)
  2. ​​Y Zhang, C Liu, Y Zhou*, W Wang, Q Ye, X Ji. "Beyond Instance Discrimination: Relation-aware Contrastive Self-supervised Learning." TMM, 2024. (SCI一区CCF-BPDF)
  3. G Zeng, Y Zhang, J Wei, D Yang, P Zhang, Y Gao, X Qin, Y Zhou. "Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval." ACM MM, 2024. (CCF-A, PDF)
  4. D Wu, D Yang, Y Zhou, C Ma. "Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs." ACM MM, 2024. (CCF-A)
  5. D Wu, D Yang, Y Zhou, C Ma. "Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion." ACM MM, 2024. (CCF-A)
  6. Z Li, Y Shu, W Zeng, D Yang, Y Zhou*. "First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending." ECAI, 2024. (CCF-B, Acceptance Rate 23.0%, PDF)
  7. X Yang, D Yang, Z Qiao, Y Zhou. "Accurate and Robust Scene Text Recognition via Adversarial Training." ICASSP, 2024. (CCF-B, PDF)
  8. X Yang, Z Qiao, J Wei, D Yang, Y Zhou*. "Masked and Permuted Implicit Context Learning for Scene Text Recognition." IEEE SPL, 2024. (CCF-C, SCIPDF)
  9. Y Zhang, G Zeng, H Shen, C Ma, Y Zhou*. "Show Exemplars and Tell Me What You See: In-context Learning with Frozen Large Language Models for TextVQA." PRCV, 2024. (CCF-C)
  10. Y Shu, W Zeng, Z Li, F Zhao, Y Zhou*. "Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing." arXiv, 2024. (PDF)
  11. J Lyu, J Wei, G Zeng, Z Li, E Xie, W Wang, Y Zhou*. "TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model." arXiv, 2024. (PDF)
  12. D Wu, D Yang, H Shen, C Ma, Y Zhou. "Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition." arXiv, 2024. (PDF)


  1. 张言,李强,申化文,曾港艳,周宇*,马灿,张远,王伟平. "以文字为中心的图像理解技术综述." 中国图象图形学报, 2023. (PDF)
  2. B Fang, W Wu, C Liu, Y Zhou*, M Yang, Y Song, F Li, W Wang, X Ji, W Ouyang. "UATVR: Uncertainty-adaptive Text-Video Retrieval." ICCV, 2023. (CCF-APDF)
  3. H Shen, X Gao, J Wei, L Qiao, Y Zhou*, Q Li, Z Cheng. "Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables." IJCAI, 2023. (CCF-A, Oral Presentation, Acceptance Rate 15.0%, PDF
  4. D Yang, Y Zhou*, X Hong, A Zhang, W Wang. "One-shot Replay: Boosting Incremental Object Detection via Retrospecting One Object." AAAI, 2023. (CCF-AOral Presentation, Acceptance Rate 约11.0%PDF)
  5. X Qin, P Lyu, C Zhang, Y Zhou*, K Yao, P Zhang, H Lin, W Wang. "Towards Robust Real-time Scene Text Detection: From Semantic to Instance Representation Learning." ACM MM, 2023. (CCF-AOral Presentation, PDF)
  6. Y Shu, W Wang, Y Zhou*, S Liu, A Zhang, D Yang, W Wang. "Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector." ACM MM, 2023. (CCF-AOral Presentation, PDF)
  7. G Zeng, Y Zhang, Y Zhou*, B Fang, G Zhao, X Wei, W Wang. "Filling in the Blank: Rationale-augmented Prompt Tuning for TextVQA." ACM MM, 2023. (CCF-AOral Presentation, PDF)
  8. D Yang, Y Zhou*, X Hong, A Zhang, X Wei, L Zeng, Z Qiao, W Wang. "Pseudo Object Replay and Mining for Incremental Object Detection." ACM MM, 2023. (CCF-A, Oral Presentation, PDF)
  9. G Zeng, Y Zhang, Y Zhou*, X Yang, N Jiang, G Zhao, W Wang, XC Yin. "Beyond OCR + VQA: Towards End-to-end Reading and Reasoning for Robust and Accurate TextVQA." PR, 2023. (SCI一区CCF-BPDF)
  10. C Liu, Y Yao, D Luo, Y Zhou, Q Ye. "Self-supervised Motion Perception for Spatio-temporal Representation Learning." TNNLS, 2023. (SCI一区, CCF-B, PDF)
  11. X Yang, D Yang, Y Zhou, Y Guo, W Wang. "Mask-guided Stamp Erasure for Real Document Image." ICME, 2023. (CCF-B, PDF)
  12. Y Shu, S Liu, Y Zhou, H Xu, F Jiang. "EI2SR: Learning an Enhanced Intra-instance Semantic Relationship for Arbitrary-shaped Scene Text Detection." ICASSP, 2023. (CCF-BPDF)
  13. X Sun, J Lyu, Y Zhang, G Zeng, B Fang, Y Zhou*, E Xie, C Ma. "Feature Enhancement with Text-specific Region Contrast for Scene Text Detection." PRCV, 2023. (CCF-C, Oral Presentation, Acceptance Rate 2.3%, PDF)
  14. X Yang, Z Qiao, Y Zhou*, W Wang. "IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition." arXiv, 2023. (PDF)


  1. 周宇*,吕嘉昊,申化文,王威,魏谨,曾港艳,曾维超,王伟平. "从检测、识别到理解:场景文字相关领域研究进展." 中国自动化学会模式识别与机器智能专委会通讯特约专栏, 2022. (链接)
  2. B Fang, W Wu, C Liu, Y Zhou*, D He, W Wang. "MaMiCo: Macro-to-micro Semantic Correspondence for Self-supervised Video Representation Learning." ACM MM, 2022. (CCF-AOral Presentation, Acceptance Rate 5.0%PDF)
  3. W Wang, Y Zhou*, J Lv, D Wu, G Zhao, N Jiang, W Wang. "TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text Representation." ACM MM, 2022. (CCF-APDF)
  4. J Wei, Y Zhang, Y Zhou*, G Zeng, Z Qiao, Y Guo, H Wu, H Wang, W Wang. "TextBlock: Towards Scene Text Spotting without Fine-grained Detection." ACM MM, 2022. (CCF-APDF)
  5. X Chen, Y Zhou, D Wu, W Zhang, Y Zhou, B Li, W Wang. "Imagine by Reasoning: A Reasoning-based Implicit Semantic Data Augmentation for Long-tailed Classification." AAAI, 2022. (CCF-APDF)
  6. D Yang, Y Zhou*, A Zhang, X Sun, D Wu, W Wang, Q Ye. "Multi-view Correlation Distillation for Incremental Object Detection." PR, 2022. (SCI一区CCF-BPDF)
  7. Y Zhou, X Li, Y Zhou, Y Wang, Q Hu, W Wang. "Deep Collaborative Multi-task Network: A Human Decision Process Inspired Model for Hierarchical Image Classification." PR, 2022. (SCI一区CCF-BPDF)
  8. D Yang, Y Zhou*, W Shi, D Wu, W Wang. "RD-IOD: Two-level Residual-distillation-based Triple Network for Incremental Object Detection." TOMM, 2022. (SCI一区CCF-BPDF)
  9. D Luo, Y Zhou*, B Fang, Y Zhou, D Wu, W Wang. "Exploring Relations in Untrimmed Videos for Self-supervised Learning." TOMM, 2022. (SCI一区CCF-BPDF)
  10. Y Guo, Y Zhou*, X Qin, E Xie, W Wang. "UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection." ICME, 2022. (CCF-BOral PresentationPDF)
  11. C Fang, G Zeng, Y Zhou*, D Wu, C Ma, D Hu, W Wang."Towards Escaping from Language Bias and OCR Error: Semantics-centered Text Visual Question Answering." ICME, 2022. (CCF-BPDF)
  12. W Li, D Luo, B Fang, X Li, Y Zhou*, W Wang. "Video Motion Perception for Self-supervised Representation Learning." ICANN, 2022. (CCF-CPDF)


  1. Z Qiao, Y Zhou*, J Wei, W Wang, Y Zhang, N Jiang, H Wang, W Wang. "PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition." ACM MM, 2021. (CCF-ABest Paper Candidate [5/1942=2.5‰]PDF)
  2. G Zeng, Y Zhang, Y Zhou*, X Yang. "Beyond OCR + VQA: Involving OCR into the Flow for Robust and Accurate TextVQA." ACM MM, 2021. (CCF-AOral PresentationAcceptance Rate 9.2%PDF)
  3. X Li, Y Zhou*, Y Zhang, A Zhang, W Wang, N Jiang, H Wu, W Wang. "Dense Semantic Contrast for Self-supervised Visual Representation Learning." ACM MM, 2021. (CCF-AOral PresentationAcceptance Rate 9.2%PDF)
  4. X Qin, Y Zhou*, Y Guo, D Wu, Z Tian, N Jiang, H Wang, W Wang. "Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-shaped Scene Text Detection." ACM MM, 2021. (CCF-APDF)
  5. W Zhang, D Wu, Y Zhou, B Li, W Wang, D Meng. "Binary Neural Network Hashing for Image Retrieval." SIGIR, 2021. (CCF-APDF)
  6. X Qin, Y Zhou*, Y Guo, D Wu, W Wang. "FC2RN: A Fully Convolutional Corner Refinement Network for Accurate Multi-oriented Scene Text Detection." ICASSP, 2021. (CCF-BPDF)
  7. G Zeng, Y Zhang, Y Zhou*, X Yang. "A Cost-efficient Framework for Scene Text Detection in the Wild." PRICAI, 2021. (CCF-CPDF)
  8. Y Guo, Y Zhou*, X Qin, W Wang. "Which and Where to Focus: A Simple yet Accurate Framework for Arbitrary-shaped Nearby Text Detection in Scene Images." ICANN, 2021. (CCF-CPDF)
  9. X Li, Y Zhou, Y Zhou, W Wang. "MMF: Multi-task Multi-structure Fusion for Hierarchical Image Classification." ICANN, 2021. (CCF-CPDF)
  10. H Li, Y Guo, Y Zhou*, W Wang. "Density-Net: A Density-aware Network for 3D Object Detection." ICTAI, 2021. (CCF-CPDF)
  1. Z Qiao, Y Zhou*, D Yang, Y Zhou, W Wang. "SEED: Semantics Enhanced Encoder-decoder Framework for Scene Text Recognition." CVPR, 2020. (CCF-A, Acceptance Rate 22%, 277 CitationsPDF)
  2. Y Yao, C Liu, D Luo, Y Zhou, Q Ye. "Video Playback Rate Perception for Self-supervised Spatio-temporal Representation Learning." CVPR, 2020. (CCF-A, Acceptance Rate 22%, 198 CitationsPDF)
  3. D Luo, C Liu, Y Zhou*, D Yang, C Ma, Q Ye, W Wang. "Video Cloze Procedure for Self-supervised  Spatio-temporal Learning." AAAI, 2020. (CCF-AOral Presentation, Acceptance Rate 5.8%, 174 CitationsPDF)
  4. W Zhang, D Wu, Y Zhou, B Li, W Wang, D Meng. "Deep Unsupervised Hybrid-similarity Hadamard Hashing." ACM MM, 2020. (CCF-APDF)
  5. S Zhao, D Wu, W Zhang, Y Zhou, B Li, W Wang. "Asymmetric Deep Hashing for Efficient Hash Code Compression." ACM MM, 2020. (CCF-APDF)
  6. Y Chen, W Wang, Y Zhou*, F Yang, D Yang, W Wang. "Self-training for Domain Adaptive Scene Text Detection." ICPR, 2020. (CCF-C, Oral Presentation, Acceptance Rate 4.4%PDF)
  7. Z Qiao, X Qin, Y Zhou*, F Yang, W Wang. "Gaussian Constrained Attention Network for Scene Text Recognition." ICPR, 2020. (CCF-C, PDF)
  8. Y Zhang, C Liu, Y Zhou*, W Wang, W Wang, Q Ye. "Progressive Cluster Purification for Unsupervised Feature Learning." ICPR, 2020. (CCF-CPDF)
  9. Y Zhou, Y Wang, J Cai, Y Zhou, Q Hu, W Wang. "Expert Training: Task Hardness Aware Meta-learning for Few-shot Classification." arXiv preprint, 2020. (PDF)
2019&Pre,参见DBLP & Google Scholar