NSTL国家科技图书文献中心

1. Learning-Based Sub-image Retrieval in Historical Document Images NSTL国家科技图书文献中心

Joseph Assaker | Stephane Nicolas ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 137～151 - 共15页

摘要：The goal of this paper is to propose an | unsupervised learning-based framework in order to deal with | any kind of one-shot object detection scenario | , focusing on the tasks of sub-image retrieval and pattern | spotting in historical document images. Taking in an

关键词： Sub-Image retrieval | Pattern spotting | Image retrieval | One-Shot object detection | Historical document images

2. Word-Diffusion: Diffusion-Based Handwritten Text Word Image Generation NSTL国家科技图书文献中心

Aniket Gurav | Narayanan C. Krishna...... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 53～72 - 共20页

摘要：Generating realistic handwritten word images | that closely resemble a target style remains a | challenging task in document image analysis. In recent years | , deep learning techniques, such as Latent Diffusion | Models (LDM), have shown promise in generating styled

关键词： Denoising diffusion probabilistic model | Handwritten text recognition | Synthetic handwritten data

3. Enhancing Authorship Attribution Through Embedding Fusion: A Novel Approach with Masked and Encoder-Decoder Language Models NSTL国家科技图书文献中心

Arjun Ramesh Kaushik | R. P. Sunil Rufus ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 460～471 - 共12页

摘要：The increasing prevalence of AI-generated | content alongside human-written text underscores the | need for reliable discrimination methods. To address | this challenge, we propose a novel framework with | textual embeddings from Pre-trained Language Models

关键词： Authorship attribution | Large language models | Generative AI

4. LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking NSTL国家科技图书文献中心

Faren Yan | Peng Yu ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 399～411 - 共13页

摘要：The use of LLMs for natural language | processing has become a popular trend in the past two years | , driven by their formidable capacity for context | comprehension and learning, which has inspired a wave of | research from academics and industry professionals

关键词： Natural language processing | Named entity recognition | Large language models | Prompt engineering

5. Font Style Translation in Scene Text Images with CLIPstyler NSTL国家科技图书文献中心

Honghui Yuan | Keiji Yanai - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 105～121 - 共17页

摘要：Scene text editing is widely used in various | fields, such as poster design and correcting spelling | mistakes in the image. Editing text in images is a | challenging task that requires accurately and naturally | integrating text within complex backgrounds. Existing

关键词： Image style transfer | Font translation | Scene text images | Arbitrary style | CLIPstyler

6. VisEmoComic: Visual Emotion Recognition in Comics Image NSTL国家科技图书文献中心

Ruddy Theodose | Jean-Christophe Buri... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 281～296 - 共16页

摘要：Emotion recognition in images have bean widely | studied on captured data of real people but few works | have been realized on drawn data. Among this category | , comic books have become an important part of the of | the popular culture. Whether realistic drawings or

关键词： Emotion recognition | Manga | Comics analysis | Document analysis

7. LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts NSTL国家科技图书文献中心

Vaibhav Agrawal | Niharika Vadlamudi ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 217～233 - 共17页

摘要：The dense and unstructured text in historical | manuscripts presents significant challenges for precise line | segmentation due to large diversity in sizes, scripts and | appearances of the documents. Existing approaches tackle | this complexity either by performing dataset-specific

关键词： Text line segmentation | Historical manuscripts | Deep learning | Zero-Shot | Transformers

8. Facet-Aware Multimodal Summarization via Cross-Modal Alignment NSTL国家科技图书文献中心

Yu Weng | Xuming Ye ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 37～52 - 共16页

摘要：Multimodal generative models have demonstrated | promising capabilities for bridging the semantic gap | between visual and textual modalities, especially in the | context of multimodal summarization. Most of the | existing methods align the visual and textual information

关键词： Multimedia analysis | Document understanding | Semantic technology | Summarization

9. Enhancing Bengali Text-to-Speech Synthesis Through Transformer-Driven Text Normalization NSTL国家科技图书文献中心

Krishnendu Ghosh | Munmun Patra ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 429～444 - 共16页

摘要：This paper presents a transformer-driven | approach for nonstandard word (NSW) normalization in | Bengali text-to-speech synthesis (TTS) systems. Our text | normalization (TN) approach is realized over three modules | : pre-processing, NSW classification, and token-to

关键词： Text-to-speech synthesis | Bengali | Text normalization | Transformer | Non-standard words

10. Visual Question Answering with Cascade of Self- and Co-Attention Blocks NSTL国家科技图书文献中心

Aakansha Mishra | Ashish Anand ... - 《Pattern Recognition,Part XIX》 - International Conference on Pattern Recognition - 2025, - 20～36 - 共17页

摘要：Recent advancements in Visual Question | Answering (VQA) have been driven by the integration of | complex attention mechanisms. This work introduces a | novel approach aimed at enhancing multi-modal | representations through dense interactions between visual and

摘要：The goal of this paper is to propose an | unsupervised learning-based framework in order to deal with | any kind of one-shot object detection scenario | , focusing on the tasks of sub-image retrieval and pattern | spotting in historical document images. Taking in an

关键词： Sub-Image retrieval | Pattern spotting | Image retrieval | One-Shot object detection | Historical document images

摘要：Generating realistic handwritten word images | that closely resemble a target style remains a | challenging task in document image analysis. In recent years | , deep learning techniques, such as Latent Diffusion | Models (LDM), have shown promise in generating styled

关键词： Denoising diffusion probabilistic model | Handwritten text recognition | Synthetic handwritten data

摘要：The increasing prevalence of AI-generated | content alongside human-written text underscores the | need for reliable discrimination methods. To address | this challenge, we propose a novel framework with | textual embeddings from Pre-trained Language Models

关键词： Authorship attribution | Large language models | Generative AI

摘要：The use of LLMs for natural language | processing has become a popular trend in the past two years | , driven by their formidable capacity for context | comprehension and learning, which has inspired a wave of | research from academics and industry professionals

关键词： Natural language processing | Named entity recognition | Large language models | Prompt engineering

摘要：Scene text editing is widely used in various | fields, such as poster design and correcting spelling | mistakes in the image. Editing text in images is a | challenging task that requires accurately and naturally | integrating text within complex backgrounds. Existing

关键词： Image style transfer | Font translation | Scene text images | Arbitrary style | CLIPstyler

摘要：Emotion recognition in images have bean widely | studied on captured data of real people but few works | have been realized on drawn data. Among this category | , comic books have become an important part of the of | the popular culture. Whether realistic drawings or

关键词： Emotion recognition | Manga | Comics analysis | Document analysis

摘要：The dense and unstructured text in historical | manuscripts presents significant challenges for precise line | segmentation due to large diversity in sizes, scripts and | appearances of the documents. Existing approaches tackle | this complexity either by performing dataset-specific

关键词： Text line segmentation | Historical manuscripts | Deep learning | Zero-Shot | Transformers

摘要：Multimodal generative models have demonstrated | promising capabilities for bridging the semantic gap | between visual and textual modalities, especially in the | context of multimodal summarization. Most of the | existing methods align the visual and textual information

关键词： Multimedia analysis | Document understanding | Semantic technology | Summarization

摘要：This paper presents a transformer-driven | approach for nonstandard word (NSW) normalization in | Bengali text-to-speech synthesis (TTS) systems. Our text | normalization (TN) approach is realized over three modules | : pre-processing, NSW classification, and token-to

关键词： Text-to-speech synthesis | Bengali | Text normalization | Transformer | Non-standard words

摘要：Recent advancements in Visual Question | Answering (VQA) have been driven by the integration of | complex attention mechanisms. This work introduces a | novel approach aimed at enhancing multi-modal | representations through dense interactions between visual and

关键词： VQA | Attention | Self-Attention | Co-attention | Multi-modal fusion | Classification networks

4008-161-200 800-990-8900

国家科技图书文献中心