全部 |
  • 全部
  • 题名
  • 作者
  • 机构
  • 关键词
  • NSTL主题词
  • 摘要
检索 二次检索 AI检索
外文文献 中文文献
筛选条件:

1. Diffusion-based diverse audio captioning with retrieval-guided Langevin dynamics NSTL国家科技图书文献中心

Zhu, Yonggang |  Men, Aidong... -  《Information Fusion》 - 2025,114 - 共13页

摘要:Audio captioning, a comprehensive task of |  audio captioning systems fail to produce captions with |  been explored in audio captioning. On the other hand |  audio captioning tasks may aggravate this problem due |  diffusion-based diverse audio captioning model by
关键词: Diverse audio captioning |  Diffusion models |  Langevin dynamics

2. InVideo Search: Scene Description Clustering and Integrating Image and Audio Captioning for Enhanced Video Search NSTL国家科技图书文献中心

Almira Asif Khan |  Muhammed... -  《Distributed Computing and Internet Technology》 -  International Conference on Distributed Computing and Internet Technology - 2025, - 195~208 - 共14页

摘要: in video analysis, image/video captioning, and |  modules: Keyframe extraction, Captioning, and Query |  focuses on a multimodal captioning strategy to integrate | In the digital era, the vast amount of long |  videos highlights the need for an advanced system to
关键词: Clustering |  Keyframe extraction |  Natural language processing |  Video content retrieval system

3. Flying Together with Audio and Video: Enhancing Communication for the Hearing-Impaired Through an Emerging Closed Captioning Standard NSTL国家科技图书文献中心

Luntian Mou |  Peize Li... -  《Social Robotics》 -  International Conference on Social Robotics - 2025, - 282~292 - 共11页

摘要: program's audio elements, Closed Captioning primarily |  hearing impaired. Since text is much simpler than audio |  and video, Closed Captioning is traditionally |  elementary stream, usually accompanied by one or more audio |  elementary streams. Since Closed Captioning is extremely
关键词: Closed captioning |  Caption elementary stream |  Hearing impaired

4. MCANet: Multimodal Caption Aware Training-Free Video Anomaly Detection via Large Language Model NSTL国家科技图书文献中心

Prabhu Prasad Dev |  Raju Hazari... -  《Pattern Recognition,Part XXXII》 -  International Conference on Pattern Recognition - 2025, - 362~379 - 共18页

摘要: captions produced by the audio captioning model. The | -the-shelf vision-language model (VLM), audio |  generated by the image captioning model, while the second |  module applies audio-text similarities to refine noisy | Towards Video Anomaly Detection (VAD
关键词: Video anomaly detection |  Large language model |  Vision language model |  Audio language model |  Multimodal captions

5. VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset NSTL国家科技图书文献中心

Jing Liu |  Sihan Chen... -  《IEEE Transactions on Pattern Analysis and Machine Intelligence》 - 2025,47(2) - 708~724 - 共17页

摘要:In this paper, we propose the Vision-Audio |  jointly models the relationships among vision, audio |  Multimodal Grouping Captioning (MGC). MGA projects vision | , language, and audio into the same common space | , simultaneously building vision-language, audio-language, and
关键词: Videos |  Correlation |  Benchmark testing |  Data models |  Visualization |  Question answering (information retrieval) |  Law enforcement |  Web sites |  Video on demand |  Training

6. Audio-Guided Visual Knowledge Representation NSTL国家科技图书文献中心

Fei Yu |  Zhiguo Wan... -  《Database Systems for Advanced Applications》 -  International Conference on Database Systems for Advanced Applications |  International Workshop on Big Data Management and Service |  International Workshop on Graph Data Management and Analysis |  International Workshop on Big Data Quality Management |  Workshop on Emerging Results inData Science and Engineering - 2025, - 129~146 - 共18页

摘要: this limitation, this paper proposes an audio-guided |  integrating auditory cues into visual captioning, the model |  on multimodal perception data is extensive, audio |  through multimodal-guided visual captioning and link | Visual knowledge is primarily acquired through
关键词: Visual knowledge |  Interpretable expression |  Multisensory collaboration |  Audio guidance

7. Towards a Multimodal Framework for Remote Sensing Image Change Retrieval and Captioning NSTL国家科技图书文献中心

Roger Ferrod |  Luigi Di Caro... -  《Discovery Science,Part II》 -  International Conference on Discovery Science - 2025, - 231~245 - 共15页

摘要: other modalities, such as images, audio and video, to |  on specific tasks like classification, captioning |  for both captioning and text-image retrieval. By |  jointly training a contrastive encoder and captioning | , while maintaining captioning performances that are
关键词: Remote sensing |  Bi-temporal change detection |  Image captioning |  Text-image retrieval |  Contrastive learning

8. Diffusion-Based Multimodal Video Captioning NSTL国家科技图书文献中心

Jaakko Kainulainen |  Zixin Guo... -  《Computer Vision - ACCV 2024,Part III》 -  Asian Conference on Computer Vision - 2025, - 148~165 - 共18页

摘要: audio synthesis. However, their applicability to video |  captioning has not yet received widespread attention |  captioning and experiments with various modality fusion |  diffusion models in multimodal video captioning and in the |  diffusion-based models in video captioning, paving the way
关键词: Video captioning |  Multimodal captioning |  Diffusion models |  Deep learning

9. An efficient deep learning-based video captioning framework using multi-modal features NSTL国家科技图书文献中心

Soumya Varma |  Dinesh Peter James -  《Expert systems》 - 2025,42(2) - e12920.1~e12920.16 - 共16页 - 被引量:2

摘要: captioning, video summarizing, subtitling, blind navigation |  into the various methods for video captioning using |  practical, efficient video captioning architecture using |  deep learning which that will utilize the audio clues |  captioning process. Quantum deep learning architectures can
关键词: attention context |  encoder-decoder framework |  language model |  quantum machine learning |  video captioning

10. AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies NSTL国家科技图书文献中心

Elise Lincker |  Camille Guinaudeau... -  《MultiMedia Modeling,Part I》 -  International Conference on MultiMedia Modeling - 2025, - 58~71 - 共14页

摘要: captioning, this work often falls short in assessing visual |  audio descriptions in movies. Our dataset, comprising |  captioning and text generation models in producing | Alternative text (alt text) is often mistaken |  for image captions. However, alt text is intended to
关键词: Alt text |  Alternative text |  Audio description |  Image-to-text generation |  Visual accessibility
检索条件Audio captioning
  • 检索词扩展

NSTL主题词

  • NSTL学科导航