[1] Bai J, Liu H, Wang M, et al. AudioSetCaps: An Enriched Audio-Caption Dataset Using Automated Generation Pipeline With Large Audio and Language Models[J]. IEEE Transactions on Audio, Speech and Language Processing, 2025, 33: 2817-2829. [2] Bai J, Chen J, Wang M, et al. A squeeze-and-excitation and transformer-based cross-task model for environmental sound recognition[J]. IEEE Transactions on Cognitive and Developmental Systems, 2022, 15(3): 1501-1513. [3] Bai J, Chen J, Wang M. Multimodal urban sound tagging with spatiotemporal context[J]. IEEE Transactions on Cognitive and Developmental Systems, 2022, 15(2): 555-565. [4] Bai J, Chen J, Wang M, et al. SSDPT: Self-supervised dual-path transformer for anomalous sound detection[J]. Digital Signal Processing, 2023, 135: 103939. [5] Bai J, Huang S, Yin H, et al. 3D audio signal processing systems for speech enhancement and sound localization and detection[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1-2. [6] Bai J, Yin H, Wang M, et al. AudioLog: LLMs-powered long audio logging with hybrid token-semantic contrastive learning[C]//2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024: 1-6. |