publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

Google Scholar

2026

  1. M2S-AVSR: Modality-aware Multi-View Self-supervised Representations for Audio-Visual Speech Recognition
    Fei Su, Cancan Li, Juan Liu, and Ming Li
    submitted to IEEE Transactions on Audio, Speech, and Language Processing, 2026
  2. Multimodal Large Language Models for ADOS-M1 Behavioral Assessment
    Wenxing Liu, Yueran Pan, Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li
    submitted to Neurocomputing, 2026
  3. VLM-Guided Semantic Augmentation and Uncertainty-Aware Tri-modal Fusion for Group Emotion Recognition
    Hongxi Yi, Yufei Xie, Dong Zhang, Wenxing Liu, Ming Li, and Dah-Jye Lee
    submitted to IEEE Transactions on Multimedia, 2026
  4. Emotional Description-Guided Vision-Language Semantic Alignment for Group Emotion Recognition
    Hongxi Yi, Dong Zhang, Yufei Xie, Wenxing Liu, Ming Li, and Dah-Jye Lee
    submitted to IEEE Transactions on Affective Computing, 2026
  5. Toward Multimodal Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals
    Zhang Chen, Yucong Zhang, Xiaoxiao Miao, and Ming Li
    2026
    submitted to Interspeech 2026
  6. Audio-Visual Speech Enhancement in Complex Scenarios with Separation and Dereverberation Joint Modeling
    Jiarong Du, Zhan Jin, Peijun Yang, Juan Liu, Zhuo Li, Xin Liu, and Ming Li
    2026
    submitted to Interspeech 2026
  7. Robust Audio-Visual Target Speaker Extraction with Multiple Enrollment Fusion
    Zhan Jin, Bang Zeng, Peijun Yang, Jiarong Du, Wei Ju, Yao Tian, Juan Liu, and Ming Li
    2026
    submitted to Interspeech 2026
  8. DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
    Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, and Ming Li
    2026
    submitted to ACM Multimedia 2026
  9. Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge
    Ze Li, Xiaoxiao Miao, Juan Liu, and Ming Li
    2026
    submitted to Interspeech 2026
  10. SPATIALLY-AUGMENTED SEQUENCE-TO-SEQUENCE NEURAL DIARIZATION FOR MEETINGS
    Li Li, Ming Cheng, Juan Liu, and Ming Li
    2026
    submitted to Interspeech 2026
  11. WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Speech Conversion
    Dong Liu, Juan Liu, Wei Ju, Yao Tian, and Ming Li
    2026
    submitted to Interspeech 2026
  12. Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
    Fei Su, Cancan Li, Juan Liu, Wei Ju, Hongbin Suo, and Ming Li
    2026
    submitted to Interspeech 2026
  13. Making Separation-First Multi-Stream Audio Watermarking Feasible via Joint Training
    Houmin Sun, Zi Hu, Linxi Li, Yechen Wang, Weijin Li, and Ming Li
    2026
    submitted to Interspeech 2026
  14. A Dual-Path Efficient EEG Encoder for Brain-Assisted Target Speaker Extraction
    Wang Xiang, Xue Zhang, Bang Zeng, Cunhang Fan, Juan Liu, and Ming Li
    2026
    submitted to Interspeech 2026
  15. AISHELL8-FISHEYE: A Fisheye Audio-Visual Dataset for Target Speaker Extraction with Distortion-Aware Baselines
    Peijun Yang, Zhan Jin, Juan Liu, Hui Bu, and Ming Li
    2026
    submitted to ACM Multimedia 2026
  16. Multi-View Based Audio Visual Target Speaker Extraction
    Peijun Yang, Zhan Jin, Juan Liu, and Ming Li
    2026
    submitted to Interspeech 2026
  17. MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection
    Xueping Zhang, Zhenshan Zhang, Yechen Wang, Linxi Li, Liwei Jin, and Ming Li
    2026
    submitted to Interspeech 2026
  18. Dual-Encoder Fusion with Explicit and Implicit Injection for the Interspeech 2026 Audio Encoder Capability Challenge
    Yucong Zhang, Zhang Chen, Juan Liu, Wei Ju, Hongbin Suo, and Ming Li
    2026
    submitted to Interspeech 2026
  19. Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement
    Wenxing Liu, Yueran Pan, Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li
    Neurocomputing, 2026
  20. Glitter: Exploring an LLM Virtual Agent for Supporting Practitioners in Behavioral Interventions of Autistic Children
    Xin Tong, Liwen He, Zhaowen Deng, Weibo Li, Ziheng Tang, Yixuan Li, Yutong Ren, Matthew Louis Mauriello, and Ming Li
    International Journal of Human–Computer Interaction, 2026
  21. Enhancing Speaker Verification with W2v-Bert 2.0 and Knowledge Distillation Guided Pruning
    Ze Li, Ming Cheng, and Ming Li
    In ICASSP, 2026
  22. AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines
    Cancan Li, Fei Su, Juan Liu, Hui Bu, Yulong Wan, Hongbin Suo, and Ming Li
    In ICASSP, 2026
  23. Compspoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasure
    Xueping Zhang, Liwei Jin, Yechen Wang, Linxi Li, and Ming Li
    In ICASSP, 2026
  24. The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures
    ZhenShan Zhang, Xueping Zhang, Yechen Wang, Liwei Jin, and Ming Li
    In ICASSP, 2026
  25. ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals
    Yucong Zhang, Juan Liu, and Ming Li
    In ICASSP, 2026

2025

  1. Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
    Danwei Cai, Zexin Cai, Ze Li, and Ming Li
    IEEE Transactions on Audio, Speech, and Language Processing, 2025
  2. Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization
    Ming Cheng, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  3. Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation
    Ming Cheng, Yuke Lin, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  4. Assessing the Expressive Language Levels of Autistic Children in Home Intervention
    Yueran Pan, Biyuan Chen, Wenxing Liu, Ming Cheng, Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li
    IEEE Transactions on Computational Social Systems, 2025
  5. Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
    Bang Zeng, and Ming Li
    Computer Speech and Language, 2025
  6. USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
    Bang Zeng, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
  7. Multi-scale Scanning Network for Machine Anomalous Sound Detection
    Yucong Zhang, Juan Liu, and Ming Li
    ICONIP, 2025
  8. An Automatic Laryngoscopic Image Segmentation System Based on SAM Prompt Engineering: From Glottis Annotation to Vocal Fold Segmentation
    Yucong Zhang, Yuchen Song, Juan Liu, and Ming Li
    Frontiers in Molecular Biosciences, 2025
  9. Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis
    Yucong Zhang, Xin Zou, Jinshan Yang, Wenjun Chen, Juan Liu, and Faya Liangand Ming Li
    Computer Speech and Language, 2025
  10. Improving Anomalous Sound Detection with Top-M Pseudo-Labeling
    Zhang Chen, Yucong Zhang, and Ming Li
    In NCMMSC, 2025
  11. Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge
    Ming Cheng, Fei Su, Cancan Li, Juan Liu, and Ming Li
    In Interspeech, 2025
  12. "Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features
    Jiarong Du, Zhan Jin, Bang Zeng, Peijun Yang, Ming Li, and Juan Liu
    In NCMMSC, 2025
  13. Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System
    Ze Li, Yao Shi, Yunfei Xu, and Ming Li
    In ICME, 2025
  14. Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive Scoring
    Yuchen Song, Yucong Zhang, and Ming Li
    In Interspeech, 2025
  15. SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
    Beilong Tang, Xiaoxiao Miao, Xin Wang, and Ming Li
    In ASRU, 2025
  16. LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
    Beilong Tang, Bang Zeng, and Ming Li
    In ASRU, 2025
  17. TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
    Beilong Tang, Bang Zeng, and Ming Li
    In NCMMSC, 2025
  18. Enhancing the Robustness of Speech Anti-spoofing Countermeasures through Joint Optimization and Transfer Learning
    Yikang WANG, Xingming WANG, Chee Siang LEOW, Qishan ZHANG, Ming LI, and Hiromitsu NISHIZAKI
    In IEICE TRANSACTIONS on Information and System, 2025
  19. VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset
    Yuxi Wang, Yikang Wang, Qishan Zhang, Hiromitsu Nishizaki, and Ming Li
    In Interspeech, 2025
  20. SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis
    Zhuojun Wu, Dong Liu, Juan Liu, Yechen Wang, Linxi Li, Liwei Jin, Hui Bu, Pengyuan zhang, and Ming Li
    In ACM Multimedia, 2025
  21. Efficient Video to Audio Mapper with Visual Scene Detection
    Mingjing Yi, Yuxi Wang, and Ming Li
    In APSIPA ASC, 2025
  22. Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings
    Hongyu Zhang, Ming Cheng, Jing Feng, and Ming Li
    In Interspeech, 2025

2024

  1. StarRescue: the Design and Evaluation of A Turn-Taking Collaborative Game for Facilitating Social and Fine Motor Skills of Children with Autism Spectrum Disorder
    Rongqi Bei, Yajie Liu, Yihe Wang, Yuxuan Huang, Ming Li, Yuhang Zhao, and Xin Tong
    CHI, 2024
  2. Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation
    Danwei Cai, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  3. Location-guided Head Pose Estimation for Fisheye Image
    Bing Li, Dong Zhang, Cheng Huang, Yun Xian, Ming Li, and Dah-Jye Lee
    IEEE Transactions on Cognitive and Developmental Systems, 2024
  4. Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction
    Dong Liu, Yueqian Lin, Hui Bu, and Ming Li
    IALP, 2024
  5. Speaker verification in deliberately disguised scenarios
    Xiaoyi Qin, Ze Li, Dong Liu, and Ming Li
    Computer Engineering and Applications, 2024
  6. Speaker verification in deliberately disguised scenarios
    Xiaoyi Qin, Ze Li, Dong Liu, and Ming Li
    Computer Engineering and Applications, 2024
  7. Investigating Long-Term and Short-Term Time-Varying Speaker Verification
    Xiaoyi Qin, Na Li, Shufei Duan, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  8. Online Neural Speaker Diarization with Target Speaker Tracking
    Weiqing Wang, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
  9. HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children
    Chengyan Yu, Shihuan Wang, Dong Zhang, Yingying Zhang, Chaoqun Cen, Zhixiang You, Xiaobing Zou, Hongzhu Deng, and Ming Li
    IEEE Transactions on Learning Technologies, 2024
  10. Joint Training on Multiple Datasets with Inconsistent Labeling Criteria for Facial Expression Recognition
    Chengyan Yu( *), Dong Zhang, Wei Zou, and Ming Li
    IEEE Transactions on Affective Computing, 2024
  11. Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios
    Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li
    J. Shanghai Jiaotong Univ. (Sci.) (2024), 2024
  12. Invertible Voice Conversion with Parallel Data
    Zexin Cai, and Ming Li
    In ICASSP, 2024
  13. Slidespeech: A Large Scale Slide-Enriched Audio-Visual Corpus
    HaoxuWang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, and Ming Li
    In ICASSP, 2024
  14. The Database and Benchmark for the Source Speaker Tracing Challenge 2024
    Ze Li, Yuke Lin, Yao Tian, Hongbin Suo, Pengyuan zhang, Yanzhen Ren, Zexin Cai, Hiromitsu Nishizaki, and Ming Li
    In SLT, 2024
  15. Multi-Objective Progressive Clustering for Semi-Supervised Domain Adaptation in Speaker Verification
    Ze Li, Yuke Lin, Ning Jiang, Xiaoyi Qin, Guoqing Zhao, Haiying Wu, and Ming Li
    In ICASSP, 2024
  16. Vivid Background Audio Generation based on Large Language Models and AudioLDM
    Yiwei Liang, and Ming Li
    In ISCSLP, 2024
  17. VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark
    Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang, and Ming Li
    In Interspeech, 2024
  18. Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation
    Yueqian Lin, Dong Liu, Yunfei Xu, Hongbin Suo, and Ming Li
    In ISCSLP, 2024
  19. Voxblink: A Large Scale Speaker Verification Dataset on Camera
    Yuke Lin, Xiaoyi Qin, Guoqing Zhao, Ming Cheng, Ning Jiang, Haiying Wu, and Ming Li1
    In ICASSP, 2024
  20. TMCSpeech: A Chinese Tv and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model
    Dong Liu, Yueqian Lin, Yunfei Xu, and Ming Li
    In ICPR, 2024
  21. The WHU Wake Word Lipreading System for the 2024 Chat-scenario Chinese Lipreading Challenge
    Haoxu Wang, Cancan Li, Fei Su, Juan Liu, Hongbin Suo, and Ming Li
    In ICME challenge paper, 2024
  22. Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing
    Weiqing Wang, Danwei Cai, Ming Cheng, and Ming Li
    In ICASSP, 2024
  23. Robust Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual Conformer
    Haoxu Wang, Ming Cheng, Qiang Fu, and Ming Li
    In Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual Conformer”, ICASSP, 2024
  24. Lightweight Language Model for Speech Synthesis: Attempts and Analysis
    Zhuojun Wu, Dong Liu, and Ming Li
    In ISCSLP, 2024
  25. Efficient Personal Voice Activity Detection with Wake Word Reference Speech"
    Bang Zeng, Ming Cheng, Yao Tian, Haifeng Liu, and Ming Li
    In ICASSP, 2024
  26. A Dual-Path Framework with Frequency-and-Time Excited Network for Machine Anomalous Sound Detection
    Yucong Zhang, Juan Liu, Yao Tian, Haifeng Liu, and Ming Li
    In ICASSP, 2024
  27. KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
    Huali Zhou, Yuke Lin, Dong Liu, and Ming Li
    In ICPR, 2024

2023

  1. Integrating Frame-Level Boundary Detection and Deepfake Detection for Locating Manipulated Regions in Partially Spoofed Audio Forgery Attacks
    Zexin Cai, and Ming Li
    Computer Speech and Language, 2023
  2. Computer-aided Autism Spectrum Disorder Diagnosis with Behavior Signal Processing
    Ming Cheng, Yingying Zhang, Yixiang Xie, Yueran Pan, Xiao Li, Wenxing Liu, Chengyan Yu, Dong Zhang, Yu Xing, Xiaoqian Huang, Fang Wang, Cong You, Yuanyuan Zou, Yuchong Liu, Fengjing Liang, Huilin Zhu, Chun Tang, Hongzhu Deng, Xiaobing Zou, and Ming Li
    IEEE Transactions on Affective Computing, 2023
  3. Expressive language profiles in a clinically screening sample of Mandarin-speaking preschool children with Autism Spectrum Disorder
    Li Li, Yi (Esther) Su, Wenwen Hou, Muyu Zhou, Yixiang Xie, Xiaobing Zou, and Ming Li
    Journal of Speech, Language, and Hearing Research, 2023
  4. Assessing the Social Skills of Children with Autism Spectrum Disorder via Language-Image Pre-training Models
    Wenxing Liu, Ming Cheng, Yueran Pan, Lynn Yuan, Suxiu Hu, Ming Li, and Songtian Zeng
    The 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023
  5. From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3
    Xiaoyi Qin, Xingming Wang, Yanli Chen, Qinglin Meng, and Ming Li
    IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA 2023), 2023
  6. A multimodal machine learning system in early screening for toddlers with autism spectrum disorders based on the response to name
    Feng-lei Zhu, Shi-huan Wang, Wen-bo Liu, Hui-lin Zhu, Ming Li, and Xiao-bing Zou
    Frontiers in Psychiatry, 2023
  7. Pretraining Conformer with ASR for Speaker Verification
    Danwei Cai, Weiqing Wang, Ming Li, Rui Xia, and Chuanzeng Huang
    In ICASSP, 2023
  8. Identifying Source Speakers For Voice Conversion Based Spoofing Attacks For Speaker Verification
    Danwei Cai, Zexin Cai, and Ming Li
    In ICASSP, 2023
  9. Waveform Boundary Detection For Partially Spoofed Audio
    Zexin Cai, Weiqing Wang, and Ming Li
    In ICASSP, 2023
  10. "Target-Speaker Voice Activity Detection Via Sequence-To-Sequence Prediction
    Ming Cheng, Weiqing Wang, Yucong Zhang, Xiaoyi Qin, and Ming Li
    In ICASSP, 2023
  11. The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023
    Ming Cheng, Weiqing Wang, Xiaoyi Qin, Yuke Lin, Ning Jiang, Guoqing Zhao, and Ming Li
    In NCMMSC, 2023
  12. Real-time Automotive Engine Sound Simulation with Deep Neural Network
    Hao Li, Weiqing Wang, and Ming Li
    In NCMMSC, 2023
  13. EEG-Based Speech Envelope Decoding: Structured State Space and Diffusion Model Integration
    Yueqian Lin, and Ming Li
    In NCMMSC, 2023
  14. Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification
    Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, and Ming Li
    In ASRU, 2023
  15. Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features
    Xingming Wang, Hao Wu, Chen Ding, Chuanzeng Huang, and Ming Li
    In ICASSP, 2023
  16. "Robust audio anti-spoofing countermeasure with joint training of front-end and back-end models
    Xingming Wang, Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li
    In Interspeech, 2023
  17. Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios
    Bang Zeng, Hongbing Suo, Yulong Wan, and Ming Li
    In NCMMSC, 2023
  18. SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network
    Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li
    In Interspeech, 2023
  19. Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues
    Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li
    In APSIPA ASC, 2023
  20. Pre-training Deep Learning Models with Finite Element Simulation Data for Enhanced Machine Anomalous Sound Detection
    Zhixian Zhang, Yucong Zhang, and Ming Li
    In NCMMSC, 2023
  21. Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning
    Yucong Zhang, Hongbin Suo, Yulong Wan, and Ming Li
    In Interspeech, 2023
  22. BiSinger: Bilingual Singing Voice Synthesis
    Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, and Ming Li
    In ASRU, 2023

2022

  1. Cross-lingual Multispeaker Speech Synthesis under Limited-Data Scenarios
    Zexin Cai, Yaogen Yang, and Ming Li
    Computer Speech and Language, 2022
  2. Incorporating visual information in audio based self-supervised speaker recognition
    Danwei Cai, Weiqing Cai, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
  3. Accurate Head Pose Estimation Using Image Rectification and Lightweight Convolutional Neural Network
    Xiao Li, Dong Zhang, Ming Li, and Dah-Jye Lee
    IEEE Transactions on Multimedia, 2022
  4. Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios
    Xiaoyi Qin, Danwei Cai, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
  5. Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization
    Weiqing Wang, Qingjian Lin, Danwei Cai, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
  6. Paralinguistic singing attribute recognition using supervised machine learning for describing the singing voice in vocal pedagogy
    Yanze Xu, Ming Li, Huahua Cui, and Mingyang Xu
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  7. Electrolaryngeal Speech Enhancement based on Bottleneck Feature Refinement and Voice Conversion
    Yaogen Yang, Haozhe Zhang, Zexin Cai, Yao Shi, Ming Li, Dong Zhang, Xiaojun Ding, Jianhua Deng, and Jie Wang
    Biomedical Signal Processing and Control, 2022
  8. A Complementary Dual-branch Network for Appearance-based Gaze Estimation from Low-resolution Facial Image
    Zhesi Zhu, Dong Zhang, Cailong Chi, Ming Li, and Dah-Jye Lee
    IEEE Transactions on Cognitive and Developmental Systems, 2022
  9. THE WHU-ALIBABA AUDIO-VISUAL SPEAKER DIARIZATION SYSTEM FOR THE MISP CHALLENGE 2022
    Ming Cheng, Haoxu Wang, Ziteng Wang, Qiang Fu, and Ming Li
    In ICASSP 2023, 2022
  10. Single-Channel Target Speaker Separation using Joint Training with Target Speaker’s Pitch Information
    Jincheng He, Yuanyuan Bao, Na Xu, Hongfeng Li, Shicong Li, Linzhang Wang, Fei Xiang, and Ming Li
    In Odyssey, 2022
  11. Towards Lightweight applications: Asymmetric Enroll-Verify Structure For Speaker Verification
    Qingjian Lin, Lin Yang, Xuyang Wang, Xiaoyi Qin, Junjie Wang, and Ming Li
    In ICASSP, 2022
  12. A Multimodal Framework for Automated Teaching Quality Assessment of One-to-many Online Instruction Videos
    Yueran Pan, Jiaxin Wu, Ran Ju, Ziang Zhou, Jiayue Gu, Songtian Zeng, Lynn Yuang, and Ming Li
    In ICPR, 2022
  13. "Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection
    Xiaoyi Qin, Na Li, Chao Weng, Dan Su, and Ming Li
    In ICASSP, 2022
  14. "Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
    Xiaoyi Qin, Na Li, Chao Weng, Dan Su, and Ming Li
    In Interspeech, 2022
  15. "VC-AUG : Voice Conversion based Data Augmentation for Text-Dependent Speaker Verification
    Xiaoyi Qin, Yaogen Yang, Yao Shi, Lin Yang, Xuyang Wang, Junjie Wang, and Ming Li
    In NCMMSC, 2022
  16. Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MET Challenge
    Weiqing Wang, Xiaoyi Qin, and Ming Li
    In of ICASSP, 2022
  17. Generating Adversarial Samples For Training Wake-Up Word Detection Systems Against Confusing Words
    Haoxu Wang, Yan Jia, Zeqing Zhao, Xuyang Wang, Junjie Wang, and Ming Li
    In Odyssey, 2022
  18. The DKU-OPPO System for the Spoofing-Aware Speaker Verification challenge 2022
    Xingming Wang, Xiaoyi Qin, Yikang Wang, Yunfei Xu, and Ming Li
    In Interspeech, 2022
  19. Low Pass Filtering and Band Extension for Robust Anti-spoofing Countermeasure against Codec Variabilities
    Yikang Wang, Xingming Wang,Hiromitsu Nishizaki, and Ming Li
    In ISCSLP, 2022
  20. Online Target Speaker Voice Activity Detection for Speaker Diarization
    Weiqing Wang, Ming Li, and Qingjian Lin
    In Interspeech, 2022
  21. Incorporating End-To-End Framework Into Target-Speaker Voice Activity Detection
    Weiqing Wang, and Ming Li
    In Prof. of ICASSP, 2022
  22. Low-Latency Online Speaker Diarization with Graph-Based Label Generation
    Yucong Zhang, Qinjian Lin, Weiqing Wang, Lin Yang, Xuyang Wang, Junjie Wang, and Ming Li
    In Odyssey, 2022
  23. SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System For Both Human Beings And Machines
    Haozhe Zhang, Zexin Cai, Xiaoyi Qin, and Ming Li
    In ICASSP, 2022
  24. Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion
    Ziang Zhou, Yanze Xu, and Ming Li
    In NCMMSC, 2022
  25. Source Tracing: Detecting Voice Spoofing
    Tinglong Zhu, Xingming Wang, Xiaoyi Qin, and Ming Li
    In APSIPA ASC, 2022

2021

  1. Discriminative Dictionary Learning for Autism Spectrum Disorder Identification
    Wenbo Liu, Ming Li, Xiaobing Zou, and Bhiksha Raj
    Frontiers in Computational Neuroscience, 2021
  2. Typical Facial Expression Network Using Facial Feature Decoupler and Spatial-Temporal Learning
    Jianing Teng, Dong Zhang, Ming Li, and Dah-Jye Lee
    IEEE Transactions on Affective Computing, 2021
  3. Audio-based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism
    Weiqing Wang, Jin Pan, Hua Yi, Zhanmei Song, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29 (2021): 1119-1133, 2021
  4. Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication
    Yuanyuan Bao, Yanze Xu, Na Xu, Wenjing Yang, Hongfeng Li, Shicong Li, Yongtao Jia, Fei Xiang, Jincheng He, and Ming Li
    In NCMMSC, 2021
  5. Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays
    Danwei Cai, and Ming Li
    In SLT, 2021
  6. A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data
    Weicheng Cai, and Ming Li
    In APSIPA ASC, 2021
  7. The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge
    Danwei Cai, and Ming Li
    In VoxSRC, 2021
  8. An Iterative Framework For Self-Supervised Deep Speaker Representation Learning
    Danwei Cai, Weiqing Wang, and Ming Li
    In ICASSP, 2021
  9. "Cross-modal Assisted Training for Abnormal Event Recognition in Elevators
    Xinmeng Chen, Xuchen Gong, Ming Cheng, Qi Deng, and Ming Li
    In ICMI, 2021
  10. The DKU Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge
    Ming Cheng, Haoxu Wang, Yechen Wang, and Ming Li
    In ICASSP 2023, 2021
  11. The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 Misp Challenge: Deep Analysis
    HaoxuWang, Ming Cheng, Qiang Fu, and Ming Li
    In ICASSP 2023, 2021
  12. Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling
    Haiwei Wuand Ming Li
    In NCMMSC, 2021
  13. Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation
    Tingle Li, Jiawei Chen, Haowen Hou, and Ming Li
    In ISCSLP, 2021
  14. Acoustic Word Embedding on Code-switching Query by Example Spoken Term Detection
    Murong Ma, Haiwei Wu, Xuyang Wang, Lin Yang, Junjie Wang, and Ming Li
    In ISCSLP, 2021
  15. AISHELL-3: A Multi-Speaker Mandarin TTS Corpus
    Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, and Ming Li
    In INTERSPEECH, 2021
  16. End-to-End Mandarin Tone Classification with Short Term Context Information
    Jiyang Tang, and Ming Li
    In APSIPA ASC, 2021
  17. The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge
    Weiqing Wang, Danwei Cai, Qingjian Lin, Lin Yang, Junjie Wang, Jin Wang, and Ming Li.
    In VoxSRC, 2021
  18. The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III
    Weiqing Wang, Danwei Cai, Jin Wang, Mi Hong, Xuyang Wang, Qingjian Lin, and Ming Li
    In INTERSPEECH, 2021
  19. A Two-Stage Query-by-example Spoken Term Detection System for Personalized Keyword Spotting
    Yechen Wang, Yan Jia, Murong Ma, Zexin Cai, and Ming Li
    In NCMMSC, 2021
  20. Binary Neural Network for Speaker Verification
    Tinglong Zhu, Xiaoyi Qin, and Ming Li
    In INTERSPEECH, 2021

2020

  1. On the fly Data Loader and Utterance-level Aggregation for Speaker and Language Recognition
    Weicheng Cai, Jinkun Chen, Jun Zhang, and Ming Li
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28 (2020): 1038-1051, 2020
  2. STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition
    Weicong Chen, Dong Zhang, Ming Li, and Dah-Jye Lee
    IEEE Transactions on Affective Computing, 2020
  3. From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer by Feedback Constraint
    Zexin Cai, Chuxiong Zhang, and Ming Li
    In INTERSPEECH, 2020
  4. Within-sample variability-invariant loss for robust speaker recognition under noisy environments
    Danwei Cai, Weicheng Cai, and Ming Li
    In ICASSP, 2020
  5. The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results
    Yan Jia, Xingming Wang, Xiaoyi Qin, Yinping Zhang, Xuyang Wang, Junjie Wang, Dong Zhang, and Ming Li
    In INTERSPEECH, 2020
  6. "Atss-Net: Target Speaker Separation via Attention-based Neural Network
    Tingle Li, Qingjian Lin, Yuanyuan Bao, and Ming Li
    In INTERSPEECH, 2020
  7. DIHARD II is Still Hard: Experimental Results and Discussions
    Qingjian Lin, Weicheng Cai, Lin Yang, Junjie Wang, Jun Zhang, and Ming Li
    In Odyssey, 2020
  8. Self-Attentive Similarity Measurement Strategies in Speaker Diarization
    Qingjian Lin, Yu Hou, and Ming Li
    In INTERSPEECH, 2020
  9. Optimal Mapping Loss: A Faster Loss for End-to-End Speaker Diarization
    Qingjian Lin, Tingle Li, Lin Yang, Junjie Wang, and Ming Li
    In Odyssey, 2020
  10. The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02
    Qingjian Lin, Tingle Li, and Ming Li
    In INTERSPEECH, 2020
  11. Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening"
    Yueran Pan, Kunjing Cai, Ming Cheng, Xiaobing Zou, and Ming Li
    In ICPR, 2020
  12. HI-MIA: a far-field text-dependent speaker verification database and the baselines"
    Xiaoyi Qin, Hui Bu, and Ming Li
    In ICASSP, 2020
  13. The INTERSPEECH 2020 Far-Field Speaker Verification Challenge
    Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, and Haizhou Li
    In INTERSPEECH, 2020
  14. Domain Aware Training for Far-field Small-footprint Keyword Spotting
    Haiwei Wu, Yan Jia, Yuanfei Nie, and Ming Li
    In INTERSPEECH, 2020

2019

  1. String Stability Analysis for Vehicle Platooning under Unreliable Communication Links with Event-Triggered Strategy
    Zhicheng Li, Bin Hu, Ming Li, and Gengnan Luo
    IEEE Transactions on Vehicular Technology, 68, no. 3 (2019): 2152-2164, 2019
  2. An Automated Assessment Framework for Atypical Prosody and Stereotyped Idiosyncratic Phrases related to Autism Spectrum Disorder
    Ming Li, Dengke Tang, Junlin Zeng, Tianyan Zhou, and Xiaobing Zou
    Computer Speech and Language, 56 (2019): 80-94, 2019
  3. Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
    Danwei Cai, Xiaoyi Qin, and Ming Li
    In INTERSPEECH, 2019
  4. The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge
    Danwei Cai, Xiaoyi Qin, Weicheng Cai, and Ming Li
    In INTERSPEECH, 2019
  5. Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Feature
    Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, and Ming Li
    In INTERSPEECH, 2019
  6. F0 contour estimation using phonetic feature in electrolaryngeal speech enhancement
    Zexin Cai, Zhicheng Xu, and Ming Li
    In ICASSP, 2019
  7. The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion
    Weicheng Cai, Haiwei Wu, Danwei Cai, and Ming Li
    In INTERSPEECH, 2019
  8. Utterance-level End-to-end Language Identification using Attention-based CNN-BLSTM
    Weicheng Cai, Shen Huang, and Ming Li
    In ICASSP, 2019
  9. LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization
    Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, and Claude Barras
    In INTERSPEECH, 2019
  10. Far-Field End-to-End Text-Dependent Speaker Verification based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation
    Xiaoyi Qin, Danwei Cai, and Ming Li
    In INTERSPEECH, 2019
  11. Fixation Based Object Recognition in Autism Clinic Setting
    Sheng Sun, Shuangmei Li, Wenbo Liu, Xiaobing Zou, and Ming Li
    In ICIRA, 2019
  12. Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection
    Weiqing Wang, Haiwei Wu, and Ming Li
    In APSIPA ASC, 2019
  13. The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge
    Haiwei Wu, Weiqing Wang, and Ming Li
    In INTERSPEECH, 2019
  14. DKU-Tencent Submission to Oriental Language Recognition AP18-OLR Challenge
    Haiwei Wu, Weicheng Cai, Ming Li, Ji Gao, Shanshan Zhang, Zhiqiang Lv, and Shen Huang
    In APSIPA ASC, 2019

2018

  1. Cancellable Speech Template via Random Binary Orthogonal Matrices Projection Hashing
    Kong-Yik Chee, Zhe Jin, Danwei Cai, Ming Li, Wun-She Yap, Yen-Lung Lai, and Bok-Min Goi
    ” Pattern Recognition, 2018
  2. Facial Expression Recognition with Identity and Emotion Joint Learning
    Ming Li, Hao Xu, Xingchang Huang, Zhanmei Song, Xiaolin Liu, and Xin Li
    IEEE Transaction on Affective Computing, accepted in 2018, published at 12, no. 2 (2021): 544-550, 2018
  3. Finite-time Stability and Stabilization of Semi-Markovian Jump Systems with Time Delay
    Zhicheng Li, Yinliang Xu, and Ming Li
    International Journal of Robust and Nonlinear Control, 28, no. 6 (2018): 2064-2081, 2018
  4. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
    Weicheng Cai, Wenbo Liu, Zexin Cai, and Ming Li
    In ICASSP, 2018
  5. The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation
    Danwei Cai, Weicheng Cai, and Ming Li
    In INTERSPEECH, 2018
  6. The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory Inversion
    Zexin Cai, Xiaoyi Qin, Danwei Cai, Ming Li, and Xinzhong Liu
    In ISCSLP, 2018
  7. Deep Speaker Embedding with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Danwei Cai, Cai Zexin, and Ming Li
    In APSIPA ASC, 2018
  8. Analysis of Length Normalization in End-to-End Speaker Verification System
    Weicheng Cai, Jinkun Chen, and Ming Li
    In INTERSPEECH, 2018
  9. Insights into End-to-End Learning Scheme for Language Identification
    Weicheng Cai, Zexin Cai, Xiang Zhang, and Ming Li
    In ICASSP, 2018
  10. Exploring the Encoding Layer and Loss function in End-to-End Speaker and Language Recognition System
    Weicheng Cai, Jinkun Chen, and Ming Li
    In Odyssey, 2018
  11. End-to-end Language Identification using NetFV and NetVLAD
    Jinkun Chen, Weicheng Cai, and Ming Li
    In ISCSLP, 2018
  12. "An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individual
    Ming Li Dengke Tang (*)
    In INTERSPEECH, 2018
  13. Unsupervised Query by Example Spoken Term Detection Using Features Concatenated with Self-Organizing Map Distances
    Haiwei Wu, and Ming Li,
    In ISCSLP, 2018

2017

  1. Reconstruction of Lamb wave dispersion curves by sparse representation and continuity constraints
    Wenbo Zhao, Ming Li, Joel B. Harley, Yuanwei Jin, Jose Moura, and Jimmy Zhu
    Journal of the Acoustical Society of America, 141, no. 2 (2017): 749-763, 2017
  2. Countermeasures for Automatic Speaker Verification Replay Spoofing Attack: On Data Augmentation, Feature Representation, Classification and Fusion
    Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, and Ming Li
    In INTERSPEECH, 2017
  3. End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum
    Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, and Ming Li
    In INTERSPEECH, 2017
  4. Automatic Emotional Spoken Language Text Corpus Construction from Written Dialogs in Fictions
    Jinkun Chen, and Ming Li
    In ACII, 2017
  5. Mandarin Electrolaryngeal Voice Conversion with Combination of Gaussian Mixture Model and Non-negative Matrix Factorization
    Ming Li, Luting Wang, Zhicheng Xu, and Danwei Cai
    In APSIPA ASC, 2017
  6. Response to Name: A Dataset and A Multimodal Machine Learning Framework towards Autism Study
    Wenbo Liu, Xiaobin Zou, and Ming Li
    In ACII, 2017
  7. SphereFace: Deep Hypersphere Embedding for Face Recognition
    Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song
    In CVPR, 2017
  8. An audio based piano performance evaluation method using deep neural network based acoustic modeling
    Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, and Manman Zhu
    In INTERSPEECH, 2017
  9. An Automated Assessment Framework for Speech Abnormalities related to Autism Spectrum Disorder
    Tianyan Zhou, Yixiang Xie, Xiaobing Zou, and Ming Li
    In INTERSPEECH, 2017

2016

  1. Speaker verification based on the fusion of speech acoustics and inverted articulatory signals
    Ming Li, Jangwon Kim, Adam Lammert, Prasanta Kumar Ghosh, Vikram Ramanarayanan, and Shrikanth Narayanan
    Computer Speech & Language, 36 (2016): 196-211, 2016
  2. Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification
    Ming Li, and Wenbo Liu
    Journal of Signal Processing Systems, 82, no. 2 (2016): 207-215, 2016
  3. Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning framework
    Wenbo Liu, Ming Li, and Li Yi
    Autism research, 9, no. 8 (2016): 888-898, 2016
  4. Locality Sensitive Discriminant Analysis for Speaker Recognition
    Danwei Cai, Weicheng Cai, and Ming Li
    In APSIPA ASC, 2016
  5. A Fast Tracking Algorithm for Estimating Ultrasonic Signal Time of Flight in Drilled Shafts Using Active Shape Models
    Zhun Chen, Wenbo Zhao, Yuanwei Jin, Ming Li, and Jimmy Zhu
    In IUS, 2016
  6. Entity Disambiguation by Knowledge and Text Jointly Embedding
    Wei Fang, Jianwen Zhang, Dilin Wang, Zheng Chen, and Ming Li.
    In CoNLL, 2016
  7. The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge
    Gaoyuan He, Jinkun Chen, Xuebo Liu, and Ming Li
    In CCPR, 2016
  8. Efficient Misalignment-Robust Face Recognition Via Locality-Constrained Representation
    Yandong Wen, Weiyang Liu, Meng Yang, and Ming Li
    In ICIP, 2016
  9. On Order-Constrained Transitive Distance Clustering
    Zhiding Yu, Weiyang Liu, Wenbo Liu, Yingzhen Yang, Ming Li, and Vijayakumar Bhagavatula
    In AAAI, 2016
  10. Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features
    Huadi Zheng, Weicheng Cai, Tianyan Zhou, Shilei Zhang, and Ming Li
    In ICPR, 2016
  11. Speaker Diarization System for Autism Children’s Real-Life Audio Data
    Tianyan Zhou, Weicheng Cai, Xiaoyan Chen, Xiaobing Zou, Shilei Zhang, and Ming Li
    In ISCSLP, 2016

2015

  1. Automatic intelligibility classification of sentence-level pathological speech
    Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, and Shrikanth Narayanan
    Computer Speech & Language, 29, no. 1 (2015): 132-144, 2015
  2. Innovations in the Use of Interactive Technology to Support Weight Management
    Donna Spruijt-Metz, Cheng K.F. Wen, Gillian O’Reilly, Ming Li, Sangwon Lee, Adar Emken, Urbashi Mitra, Murali Annavaram, Gisele Ragusa, and Shrikanth Narayanan.
    Current Obesity Reports, 4, no. 4 (2015): 510-519, 2015
  3. Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart Grid
    Yinliang Xu, Zaiyue Yang, Wei Gu, Ming Li, and Zicong Deng
    IEEE Transactions On Smart Grid, 8, no. 4 (2015): 1568-1579, 2015
  4. Duration Dependent Covariance Regularization in PLDA Modeling for Speaker Verification
    Weicheng Cai, Ming Li, Lin Li, and Qingyang Hong
    In INTERSPEECH, 2015
  5. Automatic assessment of non-native accent degrees using phonetic level posterior and duration features from multiple languages
    Shushan Chen, Yiming Zhou, and Ming Li
    In APSIPA ASC, 2015
  6. Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System
    Qingyang Hong, Lin Li, Ming Li, Ling Huang, and Jun Zhang
    In INTERSPEECH, 2015
  7. speaker verification with the mixture of Gaussian factor analysis based representation
    Ming Li
    In ICASSP, 2015
  8. Locality Constrained Transitive Distance Clustering on Speech Data
    Wenbo Liu, Zhiding Yu, Bhiksha Raj, and Ming Li
    In INTERSPEECH, 2015
  9. Efficient Autism Spectrum Disorder Diagnosis with Eye Movement: A Machine Learning Framework
    Wenbo Liu, Zhiding Yu, Li Yi, Bhiksha Raj, and Ming Li
    In ACII, 2015
  10. Speech bandwidth expansion based on deep neural networks
    Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, and Jingming Kuang
    In Interspeech, 2015
  11. The SYSU system for the INTERSPEECH 2015 automatic speaker verification spoofing and countermeasures challenge
    Shitao Weng, Shushan Chen, Lei Yu, Xuewei Wu, Weicheng Cai, Zhi Liu, Yiming Zhou, and Ming Li
    In APSIPA ASC, 2015

2014

  1. Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors
    Daniel Bone, Ming Li, Matthew Black, and Shrikanth Narayanan
    Computer Speech & Language, 28, no. 2 (2014): 375-391, 2014
  2. "Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification
    Ming Li, and Shrikanth Narayanan
    Computer Speech & Language, 28, no. 4 (2014): 940-958, 2014
  3. Verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusion
    Ming Li, and Xin Li
    In ICASSP, 2014
  4. Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens
    Ming Li
    In INTERSPEECH, 2014
  5. Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features
    Ming Li, and Wenbo Liu
    In INTERSPEECH, 2014
  6. An Iterative Framework for Unsupervised Learning in the PLDA basedSpeaker Verification
    Wenbo Liu, Zhiding Yu, and Ming Li
    In ISCSLP, 2014
  7. "Simplified and supervised i-vector modeling for speaker age regression
    Prashanth Gurunath Shivakumar, Ming Li, Vedant Dhandhania, and Shrikanth S.Narayanan
    In ICASSP, 2014

2013

  1. Automatic Speaker Age and Gender Recognition using acoustic and prosodic level information fusion
    Ming Li, Kyu J. Han, and Shrikanth Narayanan
    Computer speech and language, 27, no. 1 (2013): 151-167, 2013
  2. Automatic Classification of Palatal and Pharyngeal Wall Morphology Patterns from Speech Acoustics and Inverted Articulatory Signals
    , 2013
  3. "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds
    Daniel Bone, Theodora Chaspari, Kartik Audhkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and Shrikanth Narayanan
    In INTERSPEECH, 2013
  4. TRAP Language Identification System for RATS Phase II Evaluation
    Kyu Jeong Han, Sriram Ganapathy, Ming Li, Mohamed K. Omar, and Shrikanth Narayanan
    In INTERSPEECH, 2013
  5. Speaker verification using simplified and supervised i-vector modeling"
    Ming Li, Andreas Tsiartas, Maarten Van Segbroeck, and Shrikanth S. Narayanan
    In ICASSP, 2013
  6. Speaker verification based on fusion of acoustic and articulatory information
    Ming Li, Jangwon Kim, Prasanta Kumar Ghosh, Vikram Ramanarayanan, and Shrikanth Narayanan
    In INTERSPEECH, 2013
  7. Multi-band long-term signal variability features for robust voice activity detection
    In INTERSPEECH, 2013

2012

  1. Recognition of Physical Activities in Overweight Hispanic Youth using KNOWME Networks
    Adar Emken, Ming Li, Gautam Thatte, Sangwon Lee, Murali Annavaram, Urbashi Mitra, Shrikanth Narayanan, and Donna Spruijt-Metz
    Journal of Physical Activity and Health, 9, no. 3 (2012): 432-441, 2012
  2. KNOWME: a Case Study in Wireless Body Area Sensor Network Design
    Urbashi Mitra, Adar Emken, Sangwon Lee, Ming Li, Harshvardhan Vathsangam, Daphney-stavroula Zois, Murali Annavaram, and Shrikanth Narayanan
    IEEE Communications Magazine 50, no. 5 (2012): 116-125, 2012
  3. KNOWME: An energy-efficient multimodal body area network for physical activity monitoring
    Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Shri Narayanan, Urbashi Mitra, Donna Spruijt-Metz, and Murali Annavaram
    ACM Transactions in Embedded Computing Systems, 11, no. S2 (2012): 1-24, 2012
  4. Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network
    Kartik Audhkhasi, Angeliki Metallinou, Ming Li, and Shrikanth Narayanan
    In INTERSPEECH, 2012
  5. Intelligibility classification of pathological speech using fusion of multiple high level descriptors
    Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, and Shrikanth Narayanan
    In INTERSPEECH, 2012
  6. Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant Analysis
    Ming Li, Charley Lu, Anne Wang, and Shrikanth Narayanan
    In APSIPA ASC, 2012
  7. "Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling
    Ming Li, Angeliki Metallinou, Daniel Bone, and Shrikanth Narayanan
    In ICASSP, 2012

2011

  1. Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection
    Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Murali Annavaram, Shri Narayanan, Donna Spruijt-Metz, and Urbashi Mitra
    IEEE Transaction on Signal Processing, 59, no. 4 (2011): 1843-1857, 2011
  2. Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors
    Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, and Shrikanth Narayanan
    In INTERSPEECH, 2011
  3. Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals
    Samuel Kim, Ming Li, Sangwon Lee, Urbashi Mitra, Adar Emken, Donna Spruijt-Metz, Murali Annavaram, and Shrikanth Narayanan
    In EMBC, 2011
  4. Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors
    Ming Li, and Shrikanth Narayanan
    In ICASSP, 2011
  5. Speaker Verification using Sparse Representations on Total Variability I-Vectors
    Ming Li, Xiang Zhang, Yonghong Yan, and Shrikanth Narayanan
    In INTERSPEECH, 2011

2010

  1. Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral Information
    Ming Li, Viktor Rozgic, Gautam Thatte, Sangwon Lee, Adar Emken, Murali Annavaram, Urbashi Mitra, Donna Spruijt-Metz, and Shrikanth Narayanan
    IEEE Transactions on Neural Systems & Rehabilitation Engineering, 18, no. 4 (2010): 369-380, 2010
  2. Combining Five Acoustic Level methods for Automatic Speaker Age and Gender Recognition
    Ming Li, Chi-Sang Jung, and Kyu Jeong Han
    In INTERSPEECH, 2010
  3. Robust ECG biometrics by fusing temporal and cepstral information
    Ming Li, and Shrikanth Narayanan
    In ICPR, 2010

2009

  1. Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection
    Gautam Thatte, Viktor Rozgic, Ming Li, Sabyasachi Ghosh, Urbashi Mitra, Shri Narayanan, Murali Annavaram, and Donna Spruijt-Metz
    In DCOSS, 2009
  2. Energy-Efficient Multihypothesis Activity-Detection for Health-Monitoring Applications
    Gautam Thatte, Ming Li, Adar Emken, Urbashi Mitra, Shri Narayanan, Murali Annavaram, and Donna Spruijt-Metz
    In EMBC, 2009

2008

  1. Using SVM as back-end classifier for language identification
    Hongbin Suo, Ming Li, Ping Lu, and Yonghong Yan
    EURASIP Journal on Audio, Speech, and Music Processing, 2008
  2. Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping
    Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, and Yonghong Yan
    In INTERSPEECH, 2008
  3. Automatic language identification with discriminative language characterization based on SVM
    Hongbin Suo, Ming Li, Ping Lu, and Yonghong Yan
    In IEICE transaction on Information and Systems, 91, no. 3 (2008): 567-575, 2008

2007

  1. Authentication and quality monitoring based audio watermark for analog AM shortwave broadcasting
    Ming Li, Yun Lei, Xiang Zhang, Jian Liu, and Yonghong Yan
    In IIH-MSP, 2007
  2. Spoken Language Identification Using Score Vector Modeling and Support Vector Machine
    Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, and Yonghong Yan
    In INTERSPEECH, 2007

2006

  1. A Novel Audio Watermarking in Wavelet Domain
    Ming Li, Yun Lei, Jian Liu, and Yonghong Yan
    In IIH-MSP, 2006

2000

  1. RWF-2000: An Open Large Scale Video Database for Violence Detection
    Ming Cheng, Kunjing Cai, and Ming Li
    In ICPR, 2000