publications | SMIIP Lab

2026

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

Fei Su, Cancan Li, Juan Liu, and Ming Li

submitted to IEEE Transactions on Audio, Speech, and Language Processing, 2026
Multimodal Large Language Models for ADOS-M1 Behavioral Assessment

Wenxing Liu, Yueran Pan, Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li

submitted to Neurocomputing, 2026
VLM-Guided Semantic Augmentation and Uncertainty-Aware Tri-modal Fusion for Group Emotion Recognition

Hongxi Yi, Yufei Xie, Dong Zhang, Wenxing Liu, Ming Li, and Dah-Jye Lee

submitted to IEEE Transactions on Multimedia, 2026
Emotional Description-Guided Vision-Language Semantic Alignment for Group Emotion Recognition

Hongxi Yi, Dong Zhang, Yufei Xie, Wenxing Liu, Ming Li, and Dah-Jye Lee

submitted to IEEE Transactions on Affective Computing, 2026
Making Separation-First Multi-Stream Audio Watermarking Feasible via Joint Training

Houmin Sun, Zi Hu, Linxi Li, Yechen Wang, Weijin Li, and Ming Li

2026

submitted to SLT 2026
Audio-Visual Speech Enhancement in Complex Scenarios with Separation and Dereverberation Joint Modeling

Zhan Jin, Jiarong Du, Peijun Yang, Juan Liu, Zhuo Li, Xin Liu, and Ming Li

2026

submitted to NCMMSC 2026
Robust Audio-Visual Target Speaker Extraction with Multiple Enrollment Fusion

Zhan Jin, Bang Zeng, Peijun Yang, Jiarong Du, Wei Ju, Yao Tian, Juan Liu, and Ming Li

2026

submitted to NCMMSC 2026
A Dual-Path Efficient EEG Encoder for Brain-Assisted Target Speaker Extraction

Wang Xiang, Xue Zhang, Bang Zeng, Cunhang Fan, Juan Liu, and Ming Li

2026

submitted to NCMMSC 2026
DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, and Ming Li

2026

submitted to ACM Multimedia 2026
AISHELL8-FISHEYE: A Fisheye Audio-Visual Dataset for Target Speaker Extraction with Distortion-Aware Baselines

Peijun Yang, Zhan Jin, Juan Liu, Hui Bu, and Ming Li

2026

submitted to ACM Multimedia 2026
Toward Multimodal Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals

Zhang Chen, Yucong Zhang, Xiaoxiao Miao, and Ming Li

2026

Interspeech 2026

PDF
Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

Ze Li, Xiaoxiao Miao, Juan Liu, and Ming Li

2026

Interspeech 2026

PDF
SPATIALLY-AUGMENTED SEQUENCE-TO-SEQUENCE NEURAL DIARIZATION FOR MEETINGS

Li Li, Ming Cheng, Juan Liu, and Ming Li

2026

Interspeech 2026

PDF
WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Speech Conversion

Dong Liu, Juan Liu, Wei Ju, Yao Tian, and Ming Li

2026

Interspeech 2026

PDF
Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Fei Su, Cancan Li, Juan Liu, Wei Ju, Hongbin Suo, and Ming Li

2026

Interspeech 2026

PDF
Multi-View Based Audio Visual Target Speaker Extraction

Peijun Yang, Zhan Jin, Juan Liu, and Ming Li

2026

Interspeech 2026

PDF
MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection

Xueping Zhang, Zhenshan Zhang, Yechen Wang, Linxi Li, Liwei Jin, and Ming Li

2026

Interspeech 2026

PDF
Dual-Encoder Fusion with Explicit and Implicit Injection for the Interspeech 2026 Audio Encoder Capability Challenge

Yucong Zhang, Zhang Chen, Juan Liu, Wei Ju, Hongbin Suo, and Ming Li

2026

Interspeech 2026

PDF
Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement

Wenxing Liu, Yueran Pan, Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li

Neurocomputing, 2026

PDF
Glitter: Exploring an LLM Virtual Agent for Supporting Practitioners in Behavioral Interventions of Autistic Children

Xin Tong, Liwen He, Zhaowen Deng, Weibo Li, Ziheng Tang, Yixuan Li, Yutong Ren, Matthew Louis Mauriello, and Ming Li

International Journal of Human–Computer Interaction, 2026

PDF
Enhancing Speaker Verification with W2v-Bert 2.0 and Knowledge Distillation Guided Pruning

Ze Li, Ming Cheng, and Ming Li

In ICASSP, 2026

PDF
AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines

Cancan Li, Fei Su, Juan Liu, Hui Bu, Yulong Wan, Hongbin Suo, and Ming Li

In ICASSP, 2026

PDF
Compspoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasure

Xueping Zhang, Liwei Jin, Yechen Wang, Linxi Li, and Ming Li

In ICASSP, 2026

PDF
The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures

ZhenShan Zhang, Xueping Zhang, Yechen Wang, Liwei Jin, and Ming Li

In ICASSP, 2026

PDF
ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals

Yucong Zhang, Juan Liu, and Ming Li

In ICASSP, 2026

PDF

2025

Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning

Danwei Cai, Zexin Cai, Ze Li, and Ming Li

IEEE Transactions on Audio, Speech, and Language Processing, 2025

PDF
Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization

Ming Cheng, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025

PDF
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation

Ming Cheng, Yuke Lin, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025

PDF
Assessing the Expressive Language Levels of Autistic Children in Home Intervention

Yueran Pan, Biyuan Chen, Wenxing Liu, Ming Cheng, Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li

IEEE Transactions on Computational Social Systems, 2025

PDF
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection

Bang Zeng, and Ming Li

Computer Speech and Language, 2025

PDF
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction

Bang Zeng, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025

PDF
Multi-scale Scanning Network for Machine Anomalous Sound Detection

Yucong Zhang, Juan Liu, and Ming Li

ICONIP, 2025

PDF
An Automatic Laryngoscopic Image Segmentation System Based on SAM Prompt Engineering: From Glottis Annotation to Vocal Fold Segmentation

Yucong Zhang, Yuchen Song, Juan Liu, and Ming Li

Frontiers in Molecular Biosciences, 2025

PDF
Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis

Yucong Zhang, Xin Zou, Jinshan Yang, Wenjun Chen, Juan Liu, and Faya Liangand Ming Li

Computer Speech and Language, 2025

PDF
Improving Anomalous Sound Detection with Top-M Pseudo-Labeling

Zhang Chen, Yucong Zhang, and Ming Li

In NCMMSC, 2025

PDF
Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge

Ming Cheng, Fei Su, Cancan Li, Juan Liu, and Ming Li

In Interspeech, 2025

PDF
"Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features

Jiarong Du, Zhan Jin, Bang Zeng, Peijun Yang, Ming Li, and Juan Liu

In NCMMSC, 2025

PDF
Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System

Ze Li, Yao Shi, Yunfei Xu, and Ming Li

In ICME, 2025

PDF
Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive Scoring

Yuchen Song, Yucong Zhang, and Ming Li

In Interspeech, 2025

PDF
SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

Beilong Tang, Xiaoxiao Miao, Xin Wang, and Ming Li

In ASRU, 2025

PDF
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models

Beilong Tang, Bang Zeng, and Ming Li

In ASRU, 2025

PDF
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models

Beilong Tang, Bang Zeng, and Ming Li

In NCMMSC, 2025

PDF
Enhancing the Robustness of Speech Anti-spoofing Countermeasures through Joint Optimization and Transfer Learning

Yikang WANG, Xingming WANG, Chee Siang LEOW, Qishan ZHANG, Ming LI, and Hiromitsu NISHIZAKI

In IEICE TRANSACTIONS on Information and System, 2025

PDF
VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset

Yuxi Wang, Yikang Wang, Qishan Zhang, Hiromitsu Nishizaki, and Ming Li

In Interspeech, 2025

PDF
SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis

Zhuojun Wu, Dong Liu, Juan Liu, Yechen Wang, Linxi Li, Liwei Jin, Hui Bu, Pengyuan zhang, and Ming Li

In ACM Multimedia, 2025

PDF
Efficient Video to Audio Mapper with Visual Scene Detection

Mingjing Yi, Yuxi Wang, and Ming Li

In APSIPA ASC, 2025

PDF
Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings

Hongyu Zhang, Ming Cheng, Jing Feng, and Ming Li

In Interspeech, 2025

PDF

2024

StarRescue: the Design and Evaluation of A Turn-Taking Collaborative Game for Facilitating Social and Fine Motor Skills of Children with Autism Spectrum Disorder

Rongqi Bei, Yajie Liu, Yihe Wang, Yuxuan Huang, Ming Li, Yuhang Zhao, and Xin Tong

CHI, 2024

PDF
Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation

Danwei Cai, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

PDF
Location-guided Head Pose Estimation for Fisheye Image

Bing Li, Dong Zhang, Cheng Huang, Yun Xian, Ming Li, and Dah-Jye Lee

IEEE Transactions on Cognitive and Developmental Systems, 2024

PDF
Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction

Dong Liu, Yueqian Lin, Hui Bu, and Ming Li

IALP, 2024

PDF
Speaker verification in deliberately disguised scenarios

Xiaoyi Qin, Ze Li, Dong Liu, and Ming Li

Computer Engineering and Applications, 2024
Speaker verification in deliberately disguised scenarios

Xiaoyi Qin, Ze Li, Dong Liu, and Ming Li

Computer Engineering and Applications, 2024

PDF
Investigating Long-Term and Short-Term Time-Varying Speaker Verification

Xiaoyi Qin, Na Li, Shufei Duan, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

PDF
Online Neural Speaker Diarization with Target Speaker Tracking

Weiqing Wang, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024

PDF
HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children

Chengyan Yu, Shihuan Wang, Dong Zhang, Yingying Zhang, Chaoqun Cen, Zhixiang You, Xiaobing Zou, Hongzhu Deng, and Ming Li

IEEE Transactions on Learning Technologies, 2024

PDF
Joint Training on Multiple Datasets with Inconsistent Labeling Criteria for Facial Expression Recognition

Chengyan Yu( ^*), Dong Zhang, Wei Zou, and Ming Li

IEEE Transactions on Affective Computing, 2024

PDF
Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios

Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li

J. Shanghai Jiaotong Univ. (Sci.) (2024), 2024

PDF
Invertible Voice Conversion with Parallel Data

Zexin Cai, and Ming Li

In ICASSP, 2024

PDF
Slidespeech: A Large Scale Slide-Enriched Audio-Visual Corpus

HaoxuWang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, and Ming Li

In ICASSP, 2024

PDF
The Database and Benchmark for the Source Speaker Tracing Challenge 2024

Ze Li, Yuke Lin, Yao Tian, Hongbin Suo, Pengyuan zhang, Yanzhen Ren, Zexin Cai, Hiromitsu Nishizaki, and Ming Li

In SLT, 2024

PDF
Multi-Objective Progressive Clustering for Semi-Supervised Domain Adaptation in Speaker Verification

Ze Li, Yuke Lin, Ning Jiang, Xiaoyi Qin, Guoqing Zhao, Haiying Wu, and Ming Li

In ICASSP, 2024

PDF
Vivid Background Audio Generation based on Large Language Models and AudioLDM

Yiwei Liang, and Ming Li

In ISCSLP, 2024

PDF
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang, and Ming Li

In Interspeech, 2024

PDF
Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation

Yueqian Lin, Dong Liu, Yunfei Xu, Hongbin Suo, and Ming Li

In ISCSLP, 2024

PDF
Voxblink: A Large Scale Speaker Verification Dataset on Camera

Yuke Lin, Xiaoyi Qin, Guoqing Zhao, Ming Cheng, Ning Jiang, Haiying Wu, and Ming Li1

In ICASSP, 2024

PDF
TMCSpeech: A Chinese Tv and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model

Dong Liu, Yueqian Lin, Yunfei Xu, and Ming Li

In ICPR, 2024

PDF
The WHU Wake Word Lipreading System for the 2024 Chat-scenario Chinese Lipreading Challenge

Haoxu Wang, Cancan Li, Fei Su, Juan Liu, Hongbin Suo, and Ming Li

In ICME challenge paper, 2024

PDF
Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing

Weiqing Wang, Danwei Cai, Ming Cheng, and Ming Li

In ICASSP, 2024

PDF
Robust Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual Conformer

Haoxu Wang, Ming Cheng, Qiang Fu, and Ming Li

In Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual Conformer”, ICASSP, 2024

PDF
Lightweight Language Model for Speech Synthesis: Attempts and Analysis

Zhuojun Wu, Dong Liu, and Ming Li

In ISCSLP, 2024

PDF
Efficient Personal Voice Activity Detection with Wake Word Reference Speech"

Bang Zeng, Ming Cheng, Yao Tian, Haifeng Liu, and Ming Li

In ICASSP, 2024

PDF
A Dual-Path Framework with Frequency-and-Time Excited Network for Machine Anomalous Sound Detection

Yucong Zhang, Juan Liu, Yao Tian, Haifeng Liu, and Ming Li

In ICASSP, 2024

PDF
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Huali Zhou, Yuke Lin, Dong Liu, and Ming Li

In ICPR, 2024

PDF

2023

Integrating Frame-Level Boundary Detection and Deepfake Detection for Locating Manipulated Regions in Partially Spoofed Audio Forgery Attacks

Zexin Cai, and Ming Li

Computer Speech and Language, 2023

PDF
Computer-aided Autism Spectrum Disorder Diagnosis with Behavior Signal Processing

Ming Cheng, Yingying Zhang, Yixiang Xie, Yueran Pan, Xiao Li, Wenxing Liu, Chengyan Yu, Dong Zhang, Yu Xing, Xiaoqian Huang, Fang Wang, Cong You, Yuanyuan Zou, Yuchong Liu, Fengjing Liang, Huilin Zhu, Chun Tang, Hongzhu Deng, Xiaobing Zou, and Ming Li

IEEE Transactions on Affective Computing, 2023

PDF
Expressive language profiles in a clinically screening sample of Mandarin-speaking preschool children with Autism Spectrum Disorder

Li Li, Yi (Esther) Su, Wenwen Hou, Muyu Zhou, Yixiang Xie, Xiaobing Zou, and Ming Li

Journal of Speech, Language, and Hearing Research, 2023

PDF
Assessing the Social Skills of Children with Autism Spectrum Disorder via Language-Image Pre-training Models

Wenxing Liu, Ming Cheng, Yueran Pan, Lynn Yuan, Suxiu Hu, Ming Li, and Songtian Zeng

The 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023

PDF
From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3

Xiaoyi Qin, Xingming Wang, Yanli Chen, Qinglin Meng, and Ming Li

IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA 2023), 2023

PDF
A multimodal machine learning system in early screening for toddlers with autism spectrum disorders based on the response to name

Feng-lei Zhu, Shi-huan Wang, Wen-bo Liu, Hui-lin Zhu, Ming Li, and Xiao-bing Zou

Frontiers in Psychiatry, 2023

PDF
Pretraining Conformer with ASR for Speaker Verification

Danwei Cai, Weiqing Wang, Ming Li, Rui Xia, and Chuanzeng Huang

In ICASSP, 2023

PDF
Identifying Source Speakers For Voice Conversion Based Spoofing Attacks For Speaker Verification

Danwei Cai, Zexin Cai, and Ming Li

In ICASSP, 2023

PDF
Waveform Boundary Detection For Partially Spoofed Audio

Zexin Cai, Weiqing Wang, and Ming Li

In ICASSP, 2023

PDF
"Target-Speaker Voice Activity Detection Via Sequence-To-Sequence Prediction

Ming Cheng, Weiqing Wang, Yucong Zhang, Xiaoyi Qin, and Ming Li

In ICASSP, 2023

PDF
The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023

Ming Cheng, Weiqing Wang, Xiaoyi Qin, Yuke Lin, Ning Jiang, Guoqing Zhao, and Ming Li

In NCMMSC, 2023

PDF
Real-time Automotive Engine Sound Simulation with Deep Neural Network

Hao Li, Weiqing Wang, and Ming Li

In NCMMSC, 2023

PDF
EEG-Based Speech Envelope Decoding: Structured State Space and Diffusion Model Integration

Yueqian Lin, and Ming Li

In NCMMSC, 2023

PDF
Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification

Yuke Lin, Xiaoyi Qin, Ning Jiang, Guoqing Zhao, and Ming Li

In ASRU, 2023

PDF
Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features

Xingming Wang, Hao Wu, Chen Ding, Chuanzeng Huang, and Ming Li

In ICASSP, 2023

PDF
"Robust audio anti-spoofing countermeasure with joint training of front-end and back-end models

Xingming Wang, Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li

In Interspeech, 2023

PDF
Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios

Bang Zeng, Hongbing Suo, Yulong Wan, and Ming Li

In NCMMSC, 2023

PDF
SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network

Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li

In Interspeech, 2023

PDF
Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues

Bang Zeng, Hongbin Suo, Yulong Wan, and Ming Li

In APSIPA ASC, 2023

PDF
Pre-training Deep Learning Models with Finite Element Simulation Data for Enhanced Machine Anomalous Sound Detection

Zhixian Zhang, Yucong Zhang, and Ming Li

In NCMMSC, 2023

PDF
Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning

Yucong Zhang, Hongbin Suo, Yulong Wan, and Ming Li

In Interspeech, 2023

PDF
BiSinger: Bilingual Singing Voice Synthesis

Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, and Ming Li

In ASRU, 2023

PDF

2022

Cross-lingual Multispeaker Speech Synthesis under Limited-Data Scenarios

Zexin Cai, Yaogen Yang, and Ming Li

Computer Speech and Language, 2022

PDF
Incorporating visual information in audio based self-supervised speaker recognition

Danwei Cai, Weiqing Cai, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

PDF
Accurate Head Pose Estimation Using Image Rectification and Lightweight Convolutional Neural Network

Xiao Li, Dong Zhang, Ming Li, and Dah-Jye Lee

IEEE Transactions on Multimedia, 2022

PDF
Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios

Xiaoyi Qin, Danwei Cai, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

PDF
Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization

Weiqing Wang, Qingjian Lin, Danwei Cai, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

PDF
Paralinguistic singing attribute recognition using supervised machine learning for describing the singing voice in vocal pedagogy

Yanze Xu, Ming Li, Huahua Cui, and Mingyang Xu

EURASIP Journal on Audio, Speech, and Music Processing, 2022

PDF
Electrolaryngeal Speech Enhancement based on Bottleneck Feature Refinement and Voice Conversion

Yaogen Yang, Haozhe Zhang, Zexin Cai, Yao Shi, Ming Li, Dong Zhang, Xiaojun Ding, Jianhua Deng, and Jie Wang

Biomedical Signal Processing and Control, 2022

PDF
A Complementary Dual-branch Network for Appearance-based Gaze Estimation from Low-resolution Facial Image

Zhesi Zhu, Dong Zhang, Cailong Chi, Ming Li, and Dah-Jye Lee

IEEE Transactions on Cognitive and Developmental Systems, 2022

PDF
THE WHU-ALIBABA AUDIO-VISUAL SPEAKER DIARIZATION SYSTEM FOR THE MISP CHALLENGE 2022

Ming Cheng, Haoxu Wang, Ziteng Wang, Qiang Fu, and Ming Li

In ICASSP 2023, 2022

PDF
Single-Channel Target Speaker Separation using Joint Training with Target Speaker’s Pitch Information

Jincheng He, Yuanyuan Bao, Na Xu, Hongfeng Li, Shicong Li, Linzhang Wang, Fei Xiang, and Ming Li

In Odyssey, 2022

PDF
Towards Lightweight applications: Asymmetric Enroll-Verify Structure For Speaker Verification

Qingjian Lin, Lin Yang, Xuyang Wang, Xiaoyi Qin, Junjie Wang, and Ming Li

In ICASSP, 2022

PDF
A Multimodal Framework for Automated Teaching Quality Assessment of One-to-many Online Instruction Videos

Yueran Pan, Jiaxin Wu, Ran Ju, Ziang Zhou, Jiayue Gu, Songtian Zeng, Lynn Yuang, and Ming Li

In ICPR, 2022

PDF
"Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection

Xiaoyi Qin, Na Li, Chao Weng, Dan Su, and Ming Li

In ICASSP, 2022

PDF
"Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Xiaoyi Qin, Na Li, Chao Weng, Dan Su, and Ming Li

In Interspeech, 2022

PDF
"VC-AUG : Voice Conversion based Data Augmentation for Text-Dependent Speaker Verification

Xiaoyi Qin, Yaogen Yang, Yao Shi, Lin Yang, Xuyang Wang, Junjie Wang, and Ming Li

In NCMMSC, 2022

PDF
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MET Challenge

Weiqing Wang, Xiaoyi Qin, and Ming Li

In of ICASSP, 2022

PDF
Generating Adversarial Samples For Training Wake-Up Word Detection Systems Against Confusing Words

Haoxu Wang, Yan Jia, Zeqing Zhao, Xuyang Wang, Junjie Wang, and Ming Li

In Odyssey, 2022

PDF
The DKU-OPPO System for the Spoofing-Aware Speaker Verification challenge 2022

Xingming Wang, Xiaoyi Qin, Yikang Wang, Yunfei Xu, and Ming Li

In Interspeech, 2022

PDF
Low Pass Filtering and Band Extension for Robust Anti-spoofing Countermeasure against Codec Variabilities

Yikang Wang, Xingming Wang，Hiromitsu Nishizaki, and Ming Li

In ISCSLP, 2022

PDF
Online Target Speaker Voice Activity Detection for Speaker Diarization

Weiqing Wang, Ming Li, and Qingjian Lin

In Interspeech, 2022

PDF
Incorporating End-To-End Framework Into Target-Speaker Voice Activity Detection

Weiqing Wang, and Ming Li

In Prof. of ICASSP, 2022

PDF
Low-Latency Online Speaker Diarization with Graph-Based Label Generation

Yucong Zhang, Qinjian Lin, Weiqing Wang, Lin Yang, Xuyang Wang, Junjie Wang, and Ming Li

In Odyssey, 2022

PDF
SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System For Both Human Beings And Machines

Haozhe Zhang, Zexin Cai, Xiaoyi Qin, and Ming Li

In ICASSP, 2022

PDF
Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion

Ziang Zhou, Yanze Xu, and Ming Li

In NCMMSC, 2022

PDF
Source Tracing: Detecting Voice Spoofing

Tinglong Zhu, Xingming Wang, Xiaoyi Qin, and Ming Li

In APSIPA ASC, 2022

PDF

2021

Discriminative Dictionary Learning for Autism Spectrum Disorder Identification

Wenbo Liu, Ming Li, Xiaobing Zou, and Bhiksha Raj

Frontiers in Computational Neuroscience, 2021

PDF
Typical Facial Expression Network Using Facial Feature Decoupler and Spatial-Temporal Learning

Jianing Teng, Dong Zhang, Ming Li, and Dah-Jye Lee

IEEE Transactions on Affective Computing, 2021

PDF
Audio-based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism

Weiqing Wang, Jin Pan, Hua Yi, Zhanmei Song, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29 (2021): 1119-1133, 2021

PDF
Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Yuanyuan Bao, Yanze Xu, Na Xu, Wenjing Yang, Hongfeng Li, Shicong Li, Yongtao Jia, Fei Xiang, Jincheng He, and Ming Li

In NCMMSC, 2021

PDF
Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays

Danwei Cai, and Ming Li

In SLT, 2021

PDF
A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data

Weicheng Cai, and Ming Li

In APSIPA ASC, 2021

PDF
The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge

Danwei Cai, and Ming Li

In VoxSRC, 2021

PDF
An Iterative Framework For Self-Supervised Deep Speaker Representation Learning

Danwei Cai, Weiqing Wang, and Ming Li

In ICASSP, 2021

PDF
"Cross-modal Assisted Training for Abnormal Event Recognition in Elevators

Xinmeng Chen, Xuchen Gong, Ming Cheng, Qi Deng, and Ming Li

In ICMI, 2021

PDF
The DKU Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge

Ming Cheng, Haoxu Wang, Yechen Wang, and Ming Li

In ICASSP 2023, 2021

PDF
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 Misp Challenge: Deep Analysis

HaoxuWang, Ming Cheng, Qiang Fu, and Ming Li

In ICASSP 2023, 2021

PDF
Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

Haiwei Wuand Ming Li

In NCMMSC, 2021

PDF
Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation

Tingle Li, Jiawei Chen, Haowen Hou, and Ming Li

In ISCSLP, 2021

PDF
Acoustic Word Embedding on Code-switching Query by Example Spoken Term Detection

Murong Ma, Haiwei Wu, Xuyang Wang, Lin Yang, Junjie Wang, and Ming Li

In ISCSLP, 2021

PDF
AISHELL-3: A Multi-Speaker Mandarin TTS Corpus

Yao Shi, Hui Bu, Xin Xu, Shaoji Zhang, and Ming Li

In INTERSPEECH, 2021

PDF
End-to-End Mandarin Tone Classification with Short Term Context Information

Jiyang Tang, and Ming Li

In APSIPA ASC, 2021

PDF
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge

Weiqing Wang, Danwei Cai, Qingjian Lin, Lin Yang, Junjie Wang, Jin Wang, and Ming Li.

In VoxSRC, 2021

PDF
The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III

Weiqing Wang, Danwei Cai, Jin Wang, Mi Hong, Xuyang Wang, Qingjian Lin, and Ming Li

In INTERSPEECH, 2021

PDF
A Two-Stage Query-by-example Spoken Term Detection System for Personalized Keyword Spotting

Yechen Wang, Yan Jia, Murong Ma, Zexin Cai, and Ming Li

In NCMMSC, 2021

PDF
Binary Neural Network for Speaker Verification

Tinglong Zhu, Xiaoyi Qin, and Ming Li

In INTERSPEECH, 2021

PDF

2020

On the fly Data Loader and Utterance-level Aggregation for Speaker and Language Recognition

Weicheng Cai, Jinkun Chen, Jun Zhang, and Ming Li

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28 (2020): 1038-1051, 2020

PDF
STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition

Weicong Chen, Dong Zhang, Ming Li, and Dah-Jye Lee

IEEE Transactions on Affective Computing, 2020

PDF
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer by Feedback Constraint

Zexin Cai, Chuxiong Zhang, and Ming Li

In INTERSPEECH, 2020

PDF
Within-sample variability-invariant loss for robust speaker recognition under noisy environments

Danwei Cai, Weicheng Cai, and Ming Li

In ICASSP, 2020

PDF
The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results

Yan Jia, Xingming Wang, Xiaoyi Qin, Yinping Zhang, Xuyang Wang, Junjie Wang, Dong Zhang, and Ming Li

In INTERSPEECH, 2020

PDF
"Atss-Net: Target Speaker Separation via Attention-based Neural Network

Tingle Li, Qingjian Lin, Yuanyuan Bao, and Ming Li

In INTERSPEECH, 2020

PDF
DIHARD II is Still Hard: Experimental Results and Discussions

Qingjian Lin, Weicheng Cai, Lin Yang, Junjie Wang, Jun Zhang, and Ming Li

In Odyssey, 2020

PDF
Self-Attentive Similarity Measurement Strategies in Speaker Diarization

Qingjian Lin, Yu Hou, and Ming Li

In INTERSPEECH, 2020

PDF
Optimal Mapping Loss: A Faster Loss for End-to-End Speaker Diarization

Qingjian Lin, Tingle Li, Lin Yang, Junjie Wang, and Ming Li

In Odyssey, 2020

PDF
The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

Qingjian Lin, Tingle Li, and Ming Li

In INTERSPEECH, 2020

PDF
Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening"

Yueran Pan, Kunjing Cai, Ming Cheng, Xiaobing Zou, and Ming Li

In ICPR, 2020

PDF
HI-MIA: a far-field text-dependent speaker verification database and the baselines"

Xiaoyi Qin, Hui Bu, and Ming Li

In ICASSP, 2020

PDF
The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, and Haizhou Li

In INTERSPEECH, 2020

PDF
Domain Aware Training for Far-field Small-footprint Keyword Spotting

Haiwei Wu, Yan Jia, Yuanfei Nie, and Ming Li

In INTERSPEECH, 2020

PDF

2019

String Stability Analysis for Vehicle Platooning under Unreliable Communication Links with Event-Triggered Strategy

Zhicheng Li, Bin Hu, Ming Li, and Gengnan Luo

IEEE Transactions on Vehicular Technology, 68, no. 3 (2019): 2152-2164, 2019

PDF
An Automated Assessment Framework for Atypical Prosody and Stereotyped Idiosyncratic Phrases related to Autism Spectrum Disorder

Ming Li, Dengke Tang, Junlin Zeng, Tianyan Zhou, and Xiaobing Zou

Computer Speech and Language, 56 (2019): 80-94, 2019

PDF
Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment

Danwei Cai, Xiaoyi Qin, and Ming Li

In INTERSPEECH, 2019

PDF
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge

Danwei Cai, Xiaoyi Qin, Weicheng Cai, and Ming Li

In INTERSPEECH, 2019

PDF
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Feature

Zexin Cai, Yaogen Yang, Chuxiong Zhang, Xiaoyi Qin, and Ming Li

In INTERSPEECH, 2019

PDF
F0 contour estimation using phonetic feature in electrolaryngeal speech enhancement

Zexin Cai, Zhicheng Xu, and Ming Li

In ICASSP, 2019

PDF
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

Weicheng Cai, Haiwei Wu, Danwei Cai, and Ming Li

In INTERSPEECH, 2019

PDF
Utterance-level End-to-end Language Identification using Attention-based CNN-BLSTM

Weicheng Cai, Shen Huang, and Ming Li

In ICASSP, 2019

PDF
LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization

Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, and Claude Barras

In INTERSPEECH, 2019

PDF
Far-Field End-to-End Text-Dependent Speaker Veriﬁcation based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation

Xiaoyi Qin, Danwei Cai, and Ming Li

In INTERSPEECH, 2019

PDF
Fixation Based Object Recognition in Autism Clinic Setting

Sheng Sun, Shuangmei Li, Wenbo Liu, Xiaobing Zou, and Ming Li

In ICIRA, 2019

PDF
Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection

Weiqing Wang, Haiwei Wu, and Ming Li

In APSIPA ASC, 2019

PDF
The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge

Haiwei Wu, Weiqing Wang, and Ming Li

In INTERSPEECH, 2019

PDF
DKU-Tencent Submission to Oriental Language Recognition AP18-OLR Challenge

Haiwei Wu, Weicheng Cai, Ming Li, Ji Gao, Shanshan Zhang, Zhiqiang Lv, and Shen Huang

In APSIPA ASC, 2019

PDF

2018

Cancellable Speech Template via Random Binary Orthogonal Matrices Projection Hashing

Kong-Yik Chee, Zhe Jin, Danwei Cai, Ming Li, Wun-She Yap, Yen-Lung Lai, and Bok-Min Goi

” Pattern Recognition, 2018

PDF
Facial Expression Recognition with Identity and Emotion Joint Learning

Ming Li, Hao Xu, Xingchang Huang, Zhanmei Song, Xiaolin Liu, and Xin Li

IEEE Transaction on Affective Computing, accepted in 2018, published at 12, no. 2 (2021): 544-550, 2018

PDF
Finite-time Stability and Stabilization of Semi-Markovian Jump Systems with Time Delay

Zhicheng Li, Yinliang Xu, and Ming Li

International Journal of Robust and Nonlinear Control, 28, no. 6 (2018): 2064-2081, 2018

PDF
A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification

Weicheng Cai, Wenbo Liu, Zexin Cai, and Ming Li

In ICASSP, 2018

PDF
The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation

Danwei Cai, Weicheng Cai, and Ming Li

In INTERSPEECH, 2018

PDF
The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory Inversion

Zexin Cai, Xiaoyi Qin, Danwei Cai, Ming Li, and Xinzhong Liu

In ISCSLP, 2018

PDF
Deep Speaker Embedding with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition

Danwei Cai, Cai Zexin, and Ming Li

In APSIPA ASC, 2018

PDF
Analysis of Length Normalization in End-to-End Speaker Verification System

Weicheng Cai, Jinkun Chen, and Ming Li

In INTERSPEECH, 2018

PDF
Insights into End-to-End Learning Scheme for Language Identification

Weicheng Cai, Zexin Cai, Xiang Zhang, and Ming Li

In ICASSP, 2018

PDF
Exploring the Encoding Layer and Loss function in End-to-End Speaker and Language Recognition System

Weicheng Cai, Jinkun Chen, and Ming Li

In Odyssey, 2018

PDF
End-to-end Language Identification using NetFV and NetVLAD

Jinkun Chen, Weicheng Cai, and Ming Li

In ISCSLP, 2018

PDF
"An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individual

Ming Li Dengke Tang (^*)

In INTERSPEECH, 2018

PDF
Unsupervised Query by Example Spoken Term Detection Using Features Concatenated with Self-Organizing Map Distances

Haiwei Wu, and Ming Li，

In ISCSLP, 2018

PDF

2017

Reconstruction of Lamb wave dispersion curves by sparse representation and continuity constraints

Wenbo Zhao, Ming Li, Joel B. Harley, Yuanwei Jin, Jose Moura, and Jimmy Zhu

Journal of the Acoustical Society of America, 141, no. 2 (2017): 749-763, 2017

PDF
Countermeasures for Automatic Speaker Verification Replay Spoofing Attack: On Data Augmentation, Feature Representation, Classification and Fusion

Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, and Ming Li

In INTERSPEECH, 2017

PDF
End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum

Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, and Ming Li

In INTERSPEECH, 2017

PDF
Automatic Emotional Spoken Language Text Corpus Construction from Written Dialogs in Fictions

Jinkun Chen, and Ming Li

In ACII, 2017

PDF
Mandarin Electrolaryngeal Voice Conversion with Combination of Gaussian Mixture Model and Non-negative Matrix Factorization

Ming Li, Luting Wang, Zhicheng Xu, and Danwei Cai

In APSIPA ASC, 2017

PDF
Response to Name: A Dataset and A Multimodal Machine Learning Framework towards Autism Study

Wenbo Liu, Xiaobin Zou, and Ming Li

In ACII, 2017

PDF
SphereFace: Deep Hypersphere Embedding for Face Recognition

Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song

In CVPR, 2017

PDF
An audio based piano performance evaluation method using deep neural network based acoustic modeling

Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, and Manman Zhu

In INTERSPEECH, 2017

PDF
An Automated Assessment Framework for Speech Abnormalities related to Autism Spectrum Disorder

Tianyan Zhou, Yixiang Xie, Xiaobing Zou, and Ming Li

In INTERSPEECH, 2017

PDF

2016

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Ming Li, Jangwon Kim, Adam Lammert, Prasanta Kumar Ghosh, Vikram Ramanarayanan, and Shrikanth Narayanan

Computer Speech & Language, 36 (2016): 196-211, 2016

PDF
Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

Ming Li, and Wenbo Liu

Journal of Signal Processing Systems, 82, no. 2 (2016): 207-215, 2016

PDF
Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning framework

Wenbo Liu, Ming Li, and Li Yi

Autism research, 9, no. 8 (2016): 888-898, 2016

PDF
Locality Sensitive Discriminant Analysis for Speaker Recognition

Danwei Cai, Weicheng Cai, and Ming Li

In APSIPA ASC, 2016

PDF
A Fast Tracking Algorithm for Estimating Ultrasonic Signal Time of Flight in Drilled Shafts Using Active Shape Models

Zhun Chen, Wenbo Zhao, Yuanwei Jin, Ming Li, and Jimmy Zhu

In IUS, 2016

PDF
Entity Disambiguation by Knowledge and Text Jointly Embedding

Wei Fang, Jianwen Zhang, Dilin Wang, Zheng Chen, and Ming Li.

In CoNLL, 2016

PDF
The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge

Gaoyuan He, Jinkun Chen, Xuebo Liu, and Ming Li

In CCPR, 2016

PDF
Efficient Misalignment-Robust Face Recognition Via Locality-Constrained Representation

Yandong Wen, Weiyang Liu, Meng Yang, and Ming Li

In ICIP, 2016

PDF
On Order-Constrained Transitive Distance Clustering

Zhiding Yu, Weiyang Liu, Wenbo Liu, Yingzhen Yang, Ming Li, and Vijayakumar Bhagavatula

In AAAI, 2016

PDF
Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features

Huadi Zheng, Weicheng Cai, Tianyan Zhou, Shilei Zhang, and Ming Li

In ICPR, 2016

PDF
Speaker Diarization System for Autism Children’s Real-Life Audio Data

Tianyan Zhou, Weicheng Cai, Xiaoyan Chen, Xiaobing Zou, Shilei Zhang, and Ming Li

In ISCSLP, 2016

PDF

2015

Automatic intelligibility classification of sentence-level pathological speech

Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, and Shrikanth Narayanan

Computer Speech & Language, 29, no. 1 (2015): 132-144, 2015

PDF
Innovations in the Use of Interactive Technology to Support Weight Management

Donna Spruijt-Metz, Cheng K.F. Wen, Gillian O’Reilly, Ming Li, Sangwon Lee, Adar Emken, Urbashi Mitra, Murali Annavaram, Gisele Ragusa, and Shrikanth Narayanan.

Current Obesity Reports, 4, no. 4 (2015): 510-519, 2015

PDF
Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart Grid

Yinliang Xu, Zaiyue Yang, Wei Gu, Ming Li, and Zicong Deng

IEEE Transactions On Smart Grid, 8, no. 4 (2015): 1568-1579, 2015

PDF
Duration Dependent Covariance Regularization in PLDA Modeling for Speaker Verification

Weicheng Cai, Ming Li, Lin Li, and Qingyang Hong

In INTERSPEECH, 2015

PDF
Automatic assessment of non-native accent degrees using phonetic level posterior and duration features from multiple languages

Shushan Chen, Yiming Zhou, and Ming Li

In APSIPA ASC, 2015

PDF
Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System

Qingyang Hong, Lin Li, Ming Li, Ling Huang, and Jun Zhang

In INTERSPEECH, 2015

PDF
speaker verification with the mixture of Gaussian factor analysis based representation

Ming Li

In ICASSP, 2015

PDF
Locality Constrained Transitive Distance Clustering on Speech Data

Wenbo Liu, Zhiding Yu, Bhiksha Raj, and Ming Li

In INTERSPEECH, 2015

PDF
Efficient Autism Spectrum Disorder Diagnosis with Eye Movement: A Machine Learning Framework

Wenbo Liu, Zhiding Yu, Li Yi, Bhiksha Raj, and Ming Li

In ACII, 2015

PDF
Speech bandwidth expansion based on deep neural networks

Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, and Jingming Kuang

In Interspeech, 2015

PDF
The SYSU system for the INTERSPEECH 2015 automatic speaker verification spoofing and countermeasures challenge

Shitao Weng, Shushan Chen, Lei Yu, Xuewei Wu, Weicheng Cai, Zhi Liu, Yiming Zhou, and Ming Li

In APSIPA ASC, 2015

PDF

2014

Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors

Daniel Bone, Ming Li, Matthew Black, and Shrikanth Narayanan

Computer Speech & Language, 28, no. 2 (2014): 375-391, 2014

PDF
"Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification

Ming Li, and Shrikanth Narayanan

Computer Speech & Language, 28, no. 4 (2014): 940-958, 2014

PDF
Verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusion

Ming Li, and Xin Li

In ICASSP, 2014

PDF
Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens

Ming Li

In INTERSPEECH, 2014

PDF
Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features

Ming Li, and Wenbo Liu

In INTERSPEECH, 2014

PDF
An Iterative Framework for Unsupervised Learning in the PLDA basedSpeaker Verification

Wenbo Liu, Zhiding Yu, and Ming Li

In ISCSLP, 2014

PDF
"Simplified and supervised i-vector modeling for speaker age regression

Prashanth Gurunath Shivakumar, Ming Li, Vedant Dhandhania, and Shrikanth S.Narayanan

In ICASSP, 2014

PDF

2013

Automatic Speaker Age and Gender Recognition using acoustic and prosodic level information fusion

Ming Li, Kyu J. Han, and Shrikanth Narayanan

Computer speech and language, 27, no. 1 (2013): 151-167, 2013

PDF
Automatic Classification of Palatal and Pharyngeal Wall Morphology Patterns from Speech Acoustics and Inverted Articulatory Signals

”, 2013

PDF
"Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds

Daniel Bone, Theodora Chaspari, Kartik Audhkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and Shrikanth Narayanan

In INTERSPEECH, 2013

PDF
TRAP Language Identification System for RATS Phase II Evaluation

Kyu Jeong Han, Sriram Ganapathy, Ming Li, Mohamed K. Omar, and Shrikanth Narayanan

In INTERSPEECH, 2013

PDF
Speaker verification using simplified and supervised i-vector modeling"

Ming Li, Andreas Tsiartas, Maarten Van Segbroeck, and Shrikanth S. Narayanan

In ICASSP, 2013

PDF
Speaker verification based on fusion of acoustic and articulatory information

Ming Li, Jangwon Kim, Prasanta Kumar Ghosh, Vikram Ramanarayanan, and Shrikanth Narayanan

In INTERSPEECH, 2013

PDF
Multi-band long-term signal variability features for robust voice activity detection

In INTERSPEECH, 2013

PDF

2012

Recognition of Physical Activities in Overweight Hispanic Youth using KNOWME Networks

Adar Emken, Ming Li, Gautam Thatte, Sangwon Lee, Murali Annavaram, Urbashi Mitra, Shrikanth Narayanan, and Donna Spruijt-Metz

Journal of Physical Activity and Health, 9, no. 3 (2012): 432-441, 2012

PDF
KNOWME: a Case Study in Wireless Body Area Sensor Network Design

Urbashi Mitra, Adar Emken, Sangwon Lee, Ming Li, Harshvardhan Vathsangam, Daphney-stavroula Zois, Murali Annavaram, and Shrikanth Narayanan

IEEE Communications Magazine 50, no. 5 (2012): 116-125, 2012

PDF
KNOWME: An energy-efficient multimodal body area network for physical activity monitoring

Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Shri Narayanan, Urbashi Mitra, Donna Spruijt-Metz, and Murali Annavaram

ACM Transactions in Embedded Computing Systems, 11, no. S2 (2012): 1-24, 2012

PDF
Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network

Kartik Audhkhasi, Angeliki Metallinou, Ming Li, and Shrikanth Narayanan

In INTERSPEECH, 2012

PDF
Intelligibility classification of pathological speech using fusion of multiple high level descriptors

Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, and Shrikanth Narayanan

In INTERSPEECH, 2012

PDF
Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant Analysis

Ming Li, Charley Lu, Anne Wang, and Shrikanth Narayanan

In APSIPA ASC, 2012

PDF
"Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling

Ming Li, Angeliki Metallinou, Daniel Bone, and Shrikanth Narayanan

In ICASSP, 2012

PDF

2011

Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection

Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Murali Annavaram, Shri Narayanan, Donna Spruijt-Metz, and Urbashi Mitra

IEEE Transaction on Signal Processing, 59, no. 4 (2011): 1843-1857, 2011

PDF
Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors

Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, and Shrikanth Narayanan

In INTERSPEECH, 2011

PDF
Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals

Samuel Kim, Ming Li, Sangwon Lee, Urbashi Mitra, Adar Emken, Donna Spruijt-Metz, Murali Annavaram, and Shrikanth Narayanan

In EMBC, 2011

PDF
Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors

Ming Li, and Shrikanth Narayanan

In ICASSP, 2011

PDF
Speaker Verification using Sparse Representations on Total Variability I-Vectors

Ming Li, Xiang Zhang, Yonghong Yan, and Shrikanth Narayanan

In INTERSPEECH, 2011

PDF

2010

Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral Information

Ming Li, Viktor Rozgic, Gautam Thatte, Sangwon Lee, Adar Emken, Murali Annavaram, Urbashi Mitra, Donna Spruijt-Metz, and Shrikanth Narayanan

IEEE Transactions on Neural Systems & Rehabilitation Engineering, 18, no. 4 (2010): 369-380, 2010

PDF
Combining Five Acoustic Level methods for Automatic Speaker Age and Gender Recognition

Ming Li, Chi-Sang Jung, and Kyu Jeong Han

In INTERSPEECH, 2010

PDF
Robust ECG biometrics by fusing temporal and cepstral information

Ming Li, and Shrikanth Narayanan

In ICPR, 2010

PDF

2009

Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection

Gautam Thatte, Viktor Rozgic, Ming Li, Sabyasachi Ghosh, Urbashi Mitra, Shri Narayanan, Murali Annavaram, and Donna Spruijt-Metz

In DCOSS, 2009

PDF
Energy-Efficient Multihypothesis Activity-Detection for Health-Monitoring Applications

Gautam Thatte, Ming Li, Adar Emken, Urbashi Mitra, Shri Narayanan, Murali Annavaram, and Donna Spruijt-Metz

In EMBC, 2009

PDF

2008

Using SVM as back-end classifier for language identification

Hongbin Suo, Ming Li, Ping Lu, and Yonghong Yan

EURASIP Journal on Audio, Speech, and Music Processing, 2008

PDF
Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping

Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, and Yonghong Yan

In INTERSPEECH, 2008

PDF
Automatic language identification with discriminative language characterization based on SVM

Hongbin Suo, Ming Li, Ping Lu, and Yonghong Yan

In IEICE transaction on Information and Systems, 91, no. 3 (2008): 567-575, 2008

PDF

2007

Authentication and quality monitoring based audio watermark for analog AM shortwave broadcasting

Ming Li, Yun Lei, Xiang Zhang, Jian Liu, and Yonghong Yan

In IIH-MSP, 2007

PDF
Spoken Language Identification Using Score Vector Modeling and Support Vector Machine

Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, and Yonghong Yan

In INTERSPEECH, 2007

PDF

2006

A Novel Audio Watermarking in Wavelet Domain

Ming Li, Yun Lei, Jian Liu, and Yonghong Yan

In IIH-MSP, 2006

PDF

2000

RWF-2000: An Open Large Scale Video Database for Violence Detection

Ming Cheng, Kunjing Cai, and Ming Li

In ICPR, 2000

PDF