publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
- M2S-AVSR: Modality-aware Multi-View Self-supervised Representations for Audio-Visual Speech Recognitionsubmitted to IEEE Transactions on Audio, Speech, and Language Processing, 2026
- Multimodal Large Language Models for ADOS-M1 Behavioral Assessmentsubmitted to Neurocomputing, 2026
- VLM-Guided Semantic Augmentation and Uncertainty-Aware Tri-modal Fusion for Group Emotion Recognitionsubmitted to IEEE Transactions on Multimedia, 2026
- Emotional Description-Guided Vision-Language Semantic Alignment for Group Emotion Recognitionsubmitted to IEEE Transactions on Affective Computing, 2026
- Toward Multimodal Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals2026submitted to Interspeech 2026
- Audio-Visual Speech Enhancement in Complex Scenarios with Separation and Dereverberation Joint Modeling2026submitted to Interspeech 2026
- Robust Audio-Visual Target Speaker Extraction with Multiple Enrollment Fusion2026submitted to Interspeech 2026
- DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models2026submitted to ACM Multimedia 2026
- Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge2026submitted to Interspeech 2026
- SPATIALLY-AUGMENTED SEQUENCE-TO-SEQUENCE NEURAL DIARIZATION FOR MEETINGS2026submitted to Interspeech 2026
- WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Speech Conversion2026submitted to Interspeech 2026
- Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement2026submitted to Interspeech 2026
- Making Separation-First Multi-Stream Audio Watermarking Feasible via Joint Training2026submitted to Interspeech 2026
- A Dual-Path Efficient EEG Encoder for Brain-Assisted Target Speaker Extraction2026submitted to Interspeech 2026
- AISHELL8-FISHEYE: A Fisheye Audio-Visual Dataset for Target Speaker Extraction with Distortion-Aware Baselines2026submitted to ACM Multimedia 2026
- Multi-View Based Audio Visual Target Speaker Extraction2026submitted to Interspeech 2026
- MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection2026submitted to Interspeech 2026
- Dual-Encoder Fusion with Explicit and Implicit Injection for the Interspeech 2026 Audio Encoder Capability Challenge2026submitted to Interspeech 2026
- Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional EnhancementNeurocomputing, 2026
- Glitter: Exploring an LLM Virtual Agent for Supporting Practitioners in Behavioral Interventions of Autistic ChildrenInternational Journal of Human–Computer Interaction, 2026
- Enhancing Speaker Verification with W2v-Bert 2.0 and Knowledge Distillation Guided PruningIn ICASSP, 2026
- AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition BaselinesIn ICASSP, 2026
- Compspoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing CountermeasureIn ICASSP, 2026
-
-
2025
- Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation LearningIEEE Transactions on Audio, Speech, and Language Processing, 2025
- Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker DiarizationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
- Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and RepresentationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
- Assessing the Expressive Language Levels of Autistic Children in Home InterventionIEEE Transactions on Computational Social Systems, 2025
- Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity DetectionComputer Speech and Language, 2025
- USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
-
- An Automatic Laryngoscopic Image Segmentation System Based on SAM Prompt Engineering: From Glottis Annotation to Vocal Fold SegmentationFrontiers in Molecular Biosciences, 2025
- Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold ParalysisComputer Speech and Language, 2025
-
- Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 ChallengeIn Interspeech, 2025
- "Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip FeaturesIn NCMMSC, 2025
- Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech SystemIn ICME, 2025
- Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive ScoringIn Interspeech, 2025
- SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means QuantizationIn ASRU, 2025
- LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsIn ASRU, 2025
-
- Enhancing the Robustness of Speech Anti-spoofing Countermeasures through Joint Optimization and Transfer LearningIn IEICE TRANSACTIONS on Information and System, 2025
-
- SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech SynthesisIn ACM Multimedia, 2025
-
- Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array SettingsIn Interspeech, 2025
2024
- StarRescue: the Design and Evaluation of A Turn-Taking Collaborative Game for Facilitating Social and Fine Motor Skills of Children with Autism Spectrum DisorderCHI, 2024
- Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge DistillationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
- Location-guided Head Pose Estimation for Fisheye ImageIEEE Transactions on Cognitive and Developmental Systems, 2024
- Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech ReconstructionIALP, 2024
- Speaker verification in deliberately disguised scenariosComputer Engineering and Applications, 2024
- Speaker verification in deliberately disguised scenariosComputer Engineering and Applications, 2024
- Investigating Long-Term and Short-Term Time-Varying Speaker VerificationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
- Online Neural Speaker Diarization with Target Speaker TrackingIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
- HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic ChildrenIEEE Transactions on Learning Technologies, 2024
- Joint Training on Multiple Datasets with Inconsistent Labeling Criteria for Facial Expression RecognitionIEEE Transactions on Affective Computing, 2024
- Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting ScenariosJ. Shanghai Jiaotong Univ. (Sci.) (2024), 2024
-
-
-
- Multi-Objective Progressive Clustering for Semi-Supervised Domain Adaptation in Speaker VerificationIn ICASSP, 2024
-
- VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification BenchmarkIn Interspeech, 2024
- Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice GenerationIn ISCSLP, 2024
-
- TMCSpeech: A Chinese Tv and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation ModelIn ICPR, 2024
- The WHU Wake Word Lipreading System for the 2024 Chat-scenario Chinese Lipreading ChallengeIn ICME challenge paper, 2024
- Joint Inference of Speaker Diarization and ASR with Multi-Stage Information SharingIn ICASSP, 2024
- Robust Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual ConformerIn Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual Conformer”, ICASSP, 2024
-
-
- A Dual-Path Framework with Frequency-and-Time Excited Network for Machine Anomalous Sound DetectionIn ICASSP, 2024
-
2023
- Integrating Frame-Level Boundary Detection and Deepfake Detection for Locating Manipulated Regions in Partially Spoofed Audio Forgery AttacksComputer Speech and Language, 2023
- Computer-aided Autism Spectrum Disorder Diagnosis with Behavior Signal ProcessingIEEE Transactions on Affective Computing, 2023
- Expressive language profiles in a clinically screening sample of Mandarin-speaking preschool children with Autism Spectrum DisorderJournal of Speech, Language, and Hearing Research, 2023
- Assessing the Social Skills of Children with Autism Spectrum Disorder via Language-Image Pre-training ModelsThe 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2023
- From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA 2023), 2023
- A multimodal machine learning system in early screening for toddlers with autism spectrum disorders based on the response to nameFrontiers in Psychiatry, 2023
-
- Identifying Source Speakers For Voice Conversion Based Spoofing Attacks For Speaker VerificationIn ICASSP, 2023
-
-
- The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023In NCMMSC, 2023
-
- EEG-Based Speech Envelope Decoding: Structured State Space and Diffusion Model IntegrationIn NCMMSC, 2023
-
- Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End FeaturesIn ICASSP, 2023
- "Robust audio anti-spoofing countermeasure with joint training of front-end and back-end modelsIn Interspeech, 2023
- Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting ScenariosIn NCMMSC, 2023
-
-
- Pre-training Deep Learning Models with Finite Element Simulation Data for Enhanced Machine Anomalous Sound DetectionIn NCMMSC, 2023
- Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask LearningIn Interspeech, 2023
-
2022
- Cross-lingual Multispeaker Speech Synthesis under Limited-Data ScenariosComputer Speech and Language, 2022
- Incorporating visual information in audio based self-supervised speaker recognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
- Accurate Head Pose Estimation Using Image Rectification and Lightweight Convolutional Neural NetworkIEEE Transactions on Multimedia, 2022
- Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability ScenariosIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
- Similarity Measurement of Segment-Level Speaker Embeddings in Speaker DiarizationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
- Paralinguistic singing attribute recognition using supervised machine learning for describing the singing voice in vocal pedagogyEURASIP Journal on Audio, Speech, and Music Processing, 2022
- Electrolaryngeal Speech Enhancement based on Bottleneck Feature Refinement and Voice ConversionBiomedical Signal Processing and Control, 2022
- A Complementary Dual-branch Network for Appearance-based Gaze Estimation from Low-resolution Facial ImageIEEE Transactions on Cognitive and Developmental Systems, 2022
- THE WHU-ALIBABA AUDIO-VISUAL SPEAKER DIARIZATION SYSTEM FOR THE MISP CHALLENGE 2022In ICASSP 2023, 2022
- Single-Channel Target Speaker Separation using Joint Training with Target Speaker’s Pitch InformationIn Odyssey, 2022
- Towards Lightweight applications: Asymmetric Enroll-Verify Structure For Speaker VerificationIn ICASSP, 2022
- A Multimodal Framework for Automated Teaching Quality Assessment of One-to-many Online Instruction VideosIn ICPR, 2022
- "Simple Attention Module Based Speaker Verification with Iterative Noisy Label DetectionIn ICASSP, 2022
-
- "VC-AUG : Voice Conversion based Data Augmentation for Text-Dependent Speaker VerificationIn NCMMSC, 2022
- Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MET ChallengeIn of ICASSP, 2022
- Generating Adversarial Samples For Training Wake-Up Word Detection Systems Against Confusing WordsIn Odyssey, 2022
- The DKU-OPPO System for the Spoofing-Aware Speaker Verification challenge 2022In Interspeech, 2022
- Low Pass Filtering and Band Extension for Robust Anti-spoofing Countermeasure against Codec VariabilitiesIn ISCSLP, 2022
-
- Incorporating End-To-End Framework Into Target-Speaker Voice Activity DetectionIn Prof. of ICASSP, 2022
-
- SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System For Both Human Beings And MachinesIn ICASSP, 2022
- Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information FusionIn NCMMSC, 2022
-
2021
- Discriminative Dictionary Learning for Autism Spectrum Disorder IdentificationFrontiers in Computational Neuroscience, 2021
- Typical Facial Expression Network Using Facial Feature Decoupler and Spatial-Temporal LearningIEEE Transactions on Affective Computing, 2021
- Audio-based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention MechanismIEEE/ACM Transactions on Audio, Speech, and Language Processing, 29 (2021): 1119-1133, 2021
- Lightweight Dual-channel Target Speaker Separation for Mobile Voice CommunicationIn NCMMSC, 2021
- Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone ArraysIn SLT, 2021
-
- The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition ChallengeIn VoxSRC, 2021
-
-
-
- The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 Misp Challenge: Deep AnalysisIn ICASSP 2023, 2021
- Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and ModelingIn NCMMSC, 2021
-
- Acoustic Word Embedding on Code-switching Query by Example Spoken Term DetectionIn ISCSLP, 2021
-
-
- The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition ChallengeIn VoxSRC, 2021
- The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase IIIIn INTERSPEECH, 2021
- A Two-Stage Query-by-example Spoken Term Detection System for Personalized Keyword SpottingIn NCMMSC, 2021
-
2020
- On the fly Data Loader and Utterance-level Aggregation for Speaker and Language RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 28 (2020): 1038-1051, 2020
- STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression RecognitionIEEE Transactions on Affective Computing, 2020
- From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer by Feedback ConstraintIn INTERSPEECH, 2020
- Within-sample variability-invariant loss for robust speaker recognition under noisy environmentsIn ICASSP, 2020
- The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and ResultsIn INTERSPEECH, 2020
-
-
-
-
- The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02In INTERSPEECH, 2020
- Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening"In ICPR, 2020
- HI-MIA: a far-field text-dependent speaker verification database and the baselines"In ICASSP, 2020
-
-
2019
- String Stability Analysis for Vehicle Platooning under Unreliable Communication Links with Event-Triggered StrategyIEEE Transactions on Vehicular Technology, 68, no. 3 (2019): 2152-2164, 2019
- An Automated Assessment Framework for Atypical Prosody and Stereotyped Idiosyncratic Phrases related to Autism Spectrum DisorderComputer Speech and Language, 56 (2019): 80-94, 2019
- Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy EnvironmentIn INTERSPEECH, 2019
- The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance ChallengeIn INTERSPEECH, 2019
- Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding FeatureIn INTERSPEECH, 2019
- F0 contour estimation using phonetic feature in electrolaryngeal speech enhancementIn ICASSP, 2019
- The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and FusionIn INTERSPEECH, 2019
- Utterance-level End-to-end Language Identification using Attention-based CNN-BLSTMIn ICASSP, 2019
- LSTM Based Similarity Measurement with Spectral Clustering for Speaker DiarizationIn INTERSPEECH, 2019
- Far-Field End-to-End Text-Dependent Speaker Verification based on Mixed Training Data with Transfer Learning and Enrollment Data AugmentationIn INTERSPEECH, 2019
-
- Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech DetectionIn APSIPA ASC, 2019
- The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic ChallengeIn INTERSPEECH, 2019
-
2018
- Cancellable Speech Template via Random Binary Orthogonal Matrices Projection Hashing” Pattern Recognition, 2018
- Facial Expression Recognition with Identity and Emotion Joint LearningIEEE Transaction on Affective Computing, accepted in 2018, published at 12, no. 2 (2021): 544-550, 2018
- Finite-time Stability and Stabilization of Semi-Markovian Jump Systems with Time DelayInternational Journal of Robust and Nonlinear Control, 28, no. 6 (2018): 2064-2081, 2018
- A Novel Learnable Dictionary Encoding Layer for End-to-End Language IdentificationIn ICASSP, 2018
-
- The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory InversionIn ISCSLP, 2018
- Deep Speaker Embedding with Convolutional Neural Network on Supervector for Text-Independent Speaker RecognitionIn APSIPA ASC, 2018
-
-
- Exploring the Encoding Layer and Loss function in End-to-End Speaker and Language Recognition SystemIn Odyssey, 2018
-
- "An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical IndividualIn INTERSPEECH, 2018
- Unsupervised Query by Example Spoken Term Detection Using Features Concatenated with Self-Organizing Map DistancesIn ISCSLP, 2018
2017
- Reconstruction of Lamb wave dispersion curves by sparse representation and continuity constraintsJournal of the Acoustical Society of America, 141, no. 2 (2017): 749-763, 2017
- Countermeasures for Automatic Speaker Verification Replay Spoofing Attack: On Data Augmentation, Feature Representation, Classification and FusionIn INTERSPEECH, 2017
- End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware SpectrumIn INTERSPEECH, 2017
- Automatic Emotional Spoken Language Text Corpus Construction from Written Dialogs in FictionsIn ACII, 2017
- Mandarin Electrolaryngeal Voice Conversion with Combination of Gaussian Mixture Model and Non-negative Matrix FactorizationIn APSIPA ASC, 2017
- Response to Name: A Dataset and A Multimodal Machine Learning Framework towards Autism StudyIn ACII, 2017
-
- An audio based piano performance evaluation method using deep neural network based acoustic modelingIn INTERSPEECH, 2017
- An Automated Assessment Framework for Speech Abnormalities related to Autism Spectrum DisorderIn INTERSPEECH, 2017
2016
- Speaker verification based on the fusion of speech acoustics and inverted articulatory signalsComputer Speech & Language, 36 (2016): 196-211, 2016
- Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker VerificationJournal of Signal Processing Systems, 82, no. 2 (2016): 207-215, 2016
- Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning frameworkAutism research, 9, no. 8 (2016): 888-898, 2016
-
- A Fast Tracking Algorithm for Estimating Ultrasonic Signal Time of Flight in Drilled Shafts Using Active Shape ModelsIn IUS, 2016
-
-
- Efficient Misalignment-Robust Face Recognition Via Locality-Constrained RepresentationIn ICIP, 2016
-
- Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level FeaturesIn ICPR, 2016
-
2015
- Automatic intelligibility classification of sentence-level pathological speechComputer Speech & Language, 29, no. 1 (2015): 132-144, 2015
- Innovations in the Use of Interactive Technology to Support Weight ManagementCurrent Obesity Reports, 4, no. 4 (2015): 510-519, 2015
- Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart GridIEEE Transactions On Smart Grid, 8, no. 4 (2015): 1568-1579, 2015
- Duration Dependent Covariance Regularization in PLDA Modeling for Speaker VerificationIn INTERSPEECH, 2015
- Automatic assessment of non-native accent degrees using phonetic level posterior and duration features from multiple languagesIn APSIPA ASC, 2015
- Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition SystemIn INTERSPEECH, 2015
- speaker verification with the mixture of Gaussian factor analysis based representationIn ICASSP, 2015
-
- Efficient Autism Spectrum Disorder Diagnosis with Eye Movement: A Machine Learning FrameworkIn ACII, 2015
-
- The SYSU system for the INTERSPEECH 2015 automatic speaker verification spoofing and countermeasures challengeIn APSIPA ASC, 2015
2014
- Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM SupervectorsComputer Speech & Language, 28, no. 2 (2014): 375-391, 2014
- "Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verificationComputer Speech & Language, 28, no. 4 (2014): 940-958, 2014
- Verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusionIn ICASSP, 2014
- Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokensIn INTERSPEECH, 2014
- Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem featuresIn INTERSPEECH, 2014
- An Iterative Framework for Unsupervised Learning in the PLDA basedSpeaker VerificationIn ISCSLP, 2014
-
2013
- Automatic Speaker Age and Gender Recognition using acoustic and prosodic level information fusionComputer speech and language, 27, no. 1 (2013): 151-167, 2013
- Automatic Classification of Palatal and Pharyngeal Wall Morphology Patterns from Speech Acoustics and Inverted Articulatory Signals”, 2013
- "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential ConfoundsIn INTERSPEECH, 2013
-
-
- Speaker verification based on fusion of acoustic and articulatory informationIn INTERSPEECH, 2013
- Multi-band long-term signal variability features for robust voice activity detectionIn INTERSPEECH, 2013
2012
- Recognition of Physical Activities in Overweight Hispanic Youth using KNOWME NetworksJournal of Physical Activity and Health, 9, no. 3 (2012): 432-441, 2012
- KNOWME: a Case Study in Wireless Body Area Sensor Network DesignIEEE Communications Magazine 50, no. 5 (2012): 116-125, 2012
- KNOWME: An energy-efficient multimodal body area network for physical activity monitoringACM Transactions in Embedded Computing Systems, 11, no. S2 (2012): 1-24, 2012
- Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian NetworkIn INTERSPEECH, 2012
- Intelligibility classification of pathological speech using fusion of multiple high level descriptorsIn INTERSPEECH, 2012
- Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant AnalysisIn APSIPA ASC, 2012
- "Speaker states recognition using latent factor analysis based Eigenchannel factor vector modelingIn ICASSP, 2012
2011
- Optimal Time-Resource Allocation for Energy-Efficient Physical Activity DetectionIEEE Transaction on Signal Processing, 59, no. 4 (2011): 1843-1857, 2011
- Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM SupervectorsIn INTERSPEECH, 2011
- Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signalsIn EMBC, 2011
- Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectorsIn ICASSP, 2011
- Speaker Verification using Sparse Representations on Total Variability I-VectorsIn INTERSPEECH, 2011
2010
- Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral InformationIEEE Transactions on Neural Systems & Rehabilitation Engineering, 18, no. 4 (2010): 369-380, 2010
- Combining Five Acoustic Level methods for Automatic Speaker Age and Gender RecognitionIn INTERSPEECH, 2010
-
2009
- Optimal Allocation of Time-Resources for Multihypothesis Activity-Level DetectionIn DCOSS, 2009
- Energy-Efficient Multihypothesis Activity-Detection for Health-Monitoring ApplicationsIn EMBC, 2009
2008
- Using SVM as back-end classifier for language identificationEURASIP Journal on Audio, Speech, and Music Processing, 2008
- Cochannel speech separation using multi-pitch estimation and model based voiced sequential groupingIn INTERSPEECH, 2008
- Automatic language identification with discriminative language characterization based on SVMIn IEICE transaction on Information and Systems, 91, no. 3 (2008): 567-575, 2008
2007
- Authentication and quality monitoring based audio watermark for analog AM shortwave broadcastingIn IIH-MSP, 2007
- Spoken Language Identification Using Score Vector Modeling and Support Vector MachineIn INTERSPEECH, 2007