Diarization and speaker attributed ASR

Diarization is the process of segmenting an audio stream into distinct speaker identities, enabling systems to differentiate between speakers in conversations. I have extensive experience in researching, training, and developing diarization and speaker-attributed ASR systems at Oracle, with a focus on both general applications and specialized solutions for the healthcare domain. One of the models I contributed to has been successfully deployed in real-world production scenarios globally. My expertise spans end-to-end (E2E) diarization and serialized diarization techniques. Additionally, I hold a US/EU patent for innovations in this area, further underscoring my contributions to the field.

Automatic speech recognition and natural language processing

I have researched and implemented advanced techniques and deep neural network architectures for automatic speech recognition, creating models across various domains and languages. My work has focused on in-depth exploration of semi-supervised and weakly supervised end-to-end (E2E) architectures, including wav2vec2 and Whisper, as well as other E2E models like DeepSpeech2, Jasper, and QuartzNet. I have applied these models in demanding environments, from low-resource edge devices to Federated Learning systems in H2020 European projects (GRACE), as well as critical applications in the EU project APPRAISE

Linguistic laws and complexity in communication

I am focused on studying statistical patterns, including linguistic laws, within language and broader communication systems. My research interests encompass topics such as self-organized criticality in voice, linguistic laws measured in physical units, and comparative studies of communication with animals. In recent years, I have contributed to the first comprehensive study of linguistic laws using speech magnitudes and have explored the complexity of communication through interdisciplinary collaboration, drawing on applied physics, linguistics, cognitive science, and applied mathematics.

Applications to aphasia and Alzheimer

I am dedicated to applying artificial intelligence algorithms and principles of fundamental physics to medical and clinical applications. In particular, my work focuses on the early detection of Alzheimer's disease through non-invasive, low-cost methods based on speech characteristics. Additionally, I have published innovative techniques in automatic speech recognition using semi-supervised learning methods, which have the potential to support the development of personalized rehabilitation apps for patients with aphasia.

Data science, time serie forecasting and deep neural networks in industry 4.0

Among my academic work I have collaborated with Vicomtech and Agrowingdata partnering with various organizations to develop technological solutions leveraging the latest advancements in artificial intelligence and applied physics. I have been actively involved in transferring technology and knowledge from academia to industry, managing technological work packages in multi-partner European and national projects.

Multifractal complexity characterization of 2D and 3D images

I have worked on characterizing the complex structure of pore distribution in 2D and 3D soil images using techniques such as multifractal analysis, lacunarity, and configuration entropy analysis. Our research demonstrated that multifractal spectra more accurately capture the complexity of soil pore structures compared to traditional analysis of binarized soil CT scan images. Additionally, we highlighted the importance of conducting multifractal characterization in 3D to avoid biases introduced by gravitational effects.