Physiological Signal Analysis Research
Machine Learning Engineer / Research Data Scientist
Problem to Solve
As a Research Fellow at the LEICI Institute (Faculty of Engineering, UNLP) in collaboration with the Italian Hospital of Buenos Aires (HIBA), I developed a computational framework for analyzing physiological signals during exercise and stress episodes in individuals with Type 1 Diabetes.
The project involved processing multimodal physiological data from wearable sensors and continuous glucose monitors (CGM) to detect and classify physiological states, requiring scalable software architecture, robust signal processing pipelines, and machine learning models optimized for ambulatory monitoring.
Role: Research Fellow (2025) | Focus: Machine Learning Engineering & Signal Processing
1. Research Context
This project is part of an ongoing research collaboration between academic and medical institutions, focusing on computational analysis of physiological responses in clinical populations.
LEICI Institute
Institution: Universidad Nacional de La Plata (UNLP)
Faculty: Faculty of Engineering
Focus: Electronics, Control, and Signal Processing Research
HIBA Collaboration
Partner: Italian Hospital of Buenos Aires
Domain: Clinical Research
Population: Type 1 Diabetes Patients
2. Software Architecture & Framework Design
Developed a Python-based framework using Object-Oriented Programming (OOP) principles to manage complex multimodal physiological datasets.
2.1 Object-Oriented Data Modeling
Implemented custom data abstractions using Python classes to manage hierarchical physiological data structures. Key classes include:
Sujeto(Subject): Aggregates all physiological data for a single participant, including multimodal signals and metadataDia(Day): Manages temporal organization and synchronization of signals within a day- Additional classes: Supporting classes for signal processing, feature extraction, and data management
3. Data Engineering & Signal Processing Pipeline
Implemented an ETL pipeline to process raw sensor data into machine-learning-ready features.
3.1 Data Sources
The system processes data from multiple sensor types:
Empatica E4
Multi-sensor device capturing:
- Electrodermal Activity (EDA) - 4 Hz
- Accelerometry (ACC) - 32 Hz
- Heart Rate Variability
- Temperature
- Other physiological signals
Continuous Glucose Monitor (CGM)
Medical device providing:
- Glucose levels - 1 Hz
- Continuous monitoring data
- Clinical-grade measurements
3.2 Signal Synchronization & Resampling
Synchronized signals with heterogeneous sampling frequencies:
- ACC: 32 Hz | EDA: 4 Hz | CGM: 1 Hz
- Resampled all signals to 10-second intervals using signal-specific interpolation methods
- Handled missing data, sensor dropouts, and timestamp misalignments
3.3 Digital Signal Processing
Applied Butterworth filters (4th order) for signal processing:
- EDA: Separated phasic and tonic components
- ACC: Bandpass filtering to isolate relevant frequency bands
Extracted features including statistical measures, frequency-domain (FFT-based), time-domain (peaks, slopes), and cross-signal correlations from windowed signal segments.
4. Machine Learning Implementation
Developed classification models to detect physiological states from multimodal sensor data.
4.1 Classification Model
Random Forest Classifier
Implemented Random Forest models for physiological state detection:
- Ensemble method combining multiple decision trees
- Feature importance analysis for interpretability
- Handles missing values and noisy real-world data
- Captures non-linear interactions between multimodal signals
4.2 Mixed-Cohort Training
Combined datasets from healthy subjects and Type 1 Diabetes patients:
- Models trained using both datasets together
- Domain adaptation techniques to handle distribution differences
- Increases training data size while maintaining relevance to clinical population
4.3 Data Curation & Labeling
Addressed challenges in processing real-world ambulatory data:
- Data Quality: Validation checks for sensor malfunctions, missing data, and outliers
- Class Imbalance: Stratified sampling and class weighting in model training
- Labeling: Multi-signal analysis and temporal pattern recognition for free-living data
5. End-to-End Data Pipeline
Complete pipeline from raw sensor data to model deployment:
Data Ingestion
Load and validate raw sensor data from Empatica E4 and CGM devices.
Synchronization & Resampling
Temporal alignment and resampling to 10-second intervals.
Signal Processing
Apply Butterworth filters for EDA decomposition and ACC bandpass filtering.
Feature Engineering
Extract temporal, frequency-domain, and cross-signal features.
Model Training
Train Random Forest models using combined datasets with time-series aware validation.
Research Insights & Results
Obtain valuable insights and research findings from the processed physiological data and model predictions.
6. Engineering Impact
Key technical contributions:
Scalable Architecture
Modular OOP framework enabling easy extension to new sensor types and processing methods.
Robust Data Pipeline
ETL pipeline handles missing values, sensor failures, and heterogeneous sampling rates.
Innovative ML Approach
Mixed-cohort training addresses limited clinical data availability in medical ML.
Research-Grade Code
Production-ready framework with error handling and documentation.
Research Note
Confidentiality: This is an ongoing research project in collaboration with medical institutions. Specific model performance metrics, clinical findings, and detailed methodologies are subject to academic confidentiality pending publication. The focus here is on the engineering and technical implementation aspects of the work.
Key Concepts
Software Architecture
Modular OOP design enabling scalable, maintainable systems for complex scientific computing tasks.
Signal Processing
Digital signal processing techniques for extracting meaningful features from multimodal sensor data.
Data Engineering
ETL pipelines for handling heterogeneous, real-world sensor data with robust synchronization and preprocessing.