1. emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
- Author
-
Sivakumar, Viswanath, Seely, Jeffrey, Du, Alan, Bittner, Sean R, Berenzweig, Adam, Bolarinwa, Anuoluwapo, Gramfort, Alexandre, and Mandel, Michael I
- Subjects
Computer Science - Machine Learning ,Computer Science - Human-Computer Interaction ,Electrical Engineering and Systems Science - Audio and Speech Processing ,I.2.1 ,I.2.7 ,H.5.2 ,H.1.2 - Abstract
Surface electromyography (sEMG) non-invasively measures signals generated by muscle activity with sufficient sensitivity to detect individual spinal neurons and richness to identify dozens of gestures and their nuances. Wearable wrist-based sEMG sensors have the potential to offer low friction, subtle, information rich, always available human-computer inputs. To this end, we introduce emg2qwerty, a large-scale dataset of non-invasive electromyographic signals recorded at the wrists while touch typing on a QWERTY keyboard, together with ground-truth annotations and reproducible baselines. With 1,135 sessions spanning 108 users and 346 hours of recording, this is the largest such public dataset to date. These data demonstrate non-trivial, but well defined hierarchical relationships both in terms of the generative process, from neurons to muscles and muscle combinations, as well as in terms of domain shift across users and user sessions. Applying standard modeling techniques from the closely related field of Automatic Speech Recognition (ASR), we show strong baseline performance on predicting key-presses using sEMG signals alone. We believe the richness of this task and dataset will facilitate progress in several problems of interest to both the machine learning and neuroscientific communities. Dataset and code can be accessed at https://github.com/facebookresearch/emg2qwerty., Comment: Submitted to NeurIPS 2024 Datasets and Benchmarks Track
- Published
- 2024