Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-12-24 | A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs | OpenMind, Shaohong Zhong, Adam Zhou, Boyuan Chen, Homin Luo, Jan Liphardt | Link | Large Language Models (LLMs) are compact representations of all public knowledge of our physical environment and animal and human behaviors. The application of LLMs to robotics may offer a path to highly capable robots that perform well across most human tasks with limited or even zero tuning. Aside from increasingly sophisticated reasoning and task planning, networks of (suitably designed) LLMs offer ease of upgrading capabilities and allow humans to directly observe the robot's thinking. Here we explore the advantages, limitations, and particularities of using LLMs to control physical robots. The basic system consists of four LLMs communicating via a human language data bus implemented via web sockets and ROS2 message passing. Surprisingly, rich robot behaviors and good performance across different tasks could be achieved despite the robot's data fusion cycle running at only 1Hz and the central data bus running at the extremely limited rates of the human brain, of around 40 bits/s. The use of natural language for inter-LLM communication allowed the robot's reasoning and decision making to be directly observed by humans and made it trivial to bias the system's behavior with sets of rules written in plain English. These rules were immutably written into Ethereum, a global, public, and censorship resistant Turing-complete computer. We suggest that by using natural language as the data bus among interacting AIs, and immutable public ledgers to store behavior constraints, it is possible to build robots that combine unexpectedly rich performance, upgradability, and durable alignment with humans. |
2024-12-24 | Subsampling, aligning, and averaging to find circular coordinates in recurrent time series | Andrew J. Blumberg, Mathieu Carrière, Jun Hou Fung, Michael A. Mandell | Link | We introduce a new algorithm for finding robust circular coordinates on data that is expected to exhibit recurrence, such as that which appears in neuronal recordings of C. elegans. Techniques exist to create circular coordinates on a simplicial complex from a dimension 1 cohomology class, and these can be applied to the Rips complex of a dataset when it has a prominent class in its dimension 1 cohomology. However, it is known this approach is extremely sensitive to uneven sampling density. Our algorithm comes with a new method to correct for uneven sampling density, adapting our prior work on averaging coordinates in manifold learning. We use rejection sampling to correct for inhomogeneous sampling and then apply Procrustes matching to align and average the subsamples. In addition to providing a more robust coordinate than other approaches, this subsampling and averaging approach has better efficiency. We validate our technique on both synthetic data sets and neuronal activity recordings. Our results reveal a topological model of neuronal trajectories for C. elegans that is constructed from loops in which different regions of the brain state space can be mapped to specific and interpretable macroscopic behaviors in the worm. |
2024-12-24 | Think or Remember? Detecting and Directing LLMs Towards Memorization or Generalization | Yi-Fu Fu, Yu-Chieh Tu, Tzu-Ling Cheng, Cheng-Yu Lin, Yi-Ting Yang, Heng-Yi Liu, Keng-Te Liao, Da-Cheng Juan, Shou-De Lin | Link | In this paper, we explore the foundational mechanisms of memorization and generalization in Large Language Models (LLMs), inspired by the functional specialization observed in the human brain. Our investigation serves as a case study leveraging specially designed datasets and experimental-scale LLMs to lay the groundwork for understanding these behaviors. Specifically, we aim to first enable LLMs to exhibit both memorization and generalization by training with the designed dataset, then (a) examine whether LLMs exhibit neuron-level spatial differentiation for memorization and generalization, (b) predict these behaviors using model internal representations, and (c) steer the behaviors through inference-time interventions. Our findings reveal that neuron-wise differentiation of memorization and generalization is observable in LLMs, and targeted interventions can successfully direct their behavior. |
2024-12-24 | Towards the Automatic Detection of Vection in Virtual Reality Using EEG | Gaël Van der Lee, Anatole Lécuyer, Maxence Naud, Reinhold Scherer, François Cabestaing, Hakim Si-Mohammed | Link | Vection, the visual illusion of self-motion, provides a strong marker of the VR user experience and plays an important role in both presence and cybersickness. Traditional measurements have been conducted using questionnaires, which exhibit inherent limitations due to their subjective nature and preventing real-time adjustments. Detecting vection in real time would allow VR systems to adapt to users' needs, improving comfort and minimizing negative effects like motion sickness. This paper investigates the presence of vection markers in electroencephalogram (EEG) brain signals using evoked potentials (brain responses to external stimulations). We designed a VR experiment that induces vection using two conditions: (1) forward acceleration or (2) backward acceleration. We recorded both electroencephalographic (EEG) signals and gathered subjective reports on thirty (30) participants. We found an evoked potential of vection characterized by a positive peak around 600 ms (P600) after stimulus onset in the parietal region and a simultaneous negative peak in the frontal region. Our results also found participant variability in sensitivity to vection and cybersickness and EEG markers of acceleration across subjects. This result is promising for potential detection of vection using EEG and paves the way for future studies towards a better understanding of vection. It also provides insights into the functional role of the visual system and its integration with the vestibular system during motion-perception. It has the potential to help enhance VR user experience by qualifying users' perceived vection and adapting the VR environments accordingly. |
2024-12-24 | All-electric mimicking synaptic plasticity based on the noncollinear antiferromagnetic device | Cuimei Cao, Wei Duan, Xiaoyu Feng, Yan Xu, Yihan Wang, Zhenzhong Yang, Qingfeng Zhan, Long You | Link | Neuromorphic computing, which seeks to replicate the brain's ability to process information, has garnered significant attention due to its potential to achieve brain-like computing efficiency and human cognitive intelligence. Spin-orbit torque (SOT) devices can be used to simulate artificial synapses with non-volatile, high-speed processing and endurance characteristics. Nevertheless, achieving energy-efficient all-electric synaptic plasticity emulation using SOT devices remains a challenge. We chose the noncollinear antiferromagnetic Mn3Pt as spin source to fabricate the Mn3Pt-based SOT device, leveraging its unconventional spin current resulting from magnetic space breaking. By adjusting the amplitude, duration, and number of pulsed currents, the Mn3Pt-based SOT device achieves nonvolatile multi-state modulated by all-electric SOT switching, enabling emulate synaptic behaviors like excitatory postsynaptic potential (EPSP), inhibitory postsynaptic potential (IPSP), long-term depression (LTD) and the long-term potentiation (LTP) process. In addition, we show the successful training of an artificial neural network based on such SOT device in recognizing handwritten digits with a high recognition accuracy of 94.95 %, which is only slightly lower than that from simulations (98.04 %). These findings suggest that the Mn3Pt-based SOT device is a promising candidate for the implementation of memristor-based brain-inspired computing systems. |
2024-12-24 | Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts | Elisa Marchetto, Hannah Eichhorn, Daniel Gallichan, Julia A. Schnabel, Melanie Ganz | Link | Purpose: Reliable image quality assessment is crucial for evaluating new motion correction methods for magnetic resonance imaging. In this work, we compare the performance of commonly used reference-based and reference-free image quality metrics on a unique dataset with real motion artifacts. We further analyze the image quality metrics' robustness to typical pre-processing techniques. Methods: We compared five reference-based and five reference-free image quality metrics on data acquired with and without intentional motion (2D and 3D sequences). The metrics were recalculated seven times with varying pre-processing steps. The anonymized images were rated by radiologists and radiographers on a 1-5 Likert scale. Spearman correlation coefficients were computed to assess the relationship between image quality metrics and observer scores. Results: All reference-based image quality metrics showed strong correlation with observer assessments, with minor performance variations across sequences. Among reference-free metrics, Average Edge Strength offers the most promising results, as it consistently displayed stronger correlations across all sequences compared to the other reference-free metrics. Overall, the strongest correlation was achieved with percentile normalization and restricting the metric values to the skull-stripped brain region. In contrast, correlations were weaker when not applying any brain mask and using min-max or no normalization. Conclusion: Reference-based metrics reliably correlate with radiological evaluation across different sequences and datasets. Pre-processing steps, particularly normalization and brain masking, significantly influence the correlation values. Future research should focus on refining pre-processing techniques and exploring machine learning approaches for automated image quality evaluation. |
2024-12-24 | The Thousand Brains Project: A New Paradigm for Sensorimotor Intelligence | Viviane Clay, Niels Leadholm, Jeff Hawkins | Link | Artificial intelligence has advanced rapidly in the last decade, driven primarily by progress in the scale of deep-learning systems. Despite these advances, the creation of intelligent systems that can operate effectively in diverse, real-world environments remains a significant challenge. In this white paper, we outline the Thousand Brains Project, an ongoing research effort to develop an alternative, complementary form of AI, derived from the operating principles of the neocortex. We present an early version of a thousand-brains system, a sensorimotor agent that is uniquely suited to quickly learn a wide range of tasks and eventually implement any capabilities the human neocortex has. Core to its design is the use of a repeating computational unit, the learning module, modeled on the cortical columns found in mammalian brains. Each learning module operates as a semi-independent unit that can model entire objects, represents information through spatially structured reference frames, and both estimates and is able to effect movement in the world. Learning is a quick, associative process, similar to Hebbian learning in the brain, and leverages inductive biases around the spatial structure of the world to enable rapid and continual learning. Multiple learning modules can interact with one another both hierarchically and non-hierarchically via a "cortical messaging protocol" (CMP), creating more abstract representations and supporting multimodal integration. We outline the key principles motivating the design of thousand-brains systems and provide details about the implementation of Monty, our first instantiation of such a system. Code can be found at https://github.com/thousandbrainsproject/tbp.monty, along with more detailed documentation at https://thousandbrainsproject.readme.io/. |
2024-12-24 | Low count of optically pumped magnetometers furnishes a reliable real-time access to sensorimotor rhythm | Nikita Fedosov, Daria Medvedeva, Oleg Shevtsov, Alexei Ossadtchi | Link | This study presents an analysis of sensorimotor rhythms using an advanced, optically-pumped magnetoencephalography (OPM-MEG) system - a novel and rapidly developing technology. We conducted real-movement and motor imagery experiments with nine participants across two distinct magnetically-shielded environments: one featuring an analog active suppression system and the other a digital implementation. Our findings demonstrate that, under optimal recording conditions, OPM sensors provide highly informative signals, suitable for use in practical motor imagery brain-computer interface (BCI) applications. We further examine the feasibility of a portable, low-sensor-count OPM-based BCI under varied experimental setups, highlighting its potential for real-time control of external devices via user intentions. |
2024-12-24 | The same but different: impact of animal facility sanitary status on a transgenic mouse model of Alzheimer's disease | Caroline Ismeurt-Walmsley, Patrizia Giannoni, Florence Servant, Linda-Nora Mekki, Kevin Baranger, Santiago Rivera, Philippe Marin, Benjamin Lelouvier, Sylvie Claeysen | Link | The gut-brain axis has emerged as a key player in the regulation of brain function and cognitive health. Gut microbiota dysbiosis has been observed in preclinical models of Alzheimer's disease and patients. Manipulating the composition of the gut microbiota enhances or delays neuropathology and cognitive deficits in mouse models. Accordingly, the health status of the animal facility may strongly influence these outcomes. In the present study, we longitudinally analysed the faecal microbiota composition and amyloid pathology of 5XFAD mice housed in a specific opportunistic pathogen-free (SOPF) and a conventional facility. The composition of the microbiota of 5XFAD mice after aging in conventional facility showed marked differences compared to WT littermates that were not observed when the mice were bred in SOPF facility. The development of amyloid pathology was also enhanced by conventional housing. We then transplanted faecal microbiota (FMT) from both sources into wild-type (WT) mice and measured memory performance, assessed in the novel object recognition test, in transplanted animals. Mice transplanted with microbiota from conventionally bred 5XFAD mice showed impaired memory performance, whereas FMT from mice housed in SOPF facility did not induce memory deficits in transplanted mice. Finally, 18 weeks of housing SOPF-born animals in a conventional facility resulted in the reappearance of specific microbiota compositions in 5XFAD vs WT mice. In conclusion, these results show a strong impact of housing conditions on microbiota-associated phenotypes and question the relevance of breeding preclinical models in specific pathogen-free (SPF) facilities. |
2024-12-24 | SlimGPT: Layer-wise Structured Pruning for Large Language Models | Gui Ling, Ziyang Wang, Yuliang Yan, Qingwen Liu | Link | Large language models (LLMs) have garnered significant attention for their remarkable capabilities across various domains, whose vast parameter scales present challenges for practical deployment. Structured pruning is an effective method to balance model performance with efficiency, but performance restoration under computational resource constraints is a principal challenge in pruning LLMs. Therefore, we present a low-cost and fast structured pruning method for LLMs named SlimGPT based on the Optimal Brain Surgeon framework. We propose Batched Greedy Pruning for rapid and near-optimal pruning, which enhances the accuracy of head-wise pruning error estimation through grouped Cholesky decomposition and improves the pruning efficiency of FFN via Dynamic Group Size, thereby achieving approximate local optimal pruning results within one hour. Besides, we explore the limitations of layer-wise pruning from the perspective of error accumulation and propose Incremental Pruning Ratio, a non-uniform pruning strategy to reduce performance degradation. Experimental results on the LLaMA benchmark show that SlimGPT outperforms other methods and achieves state-of-the-art results. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-12-24 | Towards the Automatic Detection of Vection in Virtual Reality Using EEG | Gaël Van der Lee, Anatole Lécuyer, Maxence Naud, Reinhold Scherer, François Cabestaing, Hakim Si-Mohammed | Link | Vection, the visual illusion of self-motion, provides a strong marker of the VR user experience and plays an important role in both presence and cybersickness. Traditional measurements have been conducted using questionnaires, which exhibit inherent limitations due to their subjective nature and preventing real-time adjustments. Detecting vection in real time would allow VR systems to adapt to users' needs, improving comfort and minimizing negative effects like motion sickness. This paper investigates the presence of vection markers in electroencephalogram (EEG) brain signals using evoked potentials (brain responses to external stimulations). We designed a VR experiment that induces vection using two conditions: (1) forward acceleration or (2) backward acceleration. We recorded both electroencephalographic (EEG) signals and gathered subjective reports on thirty (30) participants. We found an evoked potential of vection characterized by a positive peak around 600 ms (P600) after stimulus onset in the parietal region and a simultaneous negative peak in the frontal region. Our results also found participant variability in sensitivity to vection and cybersickness and EEG markers of acceleration across subjects. This result is promising for potential detection of vection using EEG and paves the way for future studies towards a better understanding of vection. It also provides insights into the functional role of the visual system and its integration with the vestibular system during motion-perception. It has the potential to help enhance VR user experience by qualifying users' perceived vection and adapting the VR environments accordingly. |
2024-12-23 | Signal Transformation for Effective Multi-Channel Signal Processing | Sunil Kumar Kopparapu | Link | Electroencephalography (EEG) is an non-invasive method to record the electrical activity of the brain. The EEG signals are low bandwidth and recorded from multiple electrodes simultaneously in a time synchronized manner. Typical EEG signal processing involves extracting features from all the individual channels separately and then fusing these features for downstream applications. In this paper, we propose a signal transformation, using basic signal processing, to combine the individual channels of a low-bandwidth signal, like the EEG into a single-channel high-bandwidth signal, like audio. Further this signal transformation is bi-directional, namely the high-bandwidth single-channel can be transformed to generate the individual low-bandwidth signals without any loss of information. Such a transformation when applied to EEG signals overcomes the need to process multiple signals and allows for a single-channel processing. The advantage of this signal transformation is that it allows the use of pre-trained single-channel pre-trained models, for multi-channel signal processing and analysis. We further show the utility of the signal transformation on publicly available EEG dataset. |
2024-12-23 | Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding | Yueyang Li, Zijian Kang, Shengyu Gong, Wenhao Dong, Weiming Zeng, Hongjie Yan, Wai Ting Siok, Nizhuan Wang | Link | Decoding neural visual representations from electroencephalogram (EEG)-based brain activity is crucial for advancing brain-machine interfaces (BMI) and has transformative potential for neural sensory rehabilitation. While multimodal contrastive representation learning (MCRL) has shown promise in neural decoding, existing methods often overlook semantic consistency and completeness within modalities and lack effective semantic alignment across modalities. This limits their ability to capture the complex representations of visual neural responses. We propose Neural-MCRL, a novel framework that achieves multimodal alignment through semantic bridging and cross-attention mechanisms, while ensuring completeness within modalities and consistency across modalities. Our framework also features the Neural Encoder with Spectral-Temporal Adaptation (NESTA), a EEG encoder that adaptively captures spectral patterns and learns subject-specific transformations. Experimental results demonstrate significant improvements in visual decoding accuracy and model generalization compared to state-of-the-art methods, advancing the field of EEG-based neural visual representation decoding in BMI. Codes will be available at: https://github.com/NZWANG/Neural-MCRL. |
2024-12-22 | Fatigue Monitoring Using Wearables and AI: Trends, Challenges, and Future Opportunities | Kourosh Kakhi, Senthil Kumar Jagatheesaperumal, Abbas Khosravi, Roohallah Alizadehsani, U Rajendra Acharya | Link | Monitoring fatigue is essential for improving safety, particularly for people who work long shifts or in high-demand workplaces. The development of wearable technologies, such as fitness trackers and smartwatches, has made it possible to continuously analyze physiological signals in real-time to determine a person level of exhaustion. This has allowed for timely insights into preventing hazards associated with fatigue. This review focuses on wearable technology and artificial intelligence (AI) integration for tiredness detection, adhering to the PRISMA principles. Studies that used signal processing methods to extract pertinent aspects from physiological data, such as ECG, EMG, and EEG, among others, were analyzed as part of the systematic review process. Then, to find patterns of weariness and indicators of impending fatigue, these features were examined using machine learning and deep learning models. It was demonstrated that wearable technology and cutting-edge AI methods could accurately identify weariness through multi-modal data analysis. By merging data from several sources, information fusion techniques enhanced the precision and dependability of fatigue evaluation. Significant developments in AI-driven signal analysis were noted in the assessment, which should improve real-time fatigue monitoring while requiring less interference. Wearable solutions powered by AI and multi-source data fusion present a strong option for real-time tiredness monitoring in the workplace and other crucial environments. These developments open the door for more improvements in this field and offer useful tools for enhancing safety and reducing fatigue-related hazards. |
2024-12-20 | Mamba-based Deep Learning Approaches for Sleep Staging on a Wireless Multimodal Wearable System without Electroencephalography | Andrew H. Zhang, Alex He-Mo, Richard Fei Yin, Chunlin Li, Yuzhi Tang, Dharmendra Gurve, Nasim Montazeri Ghahjaverestan, Maged Goubran, Bo Wang, Andrew S. P. Lim | Link | Study Objectives: We investigate using Mamba-based deep learning approaches for sleep staging on signals from ANNE One (Sibel Health, Evanston, IL), a minimally intrusive dual-sensor wireless wearable system measuring chest electrocardiography (ECG), triaxial accelerometry, and temperature, as well as finger photoplethysmography (PPG) and temperature. Methods: We obtained wearable sensor recordings from 360 adults undergoing concurrent clinical polysomnography (PSG) at a tertiary care sleep lab. PSG recordings were scored according to AASM criteria. PSG and wearable sensor data were automatically aligned using their ECG channels with manual confirmation by visual inspection. We trained Mamba-based models with both convolutional-recurrent neural network (CRNN) and the recurrent neural network (RNN) architectures on these recordings. Ensembling of model variants with similar architectures was performed. Results: Our best approach, after ensembling, attains a 3-class (wake, NREM, REM) balanced accuracy of 83.50%, F1 score of 84.16%, Cohen's |
2024-12-20 | MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems | Elifnur Sunger, Yunus Bicer, Deniz Erdogmus, Tales Imbiriba | Link | Brain-Computer Interfaces (BCIs) help people with severe speech and motor disabilities communicate and interact with their environment using neural activity. This work focuses on the Rapid Serial Visual Presentation (RSVP) paradigm of BCIs using noninvasive electroencephalography (EEG). The RSVP typing task is a recursive task with multiple sequences, where users see only a subset of symbols in each sequence. Extensive research has been conducted to improve classification in the RSVP typing task, achieving fast classification. However, these methods struggle to achieve high accuracy and do not consider the typing mechanism in the learning procedure. They apply binary target and non-target classification without including recursive training. To improve performance in the classification of symbols while controlling the classification speed, we incorporate the typing setup into training by proposing a Partially Observable Markov Decision Process (POMDP) approach. To the best of our knowledge, this is the first work to formulate the RSVP typing task as a POMDP for recursive classification. Experiments show that the proposed approach, MarkovType, results in a more accurate typing system compared to competitors. Additionally, our experiments demonstrate that while there is a trade-off between accuracy and speed, MarkovType achieves the optimal balance between these factors compared to other methods. |
2024-12-20 | SODor: Long-Term EEG Partitioning for Seizure Onset Detection | Zheng Chen, Yasuko Matsubara, Yasushi Sakurai, Jimeng Sun | Link | Deep learning models have recently shown great success in classifying epileptic patients using EEG recordings. Unfortunately, classification-based methods lack a sound mechanism to detect the onset of seizure events. In this work, we propose a two-stage framework, \method, that explicitly models seizure onset through a novel task formulation of subsequence clustering. Given an EEG sequence, the framework first learns a set of second-level embeddings with label supervision. It then employs model-based clustering to explicitly capture long-term temporal dependencies in EEG sequences and identify meaningful subsequences. Epochs within a subsequence share a common cluster assignment (normal or seizure), with cluster or state transitions representing successful onset detections. Extensive experiments on three datasets demonstrate that our method can correct misclassifications, achieving 5%-11% classification improvements over other baselines and accurately detecting seizure onsets. |
2024-12-20 | Predicting Artificial Neural Network Representations to Learn Recognition Model for Music Identification from Brain Recordings | Taketo Akama, Zhuohao Zhang, Pengcheng Li, Kotaro Hongo, Hiroaki Kitano, Shun Minamikawa, Natalia Polouliakh | Link | Recent studies have demonstrated that the representations of artificial neural networks (ANNs) can exhibit notable similarities to cortical representations when subjected to identical auditory sensory inputs. In these studies, the ability to predict cortical representations is probed by regressing from ANN representations to cortical representations. Building upon this concept, our approach reverses the direction of prediction: we utilize ANN representations as a supervisory signal to train recognition models using noisy brain recordings obtained through non-invasive measurements. Specifically, we focus on constructing a recognition model for music identification, where electroencephalography (EEG) brain recordings collected during music listening serve as input. By training an EEG recognition model to predict ANN representations-representations associated with music identification-we observed a substantial improvement in classification accuracy. This study introduces a novel approach to developing recognition models for brain recordings in response to external auditory stimuli. It holds promise for advancing brain-computer interfaces (BCI), neural decoding techniques, and our understanding of music cognition. Furthermore, it provides new insights into the relationship between auditory brain activity and ANN representations. |
2024-12-19 | LG-Sleep: Local and Global Temporal Dependencies for Mice Sleep Scoring | Shadi Sartipi, Mie Andersen, Natalie Hauglund, Celia Kjaerby, Verena Untiet, Maiken Nedergaard, Mujdat Cetin | Link | Efficiently identifying sleep stages is crucial for unraveling the intricacies of sleep in both preclinical and clinical research. The labor-intensive nature of manual sleep scoring, demanding substantial expertise, has prompted a surge of interest in automated alternatives. Sleep studies in mice play a significant role in understanding sleep patterns and disorders and underscore the need for robust scoring methodologies. In response, this study introduces LG-Sleep, a novel subject-independent deep neural network architecture designed for mice sleep scoring through electroencephalogram (EEG) signals. LG-Sleep extracts local and global temporal transitions within EEG signals to categorize sleep data into three stages: wake, rapid eye movement (REM) sleep, and non-rapid eye movement (NREM) sleep. The model leverages local and global temporal information by employing time-distributed convolutional neural networks to discern local temporal transitions in EEG data. Subsequently, features derived from the convolutional filters traverse long short-term memory blocks, capturing global transitions over extended periods. Crucially, the model is optimized in an autoencoder-decoder fashion, facilitating generalization across distinct subjects and adapting to limited training samples. Experimental findings demonstrate superior performance of LG-Sleep compared to conventional deep neural networks. Moreover, the model exhibits good performance across different sleep stages even when tasked with scoring based on limited training samples. |
2024-12-24 | CwA-T: A Channelwise AutoEncoder with Transformer for EEG Abnormality Detection | Youshen Zhao, Keiji Iramina | Link | Electroencephalogram (EEG) signals are critical for detecting abnormal brain activity, but their high dimensionality and complexity pose significant challenges for effective analysis. In this paper, we propose CwA-T, a novel framework that combines a channelwise CNN-based autoencoder with a single-head transformer classifier for efficient EEG abnormality detection. The channelwise autoencoder compresses raw EEG signals while preserving channel independence, reducing computational costs and retaining biologically meaningful features. The compressed representations are then fed into the transformer-based classifier, which efficiently models long-term dependencies to distinguish between normal and abnormal signals. Evaluated on the TUH Abnormal EEG Corpus, the proposed model achieves 85.0% accuracy, 76.2% sensitivity, and 91.2% specificity at the per-case level, outperforming baseline models such as EEGNet, Deep4Conv, and FusionCNN. Furthermore, CwA-T requires only 202M FLOPs and 2.9M parameters, making it significantly more efficient than transformer-based alternatives. The framework retains interpretability through its channelwise design, demonstrating great potential for future applications in neuroscience research and clinical practice. The source code is available at https://github.com/YossiZhao/CAE-T. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-12-24 | Low count of optically pumped magnetometers furnishes a reliable real-time access to sensorimotor rhythm | Nikita Fedosov, Daria Medvedeva, Oleg Shevtsov, Alexei Ossadtchi | Link | This study presents an analysis of sensorimotor rhythms using an advanced, optically-pumped magnetoencephalography (OPM-MEG) system - a novel and rapidly developing technology. We conducted real-movement and motor imagery experiments with nine participants across two distinct magnetically-shielded environments: one featuring an analog active suppression system and the other a digital implementation. Our findings demonstrate that, under optimal recording conditions, OPM sensors provide highly informative signals, suitable for use in practical motor imagery brain-computer interface (BCI) applications. We further examine the feasibility of a portable, low-sensor-count OPM-based BCI under varied experimental setups, highlighting its potential for real-time control of external devices via user intentions. |
2024-12-20 | MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems | Elifnur Sunger, Yunus Bicer, Deniz Erdogmus, Tales Imbiriba | Link | Brain-Computer Interfaces (BCIs) help people with severe speech and motor disabilities communicate and interact with their environment using neural activity. This work focuses on the Rapid Serial Visual Presentation (RSVP) paradigm of BCIs using noninvasive electroencephalography (EEG). The RSVP typing task is a recursive task with multiple sequences, where users see only a subset of symbols in each sequence. Extensive research has been conducted to improve classification in the RSVP typing task, achieving fast classification. However, these methods struggle to achieve high accuracy and do not consider the typing mechanism in the learning procedure. They apply binary target and non-target classification without including recursive training. To improve performance in the classification of symbols while controlling the classification speed, we incorporate the typing setup into training by proposing a Partially Observable Markov Decision Process (POMDP) approach. To the best of our knowledge, this is the first work to formulate the RSVP typing task as a POMDP for recursive classification. Experiments show that the proposed approach, MarkovType, results in a more accurate typing system compared to competitors. Additionally, our experiments demonstrate that while there is a trade-off between accuracy and speed, MarkovType achieves the optimal balance between these factors compared to other methods. |
2024-12-20 | Predicting Artificial Neural Network Representations to Learn Recognition Model for Music Identification from Brain Recordings | Taketo Akama, Zhuohao Zhang, Pengcheng Li, Kotaro Hongo, Hiroaki Kitano, Shun Minamikawa, Natalia Polouliakh | Link | Recent studies have demonstrated that the representations of artificial neural networks (ANNs) can exhibit notable similarities to cortical representations when subjected to identical auditory sensory inputs. In these studies, the ability to predict cortical representations is probed by regressing from ANN representations to cortical representations. Building upon this concept, our approach reverses the direction of prediction: we utilize ANN representations as a supervisory signal to train recognition models using noisy brain recordings obtained through non-invasive measurements. Specifically, we focus on constructing a recognition model for music identification, where electroencephalography (EEG) brain recordings collected during music listening serve as input. By training an EEG recognition model to predict ANN representations-representations associated with music identification-we observed a substantial improvement in classification accuracy. This study introduces a novel approach to developing recognition models for brain recordings in response to external auditory stimuli. It holds promise for advancing brain-computer interfaces (BCI), neural decoding techniques, and our understanding of music cognition. Furthermore, it provides new insights into the relationship between auditory brain activity and ANN representations. |
2024-12-17 | Predicting Workload in Virtual Flight Simulations using EEG Features (Including Post-hoc Analysis in Appendix) | Bas Verkennis, Evy van Weelden, Francesca L. Marogna, Maryam Alimardani, Travis J. Wiltshire, Max M. Louwerse | Link | Effective cognitive workload management has a major impact on the safety and performance of pilots. Integrating brain-computer interfaces (BCIs) presents an opportunity for real-time workload assessment. Leveraging cognitive workload data from immersive, high-fidelity virtual reality (VR) flight simulations enhances ecological validity and allows for dynamic adjustments to training scenarios based on individual cognitive states. While prior studies have predominantly concentrated on EEG spectral power for workload prediction, delving into inter-brain connectivity may yield deeper insights. This study assessed the predictive value of EEG spectral and connectivity features in distinguishing high vs. low workload periods during simulated flight in VR and Desktop conditions. EEG data were collected from 52 non-pilot participants conducting flight tasks in an aircraft simulation, after which they reported cognitive workload using the NASA Task Load Index. Using an ensemble approach, a stacked classifier was trained to predict workload using two feature sets extracted from the EEG data: 1) spectral features (Baseline model), and 2) a combination of spectral and connectivity features (Connectivity model), both within the alpha, beta, and theta band ranges. Results showed that the performance of the Connectivity model surpassed the Baseline model. Additionally, Recursive Feature Elimination (RFE) provided insights into the most influential workload-predicting features, highlighting the potential dominance of parietal-directed connectivity in managing cognitive workload during simulated flight. Further research on other connectivity metrics and alternative models (such as deep learning) in a large sample of pilots is essential to validate the possibility of a real-time BCI for the prediction of workload under safety-critical operational conditions. |
2024-12-16 | Privacy-Preserving Brain-Computer Interfaces: A Systematic Review | K. Xia, W. Duch, Y. Sun, K. Xu, W. Fang, H. Luo, Y. Zhang, D. Sang, X. Xu, F-Y Wang, D. Wu | Link | A brain-computer interface (BCI) establishes a direct communication pathway between the human brain and a computer. It has been widely used in medical diagnosis, rehabilitation, education, entertainment, etc. Most research so far focuses on making BCIs more accurate and reliable, but much less attention has been paid to their privacy. Developing a commercial BCI system usually requires close collaborations among multiple organizations, e.g., hospitals, universities, and/or companies. Input data in BCIs, e.g., electroencephalogram (EEG), contain rich privacy information, and the developed machine learning model is usually proprietary. Data and model transmission among different parties may incur significant privacy threats, and hence privacy protection in BCIs must be considered. Unfortunately, there does not exist any contemporary and comprehensive review on privacy-preserving BCIs. This paper fills this gap, by describing potential privacy threats and protection strategies in BCIs. It also points out several challenges and future research directions in developing privacy-preserving BCIs. |
2024-12-16 | Accurate, Robust and Privacy-Preserving Brain-Computer Interface Decoding | Xiaoqing Chen, Tianwang Jia, Dongrui Wu | Link | An electroencephalogram (EEG) based brain-computer interface (BCI) enables direct communication between the brain and external devices. However, EEG-based BCIs face at least three major challenges in real-world applications: data scarcity and individual differences, adversarial vulnerability, and data privacy. While previous studies have addressed one or two of these issues, simultaneous accommodation of all three challenges remains challenging and unexplored. This paper fills this gap, by proposing an Augmented Robustness Ensemble (ARE) algorithm and integrating it into three privacy protection scenarios (centralized source-free transfer, federated source-free transfer, and source data perturbation), achieving simultaneously accurate decoding, adversarial robustness, and privacy protection of EEG-based BCIs. Experiments on three public EEG datasets demonstrated that our proposed approach outperformed over 10 classic and state-of-the-art approaches in both accuracy and robustness in all three privacy-preserving scenarios, even outperforming state-of-the-art transfer learning approaches that do not consider privacy protection at all. This is the first time that three major challenges in EEG-based BCIs can be addressed simultaneously, significantly improving the practicalness of EEG decoding in real-world BCIs. |
2024-12-15 | Imagined Speech State Classification for Robust Brain-Computer Interface | Byung-Kwan Ko, Jun-Young Kim, Seo-Hyun Lee | Link | This study examines the effectiveness of traditional machine learning classifiers versus deep learning models for detecting the imagined speech using electroencephalogram data. Specifically, we evaluated conventional machine learning techniques such as CSP-SVM and LDA-SVM classifiers alongside deep learning architectures such as EEGNet, ShallowConvNet, and DeepConvNet. Machine learning classifiers exhibited significantly lower precision and recall, indicating limited feature extraction capabilities and poor generalization between imagined speech and idle states. In contrast, deep learning models, particularly EEGNet, achieved the highest accuracy of 0.7080 and an F1 score of 0.6718, demonstrating their enhanced ability in automatic feature extraction and representation learning, essential for capturing complex neurophysiological patterns. These findings highlight the limitations of conventional machine learning approaches in brain-computer interface (BCI) applications and advocate for adopting deep learning methodologies to achieve more precise and reliable classification of detecting imagined speech. This foundational research contributes to the development of imagined speech-based BCI systems. |
2024-12-14 | Transfer Learning with Active Sampling for Rapid Training and Calibration in BCI-P300 Across Health States and Multi-centre Data | Christian Flores, Marcelo Contreras, Ichiro Macedo, Javier Andreu-Perez | Link | Machine learning and deep learning advancements have boosted Brain-Computer Interface (BCI) performance, but their wide-scale applicability is limited due to factors like individual health, hardware variations, and cultural differences affecting neural data. Studies often focus on uniform single-site experiments in uniform settings, leading to high performance that may not translate well to real-world diversity. Deep learning models aim to enhance BCI classification accuracy, and transfer learning has been suggested to adapt models to individual neural patterns using a base model trained on others' data. This approach promises better generalizability and reduced overfitting, yet challenges remain in handling diverse and imbalanced datasets from different equipment, subjects, multiple centres in different countries, and both healthy and patient populations for effective model transfer and tuning. In a setting characterized by maximal heterogeneity, we proposed P300 wave detection in BCIs employing a convolutional neural network fitted with adaptive transfer learning based on Poison Sampling Disk (PDS) called Active Sampling (AS), which flexibly adjusts the transition from source data to the target domain. Our results reported for subject adaptive with 40% of adaptive fine-tuning that the averaged classification accuracy improved by 5.36% and standard deviation reduced by 12.22% using two distinct, internationally replicated datasets. These results outperformed in classification accuracy, computational time, and training efficiency, mainly due to the proposed Active Sampling (AS) method for transfer learning. |
2024-12-13 | Active Poisoning: Efficient Backdoor Attacks on Transfer Learning-Based Brain-Computer Interfaces | X. Jiang, L. Meng, S. Li, D. Wu | Link | Transfer learning (TL) has been widely used in electroencephalogram (EEG)-based brain-computer interfaces (BCIs) for reducing calibration efforts. However, backdoor attacks could be introduced through TL. In such attacks, an attacker embeds a backdoor with a specific pattern into the machine learning model. As a result, the model will misclassify a test sample with the backdoor trigger into a prespecified class while still maintaining good performance on benign samples. Accordingly, this study explores backdoor attacks in the TL of EEG-based BCIs, where source-domain data are poisoned by a backdoor trigger and then used in TL. We propose several active poisoning approaches to select source-domain samples, which are most effective in embedding the backdoor pattern, to improve the attack success rate and efficiency. Experiments on four EEG datasets and three deep learning models demonstrate the effectiveness of the approaches. To our knowledge, this is the first study about backdoor attacks on TL models in EEG-based BCIs. It exposes a serious security risk in BCIs, which should be immediately addressed. |
2024-12-13 | User Identity Protection in EEG-based Brain-Computer Interfaces | L. Meng, X. Jiang, J. Huang, W. Li, H. Luo, D. Wu | Link | A brain-computer interface (BCI) establishes a direct communication pathway between the brain and an external device. Electroencephalogram (EEG) is the most popular input signal in BCIs, due to its convenience and low cost. Most research on EEG-based BCIs focuses on the accurate decoding of EEG signals; however, EEG signals also contain rich private information, e.g., user identity, emotion, and so on, which should be protected. This paper first exposes a serious privacy problem in EEG-based BCIs, i.e., the user identity in EEG data can be easily learned so that different sessions of EEG data from the same user can be associated together to more reliably mine private information. To address this issue, we further propose two approaches to convert the original EEG data into identity-unlearnable EEG data, i.e., removing the user identity information while maintaining the good performance on the primary BCI task. Experiments on seven EEG datasets from five different BCI paradigms showed that on average the generated identity-unlearnable EEG data can reduce the user identification accuracy from 70.01\% to at most 21.36\%, greatly facilitating user privacy protection in EEG-based BCIs. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-12-23 | BrainMAP: Learning Multiple Activation Pathways in Brain Networks | Song Wang, Zhenyu Lei, Zhen Tan, Jiaqi Ding, Xinyu Zhao, Yushun Dong, Guorong Wu, Tianlong Chen, Chen Chen, Aiying Zhang, Jundong Li | Link | Functional Magnetic Resonance Image (fMRI) is commonly employed to study human brain activity, since it offers insight into the relationship between functional fluctuations and human behavior. To enhance analysis and comprehension of brain activity, Graph Neural Networks (GNNs) have been widely applied to the analysis of functional connectivities (FC) derived from fMRI data, due to their ability to capture the synergistic interactions among brain regions. However, in the human brain, performing complex tasks typically involves the activation of certain pathways, which could be represented as paths across graphs. As such, conventional GNNs struggle to learn from these pathways due to the long-range dependencies of multiple pathways. To address these challenges, we introduce a novel framework BrainMAP to learn Multiple Activation Pathways in Brain networks. BrainMAP leverages sequential models to identify long-range correlations among sequentialized brain regions and incorporates an aggregation module based on Mixture of Experts (MoE) to learn from multiple pathways. Our comprehensive experiments highlight BrainMAP's superior performance. Furthermore, our framework enables explanatory analyses of crucial brain regions involved in tasks. Our code is provided at https://github.com/LzyFischer/Graph-Mamba. |
2024-12-19 | Accessing the topological properties of human brain functional sub-circuits in Echo State Networks | Bach Nguyen, Tianlong Chen, Shu Yang, Bojian Hou, Li Shen, Duy Duong-Tran | Link | Recent years have witnessed an emerging trend in neuromorphic computing that centers around the use of brain connectomics as a blueprint for artificial neural networks. Connectomics-based neuromorphic computing has primarily focused on embedding human brain large-scale structural connectomes (SCs), as estimated from diffusion Magnetic Resonance Imaging (dMRI) modality, to echo-state networks (ESNs). A critical step in ESN embedding requires pre-determined read-in and read-out layers constructed by the induced subgraphs of the embedded reservoir. As \textit{a priori} set of functional sub-circuits are derived from functional MRI (fMRI) modality, it is unknown, till this point, whether the embedding of fMRI-induced sub-circuits/networks onto SCs is well justified from the neuro-physiological perspective and ESN performance across a variety of tasks. This paper proposes a pipeline to implement and evaluate ESNs with various embedded topologies and processing/memorization tasks. To this end, we showed that different performance optimums highly depend on the neuro-physiological characteristics of these pre-determined fMRI-induced sub-circuits. In general, fMRI-induced sub-circuit-embedded ESN outperforms simple bipartite and various null models with feed-forward properties commonly seen in MLP for different tasks and reservoir criticality conditions. We provided a thorough analysis of the topological properties of pre-determined fMRI-induced sub-circuits and highlighted their graph-theoretical properties that play significant roles in determining ESN performance. |
2024-12-18 | ICA-based Resting-State Networks Obtained on Large Autism fMRI Dataset ABIDE | Sjir J. C. Schielen, Jesper Pilmeyer, Albert P. Aldenkamp, Danny Ruijters, Svitlana Zinger | Link | Functional magnetic resonance imaging (fMRI) has become instrumental in researching brain function. One application of fMRI is investigating potential neural features that distinguish people with autism spectrum disorder (ASD) from healthy controls. The Autism Brain Imaging Data Exchange (ABIDE) facilitates this research through its extensive data-sharing initiative. While ABIDE offers data preprocessed with various atlases, independent component analysis (ICA) for dimensionality reduction remains underutilized. We address this gap by presenting ICA-based resting-state networks (RSNs) from preprocessed scans from ABIDE, now publicly available: https://github.com/SjirSchielen/groupICAonABIDE. These RSNs unveil neural activation clusters without atlas constraints, offering a perspective on ASD analyses that complements the predominantly atlas-based literature. This contribution provides a valuable resource for further research into ASD, potentially aiding in developing new analytical approaches. |
2024-12-17 | Optimized two-stage AI-based Neural Decoding for Enhanced Visual Stimulus Reconstruction from fMRI Data | Lorenzo Veronese, Andrea Moglia, Luca Mainardi, Pietro Cerveri | Link | AI-based neural decoding reconstructs visual perception by leveraging generative models to map brain activity, measured through functional MRI (fMRI), into latent hierarchical representations. Traditionally, ridge linear models transform fMRI into a latent space, which is then decoded using latent diffusion models (LDM) via a pre-trained variational autoencoder (VAE). Due to the complexity and noisiness of fMRI data, newer approaches split the reconstruction into two sequential steps, the first one providing a rough visual approximation, the second on improving the stimulus prediction via LDM endowed by CLIP embeddings. This work proposes a non-linear deep network to improve fMRI latent space representation, optimizing the dimensionality alike. Experiments on the Natural Scenes Dataset showed that the proposed architecture improved the structural similarity of the reconstructed image by about 2\% with respect to the state-of-the-art model, based on ridge linear transform. The reconstructed image's semantics improved by about 4\%, measured by perceptual similarity, with respect to the state-of-the-art. The noise sensitivity analysis of the LDM showed that the role of the first stage was fundamental to predict the stimulus featuring high structural similarity. Conversely, providing a large noise stimulus affected less the semantics of the predicted stimulus, while the structural similarity between the ground truth and predicted stimulus was very poor. The findings underscore the importance of leveraging non-linear relationships between BOLD signal and the latent representation and two-stage generative AI for optimizing the fidelity of reconstructed visual stimuli from noisy fMRI data. |
2024-12-16 | Generalizable Representation Learning for fMRI-based Neurological Disorder Identification | Wenhui Cui, Haleh Akrami, Anand A. Joshi, Richard M. Leahy | Link | Despite the impressive advances achieved using deep learning for functional brain activity analysis, the heterogeneity of functional patterns and the scarcity of imaging data still pose challenges in tasks such as identifying neurological disorders. For functional Magnetic Resonance Imaging (fMRI), while data may be abundantly available from healthy controls, clinical data is often scarce, especially for rare diseases, limiting the ability of models to identify clinically-relevant features. We overcome this limitation by introducing a novel representation learning strategy integrating meta-learning with self-supervised learning to improve the generalization from normal to clinical features. This approach enables generalization to challenging clinical tasks featuring scarce training data. We achieve this by leveraging self-supervised learning on the control dataset to focus on inherent features that are not limited to a particular supervised task and incorporating meta-learning to improve the generalization across domains. To explore the generalizability of the learned representations to unseen clinical applications, we apply the model to four distinct clinical datasets featuring scarce and heterogeneous data for neurological disorder classification. Results demonstrate the superiority of our representation learning strategy on diverse clinically-relevant tasks. |
2024-12-13 | Data Integration with Fusion Searchlight: Classifying Brain States from Resting-state fMRI | Simon Wein, Marco Riebel, Lisa-Marie Brunner, Caroline Nothdurfter, Rainer Rupprecht, Jens V. Schwarzbach | Link | Spontaneous neural activity observed in resting-state fMRI is characterized by complex spatio-temporal dynamics. Different measures related to local and global brain connectivity and fluctuations in low-frequency amplitudes can quantify individual aspects of these neural dynamics. Even though such measures are derived from the same functional signals, they are often evaluated separately, neglecting their interrelations and potentially reducing the analysis sensitivity. In our study, we present a fusion searchlight (FuSL) framework to combine the complementary information contained in different resting-state fMRI metrics and demonstrate how this can improve the decoding of brain states. Moreover, we show how explainable AI allows us to reconstruct the differential impact of each metric on the decoding, which additionally increases spatial specificity of searchlight analysis. In general, this framework can be adapted to combine information derived from different imaging modalities or experimental conditions, offering a versatile and interpretable tool for data fusion in neuroimaging. |
2024-12-12 | Network Dynamics of Emotional Processing: A Structural Balance Theory Approach | Sepehr Gourabi, Parinaz Khosravani, Shahrzad Nosrat, Roya Mohammadi, Masoud Lotfalipour | Link | Understanding emotional processing in the human brain requires examining the complex interactions between different brain regions. While previous studies have identified specific regions involved in emotion processing, a holistic network approach may provide deeper insights. We use Structural Balance Theory to investigate the stability and triadic structures of signed brain networks during resting state and emotional processing, specifically in response to fear-related stimuli. We hypothesized that imbalanced triadic interactions would be more prevalent during emotional processing, especially in response to fear-related stimuli, potentially reflecting the brain's adaptation to emotional challenges. By analyzing fMRI data from 138 healthy, right-handed participants, we found that emotional processing was marked by an increase in positive connections and a decrease in negative connections compared to the resting state. Our findings clearly show that balanced triads significantly decreased while imbalanced triads increased, indicating a shift toward instability in the brain's functional network during emotional processing. Additionally, the number of influential hubs was significantly lower during fear processing than in neutral conditions, suggesting a more centralized network and higher levels of network energy. These findings reveal the brain's remarkable adaptive capacity during emotional processing, demonstrating how network stability dynamically shifts through changes in balanced and imbalanced triads, hub tendencies, and energy dynamics. Our research illuminates a complex mechanism by which the brain flexibly reconfigures its functional network in response to emotional stimuli with potential implications for understanding emotional resilience and neurological disorders. |
2024-12-11 | MHSA: A Multi-scale Hypergraph Network for Mild Cognitive Impairment Detection via Synchronous and Attentive Fusion | Manman Yuan, Weiming Jia, Xiong Luo, Jiazhen Ye, Peican Zhu, Junlin Li | Link | The precise detection of mild cognitive impairment (MCI) is of significant importance in preventing the deterioration of patients in a timely manner. Although hypergraphs have enhanced performance by learning and analyzing brain networks, they often only depend on vector distances between features at a single scale to infer interactions. In this paper, we deal with a more arduous challenge, hypergraph modelling with synchronization between brain regions, and design a novel framework, i.e., A Multi-scale Hypergraph Network for MCI Detection via Synchronous and Attentive Fusion (MHSA), to tackle this challenge. Specifically, our approach employs the Phase-Locking Value (PLV) to calculate the phase synchronization relationship in the spectrum domain of regions of interest (ROIs) and designs a multi-scale feature fusion mechanism to integrate dynamic connectivity features of functional magnetic resonance imaging (fMRI) from both the temporal and spectrum domains. To evaluate and optimize the direct contribution of each ROI to phase synchronization in the temporal domain, we structure the PLV coefficients dynamically adjust strategy, and the dynamic hypergraph is modelled based on a comprehensive temporal-spectrum fusion matrix. Experiments on the real-world dataset indicate the effectiveness of our strategy. The code is available at https://github.com/Jia-Weiming/MHSA. |
2024-12-07 | Biological Brain Age Estimation using Sex-Aware Adversarial Variational Autoencoder with Multimodal Neuroimages | Abd Ur Rehman, Azka Rehman, Muhammad Usman, Abdullah Shahid, Sung-Min Gho, Aleum Lee, Tariq M. Khan, Imran Razzak | Link | Brain aging involves structural and functional changes and therefore serves as a key biomarker for brain health. Combining structural magnetic resonance imaging (sMRI) and functional magnetic resonance imaging (fMRI) has the potential to improve brain age estimation by leveraging complementary data. However, fMRI data, being noisier than sMRI, complicates multimodal fusion. Traditional fusion methods often introduce more noise than useful information, which can reduce accuracy compared to using sMRI alone. In this paper, we propose a novel multimodal framework for biological brain age estimation, utilizing a sex-aware adversarial variational autoencoder (SA-AVAE). Our framework integrates adversarial and variational learning to effectively disentangle the latent features from both modalities. Specifically, we decompose the latent space into modality-specific codes and shared codes to represent complementary and common information across modalities, respectively. To enhance the disentanglement, we introduce cross-reconstruction and shared-distinct distance ratio loss as regularization terms. Importantly, we incorporate sex information into the learned latent code, enabling the model to capture sex-specific aging patterns for brain age estimation via an integrated regressor module. We evaluate our model using the publicly available OpenBHB dataset, a comprehensive multi-site dataset for brain age estimation. The results from ablation studies and comparisons with state-of-the-art methods demonstrate that our framework outperforms existing approaches and shows significant robustness across various age groups, highlighting its potential for real-time clinical applications in the early detection of neurodegenerative diseases. |
2024-12-06 | Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase | Zak Hussain, Rui Mata, Ben R. Newell, Dirk U. Wulff | Link | Semantic representations are integral to natural language processing, psycholinguistics, and artificial intelligence. Although often derived from internet text, recent years have seen a rise in the popularity of behavior-based (e.g., free associations) and brain-based (e.g., fMRI) representations, which promise improvements in our ability to measure and model human representations. We carry out the first systematic evaluation of the similarities and differences between semantic representations derived from text, behavior, and brain data. Using representational similarity analysis, we show that word vectors derived from behavior and brain data encode information that differs from their text-derived cousins. Furthermore, drawing on our psychNorms metabase, alongside an interpretability method that we call representational content analysis, we find that, in particular, behavior representations capture unique variance on certain affective, agentic, and socio-moral dimensions. We thus establish behavior as an important complement to text for capturing human representations and behavior. These results are broadly relevant to research aimed at learning human-aligned semantic representations, including work on evaluating and aligning large language models. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-12-24 | Low count of optically pumped magnetometers furnishes a reliable real-time access to sensorimotor rhythm | Nikita Fedosov, Daria Medvedeva, Oleg Shevtsov, Alexei Ossadtchi | Link | This study presents an analysis of sensorimotor rhythms using an advanced, optically-pumped magnetoencephalography (OPM-MEG) system - a novel and rapidly developing technology. We conducted real-movement and motor imagery experiments with nine participants across two distinct magnetically-shielded environments: one featuring an analog active suppression system and the other a digital implementation. Our findings demonstrate that, under optimal recording conditions, OPM sensors provide highly informative signals, suitable for use in practical motor imagery brain-computer interface (BCI) applications. We further examine the feasibility of a portable, low-sensor-count OPM-based BCI under varied experimental setups, highlighting its potential for real-time control of external devices via user intentions. |
2024-12-12 | LV-CadeNet: Long View Feature Convolution-Attention Fusion Encoder-Decoder Network for Clinical MEG Spike Detection | Kuntao Xiao, Xiongfei Wang, Pengfei Teng, Yi Sun, Wanli Yang, Liang Zhang, Hanyang Dong, Guoming Luan, Shurong Sheng | Link | It is widely acknowledged that the epileptic foci can be pinpointed by source localizing interictal epileptic discharges (IEDs) via Magnetoencephalography (MEG). However, manual detection of IEDs, which appear as spikes in MEG data, is extremely labor intensive and requires considerable professional expertise, limiting the broader adoption of MEG technology. Numerous studies have focused on automatic detection of MEG spikes to overcome this challenge, but these efforts often validate their models on synthetic datasets with balanced positive and negative samples. In contrast, clinical MEG data is highly imbalanced, raising doubts on the real-world efficacy of these models. To address this issue, we introduce LV-CadeNet, a Long View feature Convolution-Attention fusion Encoder-Decoder Network, designed for automatic MEG spike detection in real-world clinical scenarios. Beyond addressing the disparity between training data distribution and clinical test data through semi-supervised learning, our approach also mimics human specialists by constructing long view morphological input data. Moreover, we propose an advanced convolution-attention module to extract temporal and spatial features from the input data. LV-CadeNet significantly improves the accuracy of MEG spike detection, boosting it from 42.31\% to 54.88\% on a novel clinical dataset sourced from Sanbo Brain Hospital Capital Medical University. This dataset, characterized by a highly imbalanced distribution of positive and negative samples, accurately represents real-world clinical scenarios. |
2024-12-11 | Decoding individual words from non-invasive brain recordings across 723 participants | Stéphane d'Ascoli, Corentin Bel, Jérémy Rapin, Hubert Banville, Yohann Benchetrit, Christophe Pallier, Jean-Rémi King | Link | Deep learning has recently enabled the decoding of language from the neural activity of a few participants with electrodes implanted inside their brain. However, reliably decoding words from non-invasive recordings remains an open challenge. To tackle this issue, we introduce a novel deep learning pipeline to decode individual words from non-invasive electro- (EEG) and magneto-encephalography (MEG) signals. We train and evaluate our approach on an unprecedentedly large number of participants (723) exposed to five million words either written or spoken in English, French or Dutch. Our model outperforms existing methods consistently across participants, devices, languages, and tasks, and can decode words absent from the training set. Our analyses highlight the importance of the recording device and experimental protocol: MEG and reading are easier to decode than EEG and listening, respectively, and it is preferable to collect a large amount of data per participant than to repeat stimuli across a large number of participants. Furthermore, decoding performance consistently increases with the amount of (i) data used for training and (ii) data used for averaging during testing. Finally, single-word predictions show that our model effectively relies on word semantics but also captures syntactic and surface properties such as part-of-speech, word length and even individual letters, especially in the reading condition. Overall, our findings delineate the path and remaining challenges towards building non-invasive brain decoders for natural language. |
2024-12-06 | Measuring Goal-Directedness | Matt MacDermott, James Fox, Francesco Belardinelli, Tom Everitt | Link | We define maximum entropy goal-directedness (MEG), a formal measure of goal-directedness in causal models and Markov decision processes, and give algorithms for computing it. Measuring goal-directedness is important, as it is a critical element of many concerns about harm from AI. It is also of philosophical interest, as goal-directedness is a key aspect of agency. MEG is based on an adaptation of the maximum causal entropy framework used in inverse reinforcement learning. It can measure goal-directedness with respect to a known utility function, a hypothesis class of utility functions, or a set of random variables. We prove that MEG satisfies several desiderata and demonstrate our algorithms with small-scale experiments. |
2024-11-29 | Neuroplasticity and Psychedelics: a comprehensive examination of classic and non-classic compounds in pre and clinical models | Claudio Agnorelli, Meg Spriggs, Kate Godfrey, Gabriela Sawicka, Bettina Bohl, Hannah Douglass, Andrea Fagiolini, Hashemi Parastoo, Robin Carhart-Harris, David Nutt, David Erritzoe | Link | Neuroplasticity, the ability of the nervous system to adapt throughout an organism's lifespan, offers potential as both a biomarker and treatment target for neuropsychiatric conditions. Psychedelics, a burgeoning category of drugs, are increasingly prominent in psychiatric research, prompting inquiries into their mechanisms of action. Distinguishing themselves from traditional medications, psychedelics demonstrate rapid and enduring therapeutic effects after a single or few administrations, believed to stem from their neuroplasticity-enhancing properties. This review examines how classic psychedelics (e.g., LSD, psilocybin, N,N-DMT) and non-classic psychedelics (e.g., ketamine, MDMA) influence neuroplasticity. Drawing from preclinical and clinical studies, we explore the molecular, structural, and functional changes triggered by these agents. Animal studies suggest psychedelics induce heightened sensitivity of the nervous system to environmental stimuli (meta-plasticity), re-opening developmental windows for long-term structural changes (hyper-plasticity), with implications for mood and behavior. Translating these findings to humans faces challenges due to limitations in current imaging techniques. Nonetheless, promising new directions for human research are emerging, including the employment of novel positron-emission tomography (PET) radioligands, non-invasive brain stimulation methods, and multimodal approaches. By elucidating the interplay between psychedelics and neuroplasticity, this review informs the development of targeted interventions for neuropsychiatric disorders and advances understanding of psychedelics' therapeutic potential. |
2024-11-29 | On Monitoring Edge-Geodetic Sets of Dynamic Graph | Zin Mar Myint, Ashish Saxena | Link | The concept of a monitoring edge-geodetic set (MEG-set) in a graph |
2024-11-14 | Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion | Matteo Ferrante, Tommaso Boccato, Grigorii Rashkov, Nicola Toschi | Link | This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities. The results highlight the model's ability to accurately capture semantic information across different brain imaging techniques, illustrating its potential in decoding, encoding, and modality conversion tasks. |
2024-11-12 | Search for the X17 particle in |
The MEG II collaboration, K. Afanaciev, A. M. Baldini, S. Ban, H. Benmansour, G. Boca, P. W. Cattaneo, G. Cavoto, F. Cei, M. Chiappini, A. Corvaglia, G. Dal Maso, A. De Bari, M. De Gerone, L. Ferrari Barusso, M. Francesconi, L. Galli, G. Gallucci, F. Gatti, L. Gerritzen, F. Grancagnolo, E. G. Grandoni, M. Grassi, D. N. Grigoriev, M. Hildebrandt, F. Ignatov, F. Ikeda, T. Iwamoto, S. Karpov, P. -R. Kettle, N. Khomutov, A. Kolesnikov, N. Kravchuk, V. Krylov, N. Kuchinskiy, F. Leonetti, W. Li, V. Malyshev, A. Matsushita, M. Meucci, S. Mihara, W. Molzon, T. Mori, D. Nicolò, H. Nishiguchi, A. Ochi, W. Ootani, A. Oya, D. Palo, M. Panareo, A. Papa, V. Pettinacci, A. Popov, F. Renga, S. Ritt, M. Rossella, A. Rozhdestvensky. S. Scarpellini, P. Schwendimann, G. Signorelli, M. Takahashi, Y. Uchiyama, A. Venturini, B. Vitali, C. Voena, K. Yamamoto, R. Yokota, T. Yonemoto | Link | The observation of a resonance structure in the opening angle of the electron-positron pairs in the $^{7}$Li(p,\ee) $^{8}$Be reaction was claimed and interpreted as the production and subsequent decay of a hypothetical particle (X17). Similar excesses, consistent with this particle, were later observed in processes involving $^{4}$He and $^{12}$C nuclei with the same experimental technique. The MEG II apparatus at PSI, designed to search for the |
2024-11-07 | MEG: Medical Knowledge-Augmented Large Language Models for Question Answering | Laura Cabello, Carmen Martin-Turrero, Uchenna Akujuobi, Anders Søgaard, Carlos Bobed | Link | Question answering is a natural language understanding task that involves reasoning over both explicit context and unstated, relevant domain knowledge. Large language models (LLMs), which underpin most contemporary question answering systems, struggle to induce how concepts relate in specialized domains such as medicine. Existing medical LLMs are also costly to train. In this work, we present MEG, a parameter-efficient approach for medical knowledge-augmented LLMs. MEG uses a lightweight mapping network to integrate graph embeddings into the LLM, enabling it to leverage external knowledge in a cost-effective way. We evaluate our method on four popular medical multiple-choice datasets and show that LLMs greatly benefit from the factual grounding provided by knowledge graph embeddings. MEG attains an average of +10.2% accuracy over the Mistral-Instruct baseline, and +6.7% over specialized models like BioMistral. We also show results based on Llama-3. Finally, we show that MEG's performance remains robust to the choice of graph encoder. |
2024-10-30 | STIED: A deep learning model for the SpatioTemporal detection of focal Interictal Epileptiform Discharges with MEG | Raquel Fernández-Martín, Alfonso Gijón, Odile Feys, Elodie Juvené, Alec Aeby, Charline Urbain, Xavier De Tiège, Vincent Wens | Link | Magnetoencephalography (MEG) allows the non-invasive detection of interictal epileptiform discharges (IEDs). Clinical MEG analysis in epileptic patients traditionally relies on the visual identification of IEDs, which is time consuming and partially subjective. Automatic, data-driven detection methods exist but show limited performance. Still, the rise of deep learning (DL)-with its ability to reproduce human-like abilities-could revolutionize clinical MEG practice. Here, we developed and validated STIED, a simple yet powerful supervised DL algorithm combining two convolutional neural networks with temporal (1D time-course) and spatial (2D topography) features of MEG signals inspired from current clinical guidelines. Our DL model enabled both temporal and spatial localization of IEDs in patients suffering from focal epilepsy with frequent and high amplitude spikes (FE group), with high-performance metrics-accuracy, specificity, and sensitivity all exceeding 85%-when learning from spatiotemporal features of IEDs. This performance can be attributed to our handling of input data, which mimics established clinical MEG practice. Reverse engineering further revealed that STIED encodes fine spatiotemporal features of IEDs rather than their mere amplitude. The model trained on the FE group also showed promising results when applied to a separate group of presurgical patients with different types of refractory focal epilepsy, though further work is needed to distinguish IEDs from physiological transients. This study paves the way of incorporating STIED and DL algorithms into the routine clinical MEG evaluation of epilepsy. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-11-27 | NeuroAI for AI Safety | Patrick Mineault, Niccolò Zanichelli, Joanne Zichen Peng, Anton Arkhipov, Eli Bingham, Julian Jara-Ettinger, Emily Mackevicius, Adam Marblestone, Marcelo Mattar, Andrew Payne, Sophia Sanborn, Karen Schroeder, Zenna Tavares, Andreas Tolias | Link | As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety. |
2024-11-21 | Evaluating Representational Similarity Measures from the Lens of Functional Correspondence | Yiqing Bo, Ansh Soni, Sudhanshu Srivastava, Meenakshi Khosla | Link | Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data, where the comparative analysis of such data is crucial for revealing shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the abundance classes of comparison methods, a critical question remains: which metrics are most suitable for these comparisons? While some studies evaluate metrics based on their ability to differentiate models of different origins or constructions (e.g., various architectures), another approach is to assess how well they distinguish models that exhibit distinct behaviors. To investigate this, we examine the degree of alignment between various representational similarity measures and behavioral outcomes, employing group statistics and a comprehensive suite of behavioral metrics for comparison. In our evaluation of eight commonly used representational similarity metrics in the visual domain -- spanning alignment-based, Canonical Correlation Analysis (CCA)-based, inner product kernel-based, and nearest-neighbor methods -- we found that metrics like linear Centered Kernel Alignment (CKA) and Procrustes distance, which emphasize the overall geometric structure or shape of representations, excelled in differentiating trained from untrained models and aligning with behavioral measures, whereas metrics such as linear predictivity, commonly used in neuroscience, demonstrated only moderate alignment with behavior. These insights are crucial for selecting metrics that emphasize behaviorally meaningful comparisons in NeuroAI research. |
2024-10-25 | A prescriptive theory for brain-like inference | Hadi Vafaii, Dekel Galor, Jacob L. Yates | Link | The Evidence Lower Bound (ELBO) is a widely used objective for training deep generative models, such as Variational Autoencoders (VAEs). In the neuroscience literature, an identical objective is known as the variational free energy, hinting at a potential unified framework for brain function and machine learning. Despite its utility in interpreting generative models, including diffusion models, ELBO maximization is often seen as too broad to offer prescriptive guidance for specific architectures in neuroscience or machine learning. In this work, we show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network that performs Bayesian posterior inference through its membrane potential dynamics. The resulting model, the iterative Poisson VAE (iP-VAE), has a closer connection to biological neurons than previous brain-inspired predictive coding models based on Gaussian assumptions. Compared to amortized and iterative VAEs, iP-VAElearns sparser representations and exhibits superior generalization to out-of-distribution samples. These findings suggest that optimizing ELBO, combined with Poisson assumptions, provides a solid foundation for developing prescriptive theories in NeuroAI. |
2024-09-09 | Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | Emily Cheng, Richard J. Antonello | Link | Research has repeatedly demonstrated that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a language model and that the first "composition" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs. We give initial evidence that this correspondence primarily derives from the inherent compositionality of LLMs and not their next-word prediction properties. |
2024-07-22 | Predictive Coding Networks and Inference Learning: Tutorial and Survey | Björn van Zwol, Ro Jefferson, Egon L. van den Broek | Link | Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of NeuroAI. A prime example of this is predictive coding networks (PCNs), based on the neuroscientific framework of predictive coding. This framework views the brain as a hierarchical Bayesian inference model that minimizes prediction errors through feedback connections. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm that explains patterns of neural activity that BP cannot. Historically, IL has been more computationally intensive, but recent advancements have demonstrated that it can achieve higher efficiency than BP with sufficient parallelization. Furthermore, PCNs can be mathematically considered a superset of traditional feedforward neural networks (FNNs), significantly extending the range of trainable architectures. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling that goes beyond traditional artificial neural networks. This work provides a comprehensive review and detailed formal specification of PCNs, particularly situating them within the context of modern ML methods. Additionally, we introduce a Python library (PRECO) for practical implementation. This positions PC as a promising framework for future ML innovations. |
2023-10-29 | Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis | Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete | Link | How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits. |
2023-05-25 | Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture | Galen Pogoncheff, Jacob Granley, Michael Beyeler | Link | Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence. |
2024-11-09 | A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification | Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphael C. -W. Phan, Adeel Razi, David L. Dowe | Link | Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes. |
2023-03-11 | Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks | Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang | Link | Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI. |
2022-12-08 | A Rubric for Human-like Agents and NeuroAI | Ida Momennejad | Link | Researchers across cognitive, neuro-, and computer sciences increasingly reference human-like artificial intelligence and neuroAI. However, the scope and use of the terms are often inconsistent. Contributed research ranges widely from mimicking behaviour, to testing machine learning methods as neurally plausible hypotheses at the cellular or functional levels, or solving engineering problems. However, it cannot be assumed nor expected that progress on one of these three goals will automatically translate to progress in others. Here a simple rubric is proposed to clarify the scope of individual contributions, grounded in their commitments to human-like behaviour, neural plausibility, or benchmark/engineering goals. This is clarified using examples of weak and strong neuroAI and human-like agents, and discussing the generative, corroborate, and corrective ways in which the three dimensions interact with one another. The author maintains that future progress in artificial intelligence will need strong interactions across the disciplines, with iterative feedback loops and meticulous validity tests, leading to both known and yet-unknown advances that may span decades to come. |
Publish Date | Title | Authors | URL | Abstract |
---|---|---|---|---|
2024-12-24 | ClassifyViStA:WCE Classification with Visual understanding through Segmentation and Attention | S. Balasubramanian, Ammu Abhishek, Yedu Krishna, Darshan Gera | Link | Gastrointestinal (GI) bleeding is a serious medical condition that presents significant diagnostic challenges, particularly in settings with limited access to healthcare resources. Wireless Capsule Endoscopy (WCE) has emerged as a powerful diagnostic tool for visualizing the GI tract, but it requires time-consuming manual analysis by experienced gastroenterologists, which is prone to human error and inefficient given the increasing number of patients.To address this challenge, we propose ClassifyViStA, an AI-based framework designed for the automated detection and classification of bleeding and non-bleeding frames from WCE videos. The model consists of a standard classification path, augmented by two specialized branches: an implicit attention branch and a segmentation branch.The attention branch focuses on the bleeding regions, while the segmentation branch generates accurate segmentation masks, which are used for classification and interpretability. The model is built upon an ensemble of ResNet18 and VGG16 architectures to enhance classification performance. For the bleeding region detection, we implement a Soft Non-Maximum Suppression (Soft NMS) approach with YOLOv8, which improves the handling of overlapping bounding boxes, resulting in more accurate and nuanced detections.The system's interpretability is enhanced by using the segmentation masks to explain the classification results, offering insights into the decision-making process similar to the way a gastroenterologist identifies bleeding regions. Our approach not only automates the detection of GI bleeding but also provides an interpretable solution that can ease the burden on healthcare professionals and improve diagnostic efficiency. Our code is available at ClassifyViStA. |
2024-12-24 | FedVCK: Non-IID Robust and Communication-Efficient Federated Learning via Valuable Condensed Knowledge for Medical Image Analysis | Guochen Yan, Luyuan Xie, Xinyi Gao, Wentao Zhang, Qingni Shen, Yuejian Fang, Zhonghai Wu | Link | Federated learning has become a promising solution for collaboration among medical institutions. However, data owned by each institution would be highly heterogeneous and the distribution is always non-independent and identical distribution (non-IID), resulting in client drift and unsatisfactory performance. Despite existing federated learning methods attempting to solve the non-IID problems, they still show marginal advantages but rely on frequent communication which would incur high costs and privacy concerns. In this paper, we propose a novel federated learning method: \textbf{Fed}erated learning via \textbf{V}aluable \textbf{C}ondensed \textbf{K}nowledge (FedVCK). We enhance the quality of condensed knowledge and select the most necessary knowledge guided by models, to tackle the non-IID problem within limited communication budgets effectively. Specifically, on the client side, we condense the knowledge of each client into a small dataset and further enhance the condensation procedure with latent distribution constraints, facilitating the effective capture of high-quality knowledge. During each round, we specifically target and condense knowledge that has not been assimilated by the current model, thereby preventing unnecessary repetition of homogeneous knowledge and minimizing the frequency of communications required. On the server side, we propose relational supervised contrastive learning to provide more supervision signals to aid the global model updating. Comprehensive experiments across various medical tasks show that FedVCK can outperform state-of-the-art methods, demonstrating that it's non-IID robust and communication-efficient. |
2024-12-24 | Post-pandemic social contacts in Italy: implications for social distancing measures on in-person school and work attendance | Lorenzo Lucchini, Valentina Marziano, Filippo Trentini, Chiara Chiavenna, Elena D'Agnese, Vittoria Offeddu, Mattia Manica, Piero Poletti, Duilio Balsamo, Giorgio Guzzetta, Marco Aielli, Alessia Melegaro, Stefano Merler | Link | The collection of updated data on social contact patterns following the COVID-19 pandemic disruptions is crucial for future epidemiological assessments and evaluating non-pharmaceutical interventions (NPIs) based on physical distancing. We conducted two waves of an online survey in March 2022 and March 2023 in Italy, gathering data from a representative population sample on direct (verbal/physical interactions) and indirect (prolonged co-location in indoor spaces) contacts. Using a generalized linear mixed model, we examined determinants of individuals' total social contacts and evaluated the potential impact of work-from-home and distance learning on the transmissibility of respiratory pathogens. In-person attendance at work or school emerged as a primary driver of social contacts. Adults attending in person reported a mean of 1.69 (95% CI: 1.56-1.84) times the contacts of those staying home; among children and adolescents, this ratio increased to 2.38 (95% CI: 1.98-2.87). We estimated that suspending all non-essential work alone would marginally reduce transmissibility. However, combining distance learning for all education levels with work-from-home policies could decrease transmissibility by up to 23.7% (95% CI: 18.2%-29.0%). Extending these measures to early childcare services would yield only minimal additional benefits. These results provide useful data for modelling the transmission of respiratory pathogens in Italy after the end of the COVID-19 emergency. They also provide insights into the potential epidemiological effectiveness of social distancing interventions targeting work and school attendance, supporting considerations on the balance between the expected benefits and their heavy societal costs. |
2024-12-24 | Advancing Deformable Medical Image Registration with Multi-axis Cross-covariance Attention | Mingyuan Meng, Michael Fulham, Lei Bi, Jinman Kim | Link | Deformable image registration is a fundamental requirement for medical image analysis. Recently, transformers have been widely used in deep learning-based registration methods for their ability to capture long-range dependency via self-attention (SA). However, the high computation and memory loads of SA (growing quadratically with the spatial resolution) hinder transformers from processing subtle textural information in high-resolution image features, e.g., at the full and half image resolutions. This limits deformable registration as the high-resolution textural information is crucial for finding precise pixel-wise correspondence between subtle anatomical structures. Cross-covariance Attention (XCA), as a "transposed" version of SA that operates across feature channels, has complexity growing linearly with the spatial resolution, providing the feasibility of capturing long-range dependency among high-resolution image features. However, existing XCA-based transformers merely capture coarse global long-range dependency, which are unsuitable for deformable image registration relying primarily on fine-grained local correspondence. In this study, we propose to improve existing deep learning-based registration methods by embedding a new XCA mechanism. To this end, we design an XCA-based transformer block optimized for deformable medical image registration, named Multi-Axis XCA (MAXCA). Our MAXCA serves as a general network block that can be embedded into various registration network architectures. It can capture both global and local long-range dependency among high-resolution image features by applying regional and dilated XCA in parallel via a multi-axis design. Extensive experiments on two well-benchmarked inter-/intra-patient registration tasks with seven public medical datasets demonstrate that our MAXCA block enables state-of-the-art registration performance. |
2024-12-24 | Multi-Agent Norm Perception and Induction in Distributed Healthcare | Chao Li, Olga Petruchik, Elizaveta Grishanina, Sergey Kovalchuk | Link | This paper presents a Multi-Agent Norm Perception and Induction Learning Model aimed at facilitating the integration of autonomous agent systems into distributed healthcare environments through dynamic interaction processes. The nature of the medical norm system and its sharing channels necessitates distinct approaches for Multi-Agent Systems to learn two types of norms. Building on this foundation, the model enables agents to simultaneously learn descriptive norms, which capture collective tendencies, and prescriptive norms, which dictate ideal behaviors. Through parameterized mixed probability density models and practice-enhanced Markov games, the multi-agent system perceives descriptive norms in dynamic interactions and captures emergent prescriptive norms. We conducted experiments using a dataset from a neurological medical center spanning from 2016 to 2020. |
2024-12-24 | Agreement of Image Quality Metrics with Radiological Evaluation in the Presence of Motion Artifacts | Elisa Marchetto, Hannah Eichhorn, Daniel Gallichan, Julia A. Schnabel, Melanie Ganz | Link | Purpose: Reliable image quality assessment is crucial for evaluating new motion correction methods for magnetic resonance imaging. In this work, we compare the performance of commonly used reference-based and reference-free image quality metrics on a unique dataset with real motion artifacts. We further analyze the image quality metrics' robustness to typical pre-processing techniques. Methods: We compared five reference-based and five reference-free image quality metrics on data acquired with and without intentional motion (2D and 3D sequences). The metrics were recalculated seven times with varying pre-processing steps. The anonymized images were rated by radiologists and radiographers on a 1-5 Likert scale. Spearman correlation coefficients were computed to assess the relationship between image quality metrics and observer scores. Results: All reference-based image quality metrics showed strong correlation with observer assessments, with minor performance variations across sequences. Among reference-free metrics, Average Edge Strength offers the most promising results, as it consistently displayed stronger correlations across all sequences compared to the other reference-free metrics. Overall, the strongest correlation was achieved with percentile normalization and restricting the metric values to the skull-stripped brain region. In contrast, correlations were weaker when not applying any brain mask and using min-max or no normalization. Conclusion: Reference-based metrics reliably correlate with radiological evaluation across different sequences and datasets. Pre-processing steps, particularly normalization and brain masking, significantly influence the correlation values. Future research should focus on refining pre-processing techniques and exploring machine learning approaches for automated image quality evaluation. |
2024-12-24 | Unveiling the Threat of Fraud Gangs to Graph Neural Networks: Multi-Target Graph Injection Attacks against GNN-Based Fraud Detectors | Jinhyeok Choi, Heehyeon Kim, Joyce Jiyoung Whang | Link | Graph neural networks (GNNs) have emerged as an effective tool for fraud detection, identifying fraudulent users, and uncovering malicious behaviors. However, attacks against GNN-based fraud detectors and their risks have rarely been studied, thereby leaving potential threats unaddressed. Recent findings suggest that frauds are increasingly organized as gangs or groups. In this work, we design attack scenarios where fraud gangs aim to make their fraud nodes misclassified as benign by camouflaging their illicit activities in collusion. Based on these scenarios, we study adversarial attacks against GNN-based fraud detectors by simulating attacks of fraud gangs in three real-world fraud cases: spam reviews, fake news, and medical insurance frauds. We define these attacks as multi-target graph injection attacks and propose MonTi, a transformer-based Multi-target one-Time graph injection attack model. MonTi simultaneously generates attributes and edges of all attack nodes with a transformer encoder, capturing interdependencies between attributes and edges more effectively than most existing graph injection attack methods that generate these elements sequentially. Additionally, MonTi adaptively allocates the degree budget for each attack node to explore diverse injection structures involving target, candidate, and attack nodes, unlike existing methods that fix the degree budget across all attack nodes. Experiments show that MonTi outperforms the state-of-the-art graph injection attack methods on five real-world graphs. |
2024-12-24 | On the improved performances of FLUKA v4-4.0 in out-of-field proton dosimetry | Alexandra-Gabriela Şerban, Juan Alejandro de la Torre González, Marta Anguiano, Antonio M. Lallena, Francesc Salvat-Pujol | Link | A new model for the nuclear elastic scattering of protons below 250 MeV has been recently included in FLUKA v4-4.0, motivated by the evaluation of radiation effects in electronics. Nonetheless, proton nuclear elastic scattering plays a significant role also in proton dosimetry applications, for which the new model necessitated an explicit validation. Therefore, in this work a benchmark has been carried out against a recent measurement of radial-depth maps of absorbed dose in a water phantom under irradiation with protons of 100 MeV, 160 MeV, and 225 MeV. Two FLUKA versions have been employed to simulate these dose maps: v4-3.4, relying on a legacy model for proton nuclear elastic scattering, and v4-4.0, relying on the new model. The enhanced agreement with experimental absorbed doses obtained with FLUKA v4-4.0 is discussed, and the role played by proton nuclear elastic scattering, among other interaction mechanisms, in various regions of the radial-depth dose map is elucidated. Finally, the benchmark reported in this work is sensitive enough to showcase the importance of accurately characterizing beam parameters and the scattering geometry for Monte Carlo simulation purposes. |
2024-12-24 | Alleviating the trade-off between coincidence time resolution and sensitivity using scalable TOF-DOI detectors | Yuya Onishi, Ryosuke Ota | Link | Coincidence time resolution (CTR) in time-of-flight positron emission tomography (TOF-PET) has significantly improved with advancements in scintillators, photodetectors, and readout electronics. Achieving a CTR of 100 ps remains challenging due to the need for sufficiently thick scintillators-typically 20 mm-to ensure adequate sensitivity because the photon transit time spread within these thick scintillators impedes achieving 100 ps CTR. Therefore, thinner scintillators are preferable for CTR better than 100 ps. To address the trade-off between TOF capability and sensitivity, we propose a readout scheme of PET detectors. The proposed scheme utilizes two orthogonally stacked one-dimensional PET detectors, enabling the thickness of the scintillators to be reduced to approximately 13 mm without compromising sensitivity. This is achieved by stacking the detectors along the depth-of-interaction (DOI) axis of a PET scanner. We refer to this design as the cross-stacked detector, or xDetector. Furthermore, the xDetector inherently provides DOI information using the same readout scheme. Experimental evaluations demonstrated that the xDetector achieved a CTR of 175 ps FWHM and an energy resolution of 11% FWHM at 511 keV with 3 x 3 x 12.8 mm3 lutetium oxyorthosilicate crystals, each coupled one-to-one with silicon photomultipliers. In terms of xy-spatial resolution, the xDetector exhibited an asymmetric resolution due to its readout scheme: one resolution was defined by the 3.2 mm readout pitch, while the other was calculated using the center-of-gravity method. The xDetector effectively resolves the trade-off between TOF capability and sensitivity while offering scalability and DOI capability. By integrating state-of-the-art scintillators, photodetectors, and readout electronics with the xDetector scheme, achieving a CTR of 100 ps FWHM alongside high DOI resolution becomes a practical possibility. |
2024-12-24 | An AI-directed analytical study on the optical transmission microscopic images of Pseudomonas aeruginosa in planktonic and biofilm states | Bidisha Sengupta, Mousa Alrubayan, Yibin Wang, Esther Mallet, Angel Torres, Ravyn Solis, Haifeng Wang, Prabhakar Pradhan | Link | Biofilms are resistant microbial cell aggregates that pose risks to health and food industries and produce environmental contamination. Accurate and efficient detection and prevention of biofilms are challenging and demand interdisciplinary approaches. This multidisciplinary research reports the application of a deep learning-based artificial intelligence (AI) model for detecting biofilms produced by Pseudomonas aeruginosa with high accuracy. Aptamer DNA templated silver nanocluster (Ag-NC) was used to prevent biofilm formation, which produced images of the planktonic states of the bacteria. Large-volume bright field images of bacterial biofilms were used to design the AI model. In particular, we used U-Net with ResNet encoder enhancement to segment biofilm images for AI analysis. Different degrees of biofilm structures can be efficiently detected using ResNet18 and ResNet34 backbones. The potential applications of this technique are also discussed. |