top of page


Skin Tone Disentanglement in 2D Makeup Transfer With Graph Neural Networks


Masoud Mokhtari, Fatemeh Taheri Dezaki, Timo Bolkart, Betty Mohler Tesch,
Rahul Suresh, Amin Banitalebi-Dehkordi

Abstract: Makeup transfer involves transferring makeup from a reference image to a target image while maintaining the target’s identity. Existing methods, which use Generative Adversarial Networks, often transfer not just makeup but also the reference image’s skin tone. This limits their use to similar skin tones and introduces bias. Our solution introduces a skin tone-robust makeup embedding achieved by augmenting the reference image with varied skin tones. Using Graph Neural Networks, we establish connections between target, reference, and augmented images to create this robust representation that preserves the target’s skin tone. In a user study, our approach outperformed other methods 66% of the time, showcasing its resilience to skin tone variations.

Screenshot 2024-04-22 at 12.37.24 PM.png

Graph Neural Networks and Transformers for Enhanced Explainability and Generalizability in Medical Machine Learning
  MASC Thesis


Masoud Mokhtari

Supervisors: Drs. Purang Abolmaesumi and Renjie Liao

Abstract: Machine learning models have great potential to be deployed for medical applications, but they must be explainable, flexible, and work well even with limited data. This thesis proposes three innovative approaches to tackle these challenges using graph neural networks and Transformers, which are considered as some of today’s advanced machine learning techniques. The first approach focuses on making predictions more explainable through a learned graphical structure, using an example of estimating heart function from echocardiogram videos. The second approach improves the model’s ability to work with limited data by using a hierarchical structure, demonstrated through detecting clinical landmarks in echocardiograms. The final approach offers a flexible framework that can be easily modified for different clinical tasks while maintaining explainability. These approaches make machine learning models more practical for real-world clinical applications, as they provide explainability, work with limited data, and can be tailored to different tasks.

GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis


Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang,

Purang Abolmaesumi, Renjie Liao

Best Paper Award

Abstract: Echocardiography (echo) is an ultrasound imaging modality that is widely used for various cardiovascular diagnosis tasks. Due to inter-observer variability in echo-based diagnosis, which arises from the variability in echo image acquisition and the interpretation of echo images based on clinical experience, vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification. For such safety-critical applications, it is essential for any proposed ML method to present a level of explainability along with good accuracy. In addition, such methods must be able to process several echo videos obtained from various heart views and the interactions among them to properly produce predictions for a variety of cardiovascular measurements or interpretation tasks. Prior work lacks explainability or is limited in scope by focusing on a single cardiovascular task. To remedy this, we propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability, while simultaneously enabling multi-video training where the inter-play among echo image patches in the same frame, all frames in the same video, and inter-video relationships are captured based on a downstream task. We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection. Our model achieves mean absolute errors of 4.15 and 4.84 for single and dual-video EF estimation and an accuracy of 96.5 % for AS detection, while providing informative task-specific attention maps and prototypical explainability.


EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detectionon Echocardiograms


Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa Tsang, Renjie Liao

Abstract: The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.


EchoGNN: Explainable Ejection Fraction Estimation with Graph Neural Networks


Masoud Mokhtari, Teresa Tsang, Purang Abolmaesumi, Renjie Liao

Abstract: Ejection fraction (EF) is a key indicator of cardiac function, allowing identification of patients prone to heart dysfunctions such as heart failure. EF is estimated from cardiac ultrasound videos known as echocardiograms (echo) by manually tracing the left ventricle and estimating its volume on certain frames. These estimations exhibit high inter-observer variability due to the manual process and varying video quality. Such sources of inaccuracy and the need for rapid assessment necessitate reliable and explainable machine learning techniques. In this work, we introduce EchoGNN, a model based on graph neural networks (GNNs) to estimate EF from echo videos. Our model first infers a latent echo-graph from the frames of one or multiple echo cine series. It then estimates weights over nodes and edges of this graph, indicating the importance of individual frames that aid EF estimation. A GNN regressor uses this weighted graph to predict EF. We show, qualitatively and quantitatively, that the learned graph weights provide explainability through identification of critical frames for EF estimation, which can be used to determine when human intervention is required. On EchoNet-Dynamic public EF dataset, EchoGNN achieves EF prediction performance that is on par with state of the art and provides explainability, which is crucial given the high inter-observer variability inherent in this task.

bottom of page