Project Summary / AbstractAutism spectrum disorder (ASD) is a developmental disorder that affects 1 in 54 children in the US (1). Theeconomic cost of ASD is estimated to be $66 billion per year in the US from medical care and lost parentalproductivity (2). Early diagnosis is crucial since it allows for early treatment and the best long-term outcome.However identifying children at high risk for ASD at an early age is challenging due to lack of specialists. Toaddress this problem the project's objective is to create health information technology (HIT) using informationin electronic health records (EHR) to support non-expert clinicians in identifying children at high risk for ASD.The HIT will integrate two components that provide complementary information. The first component willleverage machine learning algorithms to label EHR of children at high risk for autism. Both traditional and deeplearning potentially leveraging each other will be evaluated while systematically tracking quality and quantityof information in EHR and their effect on performance. The second component will focus on the EHR free textand identify phenotypic behavioral expressions of diagnostic criteria as defined in the Diagnostic and StatisticalManual of Mental Disorders (DSM). Rule-based natural language processing will be combined with machinelearning algorithms. For both components potential algorithm bias will be investigated and corrected ordocumented when this is not possible. The HIT will combine results from both components through an intuitiveuser interface. Since it is intended to be used as a human-in-the loop decision tool it will also providedescriptive data on performance for both components. The final HIT will be developed using rapid prototypingin interaction with domain experts. It will be evaluated in a user study with representative non-expert clinicians.The evaluation will compare accuracy confidence and efficiency of identifying children at risk for ASD withand without the HIT by non-ASD experts. It will also systematically focus on the type amount quality andtransparency of information provided and how this interacts with user beliefs about their own expertise as wellas their bias toward machine decisions. Different types of EHR as well as different levels of clinical expertisewill be compared for effects of HIT use.Preliminary work has been conducted for all components with good results. However this prior work focusedon version IV of the DSM and used only free text from data rich EHR. The proposed project will expand theprior work to use DSM-5 criteria train and develop the algorithms to use structured and unstructured fields inclinical representative EHR and work with EHR from different hospitals to evaluate potential obstacles andadvantages of variability in data.Using information in EHR this HIT will provide support especially for non-expert clinicians in their evaluation ofchildren who may be at risk of ASD. The HIT will support early referrals leading to early diagnosis and therapy.It will be useful in a variety of different settings where domain expertise is missing.