Zachary Jacokes

I'm a data scientist and neuroscientist who builds scalable, reproducible systems to extract reliable structure from complex, high-dimensional data. I transmute the noise in clinical and biomedical datasets into interpretable, real-world insights.

Outside of work, I'm a father of two young boys and a basketball and tennis enthusiast. I enjoy thoughtful conversations and welcome opportunities to connect.

About Me

I used to think the beauty of data science was in the arc: raw data in, clean result out, story told. And that instinct wasn't exactly wrong, but it was a naïve version you believe before you've spent years inside real datasets.

What I've actually learned is that the interesting part isn't the trajectory. It's the process of building everything so the structure holds up. The pipeline that makes an analysis repeatable across multiple sites. The quality checks that catch when a scanner drifts. The database design that means a clinician in Seattle and a clinician in Los Angeles are actually recording the same thing. That's where the real work lives.

How I got here

I studied psychology at Emory, originally out of curiosity about how the mind works. What actually grabbed me wasn't the clinical side, though; it was the methodology. The studies that stuck with me were the ones with disciplined design, and I got more interested in that than in the findings themselves.

At Georgia Tech, I worked on a computational experiment using expectation-maximization to model social smiling behavior, and that was the moment machine learning clicked. Not just conceptually, but a tool you could point at a messy problem and get structure out of.

After that, the direction was pretty clear. I wanted to be the person who makes the analysis possible, not just the person who runs it.

The multi-site years

I spent four years at USC's Lab of Neuroimaging working as part of a data coordination center for a multi-site study. The job was making sure data collected by different teams, on different scanners, in different cities could be compared meaningfully. Building workflows, running quality control, constantly asking whether the variation in a dataset is telling you something about biology or just about which MRI machine someone happened to use.

At UVA, I took that a step further and built systems from scratch: REDCap databases for dozens of clinical instruments, automated preprocessing pipelines, HIPAA-compliant data handling, the whole stack. I trained clinical staff; I wrote the documentation; I was the person who got the call when something broke. Most people touch one piece of a data pipeline. I've worked across all of it, from data entry to model output.

The PhD

I already knew how to build infrastructure, so the doctorate wasn't a pivot. It was about going deeper on what happens inside that infrastructure. How to represent and model high-dimensional biomedical data, and, importantly, how you know whether what you've found is real.

My research boils down to one question: can you extract reliable signal from noisy, heterogeneous, multi-site clinical data? That touches dimensionality reduction, spectral embeddings, cross-site harmonization, and brain-behavior modeling, but the common thread is skepticism. When you're working with data from multiple sites, the hardest problem is figuring out whether the pattern you found would survive a different scanner, a different cohort, or a different Tuesday. Once that’s clear, modeling becomes a question of fit, not faith.

What I do

I build and run data systems in biomedical environments where nothing is simple: multi-site studies, inconsistent inputs, regulatory constraints, and evolving scientific questions. I've published 14 peer-reviewed papers and a book chapter, presented at international conferences, built large-scale data pipelines, and mentored students and staff along the way.

The ethos that defines how I work is this: I don't treat modeling as separate from data infrastructure. If the data feeding your model aren't trustworthy, the model doesn't matter. If the result doesn't generalize past the dataset that produced it, you haven't learned anything. And if the whole system falls apart when you're not watching it, it’s not a system.

What's next

I'm drawn to problems where the answers aren't settled. Real-world evidence, neurotechnology, clinical data platforms; places where the data are messy because the biology is messy. Those are the environments where careful infrastructure and honest modeling matter most.

The version of me who wrote a grad school application once said he wanted to “participate in the data revolution.” I still like the energy of that, even if I'd phrase it differently now. These days, the goal is more specific: build systems and models that make complex data usable reliably, at scale, and without anyone having to hold them together by hand.

Publications
First-Author & Lead Contributions
Additional Publications
Ressa HJ, Newman BT, Jacokes Z, McPartland JC, Kleinhans NM, Druzgal TJ, Pelphrey KA, Van Horn JD.
Imaging Neuroscience. 2025;3.
Links between behavior and brain structure in autism shift with development, suggesting that apparent “subtypes” may reflect age-dependent changes rather than fixed categories.
Newman BT, Jacokes Z, Venkadesh S, Webb SJ, Kleinhans NM, McPartland JC, et al.
PLoS ONE. 2024;19(4):e0301964.
Uses advanced MRI-derived metrics to show that differences in signal transmission along white matter pathways may underlie variability in how information is processed in autism.
Van Horn J, Jacokes Z, Newman B, Henry T.
Neuroinformatics. 2023:1–3.
Argues that neuroscience needs stronger theoretical frameworks, not just wider data collection efforts, to meaningfully understand how brain connectivity relates to disorders and behavior.
Irimia A, Lei X, Torgerson CM, Jacokes Z, Abe S, Van Horn JD.
Frontiers in Computational Neuroscience. 2018;12:93.
Combines machine learning and imaging to show that sex meaningfully changes how autism-related brain differences appear, rather than simply adding variability.
Gupta R, Audhkhasi K, Jacokes Z, Rozga A, Narayanan S.
IEEE Transactions on Affective Computing. 2018;9(1):76–89.
Develops a method to recover reliable signals from inconsistent human annotations by treating each annotator as a noisy observer of an underlying truth.
Rodriguez M, Harmony T, Carrillo-Prado C, Van Horn JD, Irimia A, Torgerson C, Jacokes Z.
NeuroImage: Clinical. 2017;16:355–368.
Reviews how early brain imaging can be used to predict developmental outcomes in preterm infants, while highlighting the limits of current predictive approaches.
Irimia A, Torgerson C, Jacokes Z, Van Horn JD.
Scientific Reports. 2017;7:46401.
Shows that large-scale brain wiring differs between males and females with autism, helping explain differences in how the condition presents.
Hull J, Jacokes Z, Torgerson C, Dokovna L, Irimia A, Van Horn JD.
Frontiers in Psychiatry. 2017;7:205.
Synthesizes a fragmented literature on brain connectivity in autism, highlighting inconsistent findings and the need for more structured, theory-driven approaches.
Harrop C, Libsack E, Bernier R, Dapretto M, Jack A, McPartland J, Van Horn J, Webb S, Pelphrey K, GENDAAR Consortium.
Autism Research. 2021;14(5).
Demonstrates that sex and early development shape when autism is recognized, suggesting diagnostic timing is influenced by more than symptom severity alone.
Jack A, Sullivan C, Aylward E, Bookheimer S, Dapretto M, Gaab N, Van Horn J, Eilbott J, Jacokes Z, Torgerson C, et al.
Brain. 2021;awab064.
Integrates genetic and brain imaging data to investigate why autism is diagnosed less frequently in females, pointing to distinct biological pathways.
Lawrence K, Hernandez L, Eilbott J, Jack A, Aylward E, Gaab N, Van Horn J, et al.
Translational Psychiatry. 2020;10:178.
Shows that reward-related brain responses during social interactions differ in autistic females, offering insight into how social motivation varies across groups.
Van Horn JD, Irimia A, Torgerson C, Bhattrai A, Jacokes Z, Vespa P.
Journal of Neuroscience Research. 2018;96(4):652–660.
Case study linking early brain injury to later-life cognitive decline, illustrating how long-term structural changes can unfold over decades.

Full record: 14 journal articles, 1 book chapter, 15+ conference abstracts (OHBM 2016–2023). See Google Scholar for complete list.

Projects
PCA-LASSO Feature Recovery
Tests when common ML pipelines fail to recover true signal in noisy, high-dimensional settings, with implications for real-world data like neuroimaging and clinical datasets.
Python · scikit-learn
Project Name
Brief description of what this project does and why it matters. Replace with your actual project details.
Python · R
Project Name
Brief description of what this project does and why it matters. Replace with your actual project details.
Bash · Docker
Project Name
Brief description of what this project does and why it matters. Replace with your actual project details.
Python · PyTorch

More on GitHub →

Resume

Download my current resume for a detailed overview of my experience, skills, and publications.

Download Resume (PDF)