What Happens When Global Health and Data Science Collide?


Data scientist Amy Finnegan’s job is to ask the right questions to get new answers. She uses machine learning to solve global health puzzles like “Why do women who use contraception in low-income countries stop using if they still want to avoid pregnancy?” and “What elements of populations and geography can determine whether a health program is successful?”

We sat down with her to ask how this work is shaping the future of global health—and what comes next.

What's your favorite thing about working in this field?

I have always been fascinated with human experiences like births, deaths, and migration and the behaviors that drive choices and help or inhibit people from converting their intentions into action.

So often those behaviors and choices have implications for global health. I think of global health not as a discipline, but as a collection of problems that require many different approaches.

Machine learning helps us decide what’s important to the human experience.


It’s an old approach from computer science that has been unlocked for global health because there is so much data available.

For example, we’ve been using machine learning in our team’s pilot work in Uganda to quickly find new insights—finding different groupings we didn’t know existed in old data. Machine learning algorithms are really teaching us to challenge traditional approaches to improve and save human lives.

What's one thing you think people should know about data science in global health?

The real value of data isn’t how much you have but how you use them.


I have found that people often expect machine learning to solve problems without being able to articulate exactly what question they are trying to answer.

Data scientists can add value by having the technical skills to code and clean data but also be good listeners and communicators. We often work together with people who have deep field experience who can amplify the power of data science by making it as useful as it can be for the constraints on the ground.

We work together to design something today that will change things tomorrow.

I recently worked in Uganda on our SupCap project. The project leaders asked me to help them estimate the sample size needed for their study of postpartum contraceptive uptake. My first estimate needed to be adjusted due to field constraints like how villages are organized in Uganda.

We work together to design something today that will change things tomorrow, making it easier for villages to manage projects long after we’re gone.

How do we make sure we’re collecting the right data?

In our work, we collect and use data to monitor and evaluate our program results, but there are many existing data sources out there that can supplement program data—like data from household surveys that are publicly available.

For example, in Uganda, IntraHealth has a project called Regional Health Integration to Enhance Services in Eastern Uganda (RHITES-East). We used data from six different sources that are all publicly available, which is cost-effective and adds value to existing approaches.

In Uganda, the results gave the team so many ideas about what to do with the data and what they meant. In some cases, the Ugandan team saw things in the results that I hadn’t considered from my vantage point in Chapel Hill. A multicountry, interdisciplinary team was key to our success.

What’s one tip you have for anyone just getting started on working with data?

Context matters. There’s a difference between the number of people who do something and the percentage of people who do something. For example, if you focus on the number of adolescents giving birth, urban areas will always have more than rural areas. But if you look at the percentage of births among adolescents, you might find that rates are higher in rural areas.

Sometimes data will tell you something completely different from what you first think.


Numbers themselves can be deceiving if you don’t think about context and how those numbers fit into the overall picture.

What does the future hold for data science?

Data science is changing so rapidly because of open source tools like R and Python that are free to learn and use, the accumulation of data available for analysis, and the curiosity that makes people pick up these tools and follow their imaginations.

The Big Data revolution is not just about volume, but also velocity, variety, and veracity. We’re building models based on more than just large data sets—we’re focusing on more types of variables in these data sets like growing seasons, rainfall, health center access, and roads, for example. We must continue to inspect inputs and outputs of these models not just as data scientists, but as practitioners who should always demand that models are intuitive and biases are transparent.

This piece is part of IntraHealth's celebration of 40 years of commitment to health workers and International Women's Day. Follow along: #TheFutureOf #IntraHealth40 #HealthWorkersCount #IWD2020 #EachforEqual

To get stories and results like these delivered to your inbox, sign up for our mailing list.