How to Deploy Machine Learning with Messy, Real World Data

⇓ More from ICTworks

How to Deploy Machine Learning with Messy, Real World Data

By Wayan Vota on November 9, 2022

usaid valuatiion tool ihris software

Machine learning and artificial intelligence pose the ability for global health practitioners to glean new insights from data they are already collecting as part of implementing their programs.

However, little practice-based research has been documented on how to incorporate machine learning into international development programs. Current systems mirror in form and format the use of manually completed paper records to create periodic reports for leadership. This has vexed health officials with a proliferation of systems leaving some “data rich, but information poor”.

Yet the growth of available analytical systems and exponential growth of data require the global digital health community to become conversant in this technology to continue to make contributions to help fulfill our missions. Our hope was to reach a level somewhere between forecasting and predictive modeling that exponentially adds value to mere clerical data collection, thus improving knowledge discovery from such data sources.

In this community case study, Deploying Machine Learning with Messy, Real World Data in Low- and Middle-Income Countries, we describe the approach we took at IntraHealth International to inform the use case for machine learning in global health and development. We found that the data needed to take advantage of machine learning were plentiful and that an international, interdisciplinary team can be formed to collect, clean, and analyze the data at hand using cloud-based and open source tools.

We organized our work as a “sprint” lasting roughly 10 weeks in length so that we could rapidly prototype these approaches in order to achieve institutional buy in. Our initial sprint resulted in two requests in subsequent workplans for analytics using the data we compiled and directly impacted program implementation.

Four Lessons Learned in Machine Learning

Based on our successful experience of gaining buy-in, setting up an interdisciplinary team, building a data lake, and performing machine learning analysis, we are able to offer the following lessons learned to other practitioners who are seeking to enhance their own work with machine learning but are unsure where to start.

1. Show value (or fail) quickly

Machine learning is a new approach that can be risky to take on among other available well known approaches. We found that by setting up the activity as a sprint of about 10 weeks, we were able to rapidly test our approach and show value leading to greater confidence in this approach from project leadership and additional requests for analysis.

Key to our success was internal support from the IntraHealth’s digital health team, where this project originated, to understand the value that can be unlocked by machine learning approaches and to endorse the initial pilot exercise.

2. Data scientist leads the technical approach

While much of the work of machine learning can be carried out by a consultant, we are committed to increasing the use of data science at our organization, which requires a full-time employee to develop and iterate on these approaches.

We built this initial approach by contracting a doctoral-level public health researcher from a local university with training in international population health and data science as a lead technical expert. We later hired her as a senior data scientist and placed her on the digital health team and since have grown the team to add an associate data scientist who had been working in the private sector but was attracted to international development by the lure of data and opportunities for improving health outcomes.

3. Communicate the value of machine learning

Machine learning can be an intuitive process and data scientists should be able to explain the approach to non-technical audiences to build trust in the results of these methods. As data science continues to permeate the world of global health and development, a gap may emerge between the old guard with field expertise and newcomers trained in newly developed data science approaches.

There will be many opportunities for collaboration to make machine learning approaches complementary to more traditional approaches by grounding these approaches in the local contexts in which we work.

4. Main cost driver: cleaning and analyzing data

Machine learning using population data is an extremely cost-effective approach to learn more about the contexts in which we support local ministries to reach their health targets. Nearly all of the data used in this analysis was available without cost for the project’s use.

The only exception to this was data from the project’s baseline survey, which had already been financed by the project prior to the beginning of our activity. In addition, the DHS and census are widely available surveys that can be downloaded by researchers without fees. Ultimately, the ongoing cost to take on these projects is time for effort to complete the analysis.

Three Ways We Succeed

Our journey without a destination into building a data lake and conducting machine learning was merited by what the team developed and discovered.

First, and most important, we could do it. We could bring together a critical mass of talent, data, and technology resources to conduct an investigation using machine learning tools and practices.
We also learned that an ad hoc team could conduct a highly technical process with large amounts of data, many variables, and complex analyses in the virtual space.
Finally, we proved the concept to project leadership in Uganda and the Activity invested in another sprint to investigate questions they provided that could directly enhance service delivery. To date, a third sprint has been approved by the project leadership in Uganda.

Machine learning is more accessible than is commonly perceived. The various technologies—computational, connectivity, storage, and accessibility—offer an onramp for other such journeys. The human resources reside in a cross-section of project management, computer science, statistics, information and communications technology, health informatics, measurement and evaluation, and data science.

Working collaboratively with such a team brings each to contribute from their specific sector for the purpose in common. These are resources generally accessible in the domain of global digital health development.

A lightly edited synopsis of Deploying Machine Learning with Messy, Real World Data in Low- and Middle-Income Countries by Amy Finnegan, David Potenziani, Caroline Karutu, Irene Wanyana, Nicholas Matsiko, Cyrus Elahi, Nobert Mijumbi, Richard Stanley, and Wayan Vota

Now Read These Related Posts

Filed Under: Featured, Healthcare
More About: Academic Research, Data Analysis, Data Scientist, Digital Health, Field Research, How To Guide, IntraHealth International, Machine Learning, Research Report, Uganda

Written by Wayan Vota

Wayan Vota co-founded ICTworks. He also co-founded Technology Salon, Career Pivot, MERL Tech, ICTforAg, ICT4Djobs, ICT4Drinks, JadedAid, Kurante, OLPC News and a few other things. Opinions expressed here are his own and do not reflect the position of his employer, any of its entities, or any ICTWorks sponsor.

Sorry, the comment form is closed at this time.

About Us

ICTworks™ is the premier resource for international development professionals committed to utilizing new and emerging technologies to magnify the intent of communities to accelerate their social and economic development.

ICTworks