⇓ More from ICTworks

Using Indirect Data for Direct MERL Impact

By Guest Writer on November 28, 2016

The above example shows how secondary data across 100 hypothetical villages can be integrated so that signals appear within the noise of data, highlighting the least resilient — and most vulnerable — villages.  

RCTs (randomized controlled trials) and similar evidence-based approaches are thought by many to be the gold standard for evaluating the effectiveness of strategies for development. To be fair, RCTs have also received their share of critique. Nobel Prize Winner Angus Deaton has famously disparaged RCTs for their lack of usefulness in many situations.

Yes, in certain scenarios, RCTs may be the right choice for evaluation, but they are hardly universally applicable. Yet, our community in international development has made little effort to collectively move in a new direction.

Compromising external and internal validity

RCTs, adapted from health applications to use in development economics, tend to tell us a lot about one very distinct situation. Will a single strategy work in a single location to address a single issue? An RCT can give us a pretty definitive answer – it provides “internal validity”, or reliability of the results in the original setting.

But what happens when we want to address that same issue in another location? And another? Every setting has a distinct set of characteristics that define it and, accordingly, the same intervention will have different effects in different settings. That means that if we want an accurate assessment of the effectiveness of, say, providing improved cook-stoves to reduce fuel use and indoor air pollution, and improve health in different communities, a separate RCT would need to be carried out in each one.

Because of its narrow geographical and topical focus, an RCT does not provide “external validity.”

To find external validity one must carry out broad studies that draw conclusions from data about larger geographic areas or more varied populations, but these, naturally, are less accurate in each individual setting.

The transportability problem

Human response to interventions is naturally unpredictable from one place to another, reflecting both subtle and obvious disparities in different ecosystems. Returning to the previous example, one type of improved cook-stove may be widely adopted by people in Kenya and thus drastically improve health and indoor air-quality.

However, when distributed in neighboring Ethiopia it may instead be pushed aside and used for storage because it is inadequate for cooking traditional injera – in short, it does not fit in within the socio-cultural norms. As we all know, this “transportability problem” commonly plagues development efforts.

But there is a way to find both internal and external validity when considering the best interventions to combat a specific issue over a broad area. When finding data to solve a problem, we tend to look for that which is directly related to the issue at hand – like type of cooking fuel used, when investigating improved cook stoves – disregarding the fact that there is a vast quantity of other data that defines the world in which we live.

By integrating a mass of indirect, secondary data, one can effectively characterize the vulnerability of broad geographic areas at the resolution of whatever data is used. A systems approach like this one takes into account the dynamic and complex nature of both human and natural environments.

We are not data limited

Where does secondary data come from? There is a deluge of data gathered by national and sub-national governments, NGOs, multi-lateral organizations, and scientific institutions. This includes typical census and survey data at different time scales and levels of resolution, as well as data collected for specific projects and studies.

It also consists of satellite imaging and remote sensing data, which shows us much more than just land use, crop types, and climate information. It can be used to gauge poverty levels based on roof types, access to markets based on road patterns, and ambient population based on movement of people, among other factors. Additional data can be harvested from social media, news sources, and private sector companies.

All of this information may fall within the category of big data, a much discussed topic at this year’s MERL Tech Conference.

The variation of a single factor from one setting to another may not always be that illuminating or great in magnitude. However, when a group of small (or large) variations are stacked, the locations with the greatest collective variation appear as signals within the noise – and likely locations of vulnerability. Based on this method of Weak Signal Analysis, we have the ability to target interventions that will have the greatest return on investment.

Gathering data is expensive. A common measure of evaluation in international development is cost per life saved, and methods of integrating secondary data can significantly drive down costs for discovering and implementing effective interventions. Leveraging pre-existing data takes advantage of money that has already been spent, and results can inform new data collection when necessary.

Further, an exploratory process like this one is what drives scientific discovery. When we test hypotheses with RCTs, we confirm or invalidate pre-conceived notions. But when we look for signals within a wealth of secondary data, we just might uncover brand-new solutions.

By Sally Goodman of Novametrics LLC 

Filed Under: Data
More About: , , , , , , ,

Written by
This Guest Post is an ICTworks community knowledge-sharing effort. We actively solicit original content and search for and re-publish quality ICT-related posts we find online. Please suggest a post (even your own) to add to our collective insight.
Stay Current with ICTworksGet Regular Updates via Email

Sorry, the comment form is closed at this time.