Blog

Work with SurveyCTO data in R using the rsurveycto package

May 12, 2023
by
Jake Hughey

Two SurveyCTO users demonstrate how to use the new rsurveycto package as a prelude to statistical analysis and data visualization.

This post was written by Jake Hughey and Rob On, core team members of the Agency Fund, a donor partnership advancing innovations and organizations that help people envision and navigate toward a better future. Jake and Rob are SurveyCTO users, R enthusiasts, and developers of the rsurveycto package. This post was originally published on SurveyCTO’s blog.

Why combine R and SurveyCTO?

A prerequisite to monitoring, evaluation, research, and learning (MERL) is collecting data in the field. One commonly used platform for this purpose is SurveyCTO, which specializes in mobile data collection in offline settings. At the Agency Fund, a number of our partner organizations use SurveyCTO to collect and manage their program data. This data is vital for making evidence-based decisions on what’s working (and should be scaled) and what needs more attention. Importantly, SurveyCTO’s REST API makes this data accessible for automated processing and analysis pipelines.

For Python users, our colleagues at IDInsight previously developed a wrapper around the REST API in the form of the pysurveycto package. But what about R users? The R programming environment and its rich ecosystem of packages are well-suited for all sorts of MERL-related tasks, from data cleaning to statistical analyses to visualization. Many MERL professionals use R because it’s free and open-source, so it can be used by anyone, anywhere.

What has been missing is a simple way to interact with SurveyCTO data in R, which is why we developed the rsurveycto package.

How does the rsurveycto package work?

The rsurveycto package allows R users to easily pull data from, and even push data to, a SurveyCTO server. The rsurveycto package relies on SurveyCTO’s REST API, but abstracts away the dreary details of dealing with API requests. To get a sense of what’s possible with R and SurveyCTO, let’s see the package in action.

What can you do with rsurveycto?

First, we load the package and authenticate to a SurveyCTO server. We recommend creating a text file containing the server name, user name, and password — for our example, let’s name this text file “scto_auth.txt” (make sure the user is assigned a role that has permission to download data and for which “Allow server API access” is enabled). We also load the awesome data.table package, since rsurveycto makes heavy use of data.tables.

library('data.table')
library('rsurveycto')

auth = scto_auth('scto_auth.txt')

Next, let’s read data from a form.

form_submissions = scto_read(auth, 'my_form_id')

The scto_read() function understands the same options as the API, allowing you to specify a start date, a review status, and a private key for encrypted fields (more on this in our reference documentation). You can retrieve server datasets in the same way, e.g., scto_read(auth, 'cases').

What if you want to know all the forms and datasets on the server? The rsurveycto package has you covered.

catalog = scto_catalog(auth)

In fact, you can read in all forms and datasets in one go.

forms_datasets = scto_read(auth)

Now the fun really begins. From here, you can use the power of R to wrangle the data, merge the data with data from other sources, fit statistical or predictive models, and make elegant and informative plots.

Want to know more?

But wait, there’s more! The rsurveycto package can also fetch detailed metadata and form definitions, download file attachments, and even write to an existing server dataset. Best of all in our opinion, the package is free, open-source, and available on CRAN.

We hope the package is useful in R-based data processing, analysis, and visualization pipelines involving SurveyCTO data. Please try it. If you do, we’d welcome your feedback, especially suggestions for how to make it better, on the package’s GitHub repository.