A shiny new opportunity for big data in statistics education


Karsten Maurer (United States)


As the availability of truly massive data sets proliferates it is enticing to incorporate these data sources into the curriculum of an undergraduate statistics course. Major barriers exist for inclusion of big data due to the computationally intense nature of working with large databases. Difficulties include gaining access to the database, interacting with database management software and obtaining manageable subsamples from the database for student use. This paper describes a web based application, the Shiny Database Sampler, which allows instructors to bypass these barriers using a simple JavaScript based tool. The tool is constructed using R and the R packages Shiny and RMySQL to allow the instructor and/or students to sample observations from a number of different large databases, using selected sampling schemes, for use in the statistics classroom.