Given the extent to which statistics classes alienate undergraduates without necessarily providing a tangible benefit, and given the increasing availability of data and open-source tools with which to analyze data, Professor Braumoeller decided to focus his undergraduate methods teaching on data literacy, data visualization, and exploratory data analysis. The result is the iTunes U course, “Data Literacy and Data Visualization,” which was listed on Apple’s “Off the Charts 2017” compilation of most popular courses on iTunes U (the course is also available on YouTube).
Some of the most useful resources he’s found while creating the course are listed below.
Overviews
- The Bamboo Dirt project, an aggregator of digital research tools from analysis to visualization
- The WikiViz Tools site
- Alberto Cairo’s thoughtfully curated list of readings, presentations, books and blogs over at The Functional Art
- The multifaceted dataviz site improving-visualisation.org and in particular its page of links to other sites
- High-tech and gorgeous Visualising Data, with a month-end best-of-web roundup and a list of essential resources
- Similar dataviz blogs at Information Aesthetics and The Why Axis
- CS 7450, Information Visualization from Georgia Tech
- RWW’s Best Tools for Visualization site
- dataviz.tools, which contains a great variety of tools for data cleaning, scraping, charting, design, and more
- The datavisualization.ch list of selected tools (most require Javascript)
Design: How to Think About Visualization
- The Visual Complexity array of dataviz examples
- A wide range of examples (many too complex, but all beautiful) at FlowingData
- Information is Beautiful, a blog with a wide range of data visualizations
- The Spatial Analysis blog with examples, news, and ideas
- The mindboggling Periodic Table of Visualization Methods (with mouseover illustrations)
- Variations on Minard’s famous graph
- A Handsome Atlas, a catalog of gorgeous 19th-century data visualizations
- A tutorial on visualizing numeric data by groups
- A blog about how the New York Times does its visualizations
- Kaiser Fung’s useful “don’t let this happen to you” blog, Junk Charts, and the charmingly horrifying WTF Visualizations
- When good dataviz goes bad: infographics at Good : Transparency
All-in-One Examples: Data + Visualization
- A real-time Twitter trends map, Trendsmap
- A searchable Twitter stream graph, Twitter StreamGraph (uses Java; might not work on all browsers)
- Social Explorer (OSU login required), a portal for obtaining demographic data on the U.S.: Census, American Community Survey, etc. Also has lots of quick-and-easy (and very good) visualization options.
- IBM’s ManyEyes, a general data-visualization engine with quite a few (iffy) user-supplied datasets
- A Facebook social-network graphing application, Facebook Social Graph
- A great real-time stacked-time-series generator, NameVoyager, the baby name wizard
- The New York Times visualization of the American Time Use Survey
- Wolfram’s massively useful general knowledge engine, Wolfram Alpha
- To visualize members of Congress’ voting behavior (including data), IBM’s Many Bills
- Terrific visualization of weather data at Weatherspark
- A sobering graphic about deaths in war
- A general data-visualization and mapmaking site with some built-in data, StatPlanet
- For global maps with weather data, GPCC Visualizer
- An interesting, if somewhat rigid, conflict visualization tool
- The problems facing the American budget: graphs – data
- Some nice opinion graphs illustrating epistemic closure and the effects of education: graphs
Visualization Tools
- An easy-to-use wordcloud generator, Wordle (and a thoughtful essay on why wordclouds are bad, and an example of an alternative)
- A more advanced and style-forward wordcloud generator, Tagxedo
- The Overview project for visualizing relationships in large numbers of documents, with an example
- Google’s easy-to-use Google Chart Tools API (helps to know HTML)
- A basic plug-and-play online chart builder, Hohli
- The JavaScript-based JSCharts site, which allows easy construction of basic charts and graphs
- The glorious animated multicolored scatterplot engine, Gapminder.org
- Google’s similar and equally awesome Public Data Explorer
- A general data-visualization engine, IBM’s ManyEyes (update: RIP!)
- The extremely easy-to-use and attractive Datawrapper reactive-chart website
- Raw, a straightforward and gorgeous way to turn spreadsheet data into vector graphics
- Infogram, an online chart and graph maker. Free version has 30 different kinds of graphs; paid version has more features, including live update from JSON or Google Drive
- Lyra, a really slick interactive visualization design environment
- Flourish, a gorgeous D3-based visualization engine with some jawdropping templates
- Plotly, a collaborative online visualization and data analysis tool with some handy APIs
- The Tableau Public data-visualization tool
- The easy-to-use GPS Visualizer (requires longitude, latitude data)
- The Flash-based map- and trend-generation engine, StatSilk
- The Flash-based (and web-centric, but gorgeous) Flare[requires nontrivial compilation]
- The cross-platform, open-source Gephi tool for visualizing networks and complex systems
- The Cytoscape network visualization platform
- The NodeXL network graphing tool [requires Windows]
- The dead-simple and very impressive GunnMap world map visualization tool
- The OpenHeatMap distribution heatmap site
- The CartoDB site for creating dynamic, data-driven maps quickly and easily (update: now improved and called Carto)
- The stunning Tilemill program at Mapbox for visualizing data on maps, and some examples (tiered pricing includes free option)
- RAW, an unbelievably easy to use and awesome site that turns data into many different kinds of gorgeous vector graphics.
- The “free for now” ChartsBin world map creation tool
- The easy-to-use, beautiful, and free GeoCommons map tool
- Chart Chooser, a website for graphs and tables from Excel or PowerPoint templates
- The Mondrian interactive-graph interface for creating graphs from ASCII, R, or database files
- The Chartle tool (beta) for creating and exporting a variety of graphs and maps from Excel data
- The Science of Science meta-tool for data analysis and visualization
- A blog with relevant resources and links, Visualizing Data
- Some information on flow maps, with source code (alpha version, far from user-friendly) and a demo
- A fairly useful-looking web-based general data analysis tool, StatCrunch
- Stunning graphics for the programming-oriented at processing.org
- FF Chartwell, a typeface for creating simple graphs
Javascript Libraries (knowledge of Javascript required… but wow)
- The flat-out-jawdropping Data-Driven Documents (or D3) and some video tutorials
- Christophe Viau’s massive compilation of D3 examples
- Crossfilter, a D3 library for creating dynamic views of different dimensions of a dataset
- Raphaël, a simple library for impressive vector graphics
- Arbor, a very slick library for creating network graphs
- The free amCharts JavaScript bundle
- Tangle, a library that allows reactive visualization of the results of complex interactions or equations
- Polymaps, a mapping library designed around data visualization
- Kartograph, a Python library for really impressive interactive map visualizations
Data Resources
- An overview of data scraping
- The OutWit Hub data scraping program (free version has limited functionality)
- Two tools—WebPlotDigitizer and DataThief—for scraping data from graphs
- A basic Twitter scraper at scraperwiki
- A comprehensive list of Twitter and Facebook data collection tools by Deen Freelon
- A clear and detailed tutorial from ProPublica on data scraping with Ruby
- Data scraping tools readLines, RCurl, and scrapeR, for R
- The Needlebase data-scraping, acquisition, cleaning, and analyzing engine [Update: R.I.P.!]
- Two websites for converting web pages into APIs: kimono and import.io
- Google’s Fusion Tables for data acquisition, fusing, mapping, and graphing
- Two amazingly slick tools for dataset cleaning, Data Wrangler and OpenRefine
- Free online data, plots, and dynamic data graphs at Data360
- Google’s web-based data-dredging tool, Google Correlate
- Google’s incredibly addictive tool for tracking trends in phrases mentioned in books, Google ngram viewer
- The more general Google Trends tool
- The QuantumGIS geographical information project
- Big data frameworks, resources, and tools, collected by Andrea Mostosi. Mostly computer-science tools, with some general data visualization tools at the end
Dataset Archives
- The IMF’s online database of economic indicators
- Incredible data from the World Bank, their API, and a tutorial for using it in R
- Social Explorer (OSU login required), a portal for obtaining demographic data on the U.S.: Census, American Community Survey, etc. Census data in particular are very hard to obtain otherwise
- Data.gov, the US Government’s online data warehouse, and its raw data catalog
- The USA.gov collection of data and statistics about the U.S.
- The Martin-Quinn data on the ideological positions of Supreme Court justices
- CDC Wonder, an easy-to-use portal for downloading data on a dazzling array of population, environmental, and mortality statistics
- The Census Bureau’s Statistical Abstract, “the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States.”
- GSS Data Explorer, for extracting data from the huge and incredibly comprehensive General Social Survey at NORC
- NationalAtlas.gov’s downloadable map data, including data and shapefiles for past elections, climate, transportation, agriculture, etc., etc. Looking to map bat ranges? They’ve got that.
- Speaking of fun datasets, check out 100+ Interesting Data Sets for Statistics—including dolphin social network data. Data on dolphins!!
- And speaking of networks, dozens of cool network datasets from KONECT
- Ohio data at the Ohio Development Services Agency Office of Research
- Millions of free time-series datasets at Quandl for Academics
- Lots of quality-of-government indicators from the Quality of Government Institute
- Data on trends in college pricing from collegeboard.org
- The Bureau of Investigative Journalism website, with data on current news stories
- The New York Times’ list of APIs, including those on Congress and campaign finance
- CSV feed of most recent earthquakes, from USGS
- Open Payments Data, a huge database of pharmaceutical companies’ payments to doctors and teaching hospitals. Contains dates, addresses, and amount of payment
- POLCON, the political constraint index
- The Open Event Data Alliance website and their Phoenix events data project, a massive attempt to codify international events of interest
- The Dynamics of Collective Action (DoCA) data project, reflecting 35 years of protest events, hand-coded by experts on social movements
- Fantastic online data compilations and visualizations at The Guardian and The Los Angeles Times’ Data Desk
- Tons of Big Data datasets at InfoChimps
- Over 100 million time series at mercenary time-series aggregator DataMarket (free subscription required)
- The Dataverse project for archiving and storing research data
- Internet data sources for social scientists, from Cornell University
- Paul Hensel’s international relations data page and ISA’s data compendium (also by Hensel)
- The Empirical Studies of Conflict (ESOC) compilation of micro-level conflict data
- Emory University’s Electronic Data Center
- Google’s Public Data tool
- The ICPSR data archive
- Public opinion polls from Gallup
- ISA’s Data Compendium page
- The IMF’s Principal Global Indicators database
- User profile data for 59,946 San Francisco users of OkCupid, an online dating service
- Data on baseball, more baseball, football, hockey, and basketball players and teams
- Drone strike data from Dronestream (JSON feed / R tutorial), the New America Foundation, and the Bureau of Investigative Journalism, and a tutorial
- Raw data about frequencies of words in written vs. spoken English
- The International Social Survey Program
- Comprehensive electoral data from Adam Carr at Psephos
- The Panel Study of Income Dynamics website
- The Comparative Manifesto Project database
- The European Social Survey data website
- The Correlates of War
- International events data projects listed at the KEDS Project website
- Wikileaks data on Iraq war deaths, with latitude, longitude, and event type
- Wikileaks data on Afghanistan IED attacks, with spatial coordinates
- Historical economic data from The University of Groningen
- Monstrous omnibus lists of data archives: Social science data archives from Craig McKie
- Data Catalogs, a comprehensive list of open data catalogs from around the world
- Network data from Stanford
Presentation of Results
- Choose a color scheme using Colorbrewer or at colourlovers
- Select a new font or fonts at dafont
- Edit your initial image using Inkscape, an open-source Illustrator alternative
- Contemplate why exactly you suck at Powerpoint
Resources for Connecting with R
- The R Project
- The R Commander graphic interface
- Tom Short’s R Reference Card, with a great summary of some of the most useful commands in R
- Tools for making LaTeX tables in R
- The Rdatasets site, which catalogs all of the datasets available natively in R
- A useful blog post on importing data of different formats into R
- The rOpenSci catalog of R packages that interface with data repositories
- Quandl for R, a package for importing time series data from Quandl (above)
- RExcel, a program that integrates R into Excel
- Lubridate, an R package for handling dates (this will seem trivial unless you’ve tried to use dates in R)
- The Mondrian data-visualization interface, which can pull data from R to create interactive graphs
- The rdatamarket package for pulling data from DataMarket directly into R
- R datasets on truly random (but generally interesting) topics at reddit
- The incredible R Graph Gallery, with source code
- The ggplot2 R library [now in maintenance mode; being phased out in favor of ggvis]
- The ggmap R library, which plots latitude/longitude data on maps
- A good example of how to make a heatmap (not geographical) with R’s heatmap library
- A useful example of how to make 3D maps with R’s persp library
- Two tutorials (here and here) on combining maps with data
- A tutorial on how to turn time series data into calendar heatmaps in R
- “Large Datasets and You,” a primer on big data in R by Matthew Blackwell and Maya Sen
R + Javascript for Interactive Online Dataviz
- Shiny, an R package for creating interactive graphics with no (non-R) programming required
- healthvis, an R package for creating D3-enabled versions of some common graph types with no (non-R) programming required
- rCharts, Ramnath Vaidyanathan’s Javascript-fueled interactive-graphics package for R
- The Plotly API for R (see Plotly, above, for full description)
- ggvis, an R package for creating interactive graphics
- D3Network, an R package for creating network, tree, dendogram, and Sankey diagrams in D3
- The crazy-cool rPivotTable package for interactive data visualization (watch the GIF to get a sense of its capabilities)
- The awesome htmlwidgets package for using Javascript visualization libraries in R, R Markdown, and Shiny applications
Why Dataviz Isn’t Enough
- The Wikipedia article on Simpson’s Paradox
- An intro chapter from Wainer, Picturing the Uncertain World, with five great examples of De Moivre’s equation in action
- A fantastic set of spurious correlations from Tyler Vigen