Two datasets to learn how to be a wizard for rich visual analytics

& A selection of magical Keshif features for data insights

Coleman Harris
Keshif

--

We recently announced an incredible resource— a deep Knowledge Base that serves as an instructive and learning guide for Keshif users, and anyone interested in making data explorable easily. For our knowledge base, we had the challenge to choose two datasets to demonstrate a wide range of features and their powerful utility.

Among hundreds of datasets we explored with Keshif, we recognize that data types, relations, and hidden insights are unique to each; so are the best approaches for their analysis. Here, we share hints on what makes two sample datasets excellent examples of what happens when you combine rich data sources with a powerful visualization tool. Some would say, too powerful.

Here’s what goes into the magic that makes these datasets two perfect examples to demonstrate hundreds of spells, or, Keshif features. Photo by Artem Maltsev at Unsplash

The Airbnb Dataset

This dataset boasts over 9,000 Airbnb listings in Washington, D.C collected by the Inside Airbnb project. What makes it powerful is its diversity of data types available to explore! Each entry includes practically all publicly available data on Airbnb’s, from basics like price, number of reviews, and amenities, to more complex data such as the geographic coordinates, detailed information on the host, and tracking reviews over time.

#1. Going from large to small — Analysis with ordinal categories

When you are about to book an Airbnb last minute, how quickly you get a response can be crucial! In this dataset, we can find average response times binned into categories: such as “within an hour”, “within a few hours”, or “within a day.” And, a great visualization would show the options from shortest to longest!

Taking advantage of this logical order is a useful visual analytics practice to reveal their data trends. With custom sorting in Keshif, you can control the sorting order to match your data.

Keep this feature in mind when exploring surveys with questions about preferences, such as agree to disagree. It’s also relevant when attributes are binned into logical groups that follow an order or pattern, like short-, medium-, and long-term projects.

Learn more about this feature here!

#2. When the axis doesn’t cut it — handling skewed numeric data

The nightly price of an Airbnb listing in this dataset sits around $100, but there are a few cases where the price is much larger (like, thousands larger). Comparing price on this default scale can lose some of the details and nuance of the data that might be better understood with a different approach.

Histogram bins with logarithmic scale is a great way to bring balance to skewed numeric data. To add to visual analytics magic, Keshif automatically applies it when skewed numeric data is detected, AND once you set logarithmic binning, it’s set for all charts using that numeric data, such as scatterplot axes.

This is an option to keep in mind whenever you use numeric data. You can also change the scale of this measurement axis of all aggregate chart types ( such as categorical bar charts and line charts), which serves the same purpose — getting a more helpful view of your data when there are few measurement values that are much above the rest.

Learn more about this feature here!

#3. Location, location, location!

When we decide on a listing to book, or wish to understand trends across a whole city, the location of a listing is crucial, and this dataset provides approximate point locations of thousands of listings. However, looking at so many small points on a map is impractical and resource intensive. So, we’ve implemented automated clustering in Keshif maps to dynamically group nearby locations together! Taking it one step further, Keshif also enables visual cluster analysis, such as by percentage of town-house type properties.

Any dataset with locations opens an extra dimension of analysis and exploration available in Keshif. Take the Aid Worker Security database we’ve visualized here, or perhaps surveys with known response locations — visualizing trends over geographic regions is a powerful way to make sense of your data.

Learn more about this feature here!

#4. On comparisons — is the grass always greener?

Comparing across groups is a vital part of data exploration — it’s the basis for many types of analysis, from clinical trials to marketing. And in Keshif, it’s easier than ever. With a simple click, you can compare the top categories of any attribute across the other features in your dashboard. Exploring Airbnb prices by those with (and, shudder, those without) Wi-Fi? Done. Checking reviews by listings with a pull-out sofa vs. a real bed? Easy.

A detailed breakdown using Keshif to compare room types across Airbnb listings.

Keshif’s comparisons are usable across datasets with categorical attributes, and can be a powerful analysis tool to compare and contrast groups. A potential use case could be comparing the demographics and responses in surveys to better understand respondents. The possibilities are endless!

Learn more about this feature here!

The World Development Indicators Dataset

This dataset that we curated from the data released at data.worldbank.org is an inspiring and informative way to get a 30,000 foot view of the world, by analyzing some key numbers, such as population, urbanization ratios, life expectancies, and economic development for every country. The annually recorded data also gives historical context and trends about how the world changed in the last 50 years.

#1. Exploring data over time

The most exciting part about this data is the time-series component for each development indicator, running from 1965 through 2015. Users can utilize this to explore how indicators like urbanization and life expectancy change over time in countries across the world.

Users can perform these powerful analyses and visualizations at different time points, or comparing across time points, while also grouping by income and region if desired. An example of these visualizations is the bump chart, which compares changes in ranks for a group, like country, over time. Keshif makes it easy to perform these analyses and visualize how order changes over time.

Taking advantage of time-dependent data in your dataset is a useful analysis technique to understand how trends change over time. Datasets like survey responses with a time component could utilize this approach to identify trends and developments from respondents throughout the survey. Combining this with location data adds even more detail to the analysis, providing details about the data over time and location.

Learn more about this feature here!

#2. Time-series animations

Another advantage of time-series data is the ability to animate your stories as the data changes over time. Keshif automatically recognizes the presence of data over time and provides an option to animate any time-series chart. This is especially useful when comparing indicators of world development to better understand how something like GDP per capita changed throughout history across the world.

Learn more about this feature here!

#3. How you measure the data matters

When performing a visual data analysis, changing how you measure the data can impact the question you are trying to answer. Comparing average life expectancy yields different results than the total population, or even when comparing the raw count of countries.

This dataset is excellent for comparing between both geographic regions and income groups. And with each step, the detail to the analysis improves: while it’s great to look at the simple raw count of countries per category, it’s far more powerful to compare population totals for each region or average life expectancy between income groups.

Deciding how to adjust this measurement function dictates how you compare groups in the analysis. And for data with time-series or geographic components (or, both!), the analysis becomes far more flexible. Each comparable measurement is selected from a specific time point, creating a dynamic way of comparing groups and attributes over time. And in general, changing this summary of your data provides a deeper focus on the question you are trying to answer.

#4. Filtering — sometimes, it’s okay to exclude your “precious”

There are myriad ways to adjust and slice your data to better understand it, including using filters to exclude features and rows that aren’t of interest. And Keshif makes it easy to filter through and drill-down to the data you need.

In this dataset, comparing changes in population over the past 50 years is a useful way to understand countries across the world. Yet, two countries dominate the global population — China and India. In fact, in 2015 they accounted for nearly 40% of the world’s population.

When trying to understand these population trends, it might make sense to exclude these countries — Keshif offers a handful of ways to remove these outliers.

  1. Directly exclude the records (or rows) out of the dataset.
  2. Filter the population attribute using its chart to ensure no records have more than a billion people.
  3. Exclude groups of countries geographically (Asia, for it contains China and India), or filter to only include the other regions across the world.
An example of how to directly exclude (filter out) a row in your dataset.

This sort of drill-down analysis quickly provides the results you need for your visual data exploration. Whether comparing survey respondents or trying to understand the subjects in your dataset, filtering with Keshif gives you control of your dashboard to best make your data explorable.

🧙‍♂️Lumos insights!

This post serves as a set of important examples that take advantage of Keshif’s unique spells to brings data insights to your fingertips. And it’s no mistake that this is coupled with the rollout of our Knowledge Base at help.keshif.me, an important step in making visual analytics easy to understand. We are constantly iterating, and we’d love to hear your ideas about how we can make powerful data visualization more accessible for you.

Happy data exploration,

The Keshif Team

--

--

Writer for

Dad and statistician. Writes about data, science, and data science.