Recap from the #DataDive on supporting disadvantaged youth

Adil Yalcin
Keshif
Published in
6 min readDec 9, 2016

--

Keshif was at the #DataDive organized by DataKindDC in collaboration with Annie E. Casey Foundation, an organization devoted to developing a brighter future for millions of children at risk of poor educational, economic, social and health outcomes. This DataDive was a weekend-long 100-people volunteering effort to bring socially conscious analysts, statisticians, data scientists, coders, hackers, and designers to search for data-driven solutions to problems that impact 2 million kids in the US. The volunteers split into four teams, worked on multiple data sources on complex issues, and presented their findings and suggestions at the end of the DataDive.

On Friday, the volunteers received an energetic powerful introduction to the four projects.

One of the strongest messages was “Data is human”. From the first day, the volunteers met and learned from young people who were in the system and successfully overcame many challenges. Their stories were very moving and motivating, and really highlighted the human side of the data. Another humanizing example was from the project on improving matching disconnected youth to homes best suited for their individual needs. Although this looks like automated match-making (like popular dating applications), the team recognized they were facing a very different setting. With home/shelter placements, a staff member controls the process, so the solutions should support decision-making instead of automating decisions. More importantly, the outcome of any matching impacts the life of a young person. So, demographic, racial, and regional trends need to be very carefully considered and communicated. There were so many other insightful take-aways to list here, shared by passionate, friendly, high-energy volunteers.

Exploring the potential of connected data systems

I joined the team working on Connecting Public Data Systems to Better Understand System-Involved Youth. Our challenge was to create a bigger picture and generate insights by merging data from multiple systems. Each system described a different aspect such as homelessness, behavioral health services, welfare periods, and placements such as foster or kinship care. We had many opportunities to analyze this rich data, which can be summarized as (i) understanding demographic distributions in youth programs, and differences across programs, (ii) understanding patterns of migrations and services received from different programs, and (iii) identifying correlations in placements and outcomes based on personal histories.

Our team showed great progress towards our goals in two days of hard-working, from data cleaning to analysis and presentation. Keshif was used in many creative ways during the event to enable easy, effective, visual analysis. Many team members used Python and R to merge different datasets by unique identifiers, clean up data, find people shared in different services, and generate preliminary charts to understand trends in demographics, overlaps in different systems, and correlations. Their data wrangling work enabled much of the final analysis made with Keshif, and showed new ways to look into the data. As one of the last, yet very important, steps, we analyzed migrations and temporal trends using another tool, EventFlow, from the research lab where Keshif started, HCIL.

Exploring youth system data with Keshif

I used Keshif first to understand various datasets we received from different systems individually. I moved multiple csv files into a Google Sheet for easy collaboration and sharing, imported them to Keshif using YourData page, iterated on the dashboards, and shared the links and some screenshots on our shared notepad. These were some of my first steps as we were still trying to figure out the unfamiliar data.

As other members checked out the early Keshif browsers, I demonstrated features of the tool to get different measurements, and helped others import new processed datasets. Over the weekend, members of our team, and even other teams, created new Keshif browsers. For example, Sheila Flick, who was working on Youth and Young Adult Outcomes project, used Keshif to look at demographics data for her team, and shared her experiences from the DataDive at her blog. I was in email contact with her earlier, and it was great to finally meet her while sharing our data passion together.

The event was fast-paced and our team had ambitious goals. Every volunteer had different backgrounds, perspectives, and hardware configurations. While everyone was utilizing the tool they were most comfortable in and analyzing the data in different perspectives, finding tools that helped across different stages of data analysis for many people was challenging. It was a great setting to test-drive Keshif for collaborative data analysis, and it performed really well. Some of the features that helped were:

  • Keshif doesn’t require software installation or a specific operating system. It works all inside your browser, even without setting up a profile.
  • Keshif enables a rapid workflow where data browsers can be shared easily using unique links with a single button (gist-upload).
  • Data can be imported from Google Sheets, easily accessible and usable.
  • With Keshif, going from data to visualization takes no time. So, we could focus on understand what data really meant, rather than how to visualize and explore it.
At the end of the DataDive, our team ambassadors presented results, including a live demo of one of the Keshif browsers focusing at the youth members active in the system.

Analyzing connected data using EventFlow

One of the most important needs for our project was to understand how events in one dataset relate to events in another dataset (such as, understanding relations between receiving mental health services before, after, or during various placements.) The key perspective change that enabled us to utilize EventFlow tool was to bring all the datasets together using an event-based approach, where each event (row) described the individual’s unique id, event type, and start/end dates. My team mates processed and generated event datasets that focus on the individuals, and I tried to sharpen our focus on some selected key event categories working with the foundation member in our team.

We could gather all the datasets together on the last day a few hours before the presentations, so we had a very short time to analyze the data. Still, EventFlow made it very easy to align, query, filter and search event sequences and relations very quickly. We presented results on how frequently various migrations happened before and after going to homeless shelters, which EventFlow quickly showed using percentage and absolute number of records.

Given our little time, our results were preliminary, and there remained so many other patterns to reveal and understand. Even the reduced dataset we worked on had many categories and different ways to align and query events. There are many questions that the foundation carried back home, in addition to new ideas, tools, and potentials of utilizing data to improve their services and impact. We could also enrich our analysis by importing demographics of individuals, and focusing on analysis of outcomes.

EventFlow visually shows an overview of what comes before and after a selected event .Here, we see migrations across homeless shelters and other placements. The individuals are lined up on the first time they went to a homeless shelter.

Wrapping up

This was my first DataDive event of DataKindDC, and I hope to attend many others as I find the opportunity. Over the weekend, I volunteered on an important problem that impacts millions of kids, learned a lot of strategies for data analysis and cleaning, grew my bonds with the familiar faces and friends I already knew, and made some new friends. I was also very excited to see how Keshif helped my team and the foundation to see the potentials of their data to improve good outcomes and policies.

A snapshot of volunteers data-diving together on Saturday. Photo by Greg Matthews.

--

--

Enabling Data-Driven Insights for Everyone | Founder-CEO at @keshifme ~ Visual Data Analytics and Exploration, Design, Engineering, Entrepreneurship