Those of us who work in data science are part of a field that is advancing at lightning pace. We have front row seats to a show where data is the charismatic protagonist that everybody wants to get to know. It has come to the point where for any large company or startup, the data that they collect and what they do with it is either part of, or as important as the product they produce.
There is no doubt that data and data science are transforming how the world operates and impacting our daily lives, but like so many resources, the benefits are not evenly spread. While the use of data by companies to improve products and sell more of them is widespread, the situation for organisations tackling issues that really matter is wildly different.
If you are reading this, perhaps you work for on the frontline of a pressing social or environmental challenge, and think that data is part of the answer. Or maybe you work in in data science, and feel that it has the potential to serve the greater good. I share these views with you, and have been lucky enough to meet and talk with people in both camps who have helped me to form thoughts about how we might break down the barriers.
Depending on our backgrounds, we each look at our environment through different lenses. As data scientists, we develop our own additional perception of the world. It doesn’t quite start to look like the Matrix, but I do find as I walk through the streets almost all human activities - sights, sounds, movements, systems - appear measurable and quantifiable. For many others, the world might be viewed through lenses shaped by human understanding, policy, social sciences, culture, power, and oppression. Data-vision doesn’t offer magical advantages over any of these alone. To create visions of how to solve complex problems and use data, we need to stack up the data science lens with the others. As we step out into the new world, we will see new opportunities, but we will also have to navigate new barriers.
What is Data?
A common barrier that I have encountered when starting to explore potential of data science with charities, NGOs and other social organisations is the disparity between what we think of as data. The common conception is that data generally takes the form of numbers, carefully collected through methods such as surveys, and kept in a spreadsheet. Data is boring, static, but necessary. The reality is that we can now treat just about anything as data, and the potential sources are numerous. Inside social impact organisations there are documents, photographs, internal records, and event logs. From the outside world, we can collect online articles and images, get weather records, download satellite and ground level images, listen to underwater recordings, crowdsource information… the list goes on.
Our ability to use computers to analyse not just numbers, but text, sound, images, and videos as data means we can distil richer insights than ever before about complex issues. There is big business in continuously innovating to find new ways to track us on and offline, to turn our personal lives into data and serve us advertisements. It’s about time that we started to realise that Opening up that idea to the people with the questions will already get us a long way to answering them.
What is Data Science?
The second obstacle is a concept and language barrier. Once we have data, making use of it is no longer constrained to statistics, bar charts, and manual analysis. Those are useful tools, but modern data science methods go further and can find patterns in data that no human could spot, be used to predict future occurrences with increasing sophistication, or automate tasks that, up until now, have required intensive human labour. This is made possible by advances in machine learning, where computers can be taught by example, rather than hard coded to follow preset rules. These ideas are being used to power self driving cars and recommend us movies, but the fact is that the underlying technologies can be applied to revolutionise responses to social, civic and environmental challenges. We just need to bring the right people together more often to have the conversations that create those links.
It doesn’t help that the world of data is full of technical language wrapped in business jargon. “Big data”, “artificial intelligence” and “powered by the cloud” are the kind of terms that sound alluring but intangible and unreachable for social impact organisations. The truth is that they take attention away from what is actually possible and valuable about using data science. Most of the technologies are actually well within the reach of many organisations, and it is up to data scientists to find opportunities to communicate them clearly.
There are countless ways that data and data science could help those who are focused on improving lives around the world. The examples here are very general, but from my conversations with charities and NGOs are common low hanging fruit. With innovation from the right mix of data science and domain expertise, the possibilities could be limitless.
- Automated Data Collection Systems - Why should someone who is a Master in public policy spend their time gathering evidence by repetitively clicking through pages on a website, to download PDFs and enter the information from them into a spreadsheet? Data engineers can build web scrapers to automatically collect the same documents and extract the information straight into a database.
- Understanding the Past - Many organisations keep records relating to their activities or the communities they serve, but these often get looked at only for operational purposes. Data scientists can use machine learning to uncover patterns and build dynamic visualisations to help understand trends.
- Making Predictions - Can we predict crises before they happen? As well as analysing the past, data scientists can use machine learning to make forecasts. We may be able to predict conflicts, the locations of communities at risk from disaster, or a citizens becoming homeless.
- Communicating Issues - Is it me or is it hard to get your message across these days? Interactive data visualisations are a fantastic way to get a message across, and if done properly can allow people to get a sense of issues at both a high and a granular level.
- Data Sharing - Many problems could be better understood if organisations could pool their data. In data science, a common trope is that more data beats a more sophisticated algorithm! Although there are political elements to this, data scientists and engineers will be able to help understand what data is useful to share, and to build solutions that respect privacy and security concerns.
In a future article, I aim to write about how, as technologists and data scientists we can more easily make our services available and build successful projects like these with partners on the front lines of global challenges. Of course data can’t be a solution alone, but it can facilitate us to do understand problems and create better solutions. To me, the future of data science in tackling complex social issues lies in computer-human collaboration, where the tools we build will not solve problems for us, but better enable us to build change that is grounded in compassion, empathy, and equity.