26th June 2019

How Big Data is shaping various sectors in 2019

Big data is a term that has increasingly been used in widespread forums over recent years and has perhaps grown in public consciousness through its discussed use in elections and even Brexit. The Institute of Physics defines1 big data as ‘a term describing the storage and analysis of large amounts of data which is frequently updated and comes in various formats, such as numeric, text, images or video’. Rebecca Tickle, a computer scientist from the University of Nottingham, puts it eloquently2 by saying that data becomes big ‘when we can no longer reasonably deal with it using traditional methods and process and store it on a single computer.’

Writing for Forbes3 at the start of the year, entrepreneur Kalev Hannes fascinatingly and in some ways correctly questions whether big data should be defined by the size of an underlying dataset, the size of the portion of it actually touched upon by queries rather than remaining archived, and/or the complexity of the queries involved. It seems fair for Hannes to assert that just because an organisation stores hundreds of petabytes of data, it cannot accurately be deemed big data if the majority is comprised backup archives. Indeed, big data management isn’t big data analysis, and he’s right to state that GPS tracker data for fleets and private vehicles if often more useful and manageable when recorded in longer intervals.

Initiatives using big data can typically prove just as valuable for small businesses and organisations as they can for much larger ones, because identifying the most insightful data and analysing it in the right ways is more fruitful than aimlessly trying to interpret torrents of information. Likewise, with cloud-based data solutions having become widespread, effective big data analysis isn’t the preserve of organisations with deeper funds, and it must be acknowledged that despite artificial intelligence (AI) and the Internet of Things (IoT) permeating so many areas of life, computers can only do what humans program them to do, the intellect ultimately emanating from physical minds4.

Use of big data in automotive

Data plays an instrumental role in Trak Global Group’s services, products and activities, from managing and analysing data for our insurance partners and bringing about safer roads and more efficient driving through our Carrot Insurance and Appy Fleet products, to the R&D and innovation explored through Trak Labs, our collaborative incubation hub. Automotive is therefore the first sector we will focus on in our latest big data snapshot.

A few years ago, car manufacturer Daimler invested in its own onsite data-centres, the firm’s analytics head Guido Vetter telling TechCrunch5 that ‘data is transforming our whole business’, and by 2016 its own ‘data lakes’ containing massive volumes of data had been built. Daimler has now recently announced, though, that its big data platform has been relocated to the cloud courtesy of Microsoft Azure in a project called eXtollo. More specialised and efficient scalability, flexibility and expertise in the data environment plus enhanced possibilities in AI were the chief aims behind this significant decision, and Azure was deemed the best solution in regard to encryption. Big data enables Daimler to make predictions or forecasts, from market trends and consumer appetites to vehicle maintenance and recalls.

Big data and ride-sharing

Ride-sharing firm Lyft6 reportedly benefitted from entering this growing mobility sector later than its primary competitor, Uber, as it was able to learn from the latter’s challenges in getting the Apache Hadoop architecture of its on-site big data infrastructure right. As a result, Lyft decided to go straight to the cloud and use AWS S3 object stores for its data, which is processed by AWS EC2 and primarily pertains to tracking and connecting drivers and riders through its app. In order to achieve the necessary scalability, Lyft migrated from Redshift to Apache Hive and also now uses Presto for more powerful queries in addition to ad hoc analytics, user experience the focus with all the big data architecture steps the operator continues to take.

Autonomous vehicles’ reliance on and generation of big data

Driverless vehicles are also already generating massive volumes of big data while they’re being tested by various OEMs, tech companies and others, with artificial intelligence very much at the heart7. AVs both analyse data in real-time and also use historical road, surroundings, weather and other data to enable them to navigate along routes as safely and efficiently as possible, while equipped with far more sensors, radars, GPS and other technology than everyday cars, which are themselves as complex as ever. Platforms connected to by driverless cars will be memory-optimised and allow seamless scaling, and compatibility with third-party systems will be vital as operators including Alphabet, Google, Tesla and Waymo (whose driverless cars have now covered in excess of 10 million miles8) work towards launching self-driving taxi fleets and other vehicles. Eventually, when full Level 5 autonomy has been proven to be safe, the ethical decisions9 AVs will be required to make will ultimately come down to analysing big data very rapidly – as they are, after all, machines rather than animate beings.

Parking made possible through big data

Concluding our latest big data selection from our own automotive sector, the Chinese city Shanghai10 has turned to harnessing big data to address the problems resulting from cars outnumbering parking spaces six to one as of Q4 2017. We always welcome news of car park-sharing initiatives and Shanghai currently operates around one hundred, which are now monitored by sensors and cameras and data on available parking spaces shared on an official smartphone app. Hospital outpatients are able to park in nearby residential spaces and vice versa, all facilitated through big data. It would certainly be welcome if businesses, hospitals and other organisations in the UK were in like manner open to making their parking spaces available at off-peak times, which would reduce pavement parking in areas surrounding high-rise apartment blocks, for instance.

Big data in the healthcare sector

Seeking to find breakthroughs in the treatment of sepsis, the number one cause of death amongst hospitalised patients, healthcare researchers in the States have analysed 29 clinical variables across the medical records of over 20,000 patients using big data11. Focussing on those identified as having sepsis within six hours of admission between 2010 and 2012, the Sepsis ENdotyping in Emergency CAre (SENECA) project, funded by the National Institutes of Health and led by associate professor Christopher Seymour, clustered these specific patients into four types before analysing a further 43,000 patients’ data spanning the following two years. After deep study including comparison with international trials, senior author and professor Derek Angus spoke of the teams’ conclusions resulting from big data revealing hidden subtypes of sepsis: “Intuitively, this makes sense – you wouldn’t give all breast cancer patients the same treatment… The next step is to do the same for sepsis that we have for cancer — find therapies that apply to the specific types of sepsis and then design new clinical trials to test them.”

Keeping measles at bay

At a time when specific localities in the U.S have made news headlines because of certain communities expressing anti-vaccination stances, attention is increasingly turning to predicting outbreaks of once virtually-eliminated diseases like measles by utilising big data, machine learning and artificial intelligence12. The World Health Organization (WHO) has developed a risk assessment tool for areas falling short of vaccination targets, but retrospective historic data including vaccination rate statistics, along with syndromic surveillance data, social media, geotagging and predictive algorithms are also being used. As with any data-centric projects or activities, quality is vital, variables continuously need optimising and in the case of diseases like measles, large volumes of data lead to better results and more precise modelling.

Big data and mental health

With Mental Health Awareness Week having just been held across the UK, it’s encouraging that big data is helping organisations become more effective at offering support and intervention. The WHO places the annual global suicide figure at over 800,000 and the SMS-based platform Crisis Text Line (CTL) has received over 100 million messages so far. In an interview with the Global Dispatches13 podcast hosted by Mark Leon Goldberg, Crisis Text Line’s chief data scientist Bob Filbin revealed that they received a spike of over four times the normal volume of messages following Anthony Bourdain’s suicide, amidst the inevitable and widespread media attention that ensued, which tends to lead to an increase in suicide ideation.

Crisis Tech Line responds to such periods of peak demand by using algorithms to enable it to provide suitable and dynamic levels of counsellor staffing. CTL is the first crisis organisation built from the ground up with technology and data at its core, analysing the content of every SMS received in real-time at critical emotional moments rather than retrospective ‘last year’ surveys, resulting in unique insights. Such use of data enables CTL to improve the messages sent by their counsellors in response to move someone out of crisis. Instead of placing service users in traditional telephone queues, CTL prioritises cases by severity by analysing incoming data for trigger words such as ‘bridge’. At the research level, Crisis Text Line data is being used to shape policy, an example being a children’s hospital in Colorado that found texting to be the most effective channel for young people experiencing abuse, they and other organisations consequentially moving away from focussing on traditional, specific trigger words such as ‘abusing’ in favour of expressions like ‘mean’ within SMS messages. With such a prominent reliance on big data, Crisis Text Line had to make numerous data protection and security considerations during the platform’s expansion into other English-speaking countries outside the U.S, Canada and UK where it has operated so far, and their current trials towards going multilingual require algorithms to be intelligently translated in various languages to retain the same effectiveness. Bob Filbin believes that the power of the data collected by Crisis Text Line represents the world’s first look at crisis on a real-time global level, which simultaneously provides rich insight into how people talk about crisis, therefore enabling such platforms and other organisations to intervene more dynamically.

Farming and agriculture’s use of big data

Big data continues to enable growers and other organisations throughout the food and drink sector to refine their techniques, optimise productivity and ultimately boost profitability or effectiveness, and a former programmer for GE in India, Krishna Kumar, quit his job to help improve the livelihoods of farmers in an ecosystem he identified as fragmented, data being the biggest gap. Nikkei Asian Review14 has published a motivating snapshot of how Krishna’s CropIn platform has successfully linked smallholders and buyers and facilitated the collection of significant data on farmers’ yields, methodologies and real-world challenges. CropIn uses machine learning and analyses how farms are performing under different conditions, the data gathered then used to predictively model yields and potential issues months in advance of each harvest. As of 2016, CropIn’s data encompassed over 250 different crop types and two million hectares of agricultural land, and having secured contracts in 29 countries, the business has also raised venture capital to grow further. It’s just one of countless offerings in the precision and digital farming sector that Global Market Insights forecasts to be worth $12billion by 2025, the tech’ utilised spanning IoT systems and drones to sensors and automated robotics.

Connected beehives and big data

The IoT is having an impact on the beekeeping15 industry and $2.4 billion honey market, with ApisProtect from Ireland the current leader in connected beehive sensors but continuously joined by entrants in countries from Bulgaria and Israel to Italy and the UK, where Olombria is focussing on flies’ role in pollination. ApisProtect’s hive sensors monitor sound, movement, humidity, CO2 levels and temperature and the resulting data is analysed by machine learning, enabling alerts to be triggered over unusual events and the presence of pests or disease. Big data means that bee colonies can be optimised and the company’s AI and algorithms are continuously improving with input from human beekeepers as well as around 10 million bees. Real-time data, alerts and other information are cleverly fed to the Slack collaboration platform to enable relatively easy access for the stakeholders involved.

Hiveopolis, a pan-European project led from Berlin, is remarkably developing a robot bee that will guide other bees to the best locations for harvesting nectar and pollen or coax them away from various dangers. And Pollenity, a startup from Bulgaria, uses AI and sensors too but has also partnered with BuzzCoin to introduce blockchain and cryptocurrency to beekeeping as a way of enabling small-scale keepers to earn an income from their endeavours.

It must be acknowledged, though, that despite all such big data success stories, critics will still express scepticism, David Lobell writing in Scientific American16 for example that ‘poorer parts of the world lag far behind in getting the tools they need to survive’. His assertion that lack of data delayed an otherwise quick fix to disappointing milk yields from cows in the East of Africa sounds plausible, and we agree with his sentiments that ‘farmers in the global north should not be the only ones basking in the sunlight of reliable data’.

Why may some big data projects fail?

The statement ‘nearly all big data projects end up in failure’ in an InfoWorld17 article may seem shocking, but Nick Heudecker, an analyst from Gartner, estimates that 60-to-85% of big data projects don’t turn out to be successes. Additionally, Microsoft executive Bob Muglia painted a negative picture when speaking to Datanami, pointing to failure in understanding Hadoop as a major factor, it being “the engine that launched the big data mania.”

Heudecker attributes all-too-common integration of siloed data from multiple sources, many of which are often legacy systems, as a primary stumbling block behind many big data projects. “A lot go the data lake route and think if I link everything to something magic will happen. That’s not the case”, he commented, before adding that many clients don’t grasp and leverage the data at their disposal and its relationships, leading to project cessation.

Undefined goals and lack of structure to both projects as a whole and the data they revolve around is cited as another reason, while other voices in the field stress the importance of identifying discrete business problems to solve using data rather than working haphazardly with a data lump or landfill. Gaps in skills and in the ability to understand the blending of historic and current data with newer streams of data such as social media, and in the difference between big data and data warehousing, are also identified as failure contributors. We agree with Douglas Merrill’s thoughts in his fintech article18 for Forbes that ‘tinkering around the edges’ of big data is indeed counterproductive.

Planning ahead, collaborating effectively, focussing on specific organisational needs for data, and recognising the wisdom in upgrading hardware, databases and other systems rather than shoehorning older platforms are steps recommended for entities considering big data analytics.

What will shape the future of big data?

It’s indisputable that the arrival of 5G mobile18 networks with significantly-enhanced bandwidth will enable machine learning technology to develop rapidly and spawn new services and applications. We welcome the way in which machine learning tools are likely to become less technical, enabling data scientists to work across silos and focus on the quality of the data and the insights extrapolated from it. Car manufacturers and countless other organisations across all sectors will increasingly place big data and analytics at the core of their services and product development. Data itself may become more ‘open’, although security challenges such as GDPR compliance will remain, and we foresee more reliance being placed on real-time big data, made possible partly by the power and flexibility of aforementioned 5G plus ever-enhancing platforms that also continue to become more affordable.

It’s clear that big data and its analysis is reaping significant rewards for wide-ranging businesses, organisations and even communities throughout the world and we will continue to highlight noteworthy developments.

 

Sources:
1. https://beta.iop.org/big-data
2. https://www.youtube.com/watch?v=H4bf_uuMC-g
3. https://www.forbes.com/sites/kalevleetaru/2019/01/09/how-do-we-define-big-data-and-just-what-counts-as-a-big-data-analysis/#60c82f691b66
4. https://enterprisersproject.com/article/2019/4/4-big-data-myths-busted
5. https://techcrunch.com/2019/02/20/why-daimler-moved-its-big-data-platform-to-the-cloud/
6. https://www.datanami.com/2019/05/10/whats-behind-lyfts-choices-in-big-data-tech/
7. https://www.androidheadlines.com/2019/05/big-data-behind-driverless-cars.html
8. https://venturebeat.com/2018/12/10/waymo-tests-ai-driving-system-that-learns-from-labeled-data/
9. https://www.smartdatacollective.com/analyzing-big-data-key-successful-self-driving-vehicles/
10. https://www.scmp.com/news/china/society/article/3011170/apps-helping-shanghai-tackle-its-chronic-car-park-shortage
11. https://www.sciencedaily.com/releases/2019/05/190519162344.htm
12. https://www.medicaldaily.com/how-big-data-and-machine-learning-can-predict-prevent-isolated-cases-disease-435359
13. https://www.undispatch.com/how-big-data-and-text-messaging-can-prevent-suicide-around-the-world/
14. https://asia.nikkei.com/Business/Startups/Asian-farmers-cultivate-big-data-for-richer-harvest2
15. https://sifted.eu/articles/bee-tech-startups-smart-hives-big-data-robot-beehero-apisprotect-pollenity/
16. https://blogs.scientificamerican.com/observations/big-data-has-transformed-agriculture-in-some-places-anyway/?redirect=1
17. https://www.infoworld.com/article/3393467/4-reasons-big-data-projects-failand-4-ways-to-succeed.html
18. https://www.forbes.com/sites/douglasmerrill/2019/05/06/fintech-needs-to-stop-tinkering-at-the-edges-of-big-data/#7936e33e5d43
19. https://www.cio.com/article/3393162/8-factors-shaping-the-future-of-big-data-machine-learning-and-ai.html