Data science can be such a force for change - but do not fall for all of the hype
Today’s world of sophisticated data harvesting and analytics is a far cry from when Mark Twain said, ‘There are three kinds of lies – lies, damned lies and statistics,’ in the early 1900s’.
This is particularly true in the investment sector, where data science-based approaches prevail. Investors who pour money into projects and companies without using key insights from Big Data to steer their decision are probably not going to see a great RoI.
Public companies are more likely to use data science than their private counterparts, which still rely heavily on legacy investment tools with limited use of the current Machine Learning algorithms and data aggregation techniques.
What is data science?
Data science involves a combination of domain expertise and programming skills, coupled with understanding of statistics to extract meaningful insights. Data scientists must be masters in many fields, including computing, statistical analysis, machine learning, deep learning, data visualisation, data wrangling, mathematics and programming.
Does data science have a future?
A resounding ‘Yes’. Data harvesting methods and capabilities continue to become exponentially more sophisticated, generating ever-growing quantities of data. Parallel to this is the growing capacity to automate data analysis, leading to widespread adoption of data science as a key metric.
It is estimated that the global data science platform market size was valued at $95.3 billion in 2021 and estimated to reach $322.9 billion in 2026, representing about 300 per cent increase in just six years.
Additional factors driving market growth include the rising adoption of cloud-based solutions, the growing application of the data science platform in various industries, and the mounting need to extract in-depth insights from voluminous data to gain competitive business advantage.
Reliability of data science
Using data science is somewhat akin to the use of Google when searching for something. Google generates answers, yet there's no guarantee that these will be accurate and adequate, begging the question, ‘who decides whether a given answer is the right one?’
People are rethinking the assessment of the answers produced by data science approaches and refining multiple ways of solution formulation to generate the best results. Yet, most of these approaches deal with the mathematical correctness of the models. On the other hand, a mathematically correct answer can be totally meaningless.
Say, if x = 5, and y = 2, then x/y = 2.5, but if x is the number of oranges and y is the temperature, then 2.5 doesn't make any sense at all. In the same vein, many Google-generated answers don't make sense, either. So, how does one get around this?
Even when a search generates several results, using common sense, we can often disregard some of the answers by inspecting the first few examples. This is not because we know the answer, but because we know what the answer cannot be.
Make better investment decisions with data science The answer to this is a three-pronged approach: better models, better data and interpretable AI.Firstly, more specific models must be built, with fewer examples of repurposing. Biotech is a good example, where many solutions in the data science domain are built from scratch, specifically to address a given problem.
This helps to avoid the interpretability problem that arises when a model that was built to analyse one set of data, is adapted to another, often with unforeseen flaws. Secondly, building customised models takes time and lots of data. For example, financial institutions generally operate in a data scarce and time scarce domain. To overcome these problems, one must appreciate that a) primary data is not the only data, and b) there is no way the system can advance without deep research. Many financial institutions do not have R&D departments to help solve specific problems. Therefore, when it comes to data, alternative data is valuable. It is often much easier to get, and, with the right means, it can be effectively converted to data which sheds light on different areas of company performance.
Thirdly, it is essential to incorporate interpretable AI. Frequently, more and more sophisticated models are used without a good understanding of how a given model generates its results.
While catch-phrases like ‘deep learning’ and AI are impressive when pitching a deck to investors, not all such models tend to outperform the classic ones. Furthermore, it is important to understand that over 3 per cent in accuracy is not always enough of a reason to abandon a less precise, yet interpretable, model. Key trends in data science Data science is changing almost daily. From data governance to DeepTech, the industry is set to face major shakeups. Keeping abreast of the trends is important to ensure that data science remains ethical, insightful and authentic. Some of the fastest-growing data science trends include the following:
- Explosion in deep-fake video and audio, which is used to facilitate scams.
- More applications created with Python, even for developing blockchain applications.
- Increased demand for end-to-end AI solutions, to assist enterprise customers clean their large data sets and build ML models.
- Companies hiring more data analysts to parse and analyse the growing amount of available data.
- More and more data scientists are joining Kaggle, the world’s largest data science community with over eight million users in 194 countries.
- Increased interest in consumer data protection and privacy, particularly in the wake of the Cambridge Analytics scandal.
- AI developers are combating adversarial machine learning, where an attacker inputs data into an ML model with the aim of causing errors.
Data science is relevant across sectors
In addition to the investment sector, it also forms the basis of important decisions across a variety of industries. Complex problems in many areas have been solved with data science, from the prediction of agricultural development and the success of crops to drug toxicity and fund allocations with certain financial instruments.