Big Data is not so big: Data Science for small- and medium-sized enterprises

New basis for any business

Current advances in technology are in many ways fueled by the growing flow of data coming from multiple sources and analyzed to create competitive advantage. Both individual users and businesses are switching to a digital system¹, which in turn generates pools of information. In their turn, organizations share data with other companies, giving rise to digital ecosystems that begin to blur traditional industry borders. As the amount of data available grows, the size, diversity, and applications of it are accelerating at a near-exponential rate, and businesses are discovering that traditional data management systems and strategies do not have the means to support the demands of the new data-driven world.

If several years ago data analytics was used mostly in finance, sales and marketing (such as customer targeting) and risk analysis, today analytics are everywhere²: HR, manufacturing, customer service, security, crime prevention and much more. As Ashish Thusoo, co-founder and CEO, Qubole, pointed out, “A new generation of cloud-native, self-service platforms have become essential to the success of data programs, especially as companies look to expand their operations with new AI, machine learning and analytics initiatives.”

While, according to a report by Qubole³, only 9% of businesses already support self-service analytics, 61 percent express plans for moving to a self-service analytics model. With different forms of data collected and connected to aid businesses in drawing analogies between datasets, coming up with actionable insights and improving decision-making, Big Data and Data Science have moved to the foreground of the industrial and commercial sector.

However, the volume of data may not be the decisive factor for optimizing business operations. Small- and medium-sized businesses need to understand the benefits that intelligent data analytics can bring and the opportunities for data collection and management.

Big Data and its relation to Data Science

Big Data is a term covering large collections of heterogeneous data whose size or type is beyond the ability of traditional databases to capture, manage, and process. Big Data encompasses all types of data, namely:

  • structured (such as RDBMS, OLTP, transaction data, etc.)
  • semi-structured (XML files, system logs, text files and such), and
  • unstructured information (emails, blogs, digital images, sensor data, web pages and many other types).

The sources of big amounts of data are multiple and varying depending on the industry or the business sector: data may come from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media, with much of it generated in real time and on a very large scale.

The heterogeneity of data and inclusion of unstructured information in the data set require specialized data modeling techniques, tools, and systems to extract insights and information. Analyzing large amounts of data allows businesses to make decisions based on the data that was previously inaccessible or unusable with the help of advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing. In this sense, the term Big Data refers to the whole range of the processes that information goes through, encompassing data gathering, data analysis, and data implementation.

Such scientific approach which applies mathematical and statistical ideas and computer tools for processing big data is called Data Science. It is a specialized field that combines intersecting areas such as statistics, mathematics, intelligent data capture techniques, data cleansing, mining and programming to prepare and align big data for intelligent analysis to extract insights and information.

Hence, the field of Data Science has evolved from Big Data, or Big Data and Data Science are inseparable.

The big Vs of big data:

The concept of Big Data as is known today was rolled running in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of Big Data as the three Vs:

Volume: with data coming from sensors, business transactions, social media and machines, there is a problem of the amount of data required for analytics is considered to be solved.

Velocity, or the pace and regularity at which data flows in. It is critical, that the flow of data is massive and continuous, and the data could be obtained in real time or with milliseconds to seconds delay.

Variety: for the data to be representative, it should come from various sources and in many types and formats.

The initial concept has evolved to capture other factors that impact the effectiveness of manipulations with data, such as:

Variability: in addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic daily, seasonal or event-triggered peaks that need to be taken into account in analytics.

Veracity: in most general terms, data veracity is the degree of accuracy or truthfulness of a data set in terms of the source, the type, and processing techniques.

As technology evolves, more aspects of data come into the foreground giving rise to new big Vs.

Challenges posed by Big Data to businesses

Even though the amount of data collected is sufficient for analytics, it cannot guarantee that the analytical findings will be useful for the company. The problems that companies face in their quest for effective analytics can be triangulated into the problems related to the overabundance of versatile data, the lack of tools and the talent shortage.

On the data processing and machine learning side, analyzing extremely large data sets (40%), ensuring adequate staffing and resources (38%) and integrating new data into existing pipelines (38%) were called the primary obstacles to implementing projects.

Oversized pool of data

The research firm Gartner forecasts that in 2019 we will see 14.2 billion connected things in use resulting in a never-ending stream of information that can become a challenge for drawing meaningful insights.

Lack of adequate tools

To successfully compete in today’s marketplace, small businesses need the tools larger companies use. In its 2018 Big Data Trends and Challenges report Oubole, the data activation company, stated that 75 percent of respondents also reported that a sizeable gap exists between the potential value of the data available to them, and dedicated tools and talent dedicated to delivering it.

Changes in labor market

The spreading of new technologies will shift the core skills required to perform a job. The Future of Jobs Report estimates that by 2022, no less than 54% of employees will require re- and upskilling. According to Qubole, 83 percent of companies say it is difficult to find data professionals with the right skills and experience.

For business these challenges mean that they need to choose between retraining their existing personnel, hiring new talent with required skills and invest into developing their own tools for data collection and processing, purchasing third-party analytical products or finding subcontractors for doing Big Data Analytics.

Applications of Big Data

Big Data affects organizations across practically every industry and of any size ranging from governments and bank institutions to retailers.


Armed with the power of Big Data, industries can turn to predictive manufacturing that can improve quality and output and minimize waste and downtime. Data Science and Big Data Analytics can track process and product defects, plan supply chains, forecast output, increase energy consumption as well as support mass-customization of manufacturing.


The retail industry largely depends on the customer relationship building. Retailers need their customers, the most effective way to handle transactions, and the most strategic way to bring back lapsed business — and Big Data provides the best solution for this. Originated from the financial sector, the use of large amounts of data for customer profiling, expenditures prediction and risk management become the essential Data Science tasks in the retail industry.


The digital marketing spectrum is probably the biggest application of Data Science and machine learning. Ranging from the display banners on websites to the digital bill boards at the airports — almost all digital advertisement is decided by Data Science algorithms. Based on the user’s past behavior, digital advertisement ensures a higher CTR than traditional advertisement targeting the audience in a timely and more demand-based manner. Another facet of digital marketing is recommender systems, or suggestions about similar products used by businesses to promote their products and services in accordance with the user’s interest and relevance of information.


Remaining a new application for Data Science, logistics benefits from its insights to improve the operational efficiency. Data science is used to determine the best routes to ship, the best suited time to deliver, the best mode of transport ensuring cost efficiency. Furthermore, the data that logistic companies generate using the GPS installed on their vehicles, in its turn creates new possibilities to explore using Data Science.

Media & Entertainment

The current consumers’ search patterns and the requirement of accessing content anywhere, any time, on any device lead to emerging new business models in media and entertainment. Big Data provides actionable points of information about millions of individuals predicting what the audience wants, scheduling optimization, increasing acquisition and retention as well as content monetization and new product development.


In education, data-driven insight can impact school systems, students and curriculums by identifying at-risk students, implementing a better system for evaluation and supporting of teachers and principals.

Health Care

Big Data Analytics is known as a critical factor to improve healthcare by providing personalized medicine and prescriptive analytics. Researchers mine data to see what treatments are effective for particular conditions, identify patterns related to drug side effects, strategize diagnostics and plan for stocking serums and vaccines.

How it works

Step 1. Discover the data sources

The first step for processing data is discovering the sources that might be useful for your business. The sources for Big Data generally fall into one of three categories:

Streaming data — the data that reaches your IT systems from a web of connected devices, often part of the IoT.

Social media data — the data on social interactions that might be used for marketing, sales and support functions.

Publicly available sources — massive amounts of data are available through open data sources like the US government’s data.gov, the CIA World Factbook or the European Union Open Data Portal.

Step 2. Harness data

Harnessing information is the next step that requires choosing strategies for storing and managing the data.

Data storage and management: at present, there are low-cost options for storing data in clouds that can be used by small businesses.

Amount of data to analyze: while some organizations don’t exclude any data from their analyses, relying on grid computing or in-memory analytics, others try to determine upfront which data is relevant to spare machine resources.

Potential of insights: Generally, the more knowledge you have, the more confident you are in making business decisions. However, not to be overwhelmed, it is critical to select only the insights relevant to the specific business or market.

Step 3. Choose the technology

The final step in making Big Data work for your business is to research the technologies that help you make the most of Big Data Analytics. Nowadays there is a variety of ready-made solutions for small-businesses, such as SAS, ClearStory Data, or Kissmetrics, to name a few. Another option to tackle your specific needs is to develop — or subcontract — your own solution. In the choice it is useful to consider:

  • Cheap, abundant storage;
  • Fast processors;
  • Affordable open source, distributed big data platforms, such as Hadoop;
  • Parallel processing, clustering, MPP, virtualization, large grid environments, high connectivity, high throughputs and other techniques to optimize analytics;
  • Cloud computing and other flexible resource allocation solutions.


In the past, Big Data was used primarily by big businesses, since they were the only ones who could afford the technology and channels used to collect and analyze the information. However, today, even smaller-scale businesses can take advantage of Big Data and Data Science by choosing the relevant information they might use for their specific needs, and selecting tools or teams that can accessed remotely and on demand.

Critically, the importance of Big Data doesn’t revolve around the amount of collected data, but around the specific insights it may bring to a specific business or an industry. The combination of relevant Big Data with high-powered and targeted analytics can serve the following tasks:

  • Determining causes of failures and defects in near-real time;
  • Generating advertisements or promotion campaigns based on the customer’s buying habits;
  • Recalculating risk portfolios in minutes;
  • Prediction of stocks and sales;
  • Detecting and prevention of fraudulent behavior and much more.


  1. https://www.forbes.com/sites/joshbersin/2016/12/11/how-everything-is-becoming-digital-and-why-businesses-must-adapt-now/
  2. https://www.ibm.com/analytics/nl/nl/?cm_mmc=OSocial_Blog-_-IBM+Analytics_Data+Science-_-IBN_NL-_-NL+BLOG+ANALYTICS&cm_mmca1=000017WL&cm_mmca2=10003914&
  3. https://insidebigdata.com/white-paper/report-depth-look-big-data-trends-challenges
  4. https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
  5. https://www.gartner.com/en/newsroom/press-releases/2018-11-07-gartner-identifies-top-10-strategic-iot-technologies-and-trends
  6. https://insidebigdata.com/white-paper/report-depth-look-big-data-trends-challenges
  7. http://www3.weforum.org/docs/WEF_Future_of_Jobs_2018.pdf