Data Lakes For Dummies. Alan R. Simon. Читать онлайн. Newlib. NEWLIB.NET

Автор: Alan R. Simon
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Базы данных
Год издания: 0
isbn: 9781119786184
Скачать книгу
are employees happy working here?

      Prescriptive analytics is a relative newcomer into the overall analytics continuum. “Wait a minute!” you may be thinking. “I’ve been making decisions and taking actions for a long time!” The “secret sauce” of prescriptive analytics, however, is making those decisions and taking those actions with a healthy assist from your organization’s data being fed into increasingly sophisticated analytics. And yes, you guessed it: Your data lake will play a starring role in driving prescriptive analytics. So, your data lake will help you with the following scenarios:

       Based on market forecasts and the overall economy, you need to cut approximately 10 percent of your headcount. What are your options? How do you get the work done? Can you shift some of the work to lower-cost contractors? Should you try a voluntary early retirement program to reduce the number of involuntary terminations? Name four or five scenarios with all the data and all the trimmings!

       Then, out of those four or five scenarios, which one is “best” and why? Are there any downside surprise risks you should be aware of?

Question Type of Analytics
What happened? Descriptive analytics
Why did it happen? Diagnostic analytics
What’s happening right now? Descriptive analytics
What’s likely to happen? Predictive analytics
What’s something interesting and important out of this mountain of data? Discovery analytics
What are our options? Prescriptive analytics
What should we do?

      Mapping your analytics needs to your data lake road map

      Jan, your CPO, is thrilled with the work that Raul and his team have done compiling the HR analytics continuum. They’ve produced an exhaustive list of more than 500 analytical functions that will be supported by the data lake, covering the broad continuum from simple “What happened?” descriptive analytics through more than a dozen complex prescriptive analytics scenarios.

      Now what?

      As you might guess, that 500-plus master list of HR analytics isn’t going to be available the first day your data lake goes operational. A data lake is built in a phased, incremental manner, probably over several years.

      But where to start?

      In Chapter 17, I show you how to build your road map that will take you from your first ideas about your data lake all the way through multiple phases of implementation.

      

Your data lake road map should be driven by your organization’s analytical needs rather than by available data. You should address your highest-impact, highest-value analytics needs first, for two reasons:

       You need the initial operating capability (IOC) of your data lake to come with some “oomph.” In other words, you want people across your organization to sit up and take notice that the data lake is, from its first days, providing some really great analytics.

       You want to build your data lake using a “pipeline” approach that not only loads your data lake with lots of data but carries that data all the way through to critical business insights.

      Building the best data pipelines inside your data lake

Schematic illustration of a data pipeline into, through, and then out of the data lake.

      FIGURE 2-7: A data pipeline into, through, and then out of the data lake.

You can think of a data pipeline in the same context that you may think of shopping. Suppliers sell and ship their products to wholesalers, who then resell and ship some of those products to a wholesaler. The wholesaler then resells and ships the products yet again to a retailer, which is where you come to buy whatever it is that you’re looking for. Figure 2-8 shows how this paradigm can apply to data pipelines within a data lake.

Schematic illustration of an easy way to understand data pipelines and data lakes.

      Addressing future gaps and shortfalls

      Your road map is only the beginning of your data lake journey. You may think you have a pretty good idea of what your data and analytical needs are over the next couple of years, and you do a good job of prioritizing the various phases of how your data lake will be built.

      

The world is constantly changing, though, which means that the farther out your data lake road map stretches, the more likely it is that any given phase will be preempted by changing priorities and new analytical needs.

      As your organization’s analytical needs evolve and — hopefully — become more sophisticated over time, you’ll continually adjust your data lake plans to reflect the real world.

      

Think of a data lake as a living entity that is subject to constant change. Remember that century-long life span of a U.S. Air Force B-52, with changing missions over the years being addressed by constantly incorporating new technology to extend the plane’s value.

      You can stream all kinds of data into your data lake as quickly as that data is created in your source applications.