Discussion about this post

User's avatar
Momota's avatar

It seems that I or you (or maybe both of us) has misunderstood something fundamental here and I am writing this to help you develop you book. So don’t take this as a criticism (I might be wrong here and will learn something from you then, win-win for both of us actually). Going back to the core of Data warehousing when I see that something needs to be “subject-oriented, nonvolatile, integrated, time-variant” aka Inmon criteria I don’t interpret these as you have done. For you it seems that if the source data is in Data lake then we have ticked of two of the criterias (non-volatile and time-variant) but the way I see it (and hopefully Inmon will agree), this 4 criteria’s are working together meaning that you must have a subject-oriented integrated layer with historical data that don’t gets updated (only new data arrives). Once you have that, you have a DW that can be used for audit and analytics. Both current data and new data can be compared and using the data we can feed source systems(to get better data in future) as well as users. So having files stored in Data Lake does make it non-volatile and time-variant but not from DW-perspective since the other 2 parts are missing. One can say that I see the criteria’s like in this picture: https://commons.wikimedia.org/wiki/File:Jigsaw.svg. Each representing one of the Inmon criteria, and without one of them you don’t have the whole picture (full puzzle). So, I don’t agree with the fact that just because you have the data in DL, you have ticked off the 2 criterias.

"

We all like a good analogy, and it occurred to me that the HOOK approach to data warehousing mirrors how a library works. A library is just a big room that contains a whole bunch of shelves on which there are a whole bunch of books. The books happen to be organised or indexed so that it should be easy to locate books about a particular subject. Its organising structure is what makes a library work; otherwise, it is just a room full of books and finding what you want is an almost impossible task.

"

For me the DL is a library that is not organized. Still ticks of 2 of the criterias but not as a whole.

Expand full comment
2 more comments...

No posts