Big Data Basics - Part 2 - Overview of Big Data Architecture


By:   |   Updated: 2014-01-09   |   Comments (12)   |   Related: More > Big Data

Problem

I read the tip on Introduction to Big Data and would like to know more about how Big Data architecture looks in an enterprise, what are the scenarios in which Big Data technologies are useful, and any other relevant information.

Solution

In this tip, let us take a look at the architecture of a modern data processing and management system involving a Big Data ecosystem, a few use cases of Big Data, and also some of the common reasons for the increasing adoption of Big Data technologies.

Architecture

Before we look into the architecture of Big Data, let us take a look at a high level architecture of a traditional data processing management system. It looks as shown below.

Traditional Data Processing and Management Architecture

As we can see in the above architecture, mostly structured data is involved and is used for Reporting and Analytics purposes. Although there are one or more unstructured sources involved, often those contribute to a very small portion of the overall data and hence are not represented in the above diagram for simplicity. However, in the case of Big Data architecture, there are various sources involved, each of which is comes in at different intervals, in different formats, and in different volumes. Below is a high level architecture of an enterprise data management system with a Big Data engine.

Big Data Processing and Management Architecture

Let us take a look at various components of this modern architecture.

Source Systems

As discussed in the previous tip, there are various different sources of Big Data including Enterprise Data, Social Media Data, Activity Generated Data, Public Data, Data Archives, Archived Files, and other Structured or Unstructured sources.

Transactional Systems

In an enterprise, there are usually one or more Transactional/OLTP systems which act as the backend databases for the enterprise's mission critical applications. These constitute the transactional systems represented above.

Data Archive

Data Archive is collection of data which includes the data archived from the transactional systems in compliance with an organization's data retention and data governance policies, and aggregated data (which is less likely to be needed in the near future) from a Big Data engine etc.

ODS

Operational Data Store is a consolidated set of data from various transactional systems. This acts as a staging data hub and can be used by a Big Data Engine as well as for feeding the data into Data Warehouse, Business Intelligence, and Analytical systems.

Big Data Engine

This is the heart of modern (Next-Generation / Big Data) data processing and management system architecture. This engine capable of processing large volumes of data ranging from a few Megabytes to hundreds of Terabytes or even Petabytes of data of different varieties, structured or unstructured, coming in at different speeds and/or intervals. This engine consists primarily of a Hadoop framework, which allows distributed processing of large heterogeneous data sets across clusters of computers. This framework consists of two main components, namely HDFS and MapReduce. We will take a closer look at this framework and its components in the next and subsequent tips.

Big Data Use Cases

Big Data technologies can solve the business problems in a wide range of industries. Below are a few use cases.

  • Banking and Financial Services
    • Fraud Detection to detect the possible fraud or suspicious transactions in Accounts, Credit Cards, Debit Cards, and Insurance etc.
  • Retail
    • Targeting customers with different discounts, coupons, and promotions etc. based on demographic data like gender, age group, location, occupation, dietary habits, buying patterns, and other information which can be useful to differentiate/categorize the customers.
  • Marketing
    • Specifically outbound marketing can make use of customer demographic information like gender, age group, location, occupation, and dietary habits, customer interests/preferences usually expressed in the form of comments/feedback and on social media networks.
    • Customer's communication preferences can be identified from various sources like polls, reviews, comments/feedback, and social media etc. and can be used to target customers via different channels like SMS, Email, Online Stores, Mobile Applications, and Retail Stores etc.
  • Sentiment Analysis
    • Organizations use the data from social media sites like Facebook, Twitter etc. to understand what customers are saying about the company, its products, and services. This type of analysis is also performed to understand which companies, brands, services, or technologies people are talking about.
  • Customer Service
    • IT Services and BPO companies analyze the call records/logs to gain insights into customer complaints and feedback, call center executive response/ability to resolve the ticket, and to improve the overall quality of service.
    • Call center data from telecommunications industries can be used to analyze the call records/logs and optimize the price, and calling, messaging, and data plans etc.

Apart from these, Big Data technologies/solutions can solve the business problems in other industries like Healthcare, Automobile, Aeronautical, Gaming, and Manufacturing etc.

Big Data Adoption

Data has always been there and is growing at a rapid pace. One question being asked quite often is "Why are organizations taking interest in the silos of data, which otherwise was not utilized effectively in the past, and embracing Big Data technologies today?". The reason for adoption of Big Data technologies is due to various factors including the following:

  • Cost Factors
    • Availability of Commodity Hardware
    • Availability of Open Source Operating Systems
    • Availability of Cheaper Storage
    • Availability of Open Source Tools/Software
  • Business Factors
    • There is lot of data being generated outside the enterprise and organizations are compelled to consume that data to stay ahead of the competition. Often organizations are interested in a subset of this large volume of data.
    • The volume of structured and unstructured data being generated in the enterprise is very large and cannot be effectively handled using the traditional data management and processing tools.
References
Next Steps
  • Explore more Big Data use cases
  • Stay tuned for next tips in this series to learn more about Big Data ecosystem


Last Updated: 2014-01-09


get scripts

next tip button



About the author
MSSQLTips author Dattatrey Sindol Datta has 8+ years of experience working with SQL Server BI, Power BI, Microsoft Azure, Azure HDInsight and more.

View all my tips
Related Resources




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.





Wednesday, May 06, 2015 - 11:26:39 AM - Robin Hood Back To Top

I learned a lot at thedevmasters.com. It is an amazing service special there mentoring program gave me real hand on experience in troubleshooting.  I was able to create a full “Sports Statistics : Given a data set of runs scored by players in different countries in different years. I learned R programming and had few class on Python too. This group helped me to speedup my learning ” ,  in less than 12 hour time all by myself. Amazing  professional team of mentors and software educators. Visit www.thedevmasters.com and [email protected], 1(866)340-1375

 

Friday, February 14, 2014 - 1:17:56 AM - Dattatrey Sindol (Datta) Back To Top

Hi Jay,

 

Please stay tuned to this series on Big Data. Few pointers/directions on how to get started will be covered in the future posts.

 

Best Regards,

Dattatrey Sindol (Datta)


Tuesday, February 11, 2014 - 3:13:03 PM - Jay Back To Top

So i am interested in learning BigData Concepts as a DBA. Let me know the startting point to learn.


Tuesday, January 14, 2014 - 9:21:54 AM - Dattatrey Sindol (Datta) Back To Top

Hi JustCurious,

If your current set of tools and technologies are able to process your data and your current BI infrastructure is able to cater to your informational/reporting needs to enable you to make informed decisions, then you don't need to concern yourself with Hadoop.

However, even if your current system meets your needs, you might still want to start thinking about Hadoop few months/years down the line to stay competitive. For instnace, say, you are an e-Commerce business and currently you are capturing and storing the clickstream data, but are not doing anything with it. You might want to start mining that data to understand the users browsing patterns on the website like what are they searching for, what are they filtering on, what are they sorting by, and so on. These insights can give you an added advantage.

Hope that answers your questions.

Best Regards,

Dattatrey Sindol (Datta)

http://dattatreysindol.com


Tuesday, January 14, 2014 - 9:10:12 AM - Scott Back To Top

What about PDW? That is the technology for big data. Several of the fortunate 50 companies in the world have already implemented PDW.


Tuesday, January 14, 2014 - 8:59:31 AM - Dattatrey Sindol (Datta) Back To Top

Hi Henry,

You need Hadoop to deal with the type of data described by the characteristics discussed in Part 1 of this series. If your data is pretty much structured or if your data volumes can be reasonably/satisfactorily handled using your current set of tools/technologies, then you don't need hadoop.

As listed above for some of the use cases, Big Data is more than the traditional ETL processes. It is used for doing more complex operations on very large and complex data sets to gain meaningful insights, which is not possible using the traditional tools/technologies that we have been using before Hadoop came into existence.

Please stay tuned to this series and I am sure, in the subsequent tips, your questions will get answered.

Best Regards,

Dattatrey Sindol (Datta)

http://dattatreysindol.com


Monday, January 13, 2014 - 12:20:29 PM - JustCurious Back To Top

So, I guess, unless I am reading it wrong, Big Data is for a business that is going to create the Pita-bytes of data (e.g. social networking), but as a typical business, I am not going to concern myself necessarily with it if I am a "typical" business with mostly structured data? Even if I am scanning documents, tools exist within the current modern relational database to effectively handle that.

If I am wanting to run queries against FaceBook, then the tools are there for me to do it, but typically any data I bring back is a very, very small fraction of the source.


Friday, January 10, 2014 - 8:25:58 PM - Henry Resheto Back To Top

Hi Dattatrey, I am afraid you confused me even more with your answer. Initially I thought I was asking yes or no question: if I need to build "Big Data engine" is Hadoop all I need or I need something else as well? That is still the question, but I would also like to extend it a bit; you say "Hadoop is a framework which is used to deal with the Big Data sets", what does it mean "to deal"? Like to do data Extraction/Transformation/Load on Big Data sets? [Why not call it ETL then?]  To store Big Data sets? [Why not call it database then?] To generate reports from Big Data sets? [Why then we need another box on your diagram called WD/BIAnalytics/Data Mining?] Or in other words why are we using fancy word "ecosystem" what is really behind it?


Friday, January 10, 2014 - 12:26:36 PM - Dattatrey Sindol (Datta) Back To Top

Hi Henry,

 

Big Data engine is that part of the Modern Architecture/Ecosystem where the processing and management of Big Data sets takes place. Hadoop is a framework which is used to deal with the Big Data sets.

Hope that answers your question.

 

Best Regards,

Dattatrey Sindol (Datta)

http://dattatreysindol.com


Friday, January 10, 2014 - 9:58:03 AM - Dattatrey Sindol (Datta) Back To Top

Hi JustCurious,

Answer to your first question: Pretty much all the social media websites expose their data through APIs (Some restrictions do apply). For instance, Facebook has Graph API, Twitter has REST API, and so on.

Answer to your second question: Social Media websites like Facebook, Twitter etc. have huge volumes of data and you as a business would be interested in only a versy small portion of that data. For instance, if you want to understad the sentiment of people about your company or product, you would be interested in only those comments, tweets etc. which have a mention about your company or product and not everything that is posted on these sites.

Hope that answers your questions.

Best Regards,

Dattatrey Sindol (Datta)

http://dattatreysindol.com


Thursday, January 09, 2014 - 8:17:48 PM - Henry Resheto Back To Top

Still not clear: is Hadoop part of Big Data Engine or Hadoop is Big Data Engine?


Thursday, January 09, 2014 - 11:27:39 AM - JustCurious Back To Top

So, if I am a business and I want to know what people are saying about me on Facebook, is the Facebook database available to me to search it for my business name to see what comments are being posted? Same with Twitter.

You stated in part 1 that part of the data that makes up Big Data is Social Media, but I guess I am having some trouble understanding how Facebook, LinkedIn, Twitter, YouTube, etc. make up part of "my" data structure and play into my Big Data infrastructure. Facebook may have 30 PB of data, but that doesn't mean I've got that much.



download

























get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



Learn more about SQL Server tools