By: Siddharth Mehta
Overview
In a typical Business Intelligence (BI) project with the Software Development Life Cycle (SDLC) methodology, a project goes through different phases: Plan, Analyze, Design, Build, Test, Deploy and Support. Generally during the plan phase the project requirements are detailed out and estimates are prepared before the analysis phase ends. In this section we are going to discuss how to identify technical requirements and estimate the level of effort.
Explanation
If one reads thoughtfully between the lines of a business case, the major areas of the solution can be identified there. Considering the given business scenario, let us dissect the scenario and identify the major areas:
Data modeling - Each business unit from every geographical location would access the online system and manage sales and order related data that would be stored in a central data repository.
Data warehousing - This centralized data repository should act as the one-stop-shop for all the data related needs of the company, as the company grows and adds more sources of data in the future.
Data Mart - The CEO of the company is the key user of reporting, and is interested in deriving sales analytics from the data.
Reporting - A team of data analysts would be accessing the data to create reports on-demand, as required by the CXOs of the company. They need to be able to slice and dice the data for detailed analysis.
ETL - Though any business requirement directly does not translate to the need of an ETL solution, a mechanism is required to move data across these layers. Hence this warrants the need for ETL batch jobs that can move data between these layers.
We would look at few requirement FAQs that apply for the areas, and then we would discuss how to estimate the level of effort.
Requirements Analysis FAQs
Question: What is a technical architecture document?
Answer: A technical architecture document defines the technical blue-print of the solution.
It covers details like the technology, infrastructure, integration, security, operations,
etc.
Question: How is the technology stack selected for the solution?
Answer: A detailed comparison of various technologies is conducted based
on parameters like License Cost, Capital Expenditure, Operational Expenditure, Skills
Availability, Product Roadmap, Community Adoption, Time to Market, Feature Comparison,
Integration support for external products, etc. After a detailed analysis is
conducted a technology
stack that best meets the needs is selected.
Question: How do we know what the business entities are involved and how
they are organized as well as associated with each other?
Answer: A requirements document that involves a conceptual model, data dictionary
and appropriate use-case diagrams would provide the required information.
Question: What is OLTP and what is developed as a part of an OLTP system?
Answer: Online Transaction Processing (OLTP) is a system to maintain transactional
data, also known as system of records. Various database objects starting from installing
a database server to deploying database objects like Tables, Stored Procedures,
Views, Functions, Indexes etc.
Question: Is Data warehousing similar to OLTP?
Answer: The same set of database objects may be developed as part of relational
data warehouse, but the way a data warehouse is modeled is different from OLTP data
modeling.
Question: Are ETL packages / jobs used for import-export of data from one
database to another?
Answer: ETL is a data extraction, transformation and loading technique. The functionality
of the ETL packages may vary depending upon the environment. For example, the ETL between
an OLTP system and Staging area may just be a differential data extract with some
transformation. But the ETL between a Staging / Landing area and a data warehouse
can be significantly different as it would have the transform the data as per the
needs of the dimensional model of the warehouse.
Question: Are Data marts and Data warehouse the same in terms of design,
with the only difference that they are different databases?
Answer: Data marts theoretically are supposed to be a sub-section of the
data warehouse. The design of the data mart would be specific to the reporting
and analytics requirements of the users.
Question: What is a cube?
Answer: A cube is a multi-dimensional data structure that stores data in
an optimal way to support the analytical needs on a voluminous data. We will
learn more about data marts in an upcoming chapter.
Question: What do we develop as a part of Analytics?
Answer: Reports and Dashboards are one of the means of delivering meaningful
analysis of data. Analytics is a combined process of analyzing data and reporting
on the data.
Time Estimation Recommendations
Before starting with the development phase, project deliverables are estimated for effort and schedule then are planned accordingly for delivery. As a developer, one is often faced with the situation of providing effort estimates for his/her set of tasks. Generally projects employ a standard mechanism of estimating the effort and the same is communicated to the team members as the expectation from their side. But in case, if you are faced with the task of providing estimates, below are some of the factors that you can keep in mind when estimating the effort for the task assigned to you.
- Provide estimates for the effort if you are at least trained in the technology in which you would be delivering the task. If you consider yourself too novice on the technology, consider asking for help from a SME (Subject Matter Expert) who is well-versed with the technology. You may prepare the estimates, though consider reviewing your estimation logic with a SME.
- Consider developing an algorithmic logic for developing the task. Based on the number of steps / objects that would be used for the logic, create thresholds and create categories like Simple, Medium and Complex. For example. if a stored procedure joins less than 5 tables with less than 3 loops consider it Simple. If it joins more than 5 but less than 10 tables and 4 - 6 loops consider it Medium and anything more than that is Complex. This is just an example and should not be considered a general rule of thumb for estimating stored procedure development. Once you have defined categories, you can assign a generalized effort for each category of tasks. This is also known as a Work Breakdown Structure.
- Tasks and Schedule may not work hand in hand. For example, if a set of tasks takes 40 hours and considering 8 hours a day, the task would be completed in 5 days. This is not true in all cases as there are other factors to be considered as well. One should assess the dependency of one task on other, which would reduce working on tasks in parallel. For example, if you do not have the tables / schema of the tables available, you may not be able to develop the stored procedures. So be cognizant of inter-dependencies while communicating the schedule of deliverables for your tasks.
- Generally developers tend to estimate the development time by only considering the build effort. A task has a plan -> analyze -> design -> build -> unit test -> defect fix -> retest -> deploy phases. So considering estimating the effort for each of these phases. At times your tasks may be a vital integration point and may be functioning as an input to other tasks. So you may have to also account for regression testing efforts.
Additional Information
- Estimation is an interesting subject. Here is an interesting article on the topic.