Case Study- Database Development
CIS 515 – Strategic Planning for Database Systems
Quality of Data Sets in Software Development Life Cycle
The software development life cycle (SDLC) provides a well-defined framework that contains definitions for each task that may need to be performed at every step in the process of software development. This framework primarily contains a detailed plan which acts as a direction for the software team to understand what needs to be developed, how it should be developed and maintained or replace any existing software or service. SDLC defines a methodology which can improve the quality of software and also the overall software development process.
As databases are an extremely important part of software, it becomes crucial that the quality of datasets is maintained at a high level. High quality data has the power to make any organization’s resources getting utilized at their best and consequently earn better profits. There are two basic criteria to judge the quality of datasets is their accuracy and consistency. Inconsistencies and errors can come into view from software projects because of lack or failure of good integrity constraints. Therefore, it is highly important that SDLC is refined and fine-tuned to gain a substantial improvement in the quality of data. Any such planning may introduce changes in budgets, time schedules or other attributes of the projects that may be not be appeal to some stakeholders of the project. But the relevant changes in this context yield significant and long-lasting benefits with better database processes, lesser data quality assessments etc. and of course better retrieval of information from the databases.
The quality of datasets can be improved by creating and following a good data quality improvement process which falls adjacent to all the phases of SDLC. A workflow should be established to move from an inaccurate data to totally accurate data state:
The above data quality improvement workflow can be implemented across the phases of SDLC. For example:
- Creation of data quality environment
- Assessment of data definitions
- Collection of right facts
- Identification of problems
- Assessment of impact of above issues
- Investigation of causes
- Proposing remedies
- Implementing remedies
- Monitoring results
Optimizing Record Selections
- Requirements’ gathering is the first and most crucial aspect of software development. The software teams must focus on gathering the critical items that are relevant to the project and can be translated into developable solutions. Accurate data gathered in the requirements gathering phase can result into accurate software and vice versa.
- Likewise, in the software design phase, it is important that data quality and data profiling are well understood and implemented and become an integral part of the whole software development process. Meta data should be stored effectively that can also be used to generate mappings which may be helpful when any corrections are needed in the datasets.
- When the SDLC proceeds into its implementation stage, the datasets should be checked for their normalization along with integrity checks, indexes, triggers, stored procedures etc. so that any errors in the database can be removed and the quality of the software can be ensured.
When it comes to optimization of query performance for datasets, it is highly dependent upon the developer’s skills and intuition and thus it is more of a black science till date. Fortunately, many popular databases provide their users with some processes that can aid in optimizing the speed and accuracy of record selections. Few of them are:
Database Maintenance plans
- Use of Indexes
- Database tables can be indexed. This makes it quicker to fetch the relevant data without performing a full-scan of the tables first. With most of the databases, developers can have up to 16 indexes per table.
- Analyzing the performance of queries
- The EXPLAIN PLAN utility can be used to assess the performance of SQL queries. This describes how the database plans to execute the query and the total number of rows it will select as a result of the query and the time taken for execution of the query. The queries can be bettered seeing the output of the plan.
- Fine-tuning internal variables
- Databases have internal variables that control some parameters which may drive the speed of record selections. For example, the number of tables which can be opened at the same time, size of buffer used when handling indices, total size of the buffer for a query, slow query log etc. These variables can be controlled to get better record selections.
The following three database maintenance plans can be used to improve the quality of database and make it less vulnerable to security threats:
- Shrink database maintenance plan: The shrinking the size of data sets is allowed in a particular database. For example, there can exist a database which has around 65MB of data using a total space of 100MB. The size of the database can be reduced to the amount of data it contains.
- System maintenance plan: SQL server management studio program is used in this maintenance plan. The database of the system is made of Master, MSDB and Model databases. This plan allows following features: back up, clean up, update and checking of database integrity.
- Automated database maintenance plan: This maintenance plan is used mostly by Oracle RDMS. It offers different features such as automatic statistics collection optimizer, automatic segment advisor and SQL tuning advisor. These features improve the quality of the database by running automatic queries that improve performance of data sets.