SQL Server 2012 : Data Architecture (part 1) - Information Architecture Principle, Database Objectives

6/24/2013 11:06:49 AM

1. Information Architecture Principle

For any complex endeavor, there is value in beginning with a common principle to drive designs, procedures, and decisions. A credible principle is understandable, robust, complete, consistent, and stable. When an overarching principle is agreed upon, conflicting opinions can be objectively measured, and standards can be decided upon that support the principle.

The Information Architecture Principle encompasses the three main areas of information management: database design and development, enterprise data center management, and business intelligence analysis.

Information Architecture Principle: Information is an organizational asset, and, according to its value and scope, must be organized, inventoried, secured, and made readily available in a usable format for daily operations and analysis by individuals, groups, and processes, both today and in the future.

Unpacking this principle reveals several practical implications. There should be a known inventory of information, including its location, source, sensitivity, present and future value, and current owner. Although most organizational information is stored in IT databases, uninventoried critical data is often found scattered throughout the organization in desktop databases, spreadsheets, scraps of papers, and Post-it notes, and (the most dangerous of all) inside the head of key employees.

Just as the value of physical assets varies from asset to asset and over time, the value of information is also variable and so must be assessed. Information value may be high for an individual or department, but less valuable to the organization as a whole; information that is critical today might be meaningless in a month; or information that may seem insignificant individually might become critical for organizational planning when aggregated.

If the data is to be made readily available in the future, then current designs must be flexible enough to avoid locking the data in a rigid, but brittle, database.

2. Database Objectives

Based on the Information Architecture Principle, every database can be architected or evaluated by six interdependent database objectives. Four of these objectives are primarily a function of design, development, and implementation: usability, extensibility, data integrity, and performance. Availability and security are more a function of implementation than design.

With sufficient design effort and a clear goal to meet all six objectives, it is fully possible to design and develop an elegant database that does just that. No database architecture is going to be 100 percent perfect, but with an early focus on design and fundamental principles, you can go a long way toward creating a database that can grow along with your organization.

You can measure each objective on a continuum. The data architect is responsible to inform the organization about these six objectives, including the cost associated with meeting each objective, the risk of failing to meet the objective, and the recommended level for each objective.

It's the organization's privilege to then prioritize the objectives compared with the relative cost.

Usability

The usability of a data store (the architectural term for a database) involves the completeness of meeting the organization's requirements; the suitability of the design for its intended purpose; the effectiveness of the format of data available to applications; the robustness of the database; and the ease of extracting information (by programmers and power users). The most common reason why a database is less than usable is an overly complex or inappropriate design.

Usability is enabled in the design by ensuring the following:

A thorough and well-documented understanding of the organizational requirements
Life-cycle planning of software features
Selecting the correct meta-pattern (for example, transactional and dimensional) for the data store
Normalization and correct handling of optional data
Simplicity of design
A well-defined abstraction layer

Extensibility

The Information Architecture Principle states that the information must be readily available today and in the future, which requires the database to be extensible and able to be easily adapted to meet new requirements. The concepts of data integrity, performance, and availability are all mature and well understood by the computer science and IT professions. With enough time and resources, you can design a data architecture that meets the objective of extensibility. The trick is to make sure that your entire organization understands that the resource investment is not only important, but also absolutely necessary to good data architecture. There are many databases that fell victim to the curse of not enough time and too few resources. These are usually the ones that can't grow and adapt to new business requirements or organizational change well. Extensibility is incorporated into the design as follows:

Normalization and correct handling of optional data.
Generalization of entities when designing the schema.
Data-driven designs that not only model the obvious data (for example, orders and customers), but also enable the organization to store the behavioral patterns, or process flow.
A well-defined abstraction layer that decouples the database from all client access, including client apps, middle tiers, ETL, and reports.
Extensibility is also closely related to simplicity. Complexity breeds complexity and inhibits adaptation. Remember, a simple solution is easy to understand and adopt, and ultimately, easy to adjust later.

Data Integrity

The ability to ensure that persisted data can be retrieved without error is central to the Information Architecture Principle, and it was the first major problem tackled by the database world. Without data integrity, a query's answer cannot be guaranteed to be correct; consequently, there's not much point in availability or performance. Data integrity can be defined in multiple ways:

Entity integrity: Involves the structure (primary key and its attributes) of the entity. If the primary key is unique and all attributes are scalar and fully dependent on the primary key, then the integrity of the entity is good. In the physical schema, the table's primary key enforces entity integrity.
Domain integrity: Ensures that only valid data is permitted in the attribute. A domain is a set of possible values for an attribute, such as integers, bit values, or characters. Nullability (whether a null value is valid for an attribute) is also a part of domain integrity. In the physical schema, the data type and nullability of the row enforce domain integrity.
Referential integrity: Refers to the domain integrity of foreign keys. Domain integrity means that if an attribute has a value, then that value must be in the domain. In the case of the foreign key, the domain is the list of values in the related primary key. Referential integrity, therefore, is not an issue of the integrity of the primary key but of the foreign key.
Transactional integrity: Ensures that every logical unit of work, such as inserting 100 rows or updating 1,000 rows, is executed as a single transaction. The quality of a database product is measured by its transactions' adherence to the ACID properties: atomic — all or nothing; consistent — the database begins and ends the transaction in a consistent state; isolated — one transaction does not affect another transaction; and durable — once committed always committed.

In addition to these four generally accepted definitions of data integrity, user-defined data integrity should be considered as well:

User-defined integrity means that the data meets the organization's requirements with simple business rules, such as a restriction to a domain and limiting the list of valid data entries. Check constraints are commonly used to enforce these rules in the physical schema.
Complex business rules limit the list of valid data based on some condition. For example, certain tours may require a medical waiver. Implementing these rules in the physical schema generally requires stored procedures or triggers.
Some data-integrity concerns can't be checked by constraints or triggers. Invalid, incomplete, or questionable data may pass all the standard data-integrity checks. For example, an order without any order detail rows is not a valid order, but no SQL constraint or trigger traps such an order. The abstraction layer can assist with this problem, and SQL queries can locate incomplete orders and help to identify other less measurable data-integrity issues, including wrong data, incomplete data, questionable data, and inconsistent data.

Integrity is established in the design by ensuring the following:

A thorough and well-documented understanding of the organizational requirements
Normalization and correct handling of optional data
A well-defined abstraction layer
Data quality unit testing using a well-defined and understood set of test data
Metadata and data audit trails documenting the source and veracity of the data, including updates

Performance/Scalability

Presenting readily usable information is a key aspect of the Information Architecture Principle. Although the database industry has achieved a high degree of performance, the ability to scale that performance to large databases is still an area of competition between database engine vendors.

Performance is enabled in the database design and development by ensuring the following:

A well-designed schema with normalization and generalization, and correct handling of optional data
Set-based queries implemented within a well-defined abstraction layer
A sound indexing strategy, including careful selection of clustered and nonclustered indexes
Tight, fast transactions that reduce locking and blocking
Partitioning, which is useful for advanced scalability

Availability

The availability of information refers to the information's accessibility when required regarding uptime, locations, and the availability of the data for future analysis. Disaster recovery, redundancy, archiving, and network delivery all affect availability.

Availability is strengthened by the following:

Quality, redundant hardware
SQL Server's high-availability features
Proper DBA procedures regarding data backup and backup storage
Disaster recovery planning

Security

The sixth database objective based on the Information Architecture Principle is security. For any organizational asset, the level of security must be secured depending on its value and sensitivity.

Security is enforced by the following:

Physical security and restricted access of the data center
Defensively coding against SQL injection
Appropriate operating system security
Reducing the surface area of SQL Server to only those services and features required
Identifying and documenting ownership of the data
Granting access according to the principle of least privilege, which is the concept that users should have only the minimum access rights required to perform necessary functions within the database
Cryptography — data encryption of live databases, backups, and data warehouses
Metadata and data audit trails documenting the source and veracity of the data, including updates

Planning Data Stores

The enterprise data architect helps an organization plan the most effective use of information throughout the organization. An organization's data store configuration (see Figure 1) includes multiple types of data stores, as illustrated in the following figure, each with a specific purpose:

Figure 1 Data store types and their typical relationships

Operational databases, or online transaction processing (OLTP) databases collect first-generation transactional data that is essential to the day-to-day operation of the organization and unique to the organization. An organization might have an operational data store to serve each unit or function within it. Regardless of the organization's size, an organization with a singly focused purpose may have only one operational database.
For performance, operational stores are tuned for a balance of data retrieval and updates, so indexes and locking are key concerns. Because these databases receive first-generation data, they are subject to data update anomalies and benefit from normalization.
Caching data stores, sometime called reporting databases, are optional read-only copies of all or part of an operational database. An organization might have multiple caching data stores to deliver data throughout the organization. Caching data stores might use SQL Server replication or log shipping to populate the database and are tuned for high-performance data retrieval.
Reference data stores are primarily read-only and store generic data required by the organization but which seldom changes — similar to the reference section of the library. Examples of reference data might be unit of measure conversion factors or ISO country codes. A reference data store is tuned for high-performance data retrieval.
Data warehouses collect large amounts of data from multiple data stores across the entire enterprise using an extract-transform-load (ETL) process to convert the data from the various formats and schema into a common format, designed for ease of data retrieval. Data warehouses also serve as the archival location, storing historical data and releasing some of the data load from the operational data stores. The data is also pre-aggregated, making research and reporting easier, thereby improving the accessibility of information and reducing errors.
Because the primary task of a data warehouse is data retrieval and analysis, the data-integrity concerns presented with an operational data store don't apply. Data warehouses are designed for fast retrieval and are not normalized like master data stores. They are generally designed using a basic star schema or snowflake design. Locks generally aren't an issue, and the indexing is applied without adversely affecting inserts or updates.
The analysis process usually involves more than just SQL queries and uses data cubes that consolidate gigabytes of data into dynamic pivot tables. Business intelligence (BI) is the combination of the ETL process, the data warehouse data store, and the acts to create and browse cubes.
A common data warehouse is essential to ensure that the entire organization researches the same data set and achieves the same result for the same query — a critical aspect of the Sarbanes-Oxley Act and other regulatory requirements.
Data marts are subsets of the data warehouse with pre-aggregated data organized specifically to serve the needs of one organizational group or one data domain.
Master data store, or master data management (MDM), refers to the data warehouse that combines the data from throughout the organization. The primary purpose of the master data store is to provide a single version of the truth for organizations with a complex set of data stores and multiple data warehouses.
Data Quality Services (DQS) refers to the SQL Server instance feature that consists of three SQL Server catalogs with data-quality functionality and storage. The purpose of this feature is to enable you to build a knowledge base to support data quality tasks.

Related -----------------

- SQL Server 2012 : Data Architecture (part 2) - Smart Database Design

Other -----------------

- Microsoft Dynamic AX 2009 : .NET Business Connector - Usage Scenarios for .NET Business Connector

- Microsoft Dynamic AX 2009 : .NET Business Connector - Inside .NET Business Connector

- Microsoft Dynamic AX 2009 : .NET Business Connector - Introduction

- Windows Server 2012 : Installing and Managing Hyper-V in Full or Server Core Mode - Installing Windows Server 2012 and Microsoft Hyper-V Server 2012

- Windows Server 2012 : Installing and Managing Hyper-V in Full or Server Core Mode - Enabling the Hyper-V role

- Windows Server 2012 : Installing and Managing Hyper-V in Full or Server Core Mode - Verifying Hyper-V requirements

- Windows Server 2012 Requirements and Installation : Customizing the Interface with Features on Demand

- Windows Server 2012 Requirements and Installation : Deploying Minimal Server Interface

- Windows Server 2012 Requirements and Installation : Switching Between Install Modes

- Windows Server 2012 Requirements and Installation : Installing Server 2012 (part 2) - Server with a GUI Install