About Me

My photo
"Enoughtheory.com" had its humble beginning in the year 2011 by ( Founder of Enoughtheory.com ) Mr Ravi Kant Soni , an Enterprise Java and Spring Framework Specialist, with a bachelor degree (B.E) in Information Science and Engineering from Reva Institute of Technology at Bangalore. He has been into the software development discipline for many years now. Ravi has worn many hats throughout his tenure, ranging from software development, designing multi-tenant applications, integration of new technology into an existing system, to his current love of writing a Spring Framework book. Currently, he is a lead engineer at HCL Technology. Ravi has focused on Web and Enterprise development using Spring Framework for most of his career and has been extensively involved in application design and implementation. He has developed applications for Core-Bank, HR and Payroll System, and e-Commerce systems using Spring Framework. Ravi Kant Soni is author of book "Learning Spring Application development" http://learningspringapplicationdevelopment.com

Wednesday, 25 July 2012

GROUP BY, HAVING, SUM, AVG, and COUNT(*)


Aggregation

You can use a SQL SELECT to aggregate data. Aggregation combines rows together and performs some operation on their combined values. Very common aggregations are COUNT, SUM, and AVG.
The simplest use of aggregations is to examine an entire table and pull out only the aggregations, with no other columns specified. Consider this SQL:
SELECT COUNT(*) as cnt
      ,SUM(sale_amount) as sumSales
      ,AVG(sale_amount) as avgSales
  FROM orders
If you have a very small sales order table, say about 7 rows, like this:
ORDER |  DATE      | STATE | SALE_AMOUNT
------+------------+-------+-------------
 1234 | 2007-11-01 | NY    |       10.00
 1235 | 2007-12-01 | TX    |       15.00
 1236 | 2008-01-01 | CA    |       20.00
 1237 | 2008-02-01 | TX    |       25.00
 1238 | 2008-03-01 | CA    |       30.00
 1237 | 2008-04-01 | NY    |       35.00
 1238 | 2008-05-01 | NY    |       40.00
Then the simple query above produces a one-row output:
CNT  | SUM  | AVG
-----+------+-----
  7  | 175  |  25

Some Notes on The Syntax

When we use COUNT(*) we always put the asterisk inside.
Note that the example names the output columns by saying "as sumSales" and "as avgSales". This is important because without it we will get whatever the database server decides to call it, which will vary from platform to platform, so it is a good idea to learn to use the "AS" clause.

The WHERE Clause Filters BEFORE the Aggregation

If you want to get just the sales from New York state, you can put a WHERE clause in:
SELECT COUNT(*) as cnt
      ,SUM(sale_amount) as sumSales
      ,AVG(sale_amount) as avgSales
  FROM orders
 WHERE state = 'NY'
...and you will get only the results for NY:
CNT | SUM  | AVG
----+------+----------
  3 |  85  |  28.33333
Notice of course that the average has a repeating decimal. Most databases have a ROUND function of some sort, so I can correct that with:
SELECT COUNT(*) as cnt
      ,SUM(sale_amount) as sum
      ,ROUND(AVG(sale_amount),2) as avg
  FROM orders
 WHERE state = 'NY'
...and get:
CNT | SUM  | AVG
----+------+----------
  3 |  85  |  28.33

The Fun Begins With GROUP BY

The query above is fine, but it would be very laborious if you had to issue the query (or write a program to do it) for every possible state. The answer is the GROUP BY clause. The GROUP BY clause causes aggregations to occur in groups (naturally) for the columns you name.
SELECT state,
      ,COUNT(*) as cnt
      ,SUM(sale_amount)          as sumSales
      ,ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
Which gives us this result:
STATE | CNT | SUM  | AVG
------+-----+------+----
NY    |  3  |  85  |  28
TX    |  2  |  40  |  20
CA    |  2  |  50  |  25  

Every Column a GROUP BY or Aggregate

When you use the GROUP BY column then every column in the output must either be a group by column or must be an aggregate function. To understand this, imagine we put "Date" into the query above:
SELECT state,
     , date -- huh?? which value should we get??
     , COUNT(*) as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
Several states have more than one row in the database, so the database server has to decide which value of DATE to give you. Since it cannot know which one you want, it throws an error and says in short, "don't confuse me!"

Two More Aggregations, MIN and MAX

If we think again about the DATE column, in most practical situations we usually want to know the smallest or largest value, or both, so this query is not uncommon:
SELECT state,
     , MIN(date)                 as minDate
     , MAX(date)                 as maxDate
     , COUNT(*)                  as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
which yields:
STATE | minDate    | maxDate    |CNT | SUM  | AVG
------+------------+------------+----+------+-----
NY    | 2007-11-01 | 2008-05-01 | 3  |  85  |  28
TX    | 2007-12-01 | 2008-02-01 | 2  |  40  |  20
CA    | 2008-01-01 | 2008-03-01 | 2  |  50  |  25  

HAVING Clause is Like WHERE after GROUP BY

The HAVING clause lets us put a filter on the results after the aggregation has taken place. If your Sales Manager wants to know which states have an average sale amount of $25.00 or more, then the query would look like this:
SELECT state,
      ,COUNT(*) as cnt
      ,SUM(sale_amount)          as sumSales
      ,ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
HAVING AVG(sale_amount) >= 25
Which gives us this result, notice that Texas is now missing, as they were just not selling big enough orders (sorry 'bout that Rhonda).
STATE | CNT | SUM  | AVG
------+-----+------+----
NY    |  3  |  85  |  28
CA    |  2  |  50  |  25  

When to use WHERE, When to use HAVING

Then the Sales Manager might come down and say, 'I don't want the states who have no sales after December 2008'. We might automatically code the following, which is tragically wrong:
SELECT state,
     , MIN(date)                 as minDate
     , MAX(date)                 as maxDate
     , COUNT(*)                  as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 -- WRONG! Will filter out individual rows!
 WHERE date <= '2008-12-31'
 GROUP BY state
The problem here is that individual rows that happened after 2008-12-31 will get filtered out, which will give you all stats for all states on sales before 2009. That is not right. The idea is to completely eliminate all results for states with no sales in 2009 or later, even if they had sales before that time. So we use MAX and the HAVING clause:
SELECT state,
     , MIN(date)                 as minDate
     , MAX(date)                 as maxDate
     , COUNT(*)                  as cnt
     , SUM(sale_amount)          as sumSales
     , ROUND(AVG(sale_amount),0) as avgSales
  FROM orders
 GROUP BY state
HAVING MAX(date) >= '2008-12-31'

Using All Three

You can pull some pretty nice results out of a database in a single query if you know how to combine the WHERE, GROUP BY, and HAVING. If you have ever worked with a Sales Manager, you know they constantly want to know strange numbers, so let's say our Sales Manager says, "Can you tell me the average order size by state for all orders greater than 20? And don't bother with any average less 30.00" We say, "Sure, don't walk away, I'll print it out right now."
SELECT state
      ,COUNT(*)
      ,SUM(sale_amount) as sum
      ,ROUND(AVG(sale_amount) as avg
  FROM orders
 WHERE sale_amount > 20
 GROUP BY state
HAVING avg(sale_amount) >= 30
   AND max(date) >= '2008-12-31'

How to Do a Weighted Average

Consider the case of a table that lists test, homework and quiz scores for the students in a certain course. Each particular score is worth a certain percentage of a student's grade, and the teacher wants the computer to calculate each student's file score. If the table looks like:
STUDENT     | WEIGHT | SCORE
------------+--------+-------
NIRGALAI    |     40 |    90
NIRGALAI    |     35 |    95
NIRGALAI    |     25 |    85
JBOONE      |     40 |    80
JBOONE      |     35 |    95
JBOONE      |     25 |    70
PCLAYBORNE  |     40 |    70
PCLAYBORNE  |     35 |    80
PCLAYBORNE  |     25 |    90
Then we can accomplish this in one pull like so:
SELECT student
      ,SUM(weight * score) / 100 as final
  FROM scores
 GROUP BY student
The nice thing about this query is that it works even if data is missing. If a student missed a test, they automatically get a zero averaged in.

Database Normalization: First, Second, and Third Normal

What is Normalization in SQL ?

 What is normalization ?


Defination : Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.
Benefits :
  1. Eliminate data redundancy
  2. Improve performance
  3. Query optimization
  4. Faster update due to less number of columns in one table
  5. Index improvement
There are diff. - diff. types of Normalizations form available in the Database. Lets see one by one.
1. First Normal Form (1NF)
 First normal form (1NF) sets the very basic rules for an organized database:
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
    • Remove repetative groups
    • Create Primary Key

           
Name State Country Phone1 Phone2 Phone3
John 101 1 488-511-3258 781-896-9897 425-983-9812
Bob 102 1 861-856-6987    
Rob 201 2 587-963-8425 425-698-9684  
 PK
                 [ Phone Nos ]
   ?


?  
ID Name State Country Phone  
1 John 101 1 488-511-3258  
2 John 101 1 781-896-9897  
3 John 101 1 425-983-9812  
4 Bob 102 1 861-856-6987  
5 Rob 201 2 587-963-8425  
6 Rob 201 2 425-698-9684  
           

2. Second Normal Form (2NF)Second normal form (2NF) further addresses the concept of removing duplicative data:
·         Meet all the requirements of the first normal form.
·         Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
·         Create relationships between these new tables and their predecessors through the use of foreign keys.   
Remove columns which create duplicate data in a table and related a new table with Primary Key – Foreign Key relationship

ID Name State Country Phone


1 John 101 1 488-511-3258


2 John 101 1 781-896-9897


3 John 101 1 425-983-9812


4 Bob 102 1 861-856-6987


5 Rob 201 2 587-963-8425


6 Rob 201 2 425-698-9684


















ID Name State Country
PhoneID ID Phone
1 John 101  
1 1 488-511-3258
2 Bob 102
2 1 781-896-9897
3 Rob 201
3 1 425-983-9812





4 2 587-963-8425





5 3 587-963-8425





6 3 425-698-9684

3. Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
·         Meet all the requirements of the second normal form.
·         Remove columns that are not dependent upon the primary key.
  Country can be derived from State also… so removing country
  ID   Name   State   Country
  1   John    101       1
  2   Bob    102       1
  3   Rob    201       2

4. Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
·         Meet all the requirements of the third normal form.
·         A relation is in 4NF if it has no multi-valued dependencies.

If PK is composed of multiple columns then all non-key attributes should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it

The 4NF also known as BCNF NF

   TeacherID StudentID SubjectID  StudentName
     101   1001   1   John
     101   1002   2   Rob
     201   1002   3   Bob
     201   1001   2   Rob








   TeacherID    StudentID   SubjectID   StudentName
  101   1001   1          X
  101   1002   2          X
  201   1001   3          X
  201   1002   2         X

Friday, 20 July 2012

SaaS application development


SaaS application development

Software as a Service (SaaS) is a Software distribution model on which the data and associated files will be kept on the Main Server. Then it is possible to access the site using browser. By keeping the data and files on a centralized server, it only need to modify the files on a Single place and the changes will be reflected everywhere. Linked In, webex, Basecamp are examples of applications based on SaaS model. SaaS application is having the following architecture:
SaaS - Software as a service

Tenant

Tenant is the user or customer of the application. Usually SaaS uses Multi Tenant architecture that means, more than one user can use the same application. Single Tenant on which there is one user for the entire application.

SaaS Implementation

While developing an application using SaaS model, we have to consider few important things. This includes the database design changes, coding changes, authentication of Tenant, User, Loading User specific Theme & Content, SOA etc.

Authenticate Tenant

Once request received on Tenant Server, the very first thing which needs to do is to authenticate the Tenant. For this, we need to check the Host Name with our Tenant Database. If the Authentication is success, then the system receive the Tenant ID from database. We need to store this Tenant ID somewhere which can be accessible always either in SESSION or similar places.

User Authentication

  • Next steps is checking the User Authentication if needed. User may access the page with or without a role. For this, system uses Tenant ID.

Return Output

  • This contains a number of process.To get a result, it uses different independent components and achieve the result by using one or more of them together.
  • It may contact with database during this time.
  • In all cases, it carries the Tenant ID.

Database Structure

Keypoints which we need to consider while designing database are:
  • Design database tables to feel it like a separate website
  • Apply indexing
  • Always keep Tenant ID in all tables for easy database management

Service Oriented Architecture (SOA)

Important points to consider here are:
  • It need to split the functions as much as smaller independent components
  • The state of each function does not interact with other
  • Achieve each services with loosely coupled components

Theme Management

Multi Tenant Application needs option to change Theme by keeping the inner functionalities as same in some cases. The basic layout structure of a Multi Tenant SaaS application have three steps layouts.
  • Wrapper: This is common for the whole application.
  • Nested Layout: This contains the site specific details like logo, hard coded contents etc. We can use CMS (Content Management System) to change the Site Specific details.
  • View: The actual content requested will be displayed through this page.

Thursday, 19 July 2012

Multi-Tenant Data Architecture

Introduction

Trust, or the lack thereof, is the number one factor blocking the adoption of software as a service (SaaS). A case could be made that data is the most important asset of any business—data about products, customers, employees, suppliers, and more. And data, of course, is at the heart of SaaS. SaaS applications provide customers with centralized, network-based access to data with less overhead than is possible when using a locally-installed application. But in order to take advantage of the benefits of SaaS, an organization must surrender a level of control over its own data, trusting the SaaS vendor to keep it safe and away from prying eyes.
To earn this trust, one of the highest priorities for a prospective SaaS architect is creating a SaaS data architecture that is both robust and secure enough to satisfy tenants or clients who are concerned about surrendering control of vital business data to a third party, while also being efficient and cost-effective to administer and maintain.
This is the second article in our series about designing multi-tenant applications. The first article, Architecture Strategies for Catching the Long Tail, introduced the SaaS model at a high level and discussed its challenges and benefits. It is available on MSDN. Other articles in the series will focus on topics such as workflow and user interface design, overall security, and others.
In this article, we'll look at the continuum between isolated data and shared data, and identify three distinct approaches for creating data architectures that fall at different places along the continuum. Next, we'll explore some of the technical and business factors to consider when deciding which approach to use. Finally, we'll present design patterns for ensuring security, creating an extensible data model, and scaling the data infrastructure.

Three Approaches to Managing Multi-Tenant Data

The distinction between shared data and isolated data isn't binary. Instead, it's more of a continuum, with many variations that are possible between the two extremes.
Aa479086.mlttntda01(en-us,MSDN.10).gif
Data architecture is an area in which the optimal degree of isolation for a SaaS application can vary significantly depending on technical and business considerations. Experienced data architects are used to considering a broad spectrum of choices when designing an architecture to meet a specific set of challenges, and SaaS is certainly no exception. We shall examine three broad approaches, each of which lies at a different location in the continuum between isolation and sharing.
Aa479086.mlttntda02(en-us,MSDN.10).gif

Separate Databases

Storing tenant data in separate databases is the simplest approach to data isolation.
Aa479086.mlttntda03(en-us,MSDN.10).gif
Figure 1. This approach uses a different database for each tenant
Computing resources and application code are generally shared between all the tenants on a server, but each tenant has its own set of data that remains logically isolated from data that belongs to all other tenants. Metadata associates each database with the correct tenant, and database security prevents any tenant from accidentally or maliciously accessing other tenants' data.
Giving each tenant its own database makes it easy to extend the application's data model (discussed later) to meet tenants' individual needs, and restoring a tenant's data from backups in the event of a failure is a relatively simple procedure. Unfortunately, this approach tends to lead to higher costs for maintaining equipment and backing up tenant data. Hardware costs are also higher than they are under alternative approaches, as the number of tenants that can be housed on a given database server is limited by the number of databases that the server can support. (Using autoclose to unload databases from memory when there are no active connections can make an application more scalable by increasing the number of databases each server can support.)
Separating tenant data into individual databases is the "premium" approach, and the relatively high hardware and maintenance requirements and costs make it appropriate for customers that are willing to pay extra for added security and customizability. For example, customers in fields such as banking or medical records management often have very strong data isolation requirements, and may not even consider an application that does not supply each tenant with its own individual database.

Shared Database, Separate Schemas

Another approach involves housing multiple tenants in the same database, with each tenant having its own set of tables that are grouped into a schema created specifically for the tenant.
Aa479086.mlttntda04(en-us,MSDN.10).gif
Figure 2. In this approach each tenant has its own separate set of tables in a common database
When a customer first subscribes to the service, the provisioning subsystem creates a discrete set of tables for the tenant and associates it with the tenant's own schema. You can use the SQL CREATE command to create a schema and authorize a user account to access it. For example, in Microsoft SQL Server 2005:
CREATE SCHEMA ContosoSchema AUTHORIZATION Contoso

The application can then create and access tables within the tenant's schema using the SchemaName.TableName convention:
CREATE TABLE ContosoSchema.Resumes (EmployeeID int identity primary key, 
   Resume nvarchar(MAX))

ALTER USER Contoso WITH DEFAULT_SCHEMA = ContosoSchema

A tenant account can access tables within its default schema by specifying just the table name, instead of using the SchemaName.TableName convention. This way, a single set of SQL statements can be created for all tenants, which each tenant can use to access its own data:
SELECT * FROM Resumes

Like the isolated approach, the separate-schema approach is relatively easy to implement, and tenants can extend the data model as easily as with the separate-database approach. (Tables are created from a standard default set, but once they are created they no longer need to conform to the default set, and tenants may add or modify columns and even tables as desired.) This approach offers a moderate degree of logical data isolation for security-conscious tenants, though not as much as a completely isolated system would, and can support a larger number of tenants per database server.
A significant drawback of the separate-schema approach is that tenant data is harder to restore in the event of a failure. If each tenant has its own database, restoring a single tenant's data means simply restoring the database from the most recent backup. With a separate-schema application, restoring the entire database would mean overwriting the data of every tenant on the same database with backup data, regardless of whether each one has experienced any loss or not. Therefore, to restore a single customer's data, the database administrator may have to restore the database to a temporary server, and then import the customer's tables into the production server—a complicated and potentially time-consuming task.
The separate schema approach is appropriate for applications that use a relatively small number of database tables, on the order of about 100 tables per tenant or fewer. This approach can typically accommodate more tenants per server than the separate-database approach can, so you can offer the application at a lower cost, as long as your customers will accept having their data co-located with that of other tenants.

Shared Database, Shared Schema

A third approach involves using the same database and the same set of tables to host multiple tenants' data. A given table can include records from multiple tenants stored in any order; a Tenant ID column associates every record with the appropriate tenant.
Aa479086.mlttntda05(en-us,MSDN.10).gif
Figure 3. In this approach, all tenants share the same set of tables, and a Tenant ID associates each tenant with the rows that it owns
Of the three approaches explained here, the shared schema approach has the lowest hardware and backup costs, because it allows you to serve the largest number of tenants per database server. However, because multiple tenants share the same database tables, this approach may incur additional development effort in the area of security, to ensure that tenants can never access other tenants' data, even in the event of unexpected bugs or attacks.
The procedure for restoring data for a tenant is similar to that for the shared-schema approach, with the additional complication that individual rows in the production database must be deleted and then reinserted from the temporary database. If there are a very large number of rows in the affected tables, this can cause performance to suffer noticeably for all the tenants that the database serves.
The shared-schema approach is appropriate when it is important that the application be capable of serving a large number of tenants with a small number of servers, and prospective customers are willing to surrender data isolation in exchange for the lower costs that this approach makes possible.

Choosing an Approach

Each of the three approaches described above offers its own set of benefits and tradeoffs that make it an appropriate model to follow in some cases and not in others, as determined by a number of business and technical considerations. Some of these considerations are listed below.

Economic Considerations

Applications optimized for a shared approach tend to require a larger development effort than applications designed using a more isolated approach (because of the relative complexity of developing a shared architecture), resulting in higher initial costs. Because they can support more tenants per server, however, their ongoing operational costs tend to be lower.
Aa479086.mlttntda06(en-us,MSDN.10).gif
Figure 4. Cost over time for a hypothetical pair of SaaS applications; one uses a more isolated approach, while the other uses a more shared approach
Your development effort can be constrained by business and economic factors, which can influence your choice of approach. The shared schema approach can end up saving you money over the long run, but it does require a larger initial development effort before it can start producing revenue. If you are unable to fund a development effort of the size necessary to build a shared schema application, or if you need to bring your application to market more quickly than a large-scale development effort would allow, you may have to consider a more isolated approach.

Security Considerations

As your application will store sensitive tenant data, prospective customers will have high expectations about security, and your service level agreements (SLAs) will need to provide strong data safety guarantees. A common misconception holds that only physical isolation can provide an appropriate level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use of more sophisticated design patterns.

Tenant Considerations

The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach.
  • How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.
  • How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)
  • How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.
  • Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.
Aa479086.mlttntda07(en-us,MSDN.10).gif
Figure 5. Tenant-related factors and how they affect "isolated versus shared" data architecture decisions

Regulatory Considerations

Companies, organizations, and governments are often subject to regulatory law that can affect their security and record storage needs. Investigate the regulatory environments that your prospective customers occupy in the markets in which you expect to operate, and determine whether they present any considerations that will affect your decision.

Skill Set Considerations

Designing single-instance, multi-tenant architecture is still a very new skill, so subject matter expertise can be hard to come by. If your architects and support staff do not have a great deal of experience building SaaS applications, they will need to acquire the necessary knowledge, or you will have to hire people that already have it. In some cases, a more isolated approach may allow your staff to leverage more of its existing knowledge of traditional software development than a more shared approach would.

Realizing Multi-Tenant Data Architecture

The remainder of this article details a number of patterns that can help you plan and build your SaaS application. As we discussed in our introductory article, a well-designed SaaS application is distinguished by three qualities: scalability, configurability, and multi-tenant efficiency. The table below lists the patterns appropriate for each of the three approaches, divided into sections representing these three qualities.
Optimizing for multi-tenant efficiency in a shared environment must not compromise the level of security safeguarding data access. The security patterns listed below demonstrate how you can design an application with "virtual isolation" through mechanisms such as permissions, SQL views, and encryption.
Configurability allows SaaS tenants to alter the way the application appears and behaves without requiring a separate application instance for each individual tenant. The extensibility patterns describe possible ways you can implement a data model that tenants can extend and configure individually to meet their needs.
The approach you choose for your SaaS application's data architecture will affect the options available to you for scaling it to accommodate more tenants or heavier usage. The scalability patterns address the different challenges posed by scaling shared databases and dedicated databases.

Security Patterns

Trusted Database Connections

Aa479086.mlttntda08(en-us,MSDN.10).gif
Aa479086.mlttntda09(en-us,MSDN.10).gif
Aa479086.mlttntda10(en-us,MSDN.10).gif
This approach involves creating a database access account for each tenant, and using ACLs to grant each of these tenant accounts access to the database objects the tenant is allowed to use. When an end user performs an action that directly or indirectly requires a call to a database, the application uses credentials associated with the tenant account, rather than credentials associated with the end user. (One way for the application to obtain the proper credentials is through impersonation, in conjunction with a credentialing system like Kerberos. A second approach is to use a security token service that returns an actual set of encrypted login credentials established for the tenant, that the application process can then submit to the database.) The database server does not distinguish between requests originating from different end users associated with the same tenant, and grants all such requests access to the tenant's data. Within the application itself, security code prevents end users from receiving and modifying any data that they are not entitled to access.
For example, consider an end user of a customer relations management (CRM) application who performs an operation that queries the database for customer records matching a certain string. The application submits the query to the database using the security context of the tenant, so instead of returning all of the matching records in the database, the query only retrieves the matching rows from the tables the tenant is allowed to access. So far, so good—but suppose the end user's role only allows her to access records of customers located within a certain geographic region. (For more information about roles, see the section "Authorization" in Architecture Strategies for Catching the Long Tail, the first article in this series.) The application must intercept the query results and only present the user with the records that she is entitled to see.

Secure Database Tables

Tenant View Filter

Tenant Data Encryption

Cryptographic methods are categorized as either symmetric or asymmetric. In symmetric cryptography, a key is generated that is used to encrypt and decrypt data. Data encrypted with a symmetric key can be decrypted with the same key. In asymmetric cryptography (also called public-key cryptography), two keys are used, designated the public key and the private key. Data that is encrypted with a given public key can only be decrypted with the corresponding private key, and vice versa. Generally, public keys are distributed to any and all parties interested in communicating with the key holder, while private keys are held secure. For example, if Alice wishes to send an encrypted message to Bob, she obtains Bob's public key through some agreed-upon means, and uses it to encrypt the message. The resulting encrypted message, or cyphertext, can only be decrypted by someone in possession of Bob's private key (in practice, this should only be Bob). This way, Bob never has to share his private key with Alice. To send a message to Bob using symmetric encryption, Alice would have to send the symmetric key separately—which runs the risk that the key might be intercepted by a third party during transmission.

Extensibility Patterns

As designed, your application will naturally include a standard database setup, with default tables, fields, queries, and relationships that are appropriate to the nature of your solution. But different organizations have their own unique needs that a rigid, inextensible default data model won't be able to address. For example, one customer of a SaaS job-tracking system might have to store an externally generated classification code string with each record to fully integrate the system with their other processes. A different customer may have no need for a classification string field, but might require support for tracking a category ID number, an integer. Therefore, in many cases you will have to develop and implement a method by which customers can extend your default data model to meet their needs, without affecting the data model that other customers use.

Preallocated Fields

Aa479086.mlttntda11(en-us,MSDN.10).gif
Aa479086.mlttntda12(en-us,MSDN.10).gif
Aa479086.mlttntda13(en-us,MSDN.10).gif

Name-Value Pairs

Aa479086.mlttntda14(en-us,MSDN.10).gif

Custom Columns

Aa479086.mlttntda15(en-us,MSDN.10).gif
This pattern is appropriate for separate-database or separate-schema applications, because each tenant has its own set of tables that can be modified independently of those belonging to any other clients. From a data model standpoint, this is the simplest of the three extensibility patterns, because it does not require you to track data extensions separately. On the application architecture side, though, this pattern can sometimes be more difficult to implement, because it allows tenants to vary the number of columns in a table. Even if the Custom Columns pattern is available to you, you may consider using a variation on the Preallocated Fields or Name-Value Pairs pattern to reduce development effort, allowing you to write application code that can assume a known and unchanging number of fields in each table.

Using Data Model Extensions

Scalability Patterns

Large-scale enterprise software is intended to be used by thousands of people simultaneously. If you have experience building enterprise applications of this sort, you know first-hand the challenges of creating a scalable architecture. For a SaaS application, scalability is even more important, because you'll have to support data belonging to all your customers. For independent software vendors (ISVs) accustomed to building on-premise enterprise software, supporting this kind of user base is like moving from the minor leagues to the majors: the rules may be familiar, but the game is played on an entirely different level. Instead of a widely deployed, business-critical enterprise application, you're really building an Internet-scale system that needs to actively support a user base potentially numbering in the millions.

Scaling Techniques

Tenant-Based Horizontal Partitioning

Single Tenant Scaleout

Conclusion

The design approaches and patterns we've discussed in this article should help you create the foundation layer of trust that's vital to the success of your SaaS application. Designing a SaaS data architecture that reconciles the competing benefits and demands of sharing and isolation isn't a trivial task, but these approaches and patterns should help you identify and resolve many of the critical questions you will face. The ideas and recommendations presented here differ in the details, but they all help you leverage the principles of configurability, scalability, and multi-tenant efficiency to design a secure and extensible data architecture for a SaaS application.