Introduction
Trust, or the lack thereof, is the number one factor
blocking the adoption of software as a service (SaaS). A case could be
made that data is the most important asset of any business—data about
products, customers, employees, suppliers, and more. And data, of
course, is at the heart of SaaS. SaaS applications provide customers
with centralized, network-based access to data with less overhead than
is possible when using a locally-installed application. But in order to
take advantage of the benefits of SaaS, an organization must surrender a
level of control over its own data, trusting the SaaS vendor to keep it
safe and away from prying eyes.
To earn this trust, one of the highest priorities for a prospective
SaaS architect is creating a SaaS data architecture that is both robust
and secure enough to satisfy tenants or clients who are concerned about
surrendering control of vital business data to a third party, while also
being efficient and cost-effective to administer and maintain.
This is the second article in our series about designing multi-tenant applications. The first article, Architecture Strategies for Catching the Long Tail,
introduced the SaaS model at a high level and discussed its challenges
and benefits. It is available on MSDN. Other articles in the series will
focus on topics such as workflow and user interface design, overall
security, and others.
In this article, we'll look at the continuum between isolated data
and shared data, and identify three distinct approaches for creating
data architectures that fall at different places along the continuum.
Next, we'll explore some of the technical and business factors to
consider when deciding which approach to use. Finally, we'll present
design patterns for ensuring security, creating an extensible data
model, and scaling the data infrastructure.
Three Approaches to Managing Multi-Tenant Data
The distinction between shared data and isolated data isn't binary.
Instead, it's more of a continuum, with many variations that are
possible between the two extremes.
Data architecture is an area in which the optimal degree of isolation
for a SaaS application can vary significantly depending on technical
and business considerations. Experienced data architects are used to
considering a broad spectrum of choices when designing an architecture
to meet a specific set of challenges, and SaaS is certainly no
exception. We shall examine three broad approaches, each of which lies
at a different location in the continuum between isolation and sharing.
Separate Databases
Storing tenant data in separate databases is the simplest approach to data isolation.
Figure 1. This approach uses a different database for each tenant
Computing resources and application code are generally shared between
all the tenants on a server, but each tenant has its own set of data
that remains logically isolated from data that belongs to all other
tenants. Metadata associates each database with the correct tenant, and
database security prevents any tenant from accidentally or maliciously
accessing other tenants' data.
Giving each tenant its own database makes it easy to extend the
application's data model (discussed later) to meet tenants' individual
needs, and restoring a tenant's data from backups in the event of a
failure is a relatively simple procedure. Unfortunately, this approach
tends to lead to higher costs for maintaining equipment and backing up
tenant data. Hardware costs are also higher than they are under
alternative approaches, as the number of tenants that can be housed on a
given database server is limited by the number of databases that the
server can support. (Using autoclose to unload databases from memory
when there are no active connections can make an application more
scalable by increasing the number of databases each server can support.)
Separating tenant data into individual databases is the "premium"
approach, and the relatively high hardware and maintenance requirements
and costs make it appropriate for customers that are willing to pay
extra for added security and customizability. For example, customers in
fields such as banking or medical records management often have very
strong data isolation requirements, and may not even consider an
application that does not supply each tenant with its own individual
database.
Shared Database, Separate Schemas
Another approach involves housing multiple tenants in the same
database, with each tenant having its own set of tables that are grouped
into a schema created specifically for the tenant.
Figure 2. In this approach each tenant has its own separate set of tables in a common database
When a customer first subscribes to the service, the provisioning
subsystem creates a discrete set of tables for the tenant and associates
it with the tenant's own schema. You can use the SQL CREATE command to
create a schema and authorize a user account to access it. For example,
in Microsoft SQL Server 2005:
CREATE SCHEMA ContosoSchema AUTHORIZATION Contoso
The application can then create and access tables within the tenant's schema using the SchemaName.TableName convention:
CREATE TABLE ContosoSchema.Resumes (EmployeeID int identity primary key, Resume nvarchar(MAX))
ALTER USER Contoso WITH DEFAULT_SCHEMA = ContosoSchema
A tenant account can access tables within its default schema by specifying just the table name, instead of using the SchemaName.TableName
convention. This way, a single set of SQL statements can be created for
all tenants, which each tenant can use to access its own data:
SELECT * FROM Resumes
Like the isolated approach, the separate-schema approach is
relatively easy to implement, and tenants can extend the data model as
easily as with the separate-database approach. (Tables are created from a
standard default set, but once they are created they no longer need to
conform to the default set, and tenants may add or modify columns and
even tables as desired.) This approach offers a moderate degree of
logical data isolation for security-conscious tenants, though not as
much as a completely isolated system would, and can support a larger
number of tenants per database server.
A significant drawback of the separate-schema approach is that tenant
data is harder to restore in the event of a failure. If each tenant has
its own database, restoring a single tenant's data means simply
restoring the database from the most recent backup. With a
separate-schema application, restoring the entire database would mean
overwriting the data of every tenant on the same database with backup
data, regardless of whether each one has experienced any loss or not.
Therefore, to restore a single customer's data, the database
administrator may have to restore the database to a temporary server,
and then import the customer's tables into the production server—a
complicated and potentially time-consuming task.
The separate schema approach is appropriate for applications that use
a relatively small number of database tables, on the order of about 100
tables per tenant or fewer. This approach can typically accommodate
more tenants per server than the separate-database approach can, so you
can offer the application at a lower cost, as long as your customers
will accept having their data co-located with that of other tenants.
Shared Database, Shared Schema
A third approach involves using the same database and the
same set of tables to host multiple tenants' data. A given table can
include records from multiple tenants stored in any order; a Tenant ID
column associates every record with the appropriate tenant.
Figure 3. In this approach, all tenants share
the same set of tables, and a Tenant ID associates each tenant with the
rows that it owns
Of the three approaches explained here, the shared schema approach
has the lowest hardware and backup costs, because it allows you to serve
the largest number of tenants per database server. However, because
multiple tenants share the same database tables, this approach may incur
additional development effort in the area of security, to ensure that
tenants can never access other tenants' data, even in the event of
unexpected bugs or attacks.
The procedure for restoring data for a tenant is similar to that for
the shared-schema approach, with the additional complication that
individual rows in the production database must be deleted and then
reinserted from the temporary database. If there are a very large number
of rows in the affected tables, this can cause performance to suffer
noticeably for all the tenants that the database serves.
The shared-schema approach is appropriate when it is important that
the application be capable of serving a large number of tenants with a
small number of servers, and prospective customers are willing to
surrender data isolation in exchange for the lower costs that this
approach makes possible.
Choosing an Approach
Each of the three approaches described above offers its own set of
benefits and tradeoffs that make it an appropriate model to follow in
some cases and not in others, as determined by a number of business and
technical considerations. Some of these considerations are listed below.
Economic Considerations
Applications optimized for a shared approach tend to require a larger
development effort than applications designed using a more isolated
approach (because of the relative complexity of developing a shared
architecture), resulting in higher initial costs. Because they can
support more tenants per server, however, their ongoing operational
costs tend to be lower.
Figure 4. Cost over time for a hypothetical
pair of SaaS applications; one uses a more isolated approach, while the
other uses a more shared approach
Your development effort can be constrained by business and economic
factors, which can influence your choice of approach. The shared schema
approach can end up saving you money over the long run, but it does
require a larger initial development effort before it can start
producing revenue. If you are unable to fund a development effort of the
size necessary to build a shared schema application, or if you need to
bring your application to market more quickly than a large-scale
development effort would allow, you may have to consider a more isolated
approach.
Security Considerations
As your application will store sensitive tenant data, prospective
customers will have high expectations about security, and your service
level agreements (SLAs) will need to provide strong data safety
guarantees. A common misconception holds that only physical isolation
can provide an appropriate level of security. In fact, data stored using
a shared approach can also provide strong data safety, but requires the
use of more sophisticated design patterns.
Tenant Considerations
The number, nature, and needs of the tenants you expect to serve all
affect your data architecture decision in different ways. Some of the
following questions may bias you toward a more isolated approach, while
others may bias you toward a more shared approach.
- How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.
- How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)
- How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.
- Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.
Figure 5. Tenant-related factors and how they affect "isolated versus shared" data architecture decisions
Regulatory Considerations
Companies, organizations, and governments are often subject to
regulatory law that can affect their security and record storage needs.
Investigate the regulatory environments that your prospective customers
occupy in the markets in which you expect to operate, and determine
whether they present any considerations that will affect your decision.
Skill Set Considerations
Designing single-instance, multi-tenant architecture is still a very
new skill, so subject matter expertise can be hard to come by. If your
architects and support staff do not have a great deal of experience
building SaaS applications, they will need to acquire the necessary
knowledge, or you will have to hire people that already have it. In some
cases, a more isolated approach may allow your staff to leverage more
of its existing knowledge of traditional software development than a
more shared approach would.
Realizing Multi-Tenant Data Architecture
The remainder of this article details a number of patterns that can
help you plan and build your SaaS application. As we discussed in our introductory article, a well-designed SaaS application is distinguished by three qualities: scalability, configurability, and multi-tenant efficiency.
The table below lists the patterns appropriate for each of the three
approaches, divided into sections representing these three qualities.
Optimizing for multi-tenant efficiency in a shared environment must
not compromise the level of security safeguarding data access. The
security patterns listed below demonstrate how you can design an
application with "virtual isolation" through mechanisms such as
permissions, SQL views, and encryption.
Configurability allows SaaS tenants to alter the way the application
appears and behaves without requiring a separate application instance
for each individual tenant. The extensibility patterns describe possible
ways you can implement a data model that tenants can extend and
configure individually to meet their needs.
The approach you choose for your SaaS application's data architecture
will affect the options available to you for scaling it to accommodate
more tenants or heavier usage. The scalability patterns address the
different challenges posed by scaling shared databases and dedicated
databases.
Approach | Security Patterns | Extensibility Patterns | Scalability Patterns |
---|---|---|---|
Separate Databases | |||
Shared Database, Separate Schemas | |||
Shared Database, Shared Schema |
Security Patterns
- Filtering: Using an intermediary layer between a tenant and a data source that acts like a sieve, making it appear to the tenant as though its data is the only data in the database.
- Permissions: Using access control lists (ACLs) to determine who can access data in the application and what they can do with it.
- Encryption: Obscuring every tenant's critical data so that it will remain inaccessible to unauthorized parties even if they come into possession of it.
Trusted Database Connections
For example, consider an end user of a customer relations management
(CRM) application who performs an operation that queries the database
for customer records matching a certain string. The application submits
the query to the database using the security context of the tenant, so
instead of returning all of the matching records in the database, the
query only retrieves the matching rows from the tables the tenant is
allowed to access. So far, so good—but suppose the end user's role only
allows her to access records of customers located within a certain
geographic region. (For more information about roles, see the section
"Authorization" in Architecture Strategies for Catching the Long Tail,
the first article in this series.) The application must intercept the
query results and only present the user with the records that she is
entitled to see.
Secure Database Tables
Tenant View Filter
Tenant Data Encryption
Extensibility Patterns
Preallocated Fields
Name-Value Pairs
- The ID of the associated record in the primary data table.
- The extension ID associated with the correct custom field definition.
- The value of the custom field in the record that's being saved, cast to a string.
Custom Columns
Using Data Model Extensions
Scalability Patterns
Scaling Techniques
Tenant-Based Horizontal Partitioning
Single Tenant Scaleout
The article Scaling Out SQL Server 2005
contains additional guidance and suggestions about analyzing data for
scaling out. The article explains reference data, activity data, and
resource data in detail, gives some guidelines for replicating and
partitioning data, and explains some additional factors that affect
scaleout. Some of the scaleout guidelines to consider:
- Use replication to create read-only copies of data that doesn't change very often. Some kinds of data rarely or never change after the data is entered, such as part numbers or employee Social Security numbers. Other kinds of data are subject to active change for a defined period of time and then archived, such as purchase orders. These kinds of data are ideal candidates for one-way replication to any databases from which they might be referenced.
- Location, location, location. Keep data close to other data that references it. ("Close" in this sense generally means logically proximate rather than physically proximate, although logical proximity often implies physical proximity as well.) Consider the relationships between different kinds of data when deciding whether to separate them, and use replication to distribute read-only copies of reference data among different databases when appropriate. For example, if the act of retrieving a customer record routinely involves selecting the customer's recent purchase orders from a different table, try to keep the two tables in the same database, or use replication to create copies of appropriate kinds of data. Try to find natural divisions in the data that will minimize the amount of cross-database communication that needs to take place. For example, data associated with particular places can often be partitioned geographically.
- Identify data that shouldn't be partitioned. Resource data, such as warehouse inventory levels, are usually poor candidates for replication or partitioning. Use scaleout techniques to move other data off the server, leaving your resource data more room to grow. If you have moved all the data you can and still experience problems, consider scaling up to a bigger server for the resource data.
- Use single-master replication whenever possible. Synchronizing changes to multiple copies of the same data is difficult, so avoid using multi-master replication if you can. When replicated data must be changed, only allow changes to be written to the master copy.
For general scaling guidance, see the Performance & Scalability resources published by Microsoft patterns & practices.
Conclusion
The design approaches and patterns we've discussed in this article
should help you create the foundation layer of trust that's vital to the
success of your SaaS application. Designing a SaaS data architecture
that reconciles the competing benefits and demands of sharing and
isolation isn't a trivial task, but these approaches and patterns should
help you identify and resolve many of the critical questions you will
face. The ideas and recommendations presented here differ in the
details, but they all help you leverage the principles of
configurability, scalability, and multi-tenant efficiency to design a
secure and extensible data architecture for a SaaS application.
Shared Database, Shared Schema
ReplyDeleteThis seems to be a good approach for SAAS Application.
Advancement in technology has made the world go "gaga". As far as technology is concerned, you can expect the unexpected or imagine the unimaginable. The world has left the stage of crude implementation. Every facet of life has been touched and affected by technology. The bewilderment of everyone is that existing technologies are fast becoming obsolete by the day; courtesy of advancement in technology... mobile tracker
ReplyDeleteAs technology continues to evolve, it is increasingly important that people fully understand the "technology state of the company". Last week, Apical Resource Group had our second monthly All Candidates meeting in order to discuss the employment landscape, new initiatives and new hospitality technologies. camera spy
ReplyDeleteTennessee Technology Center at Nashville is one of the 26 technology centers established in the year 1963. The technology center excels in offering technical training programs in various fields. best xmind alternative
ReplyDeleteCCTV cameras aren't just used for home and business security. They can also be used for unique and extraordinary purposes. cctv
ReplyDeleteKnowing OBD1 diagnostic system is a must if you own vehicles made before 1995. Although very tricky sometimes, expected returns in savings is enormous considering garages will charge you almost $100 an hour to test your check engine light which sometimes can be done in less than 5 minutes. vehicle check
ReplyDelete