In The Mix

As a SharePoint architect I have the business behind me and the Developers and IT Pro on my shoulders.

Stubbing documents in SharePoint September 9, 2009

Filed under: SharePoint — fmuntean @ 10:31 pm

It is known that SharePoint keeps all its data into the SQL database. Now SQL is a relational database and there is nothing relational about storing documents into the content Database. As your SharePoint grow more and more people are storing documents into SharePoint.

When versioning is enabled on a document library each version of the document is kept separately instead of keeping only de bits that are different.

With many users on the system there is a high probability that the same document is store in multiple places in SharePoint.

 

To alleviate the problems described before we can store the documents outside SharePoint and keep only the attributes around them into the content database. This will keep the content database smaller and more manageable.

 

There are two options to achieve this:

  1. Using the External BLOB Storage API the BLOBS (this is how the documents are stored inside SQL database) can be stored outside of SharePoint content database. However this is a farm wide configuration and requires manual management of the orphaned blob files as there is no method in the interface to accommodate for this. For more info on this follow: http://technet.microsoft.com/en-us/magazine/2009.06.insidesharepoint.aspx
  2. Using a stubbing mechanism that I will describe further.

Stubbing

Defined as: a short part of something that is left after the main part has been removed; is the process of replacing the real files with a smaller file containing only the information necessary to retrieve the original file. Now the real file can be store anywhere and in any format as long it can be restored in a timely manner and unmodified.

 

How can we implement this in SharePoint?

The way that I envision is that you will create a new Stubbing Document Library that will allow for configuration of stubbing mechanism but for the end users will be transparent where the file is located physically.

If implemented correctly not only that the files are stored on an external location but duplicates could be detected and better versioning capabilities would be available thus optimizing the storage even further.

 

Few pieces are needed to make this work:

  • Event receiver for Add, Update, Delete events that will replace the real file with the stub.
  • A service that will handle the stubbing, versioning, binary diff, and any other management and reporting on the external storage.
  • An HTTP module that will catch the stub just before reaching the client and will call the service to get and return the real file to the client. (there is no Get in the event receivers)
  • Admin and configuration pages under document library settings.
  • If versioning is implemented then would be nice to have ECB items for getting more info about a certain file.

 

The Event Receiver:

We are going to need at least the following three events to handle the stubbing of a new item, updates to an existing item and deletion of an item from SharePoint.

Item Added: We will let SharePoint to finish uploading the file into the document library and then we are going to call the service  for a new stub as we have a new file. We now can replace the file with the stub keeping the filename and extension he same. the stub can be as simple as an XML.

Item Deleted: We will get the stub and call the web service to delete the file from external storage.

Item Updating: We will call the service for a new stub that is a child of the previous stub thus enabling versioning regardless of the document library having the versioning enabled. Actually enabling the versioning for this document library will complicate the things a little as we will need to handle more events.

 

The Stubbing Service:

A Web Service implementing the following methods:

Create Stub:

Parameters: file stream

Create a hash for the file and checks if the file is already in the external store.(this will handle duplicates). If the file exists then a reference count will need to be incremented and the existing stub will be cloned and returned. If the file does not exists then the file gets stored into the external storage and  a new stub get created and returned.

Update Stub:

Parameters: new file stream, old stub

Will get the file referenced into the old stub. We will create a binary diff file and store this into the external storage. Create and return a new stub and store the old stub inside it thus making a new version of the existing file.

Delete Stub:

Parameters: stub

We decrement the reference count for the specified stub and if the reference count is zero we can delete the file from the external storage.

Get:

Parameters: stub

We will get and return the file from external storage. If the file is a binary diff then we will recursively look for the parent and generate the real file before returning.

 

Most likely there is a need for a database to store hash codes, stubs counts and other metadata used by the stubbing service.

 

HTTP Module:

This is required to be able to replace the stub file with the real file when the user requests the file from SharePoint.

We are going to attach ourselves to EndRequest event using the Init method.

This will allow us to check for the content type and the request URL to determine if the user will receive a stub so we can call the web service to get it replaced by the real file.

 

Library Settings Pages:

This is where we can plug a page to configure the Stubbing web Service URL and the external storage location.

 

ECB Items:

Would be nice to have few items here that will give the user some control and information about the stub.

Un-Stub: Will replace the stub with the real file and mark that item to not be stub again.

Re-Stub: Will replace the file with the stub and clear the un-stub flag.

Versions: Will display a page with all existing versions for the current item and give the possibility to restore the item to a previous version. will be nice to give the possibility of deleting old versions too.

Advertisements
 

4 Responses to “Stubbing documents in SharePoint”

  1. Hi,

    do you have some tipps where to start implementing a stubbing for SharePoint? Some docs, links, or so on 😉

    Thanks so far,
    Daniel

  2. jawad Says:

    Very good information, but it was bit difficult to understand in the first go.

    The 2nd mechanism of using “Stubbing Service” is only applicable to any single document library within SharePoint, but not for alls sites across SP.

    Where will we define the archive policies? Will we have to do this for all lists/sites?

    Actually, what I was trying to extract from you document is:
    If you have thought of an efficient mechanism other than using EBS.

    The concept is somewhat understable, but still need some improvemet for better understanding.

    Thanks anyway for sharing your thoughts!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s