In The Mix

As a SharePoint architect I have the business behind me and the Developers and IT Pro on my shoulders.

Increase File Size crawling May 22, 2008

Filed under: SharePoint — fmuntean @ 12:17 pm

… Or how to fix the warnings “The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled” in the Crawling log.

By default, SharePoint Portal Server 2007 can crawl and filter a file with a size of up to 16 MB. After this limit is reached, SharePoint Portal Server enters a warning in the gatherer log “The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled.”

To change the limit of 16 MB, you must add in the registry a new entry MaxDownloadSize.

  1. Start Registry Editor (Regedit.exe).
  2. Locate the following key in the registry:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager
  3. Open Edit – New – DWORD Value. Name it MaxDownloadSize. Double-click, change the value to Decimal, and type the maximum size (in MB) for files that the gatherer downloads.
  4. Restart the server.
  5. Start Full Crawl.

Use this technique at your own risk! 🙂 Note: Increasing the file size may cause a timeout exception because the crawler can timeout if the file takes too long to crawl/index (because of its size). To increase timeout value:

  1. In Central Administration, on the Application Management tab, in the Search section, click Manage search service.
  2. On the Manage Search Service page, in the Farm-Level Search Settings section, click Farm-level search settings.
  3. In the Timeout Settings section change Connection and Request acknowledgement time.

  1. The key for WSS3 is HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\
    Web Server Extensions\12.0\Search\Global\Gathering Manager

We can control how much the indexer will index on a single document based on registry keys on the indexerunder the regkey HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering ManagerMaxGrowFactor * MaxDownloadSize = max size of a file that can be indexed In MB.

MaxDownloadSize = 64MB (default = 16MB) MaxGrowFactor = 4, allows index filter to produce up to 256MB (64 x 4) of text from a file. (Defaults of 16MB * 4MB= 64MB of text)

References:

MOSS and ECM Class Notes and Questions

Specifying the Maximum File Size That MOSS Can Crawl

Advertisements
 

8 Responses to “Increase File Size crawling”

  1. KAPILJITH.R Says:

    Formula:
    ———————————————————————
    MaxGrowFactor * MaxDownloadSize = Max size in MB

    Example:
    ———————————————————————
    Scenario 1 :
    MaxDownloadSize = 16MB (default )
    MaxGrowFactor = 4
    .-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    16 * 4 = 64MB of Text
    ——————–
    Scenario 2 :
    MaxDownloadSize = 64MB (default * 4)
    MaxGrowFactor = 4
    .-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    64 * 4 = 256MB of Text

  2. Jennifer Says:

    You don’t need to restart the whole server to have the MaxDownloadSize become effective: just stop and start the Search Service (net stop osearch, net start osearch)
    jennifer

  3. Ruedi Says:

    Did someone have success with a new 2010 SP Server? I tried different Registry tricks, but no success.
    Any help would be very appreciated

  4. fmuntean Says:

    I have tried the registry keys in SharePoint 2010 and on TechNet (http://technet.microsoft.com/en-us/library/ff721975.aspx) provide the following:
    “The default SharePoint search will handle files with a maximum size of 16 MB. To crawl files larger than 16 MB, the server administrator must edit the server system registry at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering Manager\MaxDownloadSize.”

    However setting this key does not currently work.
    Added two txt files one 20Mb and one just under 16Mb with the same content (some log file) and the search reports both files crawled succesfuly but returns results only from the smaller one.

  5. Tasia Outley Says:

    Helpful blog, saved your website in interest to see more information!

  6. Nicotinic Acid Says:

    *’` I am very thankful to this topic because it really gives great information ,~-

  7. kfrench581 Says:

    Is SharePoint supposed to index the content of the file before it reaches the limit? We noticed that the only properties we can see for a large (over the limit) file were filename/filesize/url/etc. and nothing inside the file. It might act different if the target file was from a hyperlink versus a file system.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s