Logo
PREGNANCY
Windows XP
Windows Vista
Windows 7
Windows Azure
Windows Server
Windows Phone
 
 
Windows Server

Sharepoint 2010 : Managing Crawls

3/23/2012 3:47:42 PM
To manage crawls, you must understand the differences between full and incremental crawls. A full crawl will follow the instructions of the content source and the crawl rules to crawl the entire content source according to the content type, whether hierarchical, enumerated list, or link traversal. A full crawl will replace the current index for that content source and give you a new index. However, because some full crawls take many hours, the old index for that content source will remain on the index and query servers to meet query demand by your users and is only replaced after the full crawl has successfully completed. This means that, for a brief length of time, you’ll have two full indexes of the same content source existing on your hard drives. Be sure you plan for enough disk space for committing full crawls.

What is crawled during an incremental crawl depends on the content type and how changes are detected for that content type. For a file system crawl or normal Web crawls, the date/time stamp is compared to a crawl history log. However, for SharePoint incremental crawls, the change logs maintained in the content databases are used. SharePoint 2010 now supports a very quick ACL-only crawl to update security information for index items. Most databases do not support incremental crawls. FAST technology supports change notifications from SQL databases that essentially “push” changes to the crawler, but the SharePoint 2010 Search feature does not.

In the following sections, you’ll learn how to manage crawls from the Manage Content Sources page shown in Figure 1, which presents the tools for managing crawls.

Figure 1. Crawl management options from the Manage Content Sources page


1. Global Crawl Management

Crawls for all content sources can be managed globally with the toolbar option to Start All Crawls, which changes to Stop All Crawls and Pause All Crawls after crawls are started. The type of crawl initiated by the Start All Crawls option depends on several factors.

  • It would follow the next crawl scheduled for each content source whether it is a full or incremental crawl.

  • If a crawl has been paused, then that crawl will be resumed.

  • If no crawl is scheduled and a full crawl has been completed, then an incremental crawl is started. However, remember that the first crawl of any content source is always a full crawl.

  • If either type of crawl has been stopped, the next crawl will always be a full crawl. Therefore, careful consideration should be given to the impact of using the Stop All Crawls tool.

  • The indexing process can always force a full crawl if it determines that enough errors exist in the index that an incremental crawl may not correct them.


Note:

Although the crawl process is read-only and does not modify the files, it will change the last read date on some files, which can impact access auditing.


2. Content Source Crawl Management

The context menu of each content source presents crawl management tools. You can start both full and incremental crawls from the context menu. You can also use the menu to pause, resume, or stop an active crawl. Remember that any time a crawl is stopped or does not complete for any reason, the next crawl of that content source will be a full crawl, because the information in the crawl log and markers set on the change logs are considered inaccurate. When a crawl is paused, the instructions for the crawl and the information about the crawl are retained in memory on the host of the crawl component for use when the crawl is resumed.

3. User Crawl Management

SharePoint crawlers have always obeyed “Do Not Crawl” instructions embedded in Web content. SharePoint 2010 continues to offer content owners of lists, libraries, and sites the ability to add these instructions through the user interface and eliminate their content from search indexes. Site collection administrators can also flag site columns (metadata) to keep them from being crawled. Personally identifiable information (PII) is an example of information that should not be indexed on public sites. Be sure to have clear policies regarding what type of content should or should not appear in your index.

4. Scheduling Crawls

The management of crawl schedules is an ongoing process that may require daily monitoring and tweaking. The Manage Content Sources page presents information on the duration of the current and last crawl but does not indicate the type of crawl involved.

However, the Crawl History view of the crawl logs itemizes each crawl’s start and end times with the calculated duration as well as the activity accomplished during the crawl. This information permits search administrators to adjust the crawl schedules as the corpus grows so that a crawl can complete successfully before the next crawl begins. Crawls must be scheduled as often as needed to meet the “freshness” requirements of your organization. You might need to adjust the topology of your search service to add resources to complete crawls often enough to meet these needs. When determining additional resources, consider the impact the additions will have on the WFEs being crawled and on the SQL servers hosting the content and search databases.

With the improvements in incremental crawl instructions, you may only schedule full crawls when required instead of on a regular basis. The crawl component can itself switch to a full crawl if

  • A search application administrator stopped the previous crawl or the previous crawl did not complete for any reason.

  • A content database was restored from backup without the appropriate switch on the STSADM –restore operation that allows the farm administrators to restore a content database without forcing a full crawl.

  • A farm administrator has detached and reattached a content database.

  • A full crawl of the content source has never been done.

  • The change log does not contain entries for the addresses that are being crawled. Without entries in the change log for the items being crawled, incremental crawls cannot occur.

  • Depending on the severity of the corruption, the index server might force a full crawl if corruption is detected in the index.

Finally, when is a full crawl required?

  • When a search application administrator added a new managed property.

  • To re-index ASPX pages on Windows SharePoint Services 3.0 or SharePoint Server 2007 sites.


    Note:

    Incremental crawls do not re-index views or home pages when content within the page has changed, such as the deletion of individual list items. This is because of the inability of the crawler to detect when ASPX pages on SharePoint sites have changed. You should periodically do full crawls of sites that contain ASPX files to ensure that these pages are re-indexed unless you have the site configured to not have ASPX pages crawled. This behavior is the same as in previous versions of SharePoint.


  • To resolve consecutive incremental crawl failures. The index server has been reported to remove content that could not be accessed in 100 consecutive attempts.

  • When crawl rules have been added, deleted, or modified.

  • To repair a corrupted index.

  • When the search services administrator has created one or more server name mappings.

  • When the account assigned to the default content access account or crawl rule account has changed. This also automatically triggers a full crawl. Account password changes do not require or trigger a full crawl.

  • When file types and/or iFilters have been installed and the new content needs to be indexed.

Other -----------------
- Microsoft Lync Server 2010 Monitoring : Installation (part 2) - Installing the Monitoring Server Role
- Microsoft Lync Server 2010 Monitoring : Installation (part 1) - Topology Builder for Microsoft Lync Server Monitoring Role
- Microsoft Lync Server 2010 Edge : Edge Troubleshooting
- Active Directory Domain Services 2008 : Enable a Computer Object
- Active Directory Domain Services 2008 : Disable a Computer Object
- Windows Server 2008 Server Core : Starting the Command Interpreter (part 5)
- Windows Server 2008 Server Core : Starting the Command Interpreter (part 4)
- Windows Server 2008 Server Core : Starting the Command Interpreter (part 3) - Modifying AutoExec.NT & Defining Application Compatibility with the SetVer Utility
- Windows Server 2008 Server Core : Starting the Command Interpreter (part 2) - Modifying Config.NT
- Windows Server 2008 Server Core : Starting the Command Interpreter (part 1) - Using the CMD Switches
- Windows Server 2008 Server Core : Essential Registry Hacks - Modifying the Software Setup
- Sharepoint 2007 : Customizing a SharePoint Site - Create a Content Type
- Sharepoint 2007 : Modify the Top or Left Navigation Bar & Create a Site Column
- Microsoft Systems Management Server 2003 : Configuring Site Server Properties and Site Systems - Monitoring Status and Flow
- Microsoft Systems Management Server 2003 : Configuring Site Server Properties and Site Systems - The Site Configuration Process Flow
- Windows Server 2003 : Windows Security and Patch Management - Using Auditing and the Event Log
- Windows Server 2003 : Windows Security and Patch Management - Locking Down Windows
- Recovering from a Disaster in an Exchange Server 2007 Environment : Recovering from a Site Failure & Recovering from a Disk Failure
- Recovering from a Disaster in an Exchange Server 2007 Environment : Identifying the Extent of the Problem & Preparing for a More Easily Recoverable Environment
- Windows Server 2008 Server Core : Setting the Environment & Modifying the Hardware Setup
 
 
Most view of day
- Windows Server 2003 on HP ProLiant Servers : Logical Structure Design (part 4) - Group Policy
- Nginx HTTP Server : Basic Nginx Configuration - Configuration file syntax
- Adobe Photoshop CS5 : Letting Camera Raw Auto-Correct Your Photos, Adding Snap to Your Images Using the Clarity Slider
- Windows Phone 7 : Running XNA Projects in Windows (part 3) - Input Differences, Isolated Storage, Application Life Cycle
- Microsoft Exchange Server 2010 : Creating and Managing Accepted Domains (part 2) - Creating Accepted Domains
- Windows Phone 7 Programming Model : Application Data Persistence
- Planning and Designing a Public Key Infrastructure : Creating a Certificate Management Plan
- Sharing Your Computer with Others : Delete an Account
- BizTalk 2006 : Using BizTalk Framework 2.0 Reliable Messaging (part 1)
- Maintaining Dynamics GP : Maintaining updated code by rolling out Service Packs with Client Updates
Top 10
- Windows Phone 8 : Configuring Mailbox Settings (part 5) - Configuring Automatic Replies
- Windows Phone 8 : Configuring Mailbox Settings (part 4) - Lightening the Display,Changing the Mailbox Sync Settings
- Windows Phone 8 : Configuring Mailbox Settings (part 3) - Message Signatures, Blind CCing Yourself
- Windows Phone 8 : Configuring Mailbox Settings (part 2) - Unlinking Mailboxes, Conversation View
- Windows Phone 8 : Configuring Mailbox Settings (part 1) - Linking Mailboxes
- Managing Windows Server 2012 Systems : Configuring Roles, Role Services, and Features (part 6) - Tracking installed roles, role services, and features
- Managing Windows Server 2012 Systems : Configuring Roles, Role Services, and Features (part 5) - Installing components at the prompt
- Managing Windows Server 2012 Systems : Configuring Roles, Role Services, and Features (part 4) - Managing server binaries
- Managing Windows Server 2012 Systems : Configuring Roles, Role Services, and Features (part 3) - Adding server roles and features
- Managing Windows Server 2012 Systems : Configuring Roles, Role Services, and Features (part 2) - Installing components with Server Manager - Viewing configured roles and role services
 
 
Windows XP
Windows Vista
Windows 7
Windows Azure
Windows Server
Windows Phone
2015 Camaro