Logo
Windows XP
Windows Vista
Windows 7
Windows Azure
Windows Server
Windows Phone
PREGNANCY
 
 
Windows Server

SharePoint 2010 Search : Setting Up the Crawler - The Search Service Application & Indexing

8/18/2011 4:31:25 PM

1. The Search Service Application

SharePoint 2010 is designed to achieve many business tasks, and a logical structure is important to control and organize all those functions. For this reason, SharePoint is broken into separate services. Many of the essential services delivered by SharePoint are broken into what Microsoft has called service applications, which can control, independently, the different tasks that SharePoint performs. They can also be individually configured for performance and scaling.

The Search components of SharePoint 2010, for many reasons, including scaling, configurability, and performance, are therefore isolated into the Search service application, which is an application layer for configuring the back-end functionality of SharePoint search. Almost all the configuration directly related to the search components is done in the Search service application. However, as we will see, a great deal of supporting configuration may be required in the User Profile service application, the Managed Metadata service, or the Business Data Connectivity service. These services help extend SharePoint 2010 Search to address a variety of business needs.

There are often many ways to get to the same pages in SharePoint. The most direct route is outlined here.

  1. Open Central Administration. On the main page of SharePoint Central Administration, there are eight sections. Under Application Management (as shown in Figure 1), choose "Manage service applications".

    Figure 1. Choose "Manage service applications" from the Application Management menu.
  2. The Service Applications page shows all the service applications running in the SharePoint farm and their status. Scroll down and choose the Search Service Application option (Figure 2).

    Figure 2. The Search Service Application option
  3. The Search Service Application page shows a System Status and a Crawl History section as well as a navigation to the left with four sections: Administration, Crawling, Queries and Results, and Reports. Examine the information in the System Status section. This is the starting point for most Search-related administration tasks.

1.1. Default Content Access Account

SharePoint's crawler requires a user to access content and makes requests to SharePoint and other content sources. It makes standard requests to these content sources much the same way that a user requests content through a browser and waits for a reply. The reply it gets often depends on what user it makes those requests with. Some content sources may restrict access to specific content based on user credentials, and having the wrong user applied to SharePoint's default content access account (Figure 3) can adversely affect the outcome of crawls.

Make sure a user with appropriate permissions to crawl SharePoint is set on the default content access account on the Search Service Application page. This user should have read access to all content that should be crawled. This user should not be an administrator, as documents in an unpublished state would be crawled.

Figure 3. The default content access account

If there are content sources that do not recognize the default content access account, special crawl rules can be created to use a different user for those sources.

2. Indexing

Indexing is the process of collecting data and storing it in a data structure that can be accessed by an application that can query the index and point to data in a database. This data structure is usually called a search index. Some indexes contain all the searchable information. Others, such as SharePoint's, store the words found in the documents and pointers to more information about those documents in another database. In SharePoint the index is held on the query servers, and the document data and data related to the crawler and its administration are held on the database servers. However, for the purpose of this section, we will discuss only indexing as the process to create both the indexes and the related search databases.

SharePoint 2010 can crawl and index a number of different file types and content types from different sources. In this section, we will discuss the different content sources and how to set up the crawler to index each one.

Out of the box, SharePoint can index the following content sources:

  • Web content (HTTP and HTTPS)

  • SharePoint user profile databases

  • Lotus Notes

  • Exchange public folders

  • File shares

  • Business Connectivity Services-connected content

  • Other sources where a connector is provided (e.g., Documentum)

These different sources can be divided into two different types: structured and unstructured content.

2.1. Structured Content

Structured content is content that has a defined structure that can generally be queried to retrieve specific items. Relational databases, such as Microsoft SQL Server, are structures that allow their content to be retrieved if you know the row and column ID of the cell where that data sits. Databases allow their content to be retrieved if the user or the user interface knows how to acquire the location of the data. Most relational databases have their own indices to help locate these IDs. These are generally not very performant and do not support free text search well. A search engine database structure will perform much better at finding all of the occurrences of a particular term in a timely manner.

When we marry unstructured and structured content or even two disparate structured content sources, we lose the ability to simply look up cell IDs to find the specific data. Additionally, different databases' indices seldom, if ever, work together. This is where a search engine becomes crucial. SharePoint's search components can index both unstructured and structured content, store them together, return them in a homogenized result set, filter based on determined metadata, and lead the end user to the specific source system.

SharePoint 2010 has a powerful feature for indexing structured content. This feature, called Business Connectivity Services, allows administrators to define connectors to structured data sources and index the content from them in a logical and organized manner, making that data searchable and useful from SharePoint.

BCS is capable of collecting content out of the box from

  • MS SQL Databases

  • .Net assemblies

Additionally, custom connectors can be created to allow it to index almost any other content source, including

  • Other databases

  • Line-of-business applications such as Seibel and SAP

  • Other enterprise resource planning (ERP) systems

  • Many other applications and databases

2.2. Unstructured Content

Unstructured content refers to content that is not set in a strict structure such as a relational database. Unstructured content can be e-mails, documents, or web pages. Unstructured content is the biggest challenge for searching as it requires the search engine to look for specific terms across a huge corpus of free text. Unstructured search is often referred to as "free text" search.

Out of the box, SharePoint 2010 can index the following unstructured content sources:

  • SharePoint sites

  • Lotus Notes sites

  • File shares

  • Exchange public folders

  • External and internal web sites

  • Other sources where a connector is available

Other -----------------
- Microsoft Lync Server 2010 Front End : Administration & Troubleshooting
- Microsoft Lync Server 2010 Front End : Configuration
- Microsoft Dynamic NAV : Rapid Implementation Methodology
- Managing stylesheets in Dynamics NAV
- Exchange Server 2010 : Mastering Mobile Device and Wireless Access Essentials & Mastering Remote Mail and Outlook Anywhere Essentials
- Exchange Server 2010 : Managing Mobile Messaging Users - Mastering Outlook Web App Essentials
- Microsoft SQL Server 2008 Analysis Services : Designing More Complex Dimensions - Grouping and Banding
- Microsoft SQL Server 2008 Analysis Services : Building a Simple Cube
- Migrating to Windows Small Business Server 2011 Standard : Preparing Your Server (part 4) - Running the Migration Preparation Tool
- Migrating to Windows Small Business Server 2011 Standard : Preparing Your Server (part 3) - Best Practices Analyzer & Optimize Exchange Mailboxes
- Migrating to Windows Small Business Server 2011 Standard : Preparing Your Server (part 2) - Install Router, Firewall & Configuring Active Directory
- Migrating to Windows Small Business Server 2011 Standard : Preparing Your Server (part 1) - Network Configuration
- Microsoft Dynamics CRM 2011 : Adding Target Products and Sales Literature
- Microsoft Dynamics CRM 2011 : Selecting Target Marketing Lists
- Windows Server 2008 R2 : Administer Group Policy (part 2) - Use the Group Policy Management Editor
- Windows Server 2008 R2 : Administer Group Policy (part 1) - Use the Group Policy Management Console
- Microsoft Dynamics AX 2009 : The MorphX Tools - Table Browser Tool & Find Tool
- Microsoft Dynamics AX 2009 : The MorphX Tools - Visio Reverse Engineering Tool
- Windows Server 2003 : Planning Fault Tolerance and Avoidance (part 2) - Disk Arrays
- Windows Server 2003 : Planning Fault Tolerance and Avoidance (part 1) - Protecting the Power Supply
 
 
Most view of day
- Windows Server 2012 Administration : Configuring Sites (part 2) - Creating a Site - Adding Domain Controllers to Sites
- Sharepoint 2013 : Health Monitoring and Disaster Recovery - SharePoint Farm Design
- Microsoft Exchange Server 2007 : Load Balancing in Exchange Server 2007
- Windows Phone 7 : The Silverlight Controls (part 9) - Layout Controls - ScrollViewer Controls
- Working in the Background : PROVIDING POWER MANAGEMENT (part 1) - Getting the Power Management State
- Using Voice and Sounds : Letting Your Computer Do the Talking, Creating a Sound File
- Managing Windows Licensing and Activation : Managing Volume License Activation (part 2) - Leveraging MAK activation, Comparing KMS and MAK activation
- Microsoft Excel 2010 : Inserting Blank Rows (part 1) - Separating Subtotaled Rows for Print
- Microsoft Dynamics AX 2009 : Integration with Microsoft Office - Reading Excel files
- Microsoft Dynamic CRM 4 : Data Migration (part 1) - Scribe Workbench - Source and Target Definitions, Source Configuration
Top 10
- Windows Phone 8 : Orientation and the PhoneApplicationPage Class - Setting Page Orientation at Runtime
- Windows Phone 8 : Orientation and the PhoneApplicationPage Class - PhoneApplicationPage Orientation Property
- Using the Windows 7 Libraries : USING THE EXPLORER BROWSER CONTROL (part 2)
- Using the Windows 7 Libraries : USING THE EXPLORER BROWSER CONTROL (part 1) - Adding the Explorer Browser to Your Toolbox , Configuring the Explorer Browser Example
- Using the Windows 7 Libraries : CONSIDERING USER-DEFINED COLLECTIONS
- Using the Windows 7 Libraries : USING NON-FILESYSTEM CONTAINERS
- Using the Windows 7 Libraries : WORKING WITH KNOWN FOLDERS
- Microsoft Exchange Server 2007 : Implementing Client Access and Hub Transport Servers - Installing the Hub Transport Server
- Microsoft Exchange Server 2007 : Implementing Client Access and Hub Transport Servers - Transport Pipeline
- Microsoft Exchange Server 2007 : Hub Transport Server Policy Compliance Features (part 4) - Message Classification , Rights Management and the Hub Transport Server
 
 
Windows XP
Windows Vista
Windows 7
Windows Azure
Windows Server
Windows Phone
2015 Camaro