Logo
programming4us
programming4us
programming4us
programming4us
Home
programming4us
XP
programming4us
Windows Vista
programming4us
Windows 7
programming4us
Windows Azure
programming4us
Windows Server
programming4us
Windows Phone
 
Windows Server

SharePoint 2010 Search : Setting Up the Crawler - Using Crawl Rules

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
9/16/2011 5:35:41 PM
SharePoint 2010's crawler communicates with the content sources that are defined in a very standardized manner. It indexes the content as the user that it is specified as and collects information from all the links that are specified. If subfolders are set to be indexed, it will navigate to those folders, collect the links, and gather the content. It is not always desirable or possible, however, to have SharePoint crawl the content sources in the same way with the same accounts. Therefore, SharePoint 2010 has a powerful feature to specify rules for given paths that may be encountered during crawling. These rules can include or exclude specific content as well as pass special user credentials to those specific items in order to gather them correctly.

Crawl rules are applied in the Search service application on the Crawl Rules page, which is under the Crawler section of the left-hand navigation. Adding a new crawl rule is as easy as navigating to the Crawl Rules page and selecting new crawl rule. Because regular expressions and wildcard rules can be applied, a testing feature is made available on the Crawl Rules page. This feature will allow a particular address to be entered and tested to see if there is a rule already designated that affects the crawling of this address. Since many rules can be applied and the effect of rules is not always obvious, this testing feature is very useful (Figure 1). If a page is not being crawled, administrators are encouraged to check for conflicting rules.

Figure 1. Testing a crawl rule

To add a crawl rule, navigate to the Search service application and choose Crawl Rules in the left-hand navigation under Crawler. On the Crawl Rules page, select New Crawl Rule. On the Add Crawl Rule page, paths can be added to either explicitly exclude or include. Wildcards or regular expressions can be used to create complicated inclusion or exclusion rules. This gives a powerful way to find undesirable or desirable content and make sure it is or isn't crawled.

Adjusting the crawler with crawl rules can go a long way toward improving the relevance and quality of the search result set. All too often, search result lists are polluted with unnecessary or irrelevant content. Setting this content in crawl rules to be excluded from the crawl can help to remove unnecessary documents from the crawl database and consequently the result lists. Some typical examples of this are documents of a certain type or in a certain location. Although many serious scenarios can be imagined where documents with a certain file name or in a certain path need to be excluded, one of the most common situations is when crawling a public web site with print versions for each page. Setting a crawl rule to set the print version (e.g., print=true pattern in URL) can easily allow these to be removed from the crawled content and remove this noise. Some simple inspections of the search results and the patterns in URLs on the content source sites will help to determine what kinds of rules are appropriate.

1. Using Regular Expression in Crawl Rules

SharePoint 2010 has the added feature of supporting regular expressions in crawl rules. The administrator must be sure to select the "Match regular expressions" check box and formulate the expressions properly, but this feature opens vast new possibilities for controlling what is crawled and what isn't.

SharePoint 2010 supports the following regular expression operators listed in Tables 1 through 3.

Table 1. Acceptable Grouping Operators in SharePoint 2010
OperatorSymbolDescriptionExampleValid matchInvalid match
Group()Parentheses will group sets of characters. Operators for the group will be applied to the entire group.   
Disjunction|This pipe operator two expressions and returns true when only one is valid. It is a logical OR.\\prosharepointshare\((share1)|(share2))\.*\\prosharepointshare\share1\<files> OR \\prosharepointshare\share2\<files>\\myshare\share1share2\<files>

Table 2. Acceptable Matching Operators in SharePoint 2010
OperatorSymbolDescriptionExampleValid matchInvalid match
Match any.The period or dot operator matches any character. It will not match with a null character, which means the number of dots should correspond to the number of characters matched.http://prosharepointsearch/default.as.http://prosharepointsearch/default.aspxhttp://prosharepointsearch/default.asp
Conditional match?The expression can be tested to either exist or not. It will not expand the expression.http://prosharepointsearch/default(1)?.htmlhttp://prosharepointsearch/default.aspxhttp://prosharepointsearch/default1.aspx AND http://prosharepointsearch/default11.aspx
Wildcard match*. A single character can either exist or repeatedly exist based on the operator's expansionhttp://prosharepointsearch/default(1)*.aspxhttp://prosharepointsearch/default.aspxhttp://prosharepointsearch/default111.aspx AND http://prosharepointsearch/def.aspx
Match one or more times+It requires the expression on which it is applied to exist in the target address at least once.http://prosharepointsearch/default(1)+.aspxhttp://prosharepointsearch/default1.aspxhttp://prosharepointsearch/default111.aspx AND http://prosharepointsearch/default.aspx
List match[<list of chars>]This operator is a list of characters inside square brackets "[]". It matches any characters in the list. A range of characters can be specified using the hyphen "-" operator between the characters.http://prosharepointsearch/page[1-9].htmhttp://prosharepointsearch/page1.htm OR http://prosharepointsearch/page2.htm OR http://prosharepointsearch/page3.htm OR ... 

Table 3. Acceptable Count Operators in SharePoint 2010
OperatorSymbolDescriptionExampleValid matchInvalid match
Exact count{num}This operator is a number inside curly brackets"{}", e.g., {1}. It limits the number of times a specific match may occur.http://prosharepointsearch/(1){5}-(0){3}.aspxhttp://prosharepointsearch/11111-000.aspxhttp://prosharepointsearch/111-00.aspx
Min count{num,}This operator is a number inside curly brackets "{}" followed by a comma "," e.g., {1,}. It limits the number of repetitions a specific match can have and places a minimum amount on that match.http://prosharepointsearch/(1){5,}-(0){2}.aspxhttp://prosharepointsearch/11111-00.aspx AND http://prosharepointsearch/11111-00.aspxhttp://prosharepointsearch/1111-00.aspx
Range count{num1, num2}This operator holds two numbers inside curly brackets"{}" separated by a comma "," e.g., {4,5}. The first number defines a lower limit, and the second number defines an upper limit. It limits the number of repetitions in a URL between the two values, num1 and num2. The first number should always be lower than the second to be valid.http://prosharepointsearch/(1){4}-(0){2,3}.aspxhttp://prosharepointsearch/1111-00.aspx AND http://prosharepointsearch/1111-000.aspxhttp://prosharepointsearch/9999-0000.aspx
Disjunction|This pipe operator is applied between two expressions and returns true when only one is valid. It is a logical OR.\\proshare pointshare\((share1)| (share2))\.*\\proshare pointshare\share1\<files> OR \\proshare pointshare\share2\<files>\\myshare\share1share2\<files<
List [<li]st of chars>]This operator is a list of characters inside square brackets "[]". It matches any characters in the list. A range of characters can be specified using the hyphen "-" operator between the characters.http://prosharepointsearch/page[1-9].htmhttp://prosharepointsearch/page1.htm OR http://prosharepointsearch/page2.htm OR http://prosharepointsearch/page3.htm OR ... 

When adding regular expressions to match crawl paths, it is important to know that the protocol part (e.g., http://) of the path cannot contain regular expressions. Only parts of the path after the defined protocol may contain regular expressions. If the protocol is excluded, SharePoint will add http:// to the hostname and any attempts at regular expressions.

By default regular expression matches are not case-sensitive. Additionally, SharePoint 2010's crawler normalizes all discovered links by converting them to lowercase. If it is necessary to match case or use regular expressions to exclude documents based on character case in the path, the "Match case" check box should be checked. Otherwise, leave it empty. It may be necessary to match case if crawling Apache-driven web sites where pages are case-sensitive, Linux-based file shares, or content from Business Connectivity Services that preserves case. Creating crawl rules for case-sensitive file types allows them to be crawled and recognized as unique.

2. Using Crawl Rules to Grant Access

Crawl rules can also be used to grant access to specific content or parts of content by defining the user that will crawl that content. Generally, the crawler should be given full read access to content and allow SharePoint's permissions filtering to determine what users can see.

NOTE

Be careful when applying blanket permissions across large document repositories. Although giving the SharePoint crawler read access to everything is usually a good idea in well-managed SharePoint sites, doing it on other systems can often expose security risks such as documents without correct permissions that are never found solely due to obscurity. A search engine is a great tool for finding things, even those best left hidden.

It is also possible and sometimes necessary to define a special user for indexing external sites or independent systems such as file shares or Exchange. In these cases, a special user with read access to the content can be defined in the crawl rules. For example, if indexing Exchange public folders, a separate user can be defined to allow read-only access to those folders. This user can be set in crawl rules to be the user to index that content, thereby protecting other Exchange content from unauthorized crawling (Figure 2).

Figure 2. Specifying a crawl rule that applies a specific user to the crawler


Other -----------------
- Windows Small Business Server 2011 : Set Up Your Internet Address (part 2) - Add a Trusted Certificate
- Windows Small Business Server 2011 : Set Up Your Internet Address (part 1) - Registering a New Domain Name & Using an Existing Domain Name
- Windows Small Business Server 2011 : Connect to the Internet
- Microsoft Dynamics GP 2010 : Automating Dynamics GP - Automating processes with Macros
- Microsoft Dynamics GP 2010 : Automating Dynamics GP - Speeding up month-end close by Reconciling Bank Accounts daily
- Microsoft Dynamics CRM 2011 : Associating a Marketing List to a Campaign Activity
- Microsoft Dynamics CRM 2011 : Creating a Campaign Activity
- Understanding and Installing Active Directory Certificate Services (part 2) - New AD CS Features in Windows Server 2008 R2 & Installing AD CS
- Understanding and Installing Active Directory Certificate Services (part 1) - Understanding AD CS
- Microsoft Dynamics AX 2009 : The MorphX Tools - Version Control (part 2)
 
 
Top 10
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
- Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
- Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
programming4us programming4us
Popular tags
Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 windows Phone 7 windows Phone 8
programming4us programming4us
 
programming4us
Natural Miscarriage
programming4us
Windows Vista
programming4us
Windows 7
programming4us
Windows Azure
programming4us
Windows Server
programming4us
Game Trailer