Full-text indexing in SharePoint with the Text IFilter

There is a lot of information out there about indexing PDF files in SharePoint, but not much on indexing other files that are really just text files, but don’t have an IFilter associated with them by default.

I found one great blog post at Adventures in SPWonderland that goes through the entire process of adding your source code indexed by SharePoint, but so I have it handy, I’m rehashing the part I really care about which is just getting the file extension recognized in SharePoint and assigning the text parser IFilter tquery.dll assigned to the new extension for full-text queries.

Adding the extension to the File Types

  1. Open your Shared Services Administration and under the Search section, click on Search Settings
  2. Click on File Types

    FileTypes

  3. Click New File Type
  4. Type the extension you are adding, if you are adding *.cs files, you would type cs
    (Note:  Do not include the period in front of the name extension)
  5. Click OK

Assigning the text IFilter to your extension

This next step is done to tell the SharePoint Server service which IFilter to use when it is indexing your new file type.

  1. Open the registry editor (Start –> Run –> Regedit.msc)
  2. Navigate to the key below:
    HKEY_LOCAL_MACHINESOFTWAREMicrosoftOffice Server12.0SearchSetupContentIndexCommonFiltersExtension
  3. Add a new key with the name of your extension, continuing with the .cs we used above, we will add a new key called .cs
  4. Change the default value to be {4A3DD7AB-0A6B-43B0-8A90-0D8B0CC36AAB}. 

registry setting

Finishing up

  1. Start and stop your Office SharePoint Server Search service, either through your services snap-in or by running the two commands below
    net stop osearch
    net start osearch
  2. Now to update your index by performing a full crawl on any content sources that contain the types of files you just added this new capability to.

C# Extension Method Example

I was telling a friend of mine about extension methods and created this one not much later, so I thought I’d post it, for him and anybody else that it could help.
 
This first snippet of code is a static class that has my extension method called GetElementName defined inside of it.  GetElementName takes two arguments.  The first is defined with “this XElement parentElement”.  The “this” keyword is there to specify the type for which the method is defined (in this case XElement).  The second parameter is defined with “string name” which is the name of the child element I’m looking for.
 

    4     /// <summary>

    5     /// Gets the case-sensitive LocalName of a child element.

    6     /// </summary>

    7     /// <param name="parentElement">XElement object to search through</param>

    8     /// <param name="name">The (case insensitive) LocalName of the child element you are looking for</param>

    9     /// <returns>String containing the LocalName of the XName child elements</returns>

   10     public static class MyCustomExtensions

   11     {

   12         public static string GetChildElementName(this XElement parentElement, string name)

   13         {

   16             var children = from c in parentElement.Elements()

   17                            where c.Name.LocalName.ToLower() == name.ToLower()

   18                            select c.Name;

   19             if (children.Count() > 0)

   20                 { return children.First().LocalName; }

   21             else

   22                 { return String.Empty; }

   23         }

   24     }

 

Now to use the extension method, I can just use the GetChildElementName method like it was always a part of the XElement class as long as I have a reference to the namespace that the MyCustomExtensions class is inside of.  I’ll have intellisense available for me and everything.

   15             File Defined somewhere as test.xml

   16             <Test>

   17                 <Path>http://www.google.com</Path>

   18                 <Path>http://www.bing.com</Path>

   19                 <Path>http://www.yahoo.com</Path>

   20             </Test>

   21             */

   22             XElement myTestElement = XElement.Load("test.xml");

   23 

   24             // Use GetChildElementName() to get the name of "Path" in case I can’t trust what case was used

   25             string name = myTestElement.GetChildElementName("path");

   26             IEnumerable<XElement> children = myTestElement.Elements(name);