<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pingback="http://madskills.com/public/xml/rss/module/pingback/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" version="2.0">
  <channel>
    <title>Adventures in SPWonderland. - WTF</title>
    <link>http://blogs.flexnetconsult.co.uk/colinbyrne/</link>
    <description>Taking apart and putting back together</description>
    <language>en-us</language>
    <copyright>Colin Byrne</copyright>
    <lastBuildDate>Wed, 22 Feb 2012 22:10:42 GMT</lastBuildDate>
    <generator>newtelligence dasBlog 2.0.7226.0</generator>
    <managingEditor>webparts@flexnetconsult.co.uk</managingEditor>
    <webMaster>webparts@flexnetconsult.co.uk</webMaster>
    <item>
      <trackback:ping>http://blogs.flexnetconsult.co.uk/colinbyrne/Trackback.aspx?guid=dff8f41f-541a-41e2-a465-4d020ccea683</trackback:ping>
      <pingback:server>http://blogs.flexnetconsult.co.uk/colinbyrne/pingback.aspx</pingback:server>
      <pingback:target>http://blogs.flexnetconsult.co.uk/colinbyrne/PermaLink,guid,dff8f41f-541a-41e2-a465-4d020ccea683.aspx</pingback:target>
      <dc:creator>Colin Byrne</dc:creator>
      <wfw:comment>http://blogs.flexnetconsult.co.uk/colinbyrne/CommentView,guid,dff8f41f-541a-41e2-a465-4d020ccea683.aspx</wfw:comment>
      <wfw:commentRss>http://blogs.flexnetconsult.co.uk/colinbyrne/SyndicationService.asmx/GetEntryCommentsRss?guid=dff8f41f-541a-41e2-a465-4d020ccea683</wfw:commentRss>
      <slash:comments>1</slash:comments>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
So we’re setting up a crawl of a TWiki site as one source in a suite of content sources.
</p>
        <p>
So far so good, once the authentication was sorted we noticed a problem, only the
root Url of the site was getting crawled. 
</p>
        <p>
Various ideas were thrown around about nofollow and noindex attributes but we couldn’t
find anything wrong with our configuration and nothing seemed to fit the problem.
</p>
        <p>
I noticed that this particular TWiki installation was case sensitive to Urls by accident
(thought those days were gone, configurable apparently) and that got me thinking. 
</p>
        <p>
By kicking a crawl off i noticed that SharePoint was requesting lower case urls from
the Site for every link on the home page getting a 404 and stopping. 
</p>
        <p>
Ok penny drops but why is SharePoint sending a lower case url, well… this is by design
as part of the crawler’s normalization of urls (<a href="http://blogs.msdn.com/b/enterprisesearch/archive/2010/07/09/crawling-case-sensitive-repositories-using-sharepoint-server-2010.aspx">http://blogs.msdn.com/b/enterprisesearch/archive/2010/07/09/crawling-case-sensitive-repositories-using-sharepoint-server-2010.aspx</a>) 
</p>
        <p>
In 2010 if you’re setting up a crawl rule that checkbox you’ve ignored called Match
Case (badly named surely Preserve Url Casing would get the point across better) just
needs to be set and viola the crawler will preserve the case of Urls it requests.
</p>
        <img width="0" height="0" src="http://blogs.flexnetconsult.co.uk/colinbyrne/aggbug.ashx?id=dff8f41f-541a-41e2-a465-4d020ccea683" />
      </body>
      <title>Case Sensitive MindF**k Part 1 SharePoint not indexing TWiki</title>
      <guid isPermaLink="false">http://blogs.flexnetconsult.co.uk/colinbyrne/PermaLink,guid,dff8f41f-541a-41e2-a465-4d020ccea683.aspx</guid>
      <link>http://blogs.flexnetconsult.co.uk/colinbyrne/2012/02/22/CaseSensitiveMindFkPart1SharePointNotIndexingTWiki.aspx</link>
      <pubDate>Wed, 22 Feb 2012 22:10:42 GMT</pubDate>
      <description>&lt;p&gt;
So we’re setting up a crawl of a TWiki site as one source in a suite of content sources.
&lt;/p&gt;
&lt;p&gt;
So far so good, once the authentication was sorted we noticed a problem, only the
root Url of the site was getting crawled. 
&lt;/p&gt;
&lt;p&gt;
Various ideas were thrown around about nofollow and noindex attributes but we couldn’t
find anything wrong with our configuration and nothing seemed to fit the problem.
&lt;/p&gt;
&lt;p&gt;
I noticed that this particular TWiki installation was case sensitive to Urls by accident
(thought those days were gone, configurable apparently) and that got me thinking. 
&lt;/p&gt;
&lt;p&gt;
By kicking a crawl off i noticed that SharePoint was requesting lower case urls from
the Site for every link on the home page getting a 404 and stopping. 
&lt;/p&gt;
&lt;p&gt;
Ok penny drops but why is SharePoint sending a lower case url, well… this is by design
as part of the crawler’s normalization of urls (&lt;a href="http://blogs.msdn.com/b/enterprisesearch/archive/2010/07/09/crawling-case-sensitive-repositories-using-sharepoint-server-2010.aspx"&gt;http://blogs.msdn.com/b/enterprisesearch/archive/2010/07/09/crawling-case-sensitive-repositories-using-sharepoint-server-2010.aspx&lt;/a&gt;) 
&lt;/p&gt;
&lt;p&gt;
In 2010 if you’re setting up a crawl rule that checkbox you’ve ignored called Match
Case (badly named surely Preserve Url Casing would get the point across better) just
needs to be set and viola the crawler will preserve the case of Urls it requests.
&lt;/p&gt;
&lt;img width="0" height="0" src="http://blogs.flexnetconsult.co.uk/colinbyrne/aggbug.ashx?id=dff8f41f-541a-41e2-a465-4d020ccea683" /&gt;</description>
      <comments>http://blogs.flexnetconsult.co.uk/colinbyrne/CommentView,guid,dff8f41f-541a-41e2-a465-4d020ccea683.aspx</comments>
      <category>SharePoint 2010</category>
      <category>WTF</category>
    </item>
    <item>
      <trackback:ping>http://blogs.flexnetconsult.co.uk/colinbyrne/Trackback.aspx?guid=82d9774a-669d-4c7c-aa7a-382acf8bc9c9</trackback:ping>
      <pingback:server>http://blogs.flexnetconsult.co.uk/colinbyrne/pingback.aspx</pingback:server>
      <pingback:target>http://blogs.flexnetconsult.co.uk/colinbyrne/PermaLink,guid,82d9774a-669d-4c7c-aa7a-382acf8bc9c9.aspx</pingback:target>
      <dc:creator>Colin Byrne</dc:creator>
      <wfw:comment>http://blogs.flexnetconsult.co.uk/colinbyrne/CommentView,guid,82d9774a-669d-4c7c-aa7a-382acf8bc9c9.aspx</wfw:comment>
      <wfw:commentRss>http://blogs.flexnetconsult.co.uk/colinbyrne/SyndicationService.asmx/GetEntryCommentsRss?guid=82d9774a-669d-4c7c-aa7a-382acf8bc9c9</wfw:commentRss>
      <body xmlns="http://www.w3.org/1999/xhtml">
        <p>
Picture this, you're a IT bod working for Thomson Financial, you've got I don't know
how many sites subscribing to your global financial news feed, you need to check that
your latest update is working, so you key a test message in the system, hmm that's
funny nothing coming out on the test feed reader.
</p>
        <p>
Try a second one, and a third and you just keep on going... 
</p>
        <p>
Now you did check its the test system you were logged in to?
</p>
        <p>
 
</p>
        <p>
          <a href="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/Test%20Please%20Ignore_2.jpg">
            <img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="225" alt="Test Please Ignore" src="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/Test%20Please%20Ignore_thumb.jpg" width="660" border="0" />
          </a>
        </p>
        <p>
 
</p>
        <p>
Its even harder to ignore on some other sites...looks like they might have lost ignore
1 and ignore 2 though, maybe they meant these to go around the world :-)
</p>
        <p>
 
</p>
        <p>
          <a href="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/ThisIsATest2%20-%20Copy_2.jpg">
            <img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="484" alt="ThisIsATest2 - Copy" src="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/ThisIsATest2%20-%20Copy_thumb.jpg" width="459" border="0" />
          </a>
        </p>
        <img width="0" height="0" src="http://blogs.flexnetconsult.co.uk/colinbyrne/aggbug.ashx?id=82d9774a-669d-4c7c-aa7a-382acf8bc9c9" />
      </body>
      <title>Some tests are really hard to ignore</title>
      <guid isPermaLink="false">http://blogs.flexnetconsult.co.uk/colinbyrne/PermaLink,guid,82d9774a-669d-4c7c-aa7a-382acf8bc9c9.aspx</guid>
      <link>http://blogs.flexnetconsult.co.uk/colinbyrne/2008/03/08/SomeTestsAreReallyHardToIgnore.aspx</link>
      <pubDate>Sat, 08 Mar 2008 19:47:50 GMT</pubDate>
      <description>&lt;p&gt;
Picture this, you're a IT bod working for Thomson Financial, you've got I don't know
how many sites subscribing to your global financial news feed, you need to check that
your latest update is working, so you key a test message in the system, hmm that's
funny nothing coming out on the test feed reader.
&lt;/p&gt;
&lt;p&gt;
Try a second one, and a third and you just keep on going... 
&lt;/p&gt;
&lt;p&gt;
Now you did check its the test system you were logged in to?
&lt;/p&gt;
&lt;p&gt;
&amp;nbsp;
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/Test%20Please%20Ignore_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="225" alt="Test Please Ignore" src="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/Test%20Please%20Ignore_thumb.jpg" width="660" border="0"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
&amp;nbsp;
&lt;/p&gt;
&lt;p&gt;
Its even harder to ignore on some other sites...looks like they might have lost ignore
1 and ignore 2 though, maybe they meant these to go around the world :-)
&lt;/p&gt;
&lt;p&gt;
&amp;nbsp;
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/ThisIsATest2%20-%20Copy_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="484" alt="ThisIsATest2 - Copy" src="http://blogs.flexnetconsult.co.uk/colinbyrne/content/binary/WindowsLiveWriter/Sometestsarereallyhardtoignore_11653/ThisIsATest2%20-%20Copy_thumb.jpg" width="459" border="0"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;img width="0" height="0" src="http://blogs.flexnetconsult.co.uk/colinbyrne/aggbug.ashx?id=82d9774a-669d-4c7c-aa7a-382acf8bc9c9" /&gt;</description>
      <comments>http://blogs.flexnetconsult.co.uk/colinbyrne/CommentView,guid,82d9774a-669d-4c7c-aa7a-382acf8bc9c9.aspx</comments>
      <category>WTF</category>
    </item>
  </channel>
</rss>