Quantcast
Channel: Mavention
Viewing all articles
Browse latest Browse all 627

SharePoint 2013 Duplicate Check

$
0
0

In one of our working environments multiple users gather documents from several sources and upload them to a SharePoint 2013 document library. As the users individually add these documents a chance exists that duplicate documents are uploaded. In this blog we will use the SharePoint Search Trim Duplicates feature to detect duplicate items to keep the duplicates that are stored in SharePoint to a minimum.

Ideally we would directly inform the users when a duplicate document is being uploaded. However since the trim duplicate feature is based on search a document needs to be indexed first before this check is possible. In this example we do a periodic check within a timer job to look if a document is indexed by search and then search for possible duplicates.

SharePoint FieldDefinition DuplicateReferenceField

First we will create a Site Column that will contain the information about any duplicates for a specific document. In this example we create a site column with the following Field definition:

<?xmlversion="1.0"encoding="utf-8"?>

<Elementsxmlns="http://schemas.microsoft.com/sharepoint/"> 

  <Field

       ID="{6101d06f-d099-4979-867d-7d1da1261b4a}"

       Name="DuplicateReferenceField"

       DisplayName="Duplicate references"

       Type="Note"

       NumLines="6"

       RichText="TRUE"

       Required="FALSE"

       Group="Custom Site Columns">

    <Default>Duplicate check not completed.</Default>

  </Field>

</Elements>

 This site column will contain the status of the duplicate check and the links to possible duplicates. After the site column is deployed we add it to the default ‘Shared Documents’ library.

SharePoint Managed property

The next thing we will do is create a Managed Property in the Search Schema so that we can use Search to find the items that need to be checked. I created a Managed Property called ‘DuplicateReference’ of type text and mapped this property to ows_DuplicateReferenceField. Make sure you enable Searchable, Queryable and Retrieveble.

SharePoint Timer JobDefinition

In this example we will automatically check if there are duplicates for newly added item. We will do this by creating a new TimerJob. The following code is the SPJobDefinition for the DuplicateChecker timer job.

    classDuplicateCheckerJobDefinition : SPJobDefinition

    {

        public DuplicateCheckerJobDefinition()

            : base()

        {

        }

 

        public DuplicateCheckerJobDefinition(SPWebApplication webApp)

            : base("DuplicateCheckerJobDefinition", webApp, null, SPJobLockType.Job)

        {

            Title = "Duplicate Checker";

        }

    }

We override the Execute method to store the current webapplication and the default url zone in two private members. After that we start our CheckForDuplicates method as shown below.

       privateSPWebApplication _currentWebApp;

       privatestring _defaultUrlZone;

       publicoverridevoid Execute(Guid targetInstanceId)

       {

            base.Execute(targetInstanceId);

 

            _currentWebApp = this.Parent asSPWebApplication;

 

            // Get the default URL for the given webApp

            foreach (SPAlternateUrl url in _currentWebApp.AlternateUrls)

            {

                if (url.UrlZone == SPUrlZone.Default)

                    _defaultUrlZone = url.IncomingUrl;

            }

 

            CheckForDuplicates();

       }

 

The CheckForDuplicates method executes a KeywordQuery to get the newly added items that should be checked for duplicates. In this example it is hardcoded to check the ‘Shared Documents’ library in the defaultUrlZone. 

        privatevoid CheckForDuplicates()

        {

            using (var spSite = newSPSite(_defaultUrlZone))

            {

                using (SPWeb spWeb = spSite.OpenWeb())

                {

                    // Get the duplicates for the given DocId

                    KeywordQuery kq = newKeywordQuery(_currentWebApp.Sites[_defaultUrlZone]);

 

                    kq.QueryText = string.Format(@"PATH:""{0}/{1}/*"" AND DuplicateReference=""Duplicate check not completed.""", _defaultUrlZone, "Shared Documents");

                    kq.SelectProperties.Add("Title");

                    kq.SelectProperties.Add("DocId");

                    kq.SelectProperties.Add("ListItemId");

                    kq.RowLimit = 20;

 

                    ResultTableCollection searchResults = newSearchExecutor().ExecuteQuery(kq);

 

                    foreach (ResultTable rt in searchResults)

                    {

                        foreach (DataRow dr in rt.Table.Rows)

                        {

                            //SPListItem found in search index

                            if (!string.IsNullOrWhiteSpace(dr["DocID"].ToString()))

                            {

                                // The item has been added to the search index.

                                string currentDuplicateReferences = CheckItemForDuplicates(long.Parse(dr["DocID"].ToString()));

 

                                // Add the duplicate references to the SPListItem

                                SPList documentList = spWeb.Lists["Documents"];

                                AddDuplicateReferencesToSPListItem(documentList, int.Parse(dr["ListItemId"].ToString()), currentDuplicateReferences);

                            }

                        }

                    }

                }

            }

        }

 

For each of the items that need to be checked for duplicates another search query will be executed. The DocId property is used to find any duplicate items.

 

        privatestring CheckItemForDuplicates(long docId)

        {

            // Get the duplicates for the given DocId

            ResultTableCollection duplicateItems = GetDuplicateItems(docId);

 

            string currentDuplicateReferences = string.Empty;

            string itemPath = string.Empty;

 

            foreach (ResultTable rt in duplicateItems)

            {

                DataTable dt = rt.Table;

 

                // Create a string with the links to the duplicate files

                foreach (DataRow dr in dt.Rows)

                {

                    if (!string.Equals(docId.ToString(), dr["DocId"].ToString(), StringComparison.OrdinalIgnoreCase))

                    {

                        // Add duplicate to List

                        if (string.IsNullOrWhiteSpace(currentDuplicateReferences))

                            currentDuplicateReferences = dr["Path"].ToString();

                        else

                            currentDuplicateReferences = currentDuplicateReferences + ";" + System.Environment.NewLine + dr["Path"].ToString();

                    }

                }

            }

 

            return currentDuplicateReferences;

        }

 

        privateResultTableCollection GetDuplicateItems(long docId)

        {

            KeywordQuery kq = newKeywordQuery(_currentWebApp.Sites[_defaultUrlZone]);

 

            kq.QueryText = "* AND IsDocument:True";

            kq.TrimDuplicates = true;

            kq.TrimDuplicatesIncludeId = docId;

            kq.SelectProperties.Add("Path");

 

            // Return a max of 2 duplicates (origional is always present in the resultset).

            kq.RowLimit = 3;

 

            returnnewSearchExecutor().ExecuteQuery(kq);

        }

 

        privatestaticvoid AddDuplicateReferencesToSPListItem(SPList documentList, int listItemID, string duplicateReferences)

        {

            // Update the duplicatereferencesfield

            SPListItem item = documentList.GetItemById(listItemID);

 

            if (string.IsNullOrWhiteSpace(duplicateReferences))

                item["DuplicateReferenceField"] = "No duplicates found.";

            else

                item["DuplicateReferenceField"] = duplicateReferences;

 

            // Update the SPListItem without changing the the Modified and Modified By fields.

            item.SystemUpdate();

        }

 

        privatestring GetUrlDefaultZone(SPWebApplication webApp)

        {

            string defaultUrlZone = null;

 

            // Get the default URL for the given webApp

            foreach (SPAlternateUrl url in webApp.AlternateUrls)

            {

                if (url.UrlZone == SPUrlZone.Default)

                    defaultUrlZone = url.IncomingUrl;

            }

 

            return defaultUrlZone;

        }

 

SharePoint 2013 TimerJob DuplicateChecker Instance

Now that we create the DuplicateChecker JobDefinition we can create schedule this Job. We can achieve this with the following SPFeatureReceiver.

   publicclassDuplicateCheckerTimerJobEventReceiver : SPFeatureReceiver

   {

        publicoverridevoid FeatureActivated(SPFeatureReceiverProperties properties)

        {

            SPWebApplication webApp = properties.Feature.Parent asSPWebApplication;

            DeleteDuplicateCheckJob(webApp.JobDefinitions);

 

            DuplicateCheckerJobDefinition duplicateCheckerJobDefinition = newDuplicateCheckerJobDefinition(webApp);

 

            SPMinuteSchedule schedule = newSPMinuteSchedule();

            schedule.BeginSecond = 0;

            schedule.EndSecond = 59;

            schedule.Interval = 5;

 

            duplicateCheckerJobDefinition.Schedule = schedule;

            duplicateCheckerJobDefinition.Update();

        }

 

        publicoverridevoid FeatureDeactivating(SPFeatureReceiverProperties properties)

        {

            SPWebApplication webApp = properties.Feature.Parent asSPWebApplication;

            DeleteDuplicateCheckJob(webApp.JobDefinitions);

        }

 

        privatevoid DeleteDuplicateCheckJob(SPJobDefinitionCollection jobs)

        {

            foreach (SPJobDefinition job in jobs)

            {

                if (job.Name.Equals("Duplicate Checker"))

                    job.Delete();

            }

        }

    }

When the feature is activated the timer job will check the ‘Shared Documents’ library every 5 minutes for items with the value ‘Duplicate check not completed.’ In the DuplicateReferenceField. Keep in mind that this item will need to be indexed first before we can successfully find duplicate documents. If new documents are found the DocId property will be used to find any duplicate items. If any duplicates are found, hyperlinks will be created in the duplicate reference field so that the users can view the detected duplicates.

Testing the DuplicateCheck

First we add a document to the library. This document will get 'Duplicate check not completed.' as default value for the Duplicate references field.

SharePoint Duplicate Check not completed

Now we have to wait until the document is indexed by Search and the duplicate checker timer job is completed. It will show 'No duplicates found' as this is currently the only document in the library.

SharePoint Duplicate Check completed

We will add a second document about other stuff. If all is configured correctly, no duplicates will be found for the second document.

SharePoint Duplicate Check add Unique Document

We will add a duplicate document for each of the previous documents to the library.

SharePoint Duplicate Check add duplicate documents

When the duplicate check timerjob is completed the document library will show the duplicate file references for the found duplicates.

SharePoint Duplicate Check found duplicate documents

 

 


Viewing all articles
Browse latest Browse all 627

Trending Articles