Data retention and deletion…I’m sure this is a something that anyone involved in Office 365, SharePoint on information management in general gets fed up of saying since the recent GDPR legislation!
Recently we have been rationalising and cleaning up our data in preparation for moving to Office 365. We are starting with SharePoint as the first target repository or silo of content.
The general consensus is to delete files and folders over 7 years old unless there is a pre-existing data retention policy to adhere to. So the next task is to identify those files that fall within our threshold, and ultimately delete.
Luckily, we have Tree Size Pro and ShareGate so I was able to relatively easily identify the files in question (there were a lot!).
As our SharePoint environment is a) rather full; and b) rather old, I made the decision to incrementally delete files rather than en-masse to mitigate risk, targeting the lists/libraries containing the most out of date content. I started by creating a view in the first library – library A with the following parameters:
- Standard library view
- Filtered by Created Date if less than or equal to 01/01/2011
- Folders or Flat: Show items inside folders
Show this view: In all folders
(all other settings are left default)
Results this returned looked good, I could see folders and files in this view that matched the criteria – brilliant! Based on my previous statement I decided to delete in batches out of working hours, again to mitigate risk. I deleted first from library A, then from the first stage and finally from the second stage recycle bin all in this fashion.
I had permanently deleted around 50% of the total volume of content to be deleted from library A when we started to receive reports of current files being ‘missing’ from library A…not a good day.
After these reports were investigated they were indeed true. It turns out that when folders are included within a library view, folders that match the filter will be shown in the view, regardless of whether the files inside match.
We tested the view exluding folders and all the files returned matched the filter criteria. The same results were demonstrated from a SharGate report of the same nature. The report of all files over 7 years old brought back folders over 7 years old, but they also contained files that were newer.
At present, we are not entirely sure as to why these filters are not able to drill down past a top-level folder. It appears to be difficult to specify via view settings to only show files within folders, including the folder itself that matches the criteria.
We have decided to omitt folders from our reports and views going forward and to solely focus on files as this is the most reliable way we can delete files.
Bonus: for those of you with ShareGate, heres an example of my report we created to bring back all files over 7 years old, excluding folders. I ran this report across the entire intranet application over a weekend and it worked a treat 🙂