A thorough explanation on how to Filter when indexing, either manually or using AVAIL Stream
Table of Contents
- Overview
- Four Components of a Filter
- Filtering Specifications and Examples
- Advanced Filtering: Using AND and OR (Boolean Relationships) and Filter Groups
Overview
AVAIL’s Content Filters are a very powerful way of culling away unwanted content being indexed into AVAIL via both manual indexing and AVAIL Stream. However, with this power comes some complexity as well. The purpose of this guide is to show what filters are capable of, while also providing a general framework of behavior. By the end you should have everything you need to start building your own sets of filters, further enhancing your ability to manage content in AVAIL.
Before we dive in, we should ground ourselves with a quick definition of what exactly a filter is. In essence, a filter is a rule that your files and folders will be checked against before being indexed into your channel. For any filter, a file either matches or does not match. The filter itself defines how a file or folder is matched and whether being a match means the file will or won’t be indexed, as we’ll get into.
Four Components of a Filter
A single filter can be broken down into four different components, which all work together to form a unified filter. These different components are:
- Type
- Scope
- Specification
- Condition
Type
There are two main types of filters; Inclusions and Exclusions. As the name suggests, an inclusion filter specifies a rule that must be met for a file to be added to your channel. In contrast, an exclusion filter specifies a rule that, if met, will prevent a file from being added to your channel.
Important Note:
An Inclusion = "only include the following..."
An Exclusion = "include everything except the following..."
Whenever a file is being checked against filters, it is checked against the inclusion filters first to see if it something that should be included in your channel. If the file is deemed to be a match by the inclusion filters, the file will then be checked against the exclusion filters. If the file is deemed to be a match by the exclusion filters, then it will not be added to your AVAIL channel. In a sense, the exclusion filters cull the set of files that manage to pass through the inclusion filters. For this reason, it makes sense to manage the filters in separate buckets based on their type.
Scope
The Scope indicates whether a filter will be applied to files or folders (and all of the files within the folder). A file filter will only look at the filename of an incoming Stream change or indexing operation, ignoring its parent folders. A folder filter looks at all of the parent folders that a file lives in.
For Stream, when a change to a folder happens, that change is only checked against folder filters. File filters are irrelevant to folder changes and will therefore be ignored for folder changes. However, in the case of a file change (or when manually indexing a file), both folder filters and file filters are relevant. The file name will be checked against all of the file filters and the file’s parent folders will be checked against all of the folder filters.
Specification & Condition
The Specification determines what methodology a filter will use to determine whether a Stream change or file indexing operation is a match.
The Condition is a string of text which tells the Specification what it should be looking to match on for incoming files or folders. The condition is left to the user to define what it is. Conditions are not case sensitive.
Below, we'll walk through some examples to provide further clarification.
Filtering Specifications and Examples
All of these specifications are available for both file and folder filters. In the case of a file filter, these specifications will only look at the name of the file without its file extension. The Regular Expression specification is an exception to this rule and will look at a file’s name including its file type extension.
Containing
This specification looks for a condition match anywhere in the name of the folder/file.
The above example will include any files whose names contain the text string “avail”. Each of the files on the left contain the string “avail”, thus all of them would match and be included.
Since filters are case insensitive, the different casings of AVAIL within each file name wouldn’t prevent any of these files from matching this filter.
Begins With
This specification looks for a condition match at the beginning of the name of the folder/file.
Using the example from the previous section, if that filter was instead set to use the Begins With specification, only the second of these three files would be included, as it’s the only file of the 3 that begins with the string “avail”.
Ends With
This specification looks for a condition match at the end of the name of the file/folder.
Continuing with the same example, if we switched the specification to be Ends With only the first of the three files would be included, as it’s the only file of the 3 that ends with the string “avail”.
Exactly Matching
This specification looks for a condition that exactly matches the name of the file/folder.
Continuing with the same example, if we switched the specification to be Exactly Matching none of the three files would be included as none of their names exactly match “avail”.
Regular Expression
This specification allows you to define your own search pattern by utilizing the standardized regular expression rules. To understand this specification, it’s best to do independent research on what the regular expression is capable of and how you might be able to use it to suit your needs. For absolute beginners, we recommend https://regexone.com/. Once you have a basic understanding of regular expressions, the website https://regexr.com/ is incredibly useful for building and testing any patterns you might need, and also has an abundance of reference material if you get stuck.
The above example is a method for preventing Revit backups from being indexed.
Important Note: Unlike the other folder and file specifications, Regular Expression matches on file names including the file type extension.
File Only Specifications
These specifications are only available for file filters.
File Type
This specification looks for files with a certain extension on them. The use of the “.” character is acceptable but not required (For instance, “txt” and “.txt” will both match on text files).
File Size
This specification will match based on a file’s size in bytes. The condition for this specification
The File Size specification can use a number of conditional operators for comparing a file’s size to the size specified in a filter’s condition. When using this specification, the condition must be structured as follows:
You’ll notice that there’s a “<=” symbol before the “10”. The “<=” symbol is a conditional operator (The meaning of this operator is in the table below). The number that comes after this operator is the file size that files will be compared to. In this instance, a file’s size must be less than or equal to 10 bytes to match. And because this is an exclusion filter, if a file matches the file will be excluded and it won’t be indexed into your AVAIL channel.
Here’s a list of all the conditional operators that this specification will accept.
Operator |
Meaning |
= |
Has a value of |
== |
Has a value of |
> |
Has a value greater than |
>= |
Has a value greater than or equal to |
< |
Has a value less than |
<= |
Has a value less than or equal to |
Important Note: The conditional operator must come before the file’s size
A file’s size cannot be negative. If you use a negative number with the File Size specification, your filter will throw an error and all files that encounter the filter will not match.Advanced Filtering: Using AND and OR (Boolean Relationships) and Filter Groups
To enable advanced filtering strategies, we’ve introduced the concept of boolean relationships when there’s more than one filter being used. In this case, every filter has a boolean relationship between itself and the filter that comes before it - as well as between itself and the filter that comes after it (assuming both a before and after filter exist). There are currently only two types of boolean expressions that can relate two filters; the OR and the AND.
The OR
The OR relationship is a loose connection between two filters that says a file is a match as long as the file matches at least one of these filters. It might even be helpful to think of the OR as a divider rather than a connector, considering the result of one filter is not contingent on the result of the other. An example should clearly illustrate this.
In the above situation, we’re essentially saying that these filters will match files that are of type .jpg or .png. Both file types will be considered a match.
As the table on the left shows, our This_is_avail.jpg file and our This_is_avail.PNG files match these filters. However, the This_is_avail.txt file does not match. The This_is_avail.jpg matches the .jpg filter, so therefore its result on the .png filter (which it is not a match on) is irrelevant. The same can be said for the This_is_avail.PNG file. It matches the .png filter so its result on the .jpg filter is irrelevant. And then the This_is_avail.txt is not a match on either the .jpg filter nor the .png filter, so it is not a match.
The AND
The AND relationship is a strong connection between two filters that says a file must match both of the related filters to be considered a match. When filters have an AND relationship between them, it’s best to think of the filters as being grouped together into a single filter, with two requirements that need to be met for a file to match. The more filters with contiguous AND relationships, the more requirements for that AND grouping.
As the table on the left shows, our This_is_avail.jpg file and our This_is_avail.txt files do not match the filters above. This is because the filters are related by an AND, and the files don’t match both filters. While both files contain the string “avail”, they are not of file type .png. However, the This_is_avail.PNG file does match, because it matches both of the related filters. The file contains the string “avail” and it is also of file type png.
Now, let’s expand on this example and add a third filter. We’ll scope this filter to folders.
Now that we’ve added a filter scoped to folders, we need to analyze the folders that files live in. The table above shows the entire file paths up to the file itself. As you can see, only the first avail_logo.png file is a match. The avail_logo.png matches all three of the filters related by AND. The file is of type png, the file’s name contains the string “avail”, and the name of its parent folder begins with the string “another”. The other two files both fail against the folder filter, as neither has a parent or ancestor folder that begins with the string “another”.
Filter Groups
Along with offering boolean relationships between single filters, AVAIL also offer ways to “group” filters that have been related by AND/OR expressions. These filter groups are then also related by AND/OR expressions, the same way single filters are. A filter group will take any number of filters and match a file against all of the filters within the group, and the group will compute a single Match or No Match result for that file. These Match/No Match results across multiple groups will then be condensed down to a single Match or No Match result depending on how the groups are related (whether they are AND or OR together).
In the above example, we see two filter groups. The first filter group contains two filters for the file types .png and .jpg. The second filter group contains one filter for folders that contain the string “Marketing”. You can see that the two filter groups have been related by an AND relationship. You can add new filter groups by using the two buttons circled in red. The AND/OR relationship of the new filter group is contingent upon which of these buttons you hit.
Continuing with the above example, the AND relationship between the two filter groups is saying that, for a file to be considered a match, it has to match the first filter group and also has to match the second filter group. In concrete terms for this example, the file must be of file type .png or .jpg and then it must also live in a folder that contains the term marketing.
If you continue to the next section, we’ll look at this example (and others) with specific folder structures in mind so we can better understand how one might apply these filters to their actual content.
Advanced Filtering Examples
In this section, we’ll look at numerous different filtering techniques for different organizational folder structures. If you have a filtering example that you find useful and that you think others might benefit from, please reach out to support@getavail.com and let us know about it! We love hearing about how our customers get creative when using AVAIL!
Marketing Images Filter
This example was actually the inspiration for the introduction of Boolean relationships and filter groups to AVAIL’s filtering. Let’s take a look at the folder structure below.
We have a “Projects” folder that contains numerous different projects, denoted by the letters “A”, “B”, and “C” in this example. Within each of these project folders, there exists a folder for “Project Files” and a folder for “Marketing”.
With this folder structure in mind, let’s say we want to index all of the Marketing images from each project folder into a Marketing Images channel. To do this, we might be able to index the all encompassing “Projects” folder and only include .jpg or.png files (or any other image format that might be relevant), as the filters below show.
This will definitely index all of the project marketing images into the Marketing Images Channel. However, let’s say there are other images that also reside in the “Projects” folder which aren’t necessarily marketing images and that don’t live in any of the “Marketing” folders. In this scenario, the filters above would not be sufficient. The Marketing Images Channel would have images indexed into it that are unwanted. So now we need to modify these filters so that they only match on images that live in a folder whose name contains “Marketing”. This can be achieved by adding a second filter group with the AND relationship, containing a filter for the “Marketing” folder.
Now when content is run through these filters, the content has to match on the first filter group and be of type .png or .jpg. If the content matches on this first filter group, then the AND relationship with the second filter group means the content also has to match on it, which enforces the rule that this content must also live in a folder that contains the string “Marketing”. This will effectively eliminate all other .png and .jpg files that aren’t in the marketing folders of each of our projects, leaving us only with the wanted content.
Have more questions on filtering in AVAIL? Reach out to us at support@getavail.com.