Search Syntax
Last updated
About Search Syntax
The search engine allows users to locate uploads via tags that are associated with an upload and some other metadata such as uploader and user count. It also permits chaining together specified tags and metadata to search for specific logical combinations, to allow more precise filtering. This guide explains the syntax and features of individual terms and then shows how they are combined into more complex queries.
- About Search Syntax
- Search Terms
- Special Characters and Suffixes
- Search Grammar: Term Operators and Combinations
- Boosting Terms
Search Terms
Specific searches require the inclusion of search terms, which individually define the criteria expected of each upload result to be returned by the search engine.
Tag Search Behavior
Searching a single term is obvious: merely type in the term you want. By default, the term you use will be searched among the indexed image tags and aliases. Thus, a search for
pinkie pie
would, as you may surmise, result in all appropriately tagged and indexed pictures of Pinkie Pie. Aliases are also indexed, so a search for the tag alias ts
is the same as one for
twilight sparkle
.
The default tag search has particular aspects associated with it for your convenience. For tag searches, case is insensitive. This means capitalization is irrelevant for queries. For example, the search queries
pinkie pie
and
Pinkie Pie
will share the same result set.
Searching Through Other Fields
Other fields are also indexed, and you can search them using the namespace convention that is also used by tags. Namely, one enters the field name followed by a colon, and finally, the target value. For example, to search for images with a width of 1920,
we would search within the
width
field and so construct the query
width:1920
. If a tag with namespace were to share the namespace with a given field, it can still be queried via quoting or escaping.
Numeric Range Queries
Numeric fields in particular support queries for ranges of possible values. A qualifier can be added to the end of the field name with a single period to indicate desired results that are greater than or less than the supplied value; the value can be
optionally included, too. To find images with a score greater than 100, we would enter
score.gt:100
. For an inclusive search of scores greater than
or equal to 100, we would instead enter
score.gte:100
. The following table enumerates the supported qualifiers.
Qualifier | Meaning | Example |
---|---|---|
gt
|
Values greater than specified, and not including the specified value |
score.gt:100
|
gte
|
Values greater than or equal to specified |
score.gte:100
|
lt
|
Values less than specified, and not including the specified value |
width.lt:100
|
lte
|
Values less than or equal to specified |
width.lte:100
|
Date/Time Range Queries
Date and time values are specified using a tweaked subset of the
ISO 8601 standard. A full date is specified by four-digit year, followed by two-digit month and date, with each value delimited by a hyphen, i.e., "YYYY-mm-DD". Like in ISO 8601, one can specify just
the month or even just the year, as long as the less precise information is included in left-to-right order without dangling hyphens. This is semantically interpreted as the range of the entire period (not just the first day of the month, etc.). For
example,
2015-04
represents the entire month of April 2015.
Given a full date, a specification for the time of day can be added. To do so, separate the time with a
T
or space, followed by the hours, minutes, and seconds, each specified ' with two digits and separated by a colon, i.e., "HH:MM:SS". The hours follow a 24-hour clock. As with date values, one may alternatively specify entire minutes and
even hours by truncating the value without a dangling colons. The value
2014-04-20 16
represents the entire hour of 4 PM on 20 April 2014 (UTC). The entire first minute can be specified with
2014-04-20 16:00
.
By default, time follows international UTC ("Zulu") time. (In terms of the ISO 8601 standard, a
Z
suffix is implied.) One may specify an offset for local time by affixing a plus or minus sign, followed by the offset hours as two digits, a colon, and the offset minutes (usually
00
), e.g.,
-04:00
for US Eastern Daylight Time (EDT). Note that unlike ISO 8601, this can be attached to dates as well as times, to ensure date boundaries fit the locale of interest. For example,
2015-05:00
represents the year of 2015 with an offset of minus five hours (US Eastern Standard Time).
Date/time range queries also accept range qualifiers. The
gt
and
lt
qualifiers omit everything matching the implied time range of the specified value, whereas
gte
and
lte
include the entirety of said time range.
The following examples are valid search queries.
Example | Explanation |
---|---|
created_at:2015
|
Returns all uploads made in 2015 (UTC). |
created_at:2015+08:00
|
Returns all uploads made in 2015 (SGT). |
created_at:2015-04
|
Returns all uploads made in April 2015 (UTC). |
created_at:2015-04-03:00
|
Returns all uploads made in April 2015 (BRT). |
created_at:2015-04-01
|
Returns all uploads made in 1 April 2015 (UTC). |
created_at:2015-04-01+08:00
|
Returns all uploads made in 1 April 2015 (SGT). |
created_at:2015-04-01 01
|
Returns all uploads made in the hour of 1 AM of 1 April 2015 (UTC). |
created_at:2015-04-01 01Z
|
Returns all uploads made in the hour of 1 AM on 1 April 2015 (UTC). The zero UTC offset designator ("Zulu") is explicit. |
created_at:2015-04-01T01Z
|
Returns all uploads made in the hour of 1 AM on 1 April 2015 (UTC). This uses the standard "T" separator associated with ISO 8601. |
created_at:2015-04-01 01-04:00
|
Returns all uploads made in the hour of 1 AM on 1 April 2015 (EDS). |
created_at:2015-04-01 01:00
|
Returns all uploads made sometime in the minute of 1:00 AM on 1 April 2015 (UTC). |
created_at:2015-04-01 01:00Z
|
Returns all uploads made sometime in the minute of 1:00 AM on 1 April 2015 (UTC). The zero UTC offset designator ("Zulu") is explicit. |
created_at:2015-04-01 00:00:00
| Returns all uploads made exactly at midnight on 1 April 2015 (UTC). |
created_at:2015-04-01 00:00:00+08:00
|
Returns all uploads made exactly at midnight on 1 April 2015 (SGT). |
created_at.lt:2015
|
Returns all uploads before the start of 2015 (UTC). |
created_at.gte:2015-04-04
|
Returns all uploads since and including the entire day of 4 April 2015 (season 5 premiere, UTC). |
Supported Fields
The following table enumerates all of the supported fields, with examples.
Field Selector | Type | Description | Example |
---|---|---|---|
aspect_ratio
|
Numeric Range | Matches any image with the specified aspect ratio. |
aspect_ratio:1
|
comment_count
|
Numeric Range | Matches any image with the specified number of comments |
comment_count.gt:50
|
created_at
|
Date/Time Range | Matches any image posted at the specified date and/or time. |
created_at:2015-04-01
|
description
|
Full Text | Full-text search against image descriptions with the specified string. |
description:derp
|
downvotes
|
Numeric Range | Matches any image with the specified downvote count. |
downvotes:0
|
faved_by
|
Literal | Matches any image favorited by the specified user. Case-insensitive. |
faved_by:roboshi
|
faves
|
Numeric Range | Matches any image with the specified number of favorites. |
faves:20
|
height
|
Numeric Range | Matches any image with the specified height. |
height:1080
|
id
|
Numeric Range | Matches any image with the specified number. |
id:111111
|
mime_type
|
Literal | Returns images with the specified IANA media type. |
mime_type:video/webm
|
orig_sha512_hash
|
Literal | Matches the original SHA-512 checksum of an uploaded image. |
|
original_format
|
Literal | Returns images with the specified image format. |
original_format:png
|
score
|
Numeric Range | Matches any image with the specified net score. |
score.gt:200
|
sha512_hash
|
Literal | Matches any image with the specified SHA-512 checksum. N.B.: Image optimization usually alters the original checksum! |
|
source_count
|
Numeric Range | Matches any image with the specified number of sources. |
source_count:3
|
source_url
|
Literal | Matches image source URLs. Case-insensitive. |
source_url:*deviantart.com*
|
tag_count
|
Numeric Range | Matches any image with the specified number of tags |
tag_count.gt:10
|
uploader
|
Literal | Matches any image with the specified uploader account. Case-insensitive. |
uploader:k_a
|
upvotes
|
Numeric Range | Matches any image with the specified upvote count. |
upvotes.gt:200
|
width
|
Numeric Range | Matches any image with the specified width. |
width:1920
|
wilson_score
|
Numeric Range | Matches any image with the specified lower bound of a 99.5% Wilson CI. |
wilson_score.gt:0.9
|
It is worth noting the absence of certain “fields” such as
artist
and
spoiler
. These are
tag namespaces, not metadata, but they are functionally the same. Thus, a search for
spoiler:s04
performs as expected.
Tag Categories
Additionally you can search by count of specific tag categories on an image. Supported categories are body_type_tag_count
, error_tag_count
, character_tag_count
, content_fanmade_tag_count
, content_official_tag_count
, oc_tag_count
, rating_tag_count
, species_tag_count
, and
spoiler_tag_count
. For example, origin_tag_count:2
Note that tag categories may not result the exact results you expect. With the example above, tags like alternate version
or edit
are currently under the origin_tag category
.
Special Characters and Suffixes
Wildcards
Wildcards allow for matching with terms that begin with, end with, or contain a given string of characters, like wildcards used in file management. Two wildcards are recognized: the asterisk (or star) and the question mark.
An asterisk "expands" or matches to any number of characters in its place, including 0. For example,
apple*
matches to uploads with any of the tags
apple bloom
,
applejack
, and simply
apple
.
A question mark matches to a single character in its place. For example,
t?ixie
can match to either
trixie
or
twixie
.
Wildcard Character | Match |
---|---|
*
|
Zero or more characters |
?
|
A single character |
Escaping Special Characters
The use of special characters that modify search terms or exist outside search terms mandates a facility for “escaping” those characters, so that they are not excluded from search terms themselves. To use special characters within a search term, both of the conventional string escaping mechanisms are used: the backslash and quoting. The following are special characters and sequences that may need to be escaped:
-
(
-
)
-
*
-
?
-
-
(when placed in front of a term) -
!
(when placed in front of a term) -
,
-
&&
-
||
-
OR
(if all-capitalized) -
AND
(if all-capitalized) -
NOT
(if all-capitalized) -
"
-
\
-
~
(with fuzzy matching syntax) -
^
(with boosting)
A backslash is placed in front of a special character (and can also be placed in front of a sequence like the ones in the preceding list). This forces a given character to be counted as part of the preceding or following term. In front of any other character,
it effectively has no effect. For example,
\-_-
forces a search for the emoticon
-_-
, despite it following the syntax for
negation
if without the backslash. Also consider the search term
rose \(flower\)
, although parentheses have intuitive rules that do not make escaping them necessary in most cases. The backslash is a special character and thus must also be escaped; a literal
backslash is indicated with \\
.
The alternative to escaping is to simply surround the search query in double quotes ("
), e.g.,
"rose (flower)"
. When searching with a specified field, quotes
must surround the field and colon as well, e.g.,
"width:1920"
. Everything in quotes is together treated as a verbatim search term, with one exception. Note that the double quote character itself bounds the search term, so if it appears inside,
it must be escaped with a backslash.
All other uses of backslash are treated literally.
Approximate (Fuzzy) String Matching
The search engine backend, Apache Lucene, also enables so-called “fuzzy” string matching. Fuzzy string matching can be used with any literal search term, including the default tags field. A fuzzy match is specified using a similarity metric either ranging from 0 to 1.0 or a whole number. The whole number specifies an optimal string alignment edit distance, which is the maximum number of edits done to a string to match a given target, with an edit defined as a deletion, insertion, replacement, or switching two adjacent characters. One may alternatively define a similarity factor ranging from 0 to 1.0, with a 1.0 the least “fuzzy”. The derived edit distance is the length of the term sans the field name prefix, multiplied by the difference of unity minus the similarity factor, all rounded down. To specify either, a term is followed with a tilde followed by the edit distance or similarity factor. Note in both cases that Lucene caps the maximum edit distance at 2, as an optimization. Therefore, very large edit distances or small similarities will not behave as expected.
For example,
fluttersho~0.8
searches for uploads with tags that approximately match
fluttersho
, with a similairty of 0.8. This is an edit distance of ⌊(1 − 0.8)(10)⌋ = 2. Note that uploads tagged
fluttershy
are included in the result set. The utility of this is obvious: if you are unsure of a character or tag's exact spelling, you can use this as an aid, like a more manual and controlled version of Google's (in)famous spelling correction
features.
Fuzziness can also be applied to numeric queries to specify a range. In this case, the fuzziness parameter is the magnitude above and below the specified number that will be included in the result set. For example,
width:800~200
specifies images with a width ranging from 600 (800 − 200) to 1000 (800 + 200), inclusive.
Fuzzy matching can be freely applied to any term inside an
expression
.
Search Grammar: Term Operators and Combinations
Expressions
Terms can be combined to define a search query corresponding to a specific result set. These combinations are formulized as expressions that are constructed from terms, operators, and even other expressions, which are then called subexpressions. Expressions recognized by the search frontend are the negation of a term or subexpression, the requirement of any search term or subexpression, or the requirement of both search terms or subexpressions.
At its core, a search expression is either binary or unary. A binary expression consists of a term or subexpression, an operator indicating the type of expression, and another term or subexpression. Binary expressions can be “chained” by adding the operator followed by another term. A unary expression consists of the operator followed by a single term or subexpression. Both expression types and how to use subexpressions will be covered in the following sections.
Summary Table
Operator | Symbols | Comments |
---|---|---|
Negation (NOT) |
|
Applied in front of a single term or parenthesized subexpression. The minus sign does not require padding to the right. Specifies that the term or subexpression must not match. |
Conjunction (AND) |
|
Applied between two terms. The comma may be optionally padded with space on either side; the other forms must be padded. Specifies that both terms match. Can be chained to more terms. |
Disjunction (OR) |
|
Applied between two terms, with surrounding space. Specifies that either of the terms match. Can be chained to more terms. |
Negation
Negation of a term or expression specifies that the the original term or subexpression
must not match. The corresponding negation operator is
unary, that is, applied to either a single term or to a subexpression. It is specified with the all-capitalized word
NOT
, a dash of the non-multi-chromatic variety (-
), or an exclamation point (!
). For example,
-fluttershy
or NOT fluttershy
matches pictures that are
not tagged with
fluttershy
. In set theory terms, this is taking the
complement of the original result set, that is, all uploads outside it.
Commas and AND Expressions
An expression that queries for images that meet
all specified terms is a
conjunction or
AND expression. As in the past, you can query images that meet a list of terms by hooking the terms together with commas. For example,
fluttershy,pinkie pie
results in pictures that contain
both the
fluttershy
and
pinkie pie
tags. In set theory terms, the result set is the intersection of uploads tagged
fluttershy
and uploads tagged
pinkie pie
.
Commas can be padded with spaces however you like. Unlike the past, commas are now plain AND operators, so they are more versatile. As will be discussed, they can be used in subexpressions and alongside the OR operator.
AND operators can also be expressed using
&&
(derived from typical programming notation) or the all-capitalized word
AND
, e.g.,
rarity && pinkie pie
or
rarity AND pinkie pie
. These forms, unlike the comma, require padding space on either side.
OR Expressions
A
disjunction or
OR expression requests for uploads that meet
any of the specified search terms. This is markedly different from the aforementioned AND expression, which, to reiterate, mandates that
all terms match. OR operators are expressed either with
||
(also a programming notation) or the all-capitalized word
OR
, e.g.,
rarity || pinkie pie
or
rarity OR pinkie pie
. In set theory terms, the result set is the union of uploads tagged
rarity
and uploads tagged
pinkie pie
. All forms of the OR operator require padding on either side.
Compound Expressions
Complex combinations of terms, and therefore search criteria, are possible by combining expressions together. Doing so effectively is analogous to arithmetic. Consider multiplication and addition (which in so-called Boolean alegra are respectively analogous to AND and OR operations). We can express an algebraic expression with multiplication and addition several ways. For three terms, A, B, and C, consider the expression A × B + C. Multiplication is evaluated before addition, so this expression is equivalent to (A × B) + C, in which case the order of operations is explicit.
Operator Precedence
Likewise, precedence is applied to determine the order in which chained OR, AND, and NOT operations are evaluated. The order of operations in the search syntax is as follows:
- negation (NOT)
- conjunction (AND)
- disjunction (OR)
Consider the query
twilight sparkle || fluttershy && pinkie pie
. In this example,
fluttershy && pinkie pie
is evaluated first, as an implicit
subexpression. Then, that result is OR'd together with
twilight sparkle
. Thus, the query instructs the engine to return uploads
either tagged with
twilight sparkle
or tagged with
both
fluttershy
and
pinkie pie
. Note how if the OR expression
twilight sparkle || fluttershy
were evaluated first, the result set would differ.
Defining Subexpressions with Parentheses
Returning to an earlier example with arithmetic, we can trump the order of operations using explicit subexpressions. This requires the use of
delimiters that act as boundaries, and most often parentheses are used for this purpose. Hence,
A × (B +
C) forces
B +
C to be evaluated, and then multiplied with
A, which is contrary to the order otherwise followed. Likewise,
(twilight sparkle || fluttershy) && pinkie pie
instructs the search engine to return results that have
either
twilight sparkle
or
fluttershy
and always match the tag
pinkie pie
.
As was mentioned earlier, the unary NOT operator can be applied to parenthesized subexpressions. The semantics of this is analogous to applying it to a single term: a negated subexpression specifies uploads that
do not adhere to what the subexpression specifies. For example, the query
-(pinkamena diane pie, grimdark)
returns all uploads that are
not tagged with
both
pinkamena diane pie
and
grimdark
. Uploads tagged with
either of the two would be returned as long as they do not have both. Thus light-hearted Pinkamena images and grimdark material not involving Pinkamena would be included, yet the intersection of those two sets of images would be excluded, that
is, images that are grimdark and feature Pinkamena.
Explicit subexpressions with parentheses allow for complex queries as they can be arbitrarily nested inside other subexpressions, to fine-tune the result set even more.
Automatic Parentheses Escaping
Finally, a footnote about parentheses is warranted. Traditionally, if an expression parser encounters an open parenthesis without a closing parenthesis, or if parentheses are swapped, an error is raised. This is indeed the case with the search engine,
as highlighted in the search parsing error page. However, to a limited extent, a term can contain parentheses within. Parentheses are accepted within search terms as long as they are closed and do not cover the entire expression. The first limit is
a heuristic to address the typical use of parentheses, and the latter arises from the legal use of parentheses to single out a term. Thus, the search
rose (flower)
searches for uploads tagged with
rose (flower)
; however, the emoticon query
))B-(
raises an error, while
(q)
effectively searches for
q
, instead. For the latter two examples, simply surround with double quotes to clarify your meaning to the search engine.
Boosting Terms
The search engine also allows the boosting of specific terms when sorting by relevance, so that uploads including or not including the term occur earlier or later in the results. Boosting is done by modifying a term's relevance score with a positive or
negative value. This value is affixed to a term with a preceding caret (^
) and with a positive or negative decimal number. For example,
pinkie pie^1 || tara strong
returns uploads tagged either with
pinkie pie
or
tara strong
, but when sorting by relevance descending, uploads with
pinkie pie
are prioritized. A negative value meanwhile reduces the relevance score and deprioritizes the affected term when sorting by relevance, e.g.,
pinkie pie^-1 || tara strong
. Sorting options are found below the search box on this page and
must be set to sort by relevance for boosting to take proper effect. Thus, in both cases, pictures with
both tags will still appear first.