Syntax Highlighting is what makes the editor automatically display text in different styles/colors, depending on the function of the string in relation to the purpose of the file. In program source code for example, control statements may be rendered bold, while data types and comments get different colors from the rest of the text. This greatly enhances the readability of the text, and thus helps the author to be more efficient and productive.
Of the two examples, which is easiest to read?
KatePart comes with a flexible, configurable and capable system for doing syntax highlighting, and the standard distribution provides definitions for a wide range of programming, scripting and markup languages and other text file formats. In addition you can provide your own definitions in simple XML files.
KatePart will automatically detect the right syntax rules when you open a file, based on the MIME Type of the file, determined by its extension, or, if it has none, the contents. Should you experience a bad choice, you can manually set the syntax to use from the → menu.
The styles and colors used by each syntax highlight definition can be configured using the Highlighting Text Styles tab of the Config Dialog, while the MIME Types and file extensions it should be used for are handled by the Modes & Filetypes tab.
Note
Syntax highlighting is there to enhance the readability of correct text, but you cannot trust it to validate your text. Marking text for syntax is difficult depending on the format you are using, and in some cases the authors of the syntax rules will be proud if 98% of text gets correctly rendered, though most often you need a rare style to see the incorrect 2%.
This section will discuss the KatePart syntax highlighting mechanism in more detail. It is for you if you want to know about it, or if you want to change or create syntax definitions.
Whenever you open a file, one of the first things the KatePart editor does is detect which syntax definition to use for the file. While reading the text of the file, and while you type away in it, the syntax highlighting system will analyze the text using the rules defined by the syntax definition and mark in it where different contexts and styles begin and end.
When you type in the document, the new text is analyzed and marked on the fly, so that if you delete a character that is marked as the beginning or end of a context, the style of surrounding text changes accordingly.
The syntax definitions used by the KatePart Syntax Highlighting System are XML files, containing
Rules for detecting the role of text, organized into context blocks
Keyword lists
Style Item definitions
When analyzing the text, the detection rules are evaluated in the order in which they are defined, and if the beginning of the current string matches a rule, the related context is used. The start point in the text is moved to the final point at which that rule matched and a new loop of the rules begins, starting in the context set by the matched rule.
The detection rules are the heart of the highlighting detection system. A rule is a string, character or regular expression against which to match the text being analyzed. It contains information about which style to use for the matching part of the text. It may switch the working context of the system either to an explicitly mentioned context or to the previous context used by the text.
Rules are organized in context groups. A context group is used for main text concepts within the format, for example quoted text strings or comment blocks in program source code. This ensures that the highlighting system does not need to loop through all rules when it is not necessary, and that some character sequences in the text can be treated differently depending on the current context.
Contexts may be generated dynamically to allow the usage of instance specific data in rules.
In some programming languages, integer numbers are treated differently from floating point ones by the compiler (the program that converts the source code to a binary executable), and there may be characters having a special meaning within a quoted string. In such cases, it makes sense to render them differently from the surroundings so that they are easy to identify while reading the text. So even if they do not represent special contexts, they may be seen as such by the syntax highlighting system, so that they can be marked for different rendering.
A syntax definition may contain as many styles as required to cover the concepts of the format it is used for.
In many formats, there are lists of words that represent a specific concept. For example, in programming languages, control statements are one concept, data type names another, and built in functions of the language a third. The KatePart Syntax Highlighting System can use such lists to detect and mark words in the text to emphasize concepts of the text formats.
If you open a C++ source file, a Java™ source file and an HTML document in KatePart, you will see that even though the formats are different, and thus different words are chosen for special treatment, the colors used are the same. This is because KatePart has a predefined list of Default Styles which are employed by the individual syntax definitions.
This makes it easy to recognize similar concepts in different text formats. For example, comments are present in almost any programming, scripting or markup language, and when they are rendered using the same style in all languages, you do not have to stop and think to identify them within the text.
Tip
All styles in a syntax definition use one of the default styles. A few syntax definitions use more styles than there are defaults, so if you use a format often, it may be worth launching the configuration dialog to see if some concepts use the same style. For example, there is only one default style for strings, but as the Perl programming language operates with two types of strings, you can enhance the highlighting by configuring those to be slightly different. All available default styles will be explained later.
KatePart uses the Syntax-Highlighting framework from KDE Frameworks™. The default highlighting XML files shipped with KatePart are compiled into the Syntax-Highlighting library by default.
This section is an overview of the Highlight Definition XML format. Based on a small example it will describe the main components and their meaning and usage. The next section will go into detail with the highlight detection rules.
The formal definition, also known as the XSD you
find in Syntax
Highlighting repository in the file language.xsd
Custom .xml
highlight definition files are
located in org.kde.syntax-highlighting/syntax/
in
your user folder found with qtpaths
which usually are
--paths
GenericDataLocation
and $HOME
/.local/share//usr/share/
.
In Flatpak and Snap packages, the above directory will not work
as the data location is different for each application.
In a Flatpak application, the location of custom XML files is usually
and in a Snap application that location is
$HOME
/.var/app/flatpak-package-name
/data/org.kde.syntax-highlighting/syntax/
.
$HOME
/snap/snap-package-name
/current/.local/share/org.kde.syntax-highlighting/syntax/
On Windows® these files are located %USERPROFILE%\AppData\Local\org.kde.syntax-highlighting\syntax
.
%USERPROFILE%
usually expands to C:\Users\
.user
In summary, for most configurations the directory of custom XML files is as follows:
For local user |
|
For all users | /usr/share/org.kde.syntax-highlighting/syntax/ |
For Flatpak packages |
|
For Snap packages |
|
On Windows® | %USERPROFILE%\AppData\Local\org.kde.syntax-highlighting\syntax |
If multiple files exist for the same language, the file with the highest version
attribute in the language
element will be loaded.
Main sections of KatePart Highlight Definition files
- A highlighting file contains a header that sets the XML version:
<?xml version="1.0" encoding="UTF-8"?>
- The root of the definition file is the element
language
. Available attributes are: Required attributes:
name
sets the name of the language. It appears in the menus and dialogs afterwards.section
specifies the category.extensions
defines file extensions, such as "*.cpp;*.h"version
specifies the current revision of the definition file in terms of an integer number. Whenever you change a highlighting definition file, make sure to increase this number.kateversion
specifies the latest supported KatePart version.Optional attributes:
mimetype
associates files MIME type.casesensitive
defines, whether the keywords are case sensitive or not.priority
is necessary if another highlight definition file uses the same extensions. The higher priority will win.author
contains the name of the author and his email-address.license
contains the license, usually the MIT license for new syntax-highlighting files.style
contains the provided language and is used by the indenters for the attributerequired-syntax-style
.indenter
defines which indenter will be used by default. Available indenters are: ada, normal, cstyle, cmake, haskell, latex, lilypond, lisp, lua, pascal, python, replicode, ruby and xml.hidden
defines whether the name should appear in KatePart's menus.So the next line may look like this:
<language name="C++" version="1" kateversion="2.4" section="Sources" extensions="*.cpp;*.h" />
- Next comes the
highlighting
element, which contains the optional elementlist
and the required elementscontexts
anditemDatas
. list
elements contain a list of keywords. In this case the keywords are class and const. You can add as many lists as you need.Since KDE Frameworks™ 5.53, a list can include keywords from another list or language/file, using the
include
element.##
is used to separate the list name and the language definition name, in the same way as in theIncludeRules
rule. This is useful to avoid duplicating keyword lists, if you need to include the keywords of another language/file. For example, the othername list contains the str keyword and all the keywords of the types list, which belongs to the ISO C++ language.The
contexts
element contains all contexts. The first context is by default the start of the highlighting. There are two rules in the context Normal Text, which match the list of keywords with the name somename and a rule that detects a quote and switches the context to string. To learn more about rules read the next chapter.The third part is the
itemDatas
element. It contains all color and font styles needed by the contexts and rules. In this example, theitemData
Normal Text, String and Keyword are used.<highlighting> <list name="somename"> <item>class</item> <item>const</item> </list> <list name="othername"> <item>str</item> <include>types##ISO C++</include> </list> <contexts> <context attribute="Normal Text" lineEndContext="#pop" name="Normal Text" > <keyword attribute="Keyword" context="#stay" String="somename" /> <keyword attribute="Keyword" context="#stay" String="othername" /> <DetectChar attribute="String" context="string" char=""" /> </context> <context attribute="String" lineEndContext="#stay" name="string" > <DetectChar attribute="String" context="#pop" char=""" /> </context> </contexts> <itemDatas> <itemData name="Normal Text" defStyleNum="dsNormal" /> <itemData name="Keyword" defStyleNum="dsKeyword" /> <itemData name="String" defStyleNum="dsString" /> </itemDatas> </highlighting>
- The last part of a highlight definition is the optional
general
section. It may contain information about keywords, code folding, comments, indentation, empty lines and spell checking. The
comment
section defines with what string a single line comment is introduced. You also can define a multiline comment using multiLine with the additional attribute end. This is used if the user presses the corresponding shortcut for comment/uncomment.The
keywords
section defines whether keyword lists are case sensitive or not. Other attributes will be explained later.The other sections,
folding
,emptyLines
andspellchecking
, are usually not necessary and are explained later.<general> <comments> <comment name="singleLine" start="#"/> <comment name="multiLine" start="###" end="###" region="CommentFolding"/> </comments> <keywords casesensitive="1"/> <folding indentationsensitive="0"/> <emptyLines> <emptyLine regexpr="\s+"/> <emptyLine regexpr="\s*#.*"/> </emptyLines> <spellchecking> <encoding char="á" string="\'a"/> <encoding char="à" string="\`a"/> </spellchecking> </general> </language>
This part will describe all available attributes for contexts, itemDatas, keywords, comments, code folding and indentation.
- The element
context
belongs in the groupcontexts
. A context itself defines context specific rules such as what should happen if the highlight system reaches the end of a line. Available attributes are: name
states the context name. Rules will use this name to specify the context to switch to if the rule matches.lineEndContext
defines the context the highlight system switches to if it reaches the end of a line. This may either be a name of another context,#stay
to not switch the context (e.g.. do nothing) or#pop
which will cause it to leave this context. It is possible to use for example#pop#pop#pop
to pop three times, or even#pop#pop!OtherContext
to pop two times and switch to the context namedOtherContext
. It is also possible to switch to a context that belongs to another language definition, in the same way as in theIncludeRules
rules, e.g.,SomeContext##JavaScript
. Note that it is not possible to use this context switch in combination with#pop
, for example,#pop!SomeContext##JavaScript
is not valid. Context switches are also described in the section called “Highlight Detection Rules”.lineEmptyContext
defines the context if an empty line is encountered. The nomenclature of context switches is the same as previously described in lineEndContext. Default: #stay.fallthroughContext
specifies the next context to switch to if no rule matches. The nomenclature of context switches is the same as previously described in lineEndContext. Default: #stay.fallthrough
defines if the highlight system switches to the context specified infallthroughContext
if no rule matches. Note that since KDE Frameworks™ 5.62 this attribute is deprecated in favor offallthroughContext
, since if thefallthroughContext
attribute is present it is implicitly understood that the value offallthrough
is true. Default: false.noIndentationBasedFolding
disables indentation-based folding in the context. If indentation-based folding is not activated, this attribute is useless. This is defined in the element folding of the group general. Default: false.- The element
itemData
is in the groupitemDatas
. It defines the font style and colors. So it is possible to define your own styles and colors. However, we recommend you stick to the default styles if possible so that the user will always see the same colors used in different languages. Though, sometimes there is no other way and it is necessary to change color and font attributes. The attributes name and defStyleNum are required, the others are optional. Available attributes are: name
sets the name of the itemData. Contexts and rules will use this name in their attribute attribute to reference an itemData.defStyleNum
defines which default style to use. Available default styles are explained in detail later.color
defines a color. Valid formats are '#rrggbb' or '#rgb'.selColor
defines the selection color.italic
if true, the text will be italic.bold
if true, the text will be bold.underline
if true, the text will be underlined.strikeout
if true, the text will be struck out.spellChecking
if true, the text will be spellchecked.- The element
keywords
in the groupgeneral
defines keyword properties. Available attributes are: casesensitive
may be true or false. If true, all keywords are matched case sensitively.weakDeliminator
is a list of characters that do not act as word delimiters. For example, the dot'.'
is a word delimiter. Assume a keyword in alist
contains a dot, it will only match if you specify the dot as a weak delimiter.additionalDeliminator
defines additional delimiters.wordWrapDeliminator
defines characters after which a line wrap may occur.Default delimiters and word wrap delimiters are the characters
.():!+,-<=>%&*/;?[]^{|}~\
, space (' '
) and tabulator ('\t'
).- The element
comment
in the groupcomments
defines comment properties which are used for → , → and → . Available attributes are: name
is either singleLine or multiLine. If you choose multiLine the attributes end and region are required. If you choose singleLine you can add the optional attribute position.start
defines the string used to start a comment. In C++ this would be "/*" in multiline comments. This attribute is required for types multiLine and singleLine.end
defines the string used to close a comment. In C++ this would be "*/". This attribute is only available and is required for comments of type multiLine.region
should be the name of the foldable multiline comment. Assume you have beginRegion="Comment" ... endRegion="Comment" in your rules, you should use region="Comment". This way uncomment works even if you do not select all the text of the multiline comment. The cursor only must be in the multiline comment. This attribute is only available for type multiLine.position
defines where the single line comment is inserted. By default, the single line comment is placed at the beginning of the line at column 0, but if you use position="afterwhitespace" the comment is inserted after leading whitespaces right, before the first non-whitespace character. This is useful for putting comments correctly in languages where indentation is important, such as Python or YAML. This attribute is optional and the only possible value is afterwhitespace. This is only available for type singleLine.- The element
folding
in the groupgeneral
defines code folding properties. Available attributes are: indentationsensitive
if true, the code folding markers will be added indentation based, as in the scripting language Python. Usually you do not need to set it, as it defaults to false.- The element
emptyLine
in the groupemptyLines
defines which lines should be treated as empty lines. This allows modifying the behavior of the lineEmptyContext attribute in the elementscontext
. Available attributes are: regexpr
defines a regular expression that will be treated as an empty line. By default, empty lines do not contain any characters, therefore, this adds additional empty lines, for example, if you want lines with spaces to also be considered empty lines. However, in most syntax definitions you do not need to set this attribute.- The element
encoding
in the groupspellchecking
defines a character encoding for spell checking. Available attributes: char
is a encoded character.string
is a sequence of characters that will be encoded as the character char in the spell checking. For example, in the language LaTeX, the string\"{A}
represents the characterÄ
.
Default Styles were already explained, as a short summary: Default styles are predefined font and color styles.
- General default styles:
dsNormal
, when no special highlighting is required.dsKeyword
, built-in language keywords.dsFunction
, function calls and definitions.dsVariable
, if applicable: variable names (e.g. $someVar in PHP/Perl).dsControlFlow
, control flow keywords like if, else, switch, break, return, yield, ...dsOperator
, operators like + - * / :: < >dsBuiltIn
, built-in functions, classes, and objects.dsExtension
, common extensions, such as Qt™ classes and functions/macros in C++ and Python.dsPreprocessor
, preprocessor statements or macro definitions.dsAttribute
, annotations such as @override and __declspec(...).- String-related default styles:
dsChar
, single characters, such as 'x'.dsSpecialChar
, chars with special meaning in strings such as escapes, substitutions, or regex operators.dsString
, strings like "hello world".dsVerbatimString
, verbatim or raw strings like 'raw \backlash' in Perl, CoffeeScript, and shells, as well as r'\raw' in Python.dsSpecialString
, SQL, regexes, HERE docs, LATEX math mode, ...dsImport
, import, include, require of modules.- Number-related default styles:
dsDataType
, built-in data types like int, void, u64.dsDecVal
, decimal values.dsBaseN
, values with a base other than 10.dsFloat
, floating point values.dsConstant
, built-in and user defined constants like PI.- Comment and documentation-related default styles:
dsComment
, comments.dsDocumentation
, /** Documentation comments */ or """docstrings""".dsAnnotation
, documentation commands like @param, @brief.dsCommentVar
, the variable names used in above commands, like "foobar" in @param foobar.dsRegionMarker
, region markers like //BEGIN, //END in comments.- Other default styles:
dsInformation
, notes and tips like @note in doxygen.dsWarning
, warnings like @warning in doxygen.dsAlert
, special words like TODO, FIXME, XXXX.dsError
, error highlighting and wrong syntax.dsOthers
, when nothing else fits.
This section describes the syntax detection rules.
Each rule can match zero or more characters at the beginning of the string they are tested against. If the rule matches, the matching characters are assigned the style or attribute defined by the rule, and a rule may ask that the current context is switched.
A rule looks like this:
<RuleName attribute="(identifier)" context="(identifier)" [rule specific attributes] />
The attribute identifies the style to use for matched characters by name, and the context identifies the context to use from here.
The context can be identified by:
An identifier, which is the name of the other context.
An order telling the engine to stay in the current context (
#stay
), or to pop back to a previous context used in the string (#pop
).To go back more steps, the #pop keyword can be repeated:
#pop#pop#pop
An order followed by an exclamation mark (!) and an identifier, which will make the engine first follow the order and then switch to the other context, e.g.
#pop#pop!OtherContext
.An identifier, which is a context name, followed by two hashes (
##
) and another identifier, which is the name of a language definition. This naming is similar to that used inIncludeRules
rules and allows you to switch to a context belonging to another syntax highlighting definition, e.g.SomeContext##JavaScript
. Note that it is not possible to use this context switch in combination with#pop
, for example,#pop!SomeContext##JavaScript
is not valid.
Rule specific attributes varies and are described in the following sections.
Common attributes
All rules have the following attributes in common and are
available whenever (common attributes)
appears.
attribute and context
are required attributes, all others are optional.
attribute: An attribute maps to a defined itemData.
context: Specify the context to which the highlighting system switches if the rule matches.
beginRegion: Start a code folding block. Default: unset.
endRegion: Close a code folding block. Default: unset.
lookAhead: If true, the highlighting system will not process the matches length. Default: false.
firstNonSpace: Match only, if the string is the first non-whitespace in the line. Default: false.
column: Match only, if the column matches. Default: unset.
Dynamic rules
Some rules allow the optional attribute dynamic
of type boolean that defaults to false. If dynamic is
true, a rule can use placeholders representing the text
matched by a regular expression rule that switched to the
current context in its string
or
char
attributes. In a string
,
the placeholder %N
(where N is a number) will be
replaced with the corresponding capture N
from the calling regular expression, starting from 1. In a
char
the placeholder must be a number
N
and it will be replaced with the first character of
the corresponding capture N
from the calling regular
expression. Whenever a rule allows this attribute it will contain a
(dynamic).
dynamic: may be (true|false).
How does it work:
In the regular expressions of the
RegExpr
rules, all text within simple curved brackets
(PATTERN)
is captured and remembered.
These captures can be used in the context to which it is switched, in the rules with the
attribute dynamic
true, by
%N
(in String) or
N
(in char).
It is important to mention that a text captured in a RegExpr
rule is
only stored for the switched context, specified in its context
attribute.
Tip
If the captures will not be used, both by dynamic rules and in the same regular expression,
non-capturing groups
should be used:(?:PATTERN)
The lookahead or lookbehind groups such as
(?=PATTERN)
,(?!PATTERN)
or(?<=PATTERN)
are not captured. See Regular Expressions for more information.The capture groups can be used within the same regular expression, using
\N
instead of%N
respectively. For more information, see Capturing matching text (back references) in Regular Expressions.
Example 1:
In this simple example, the text matched by the regular expression
=*
is captured and inserted into %1
in the dynamic rule. This allows the comment to end with the same amount of
=
as at the beginning. This matches text like:
[[ comment ]]
, [=[ comment ]=]
or
[=====[ comment ]=====]
.
In addition, the captures are available only in the switched context Multi-line Comment.
<context name="Normal" attribute="Normal Text" lineEndContext="#stay"> <RegExpr context="Multi-line Comment" attribute="Comment" String="\[(=*)\[" beginRegion="RegionComment"/> </context> <context name="Multi-line Comment" attribute="Comment" lineEndContext="#stay"> <StringDetect context="#pop" attribute="Comment" String="]%1]" dynamic="true" endRegion="RegionComment"/> </context>
Example 2:
In the dynamic rule, %1
corresponds to the capture that matches
#+
, and %2
to "+
.
This matches text as: #label""""inside the context""""#
.
These captures will not be available in other contexts, such as OtherContext, FindEscapes or SomeContext.
<context name="SomeContext" attribute="Normal Text" lineEndContext="#stay"> <RegExpr context="#pop!NamedString" attribute="String" String="(#+)(?:[\w-]|[^[:ascii:]])("+)"/> </context> <context name="NamedString" attribute="String" lineEndContext="#stay"> <RegExpr context="#pop!OtherContext" attribute="String" String="%2(?:%1)?" dynamic="true"/> <DetectChar context="FindEscapes" attribute="Escape" char="\"/> </context>
Example 3:
This matches text like:
Class::function<T>( ... )
.
<context name="Normal" attribute="Normal Text" lineEndContext="#stay"> <RegExpr context="FunctionName" lookAhead="true" String="\b([a-zA-Z_][\w-]*)(::)([a-zA-Z_][\w-]*)(?:<[\w\-\s]*>)?(\()"/> </context> <context name="FunctionName" attribute="Normal Text" lineEndContext="#pop"> <StringDetect context="#stay" attribute="Class" String="%1" dynamic="true"/> <StringDetect context="#stay" attribute="Operator" String="%2" dynamic="true"/> <StringDetect context="#stay" attribute="Function" String="%3" dynamic="true"/> <DetectChar context="#pop" attribute="Normal Text" char="4" dynamic="true"/> </context>
Local deliminators
Some rules allow the optional attributes weakDeliminator
and additionalDeliminator
which are combined with attributes
of the same name of keywords
tag. For example, when
'%'
is a weak delimiter of keywords
,
it can become a word delimiter only for a rule by putting it in its
additionalDeliminator
attribute. Whenever a rule allows these
attributes it will contain a (local deliminators).
weakDeliminator: list of characters that do not act as word delimiters.
additionalDeliminator: defines additional delimiters.
- DetectChar
Detect a single specific character. Commonly used for example to find the ends of quoted strings.
<DetectChar char="(character)" (common attributes) (dynamic) />
The
char
attribute defines the character to match.- Detect2Chars
Detect two specific characters in a defined order.
<Detect2Chars char="(character)" char1="(character)" (common attributes) />
The
char
attribute defines the first character to match,char1
the second.- AnyChar
Detect one character of a set of specified characters.
<AnyChar String="(string)" (common attributes) />
The
String
attribute defines the set of characters.- StringDetect
Detect an exact string.
<StringDetect String="(string)" [insensitive="true|false"] (common attributes) (dynamic) />
The
String
attribute defines the string to match. Theinsensitive
attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used.- WordDetect
Detect an exact string but additionally require word boundaries such as a dot
'.'
or a whitespace on the beginning and the end of the word. Think of\b<string>\b
in terms of a regular expression, but it is faster than the ruleRegExpr
.<WordDetect String="(string)" [insensitive="true|false"] (common attributes) (local deliminators) />
The
String
attribute defines the string to match. Theinsensitive
attribute defaults to false and is passed to the string comparison function. If the value is true insensitive comparing is used.Since: Kate 3.5 (KDE 4.5)
- RegExpr
Matches against a regular expression.
<RegExpr String="(string)" [insensitive="true|false"] [minimal="true|false"] (common attributes) (dynamic) />
The
String
attribute defines the regular expression.insensitive
defaults to false and is passed to the regular expression engine.minimal
defaults to false and is passed to the regular expression engine.Because the rules are always matched against the beginning of the current string, a regular expression starting with a caret (
^
) indicates that the rule should only be matched against the start of a line.See Regular Expressions for more information on those.
- keyword
Detect a keyword from a specified list.
<keyword String="(list name)" (common attributes) (local deliminators) />
The
String
attribute identifies the keyword list by name. A list with that name must exist.The highlighting system processes keyword rules in a very optimized way. This makes it an absolute necessity that any keywords to be matched need to be surrounded by defined delimiters, either implied (the default delimiters), or explicitly specified within the additionalDeliminator property of the keywords tag.
If a keyword to be matched shall contain a delimiter character, this respective character must be added to the weakDeliminator property of the keywords tag. This character will then loose its delimiter property in all keyword rules. It is also possible to use the weakDeliminator attribute of keyword so that this modification only applies to this rule.
- Int
Detect an integer number (as the regular expression:
\b[0-9]+
).<Int (common attributes) (local deliminators) />
This rule has no specific attributes.
- Float
Detect a floating point number (as the regular expression:
(\b[0-9]+\.[0-9]*|\.[0-9]+)([eE][-+]?[0-9]+)?
).<Float (common attributes) (local deliminators) />
This rule has no specific attributes.
- HlCOct
Detect an octal point number representation (as the regular expression:
\b0[0-7]+
).<HlCOct (common attributes) (local deliminators) />
This rule has no specific attributes.
- HlCHex
Detect a hexadecimal number representation (as a regular expression:
\b0[xX][0-9a-fA-F]+
).<HlCHex (common attributes) (local deliminators) />
This rule has no specific attributes.
- HlCStringChar
Detect an escaped character.
<HlCStringChar (common attributes) />
This rule has no specific attributes.
It matches literal representations of characters commonly used in program code, for example
\n
(newline) or\t
(TAB).The following characters will match if they follow a backslash (
\
):abefnrtv"'?\
. Additionally, escaped hexadecimal numbers such as for example\xff
and escaped octal numbers, for example\033
will match.- HlCChar
Detect an C character.
<HlCChar (common attributes) />
This rule has no specific attributes.
It matches C characters enclosed in a tick (Example:
'c'
). The ticks may be a simple character or an escaped character. See HlCStringChar for matched escaped character sequences.- RangeDetect
Detect a string with defined start and end characters.
<RangeDetect char="(character)" char1="(character)" (common attributes) />
char
defines the character starting the range,char1
the character ending the range.Useful to detect for example small quoted strings and the like, but note that since the highlighting engine works on one line at a time, this will not find strings spanning over a line break.
- LineContinue
Matches a specified char at the end of a line.
<LineContinue (common attributes) [char="\"] />
char
optional character to match, default is backslash ('\'
). New since KDE 4.13.This rule is useful for switching context at end of line. This is needed for example in C/C++ to continue macros or strings.
- IncludeRules
Include rules from another context or language/file.
<IncludeRules context="contextlink" [includeAttrib="true|false"] />
The
context
attribute defines which context to include.If it is a simple string it includes all defined rules into the current context, example:
<IncludeRules context="anotherContext" />
If the string contains a
##
the highlight system will look for a context from another language definition with the given name, for example<IncludeRules context="String##C++" />
would include the context String from the C++ highlighting definition.
If
includeAttrib
attribute is true, change the destination attribute to the one of the source. This is required to make, for example, commenting work, if text matched by the included context is a different highlight from the host context.- DetectSpaces
Detect whitespaces.
<DetectSpaces (common attributes) />
This rule has no specific attributes.
Use this rule if you know that there can be several whitespaces ahead, for example in the beginning of indented lines. This rule will skip all whitespace at once, instead of testing multiple rules and skipping one at a time due to no match.
- DetectIdentifier
Detect identifier strings (as the regular expression:
[a-zA-Z_][a-zA-Z0-9_]*
).<DetectIdentifier (common attributes) />
This rule has no specific attributes.
Use this rule to skip a string of word characters at once, rather than testing with multiple rules and skipping one at a time due to no match.
Once you have understood how the context switching works it will be easy to write highlight definitions. Though you should carefully check what rule you choose in what situation. Regular expressions are very mighty, but they are slow compared to the other rules. So you may consider the following tips.
If you only match two characters use
Detect2Chars
instead ofStringDetect
. The same applies toDetectChar
.Regular expressions are easy to use but often there is another much faster way to achieve the same result. Consider you only want to match the character
'#'
if it is the first character in the line. A regular expression based solution would look like this:<RegExpr attribute="Macro" context="macro" String="^\s*#" />
You can achieve the same much faster in using:
<DetectChar attribute="Macro" context="macro" char="#" firstNonSpace="true" />
If you want to match the regular expression
'^#'
you can still useDetectChar
with the attributecolumn="0"
. The attributecolumn
counts characters, so a tabulator is only one character.In
RegExpr
rules, use the attributecolumn="0"
if the pattern^PATTERN
will be used to match text at the beginning of a line. This improves performance, as it will avoid looking for matches in the rest of the columns.In regular expressions, use non-capturing groups
(?:PATTERN)
instead of capturing groups(PATTERN)
, if the captures will not be used in the same regular expression or in dynamic rules. This avoids storing captures unnecessarily.You can switch contexts without processing characters. Assume that you want to switch context when you meet the string
*/
, but need to process that string in the next context. The below rule will match, and thelookAhead
attribute will cause the highlighter to keep the matched string for the next context.<Detect2Chars attribute="Comment" context="#pop" char="*" char1="/" lookAhead="true" />
Use
DetectSpaces
if you know that many whitespaces occur.Use
DetectIdentifier
instead of the regular expression'[a-zA-Z_]\w*'
.Use default styles whenever you can. This way the user will find a familiar environment.
Look into other XML files to see how other people implement tricky rules.
You can validate every XML file by using the command validatehl.sh language.xsd mySyntax.xml. The files
validatehl.sh
andlanguage.xsd
are available in Syntax Highlighting repository.If you repeat complex regular expression very often you can use ENTITIES. Example:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE language SYSTEM "language.dtd" [ <!ENTITY myref "[A-Za-z_:][\w.:_-]*"> ]>
Now you can use &myref; instead of the regular expression.