CROSSBOW-FORMAT(5) | File Formats Manual (urm) | CROSSBOW-FORMAT(5) |
crossbow-format
—
format string reference
The crossbow(1) feed aggregator processes each new entry by means of a handler, according to the feed settings in the crossbow.conf(5) configuration file.
The print
handler prints a textual
representation of an entry on
stdout(3). The optional
format
setting can define a template to be used in
place of the default. The exec
and
pipe
handlers process individual entries by invoking
an external program. They both require the definition of a
command
setting, to expand into the argument vector
of the invoked subprocess.
The values of format
and
command
are interpreted as format strings: they are
allowed to contain placeholders that are replaced with the fields of the
processed entry. All the available placeholders are listed below (see
Supported
placeholders).
The placeholders syntax resembles the printf(3) function, with some differences and simplifications:
%
’ character can only be
followed by one or more alphabetic characters. There is no support for
field width or other modifiers.%
’ character is represented
with ‘\%
’ instead of
‘%%
’.\
’ escapes
the subsequent character:
\n
’ (new line),
‘\r
’ (carriage return),
‘\t
’ (horizontal tab) and
‘\\
’ (literal backslash).exec
or
pipe
, the
‘\
’ string is
interpreted as space character, as opposed to a non-escaped
white-space that is interpreted as argument separator. Escaping
white-spaces has no effect if the handler is
print
, since in this case they do not have any
special meaning.\:
’ string is interpreted
as a zero width break (see Zero
width break below).In the context of the exec
or
pipe
handlers, the argument vector of a subprocess
is constructed on the command
setting.
The value is split on white-spaces into an intermediate array of tokens. When a new entry is processed, each element of the intermediate array is further evaluated for placeholder expansion. The obtained array is then used as argument vector for the subprocess. The first of its elements determines the command to execute.
Details:
PATH
environment variable is honoured.argv[0]
’ corresponds to the name of
the executable.The language recognized by the output format parser allows the
placeholders to be composed by multiple characters. While this feature makes
it easier to have mnemonic placeholders (such as
‘%a
’ for "Author" and
‘%am
’ for "Author eMail"),
it introduces some additional edge cases.
The zero width break sequence
‘\:
’ has been introduced to cover a
case of ambiguity which can be easily explained by means of an example.
Let ‘%x
’ and
‘%xn
’ be valid placeholders. In such
case obtaining the expansion of ‘%x
’
followed by a literal ‘n
’ would be
impossible, as the sequence ‘%x\n
’
would be rendered as the expansion of
‘%x
’ followed by a new-line, while
‘%xn
’ would be rendered as the
expansion of ‘%xn
’. Using the
backslash would have worked if ‘\n
’
wasn't a recognized escape sequence.
The zero width break can be used to force the termination of an escape sequence, so that whatever follows can be interpreted independently of it.
The behaviour is summarized by the following table,
where
expand
(x)
expresses the expansion of the ‘%x
’
placeholder into the string representation of the corresponding
field,
and the "." operator expresses string concatenation.
Format | Expansion | Notes |
%x | expand (x) |
|
%xn | expand (xn) |
|
%x\m | expand (x)
. "m" |
Works because ‘\m ’ is not a
recognized escape sequence. |
%x\n | expand (x)
. "\n" |
|
%x\:n | expand (x)
. "n" |
Additionally, although not originally intended for this purpose, the zero width break can be used to pass empty strings as subprocess arguments. This is demonstrated in the following example, where the configuration prints the entry author, followed by a space character, and by the entry title:
feed foobar url https://example.conf/feed.xml handler exec command printf \%s\%s\%s\\n %a \: %t
Note:
the literal ‘%s
’ is intended to be
interpreted by the printf(1) command.
The corresponding percent character is escaped, since it is meant to be a
literal percent, and not to be expanded by
crossbow(1).
The following table shows the supported placeholders and the corresponding entry properties for the RSS and Atom feed formats.
Depending on the feed format, some placeholders may refer to unavailable entry properties. If this is the case they are expanded with an empty string, with some exceptions (see Notes).
P.holder | RSS | Atom |
%a | - | author.name |
%am | author | author.email |
%au | - | author.uri |
%ca | category[0] (1) | - |
%co | comments | - |
%c | content:encoded (2) | content |
%g | guid | id |
%gp | guid_isPermaLink (3) | - (4) |
%l | link | link[0] (1) |
%pd | pubDate | published |
%s | description | summary |
%t | title | title |
Notes:
<content:encoded>
’
element of RSS is an extension provided by the
http://purl.org/rss/1.0/modules/content/
XML namespace. If such namespace is not enabled for the feed, the
‘%c
’ placeholder will be expanded
with the regular ‘<content>
’
tag of RSS.0
’ or
‘1
’.<id>
’ element conveys a
permanent, universally unique identifier.
‘%gp
’ is therefore undefined, but
always expanded as ‘1
’.Some additional placeholders, not referring to any entry field, are also available:
P.holder | Description |
%fi | The unique name of the feed. |
%ft | The feed title |
%n | A per-feed zero-padded six digits incremental number. |
The incremental number expanded in place of
‘%n
’ is initialized to zero for new
feeds, and gets incremented for every new feed entry. This is an important
security feature: the value of this number is not controlled by the feed
content, thus it can be used safely as filename. The value is persisted
across executions, and incremented even if the item was not successfully
processed, so that the same value is never used twice.
Since the exec
and
pipe
handlers process entries by passing parameters
to a subprocess, it is important to keep security in mind when configuring
the corresponding command
setting.
Consider, for example, the following configuration:
feed foobar url https://example.conf/feed.xml handler exec command sh -c echo\ "%t"\ |\ wc\ -c
The provided command is dangerous in that the entry title,
expanded in place of ‘%t
’, might
be exploited by a malicious XML like the following:
<?xml version="1.0" encoding="utf8"?> <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> <item> <guid>inject</guid> <title>"; echo pwned; echo "</title> </item> </channel> </rss>
The correct way of achieving the same result consists in moving the code into a shell script, and making the entry properties available to it by means of a safe, uninterpreted, parameter passing:
feed foobar url https://example.conf/feed.xml handler exec command /usr/local/bin/count_bytes %t
feed foobar url https://example.conf/feed.xml handler pipe command sed -n w%t
This is an effective (yet dangerous) way of dumping the entry contents into a file named after the entry title. A specially crafted XML can exploit a similar configuration to attempt the replacement of sensitive files:
<?xml version="1.0" encoding="utf8"?> <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> <item> <guid>shenanigans</guid> <title>.bashrc</title> <description>echo pwned</description> </item> <item> <guid>shenanigans_1</guid> <title>../.bashrc</title> <description>echo pwned</description> </item> <item> <guid>shenanigan's_2</guid> <title>../../.bashrc</title> <description>echo pwned</description> </item> ... </channel> </rss>
The correct way of achieving the same result consists in using
the ‘%n
’ placeholder in place of
‘%t
’, obtaining a safer (although
admittedly less descriptive) file naming.
See crossbow-cookbook(7).
Giovanni Simoni <dacav@fastmail.com>
September 30, 2021 |