CROSSBOW(7) Miscellaneous Information Manual (urm) CROSSBOW(7)

crossbow-cookbookexamples of handling feeds

crossbow set [...]

This manual page contains short recipes describing common usage patterns for the crossbow feed aggregator.

In all the following examples we will assume that the $ID environment variable is defined as an arbitrary feed identifier, and that the $URL environment variable is defined as the feed URL.

We want a periodic bulk notification about updates availability.

The following feed set up can be used for this purpose:

crossbow set -i "$ID" -u "$URL" \
  -o pretty \
  -f "updates from $ID:\n title: %t\n link: %l\n"

The invocation of crossbow-fetch(1) will emit on stdout(3) a "record" like the following for each new item:

updates from foobar:
 title: Today is a good day
 link: http://example.com/today-is-a-good-day

The user can schedule on cron(8) a periodic invocation of crossbow-fetch(1). Assuming that local mail delivery is enabled, and since any output of a cronjob is emailed to the owner of the crontab(5), the user will receive an email having as body the concatenation of the records.

Let's consider the case of a feed for which the item's description field reports the whole article in HTML format. Individual articles need to be stored in a separate HTML file under a certain directory on the filesystem.

The following feed set up can be used for this purpose:

crossbow set -i "$ID" -u "$URL" \
  -o pipe \
  -f "sed -n w%n.html" \
  -C /some/destination/path/

The invocation of crossbow-fetch(1) will spawn one sed(1) process for each new item. The item description will be piped to sed(1), which in turn will write it on a file (w command). The output files will be named , , , since %n is expanded with an incremental numeric value. See crossbow-outfmt(5).

Security remark: Unless the feed is , it is strongly discouraged to use anything but %n to name files. Consider for example the case where is used instead of %n, and the title of a post is

Security remark: We are using the w command of sed(1) to write to a file. It is not possible to use shell redirection since sub-commands are never executed through a shell interpreter. Invoking a shell interpreter from a command template is strongly discouraged, since the placeholders would be directly mixed with the shell script, and doing proper shell escaping against untrusted input is really hard, if not impossible. It is on the other hand safe to invoke a shell script whose code lives in a file and pass parameters to it. See crossbow-outfmt(5).

This scenario is similar to the previous one, except that the item description contains only part of the content, or nothing at all. The link field contains a valid URL, which is intended to be reached by means of a browser.

In this case we can leverage curl(1) to do the retrieval:

crossbow set -i "$ID" -u "$URL" \
  -o subproc \
  -f "curl -o %n.html %l"
  -C /some/destination/path/

Remark: Placeholders such as %n and do not need to be quoted: they are handled safely even when their expansions contain whitespaces.

We want to turn individual feed items into plain (HTML-free) text messages delivered via email.

Our goal can be achieved by means of a generic shell script like the following:

#!/bin/sh

set -e

feed_title="$1"
post_title="$2"
link="$3"

lynx "${link:--stdin}" -dump -force_html |
    sed "s/^~/~~/" |    # Escape dangerous tilde expressions
    mail -s "${feed_title:+${feed_title}: }${post_title:-...}" "${USER:?}"

The script can be installed in the PATH, e.g. as , and then integrated in crossbow(1) as follows:

  • If the feed provides the whole content as item description:
    crossbow set -i "$ID" -u "$URL" \
        -o pipe \
        -f "crossbow-to-mail %ft %t"
  • If the feed provides only the URL of the article as item link:
    crossbow set -i "$ID" -u "$URL" \
        -o subproc \
        -f "crossbow-to-mail %ft %t %l"

Remark: The script depends on the excellent lynx(1) browser to download and parse the HTML into textual form.

Security remark: The "s/^~/~~/" sed(1) command prevents to be honored by unsafe implementations of mail(1). The mutt(1) mail user agent, if available, can be used as a safer drop-in replacement.

The YouTube site provides feeds for users, channels and playlists. Each of these entities is assigned with a unique identifier which can be easily obtained by looking at the web URL.

Once the user, channel or playlist identifier is known, it is trivial to obtain the corresponding feeds:

It is possible to combine crossbow(1) with the youtube-dl(1) tool, to maintain up to date a local collection of video or audio files.

What follows is a convenient wrapper script that ensures proper file naming:

#!/bin/sh

link="${1:?mandatory argument missing: link}"
incremental_id="${2:?mandatory argument missing: incremental id}"
format="$3"

# Transform a title in a reasonably safe 'slug'
slugify() {
    tr -d \\n |                     # explicitly drop new-lines
    tr /[:punct:][:space:] . |      # turn all sneaky chars into dots
    tr -cs [:alnum:]                # squeeze ugly repetitions
}

fname="$(
    youtube-dl \
        --get-filename \
        -o "%(id)s_%(title)s.%(ext)s" \
        "$link"
)" || exit 1

youtube-dl \
    ${format:+-f "$format"} \
    -o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \
    --no-progress \
    "$link"

Once again, the script can be installed in the PATH, e.g. as And then integrated in crossbow(1) as follows:

  • To save each published item:
    crossbow set -i "$ID" -u "$URL" \
        -o subproc \
        -f "crossbow-ytdl %l %n" \
        -C /some/destination/path
  • To save each published item as audio:
    crossbow set -i "$ID" -u "$URL" \
        -o subproc \
        -f "crossbow-ytdl %l %n bestaudio" \
        -C /some/destination/path

crossbow-fetch(1), crossbow-set(1), lynx(1), sed(1), youtube-dl(1), crontab(5), cron(8)

Giovanni Simoni <dacav@fastmail.com>

July 11, 2020