CROSSBOW(7) | Miscellaneous Information Manual (urm) | CROSSBOW(7) |
crossbow-cookbook
—
examples of handling feeds
crossbow set |
[...] |
This manual page contains short recipes describing common usage
patterns for the crossbow
feed aggregator.
In all the following examples we will assume that the
$ID
environment variable is defined as an arbitrary
feed identifier, and that the $URL
environment
variable is defined as the feed URL.
We want a periodic bulk notification about updates availability.
The following feed set up can be used for this purpose:
crossbow set -i "$ID" -u "$URL" \ -o pretty \ -f "updates from $ID:\n title: %t\n link: %l\n"
The invocation of crossbow-fetch(1) will emit on stdout(3) a "record" like the following for each new item:
updates from foobar: title: Today is a good day link: http://example.com/today-is-a-good-day
The user can schedule on cron(8) a periodic invocation of crossbow-fetch(1). Assuming that local mail delivery is enabled, and since any output of a cronjob is emailed to the owner of the crontab(5), the user will receive an email having as body the concatenation of the records.
Let's consider the case of a feed for which the item's description field reports the whole article in HTML format. Individual articles need to be stored in a separate HTML file under a certain directory on the filesystem.
The following feed set up can be used for this purpose:
crossbow set -i "$ID" -u "$URL" \ -o pipe \ -f "sed -n w%n.html" \ -C /some/destination/path/
The invocation of crossbow-fetch(1) will spawn one sed(1) process for each new item. The item description will be piped to sed(1), which in turn will write it on a file (w command). The output files will be named 000000.html, 000001.html, 000002.html ..., since %n is expanded with an incremental numeric value. See crossbow-outfmt(5).
Security remark: Unless the feed is trusted, it is strongly discouraged to use anything but %n to name files. Consider for example the case where %t is used instead of %n, and the title of a post is ../../../../home/user/public_html/index
Security remark: We are using the w command of sed(1) to write to a file. It is not possible to use shell redirection since sub-commands are never executed through a shell interpreter. Invoking a shell interpreter from a command template is strongly discouraged, since the placeholders would be directly mixed with the shell script, and doing proper shell escaping against untrusted input is really hard, if not impossible. It is on the other hand safe to invoke a shell script whose code lives in a file and pass parameters to it. See crossbow-outfmt(5).
This scenario is similar to the previous one, except that the item description contains only part of the content, or nothing at all. The link field contains a valid URL, which is intended to be reached by means of a browser.
In this case we can leverage curl(1) to do the retrieval:
crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "curl -o %n.html %l" -C /some/destination/path/
Remark: Placeholders such as %n and %l do not need to be quoted: they are handled safely even when their expansions contain whitespaces.
We want to turn individual feed items into plain (HTML-free) text messages delivered via email.
Our goal can be achieved by means of a generic shell script like the following:
#!/bin/sh set -e feed_title="$1" post_title="$2" link="$3" lynx "${link:--stdin}" -dump -force_html | sed "s/^~/~~/" | # Escape dangerous tilde expressions mail -s "${feed_title:+${feed_title}: }${post_title:-...}" "${USER:?}"
The script can be installed
in the PATH
, e.g. as
/usr/local/bin/crossbow-to-mail,
and then integrated in crossbow(1)
as follows:
crossbow set -i "$ID" -u "$URL" \ -o pipe \ -f "crossbow-to-mail %ft %t"
crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "crossbow-to-mail %ft %t %l"
Remark: The crossbow-to-mail script depends on the excellent lynx(1) browser to download and parse the HTML into textual form.
Security remark: The "s/^~/~~/" sed(1) command prevents tilde escapes to be honored by unsafe implementations of mail(1). The mutt(1) mail user agent, if available, can be used as a safer drop-in replacement.
The YouTube site provides feeds for users, channels and playlists. Each of these entities is assigned with a unique identifier which can be easily obtained by looking at the web URL.
Once the user, channel or playlist identifier is known, it is trivial to obtain the corresponding feeds:
It is possible to combine crossbow(1) with the youtube-dl(1) tool, to maintain up to date a local collection of video or audio files.
What follows is a convenient wrapper script that ensures proper file naming:
#!/bin/sh link="${1:?mandatory argument missing: link}" incremental_id="${2:?mandatory argument missing: incremental id}" format="$3" # Transform a title in a reasonably safe 'slug' slugify() { tr -d \\n | # explicitly drop new-lines tr /[:punct:][:space:] . | # turn all sneaky chars into dots tr -cs [:alnum:] # squeeze ugly repetitions } fname="$( youtube-dl \ --get-filename \ -o "%(id)s_%(title)s.%(ext)s" \ "$link" )" || exit 1 youtube-dl \ ${format:+-f "$format"} \ -o "$(printf %s_%s "$incremental_id" "$fname" | slugify)" \ --no-progress \ "$link"
Once again, the script can be
installed in the PATH
, e.g. as
/usr/local/bin/crossbow-ytdl
And then integrated in crossbow(1)
as follows:
crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "crossbow-ytdl %l %n" \ -C /some/destination/path
crossbow set -i "$ID" -u "$URL" \ -o subproc \ -f "crossbow-ytdl %l %n bestaudio" \ -C /some/destination/path
crossbow-fetch(1), crossbow-set(1), lynx(1), sed(1), youtube-dl(1), crontab(5), cron(8)
Giovanni Simoni <dacav@fastmail.com>
July 11, 2020 |