Supercazzola is my own scraper tar pit, designed to generate dynamically an endless graph of webpages.
I wrote it with the purpose of poisoning web crawlers that ignore my robots.txt
This software requires pkg-config and libevent as dependencies. It has been tested to work under GNU/Linux and FreeBSD.
The following instructions refer to the provisioning and installation under FreeBSD systems, but they can be easily adapted to other operating systems (e.g. GNU/Linux).
Future software version might improve the procedure and make it more generic.
root@freebsd:~ # pkg install -y devel/pkgconf devel/libevent
root@freebsd:~ # tar -xzf supercazzola-*.tar.gz root@freebsd:~ # make -C supercazzola-*/ # ... root@freebsd:~ # make -C supercazzola-*/ install # ...
Get some long text, e.g. Frankenstein from Gutenberg.org and turn it into a markov chain:
root@freebsd:~ # fetch 'https://www.gutenberg.org/ebooks/84.txt.utf-8' 84.txt.utf-8 438 kB 589 kBps 00s root@freebsd:~ # mkdir /usr/local/share/spamd root@freebsd:~ # supercazzola-*/mchain ./84.txt.utf-8 /usr/local/share/spamd/mkvchain mchain: number of states: 42181 (build-time max: 81920) mchain: number of edges: 65106 mchain: spamd(8) mallocs: 858296 bytes
root@freebsd:~ # service spamd enable spamd enabled in /etc/rc.conf root@freebsd:~ # service spamd start Starting spamd.
root@freebsd:~ # tail -n2 /var/log/daemon.log Dec 8 23:53:23 freebsd spamd[3500]: listening on localhost:7180 Dec 8 23:53:23 freebsd spamd[3500]: listening on localhost:7181
Tune in to your server, or (intended use) do a reverse proxy towards it with whichever web-server you run.