Supercazzola

Generate spam for web scrapers

Supercazzola is my own scraper tar pit, designed to generate dynamically an endless graph of webpages.

I wrote it with the purpose of poisoning web crawlers that ignore my robots.txt

This software requires pkg-config and libevent as dependencies. It has been tested to work under GNU/Linux and FreeBSD.

How-To

The following instructions refer to the provisioning and installation under FreeBSD systems, but they can be easily adapted to other operating systems (e.g. GNU/Linux).

Future software version might improve the procedure and make it more generic.

  1. Build the software
  2. Create and install markov chain

    Get some long text, e.g. Frankenstein from Gutenberg.org and turn it into a markov chain:

    root@freebsd:~ # fetch 'https://www.gutenberg.org/ebooks/84.txt.utf-8'
    84.txt.utf-8                                           438 kB  589 kBps    00s
    root@freebsd:~ # mkdir /usr/local/share/spamd
    root@freebsd:~ # supercazzola-*/mchain ./84.txt.utf-8 /usr/local/share/spamd/mkvchain
    mchain: number of states:  42181 (build-time max: 81920)
    mchain: number of edges:   65106
    mchain: spamd(8) mallocs:  858296 bytes
    
  3. Start spamd
  4. Enjoy some spam

    Tune in to your server, or (intended use) do a reverse proxy towards it with whichever web-server you run.