Setting up a mirror site for E-JC
Setting up a mirror site for E-JC
by Brendan McKay
This page describes how to set up a mirror site for E-JC on
a UNIX computer (including Linux).
Naturally, you need a computer with a http server running, a
reasonable amount of disk space (about 400MB will suffice at
March 2002, but allow at least 600MB for expansion), and a
base directory for E-JC in a place where the http server can see it.
To explain directories a bit more:
When I am logged into the computer where my mirror is, the base
directory for E-JC is called /cs/pub/publications/eljc,
but when people use their Web browser to look in from the
outside it appears to be called /publications/eljc.
That is because our http server has been told that /cs/pub
is the root directory for Web access.
In my description below, I will use /LOCALDIR to mean the
local name of the directory (such as /cs/pub/publications/eljc),
and /HTTPDIR to mean the http name
(such as /publications/eljc).
You have to substitute the values for your own site wherever I mention
those two names.
If you want other people to use your mirror, send a note to our
Managing Editor and we
will advertise it for you.
You have two choices for collecting the E-JC files: HTTP and FTP.
The first is recommended, but we will describe both.
Mirroring via HTTP
The recommended tool for this is wget. If you don't
already have it, you can fetch it from
ftp://prep.ai.mit.edu/pub/gnu/wget.
Installing it on most Unix systems is very simple: just unpack the
archive, type configure then type make.
You might want to ask your system manager to install it in a
standard place.
The process of using wget is very simple:
- Fetch the file wgetejc
and make it executable (chmod +x wgetejc).
- Edit wgetejc to replace the string "/LOCALDIR"
by the name of your E-JC base directory.
- If your access to the internet must be via a local proxy
server, create a file .wgetrc (including the dot) in your
home directory, containing lines like these:
proxy = on
http_proxy = my.proxy.com
proxy_user = my-proxy-username
proxy_passwd = my-proxy-password
Obviously, you have to set those variables to the correct values for
your site. If your proxy server doesn't need a username or password,
leave out the last two lines. If you can access the internet directly
(without going through a proxy server), don't make .wgetrc at all.
- Now you can just execute wgetejc to start collecting
files from the E-JC main site. Of course it will take a very long time the
first time you use it because there are many files. Maybe quite some hours.
- After the first time, executing wgetejc will only collect
the files that are new or changed, but since it must ask for the modification
time of every file it will still take an hour or so.
A log of the downloads will appear in /LOCALDIR/wget.log.
- To make fetching of new files automatic, you can arrange for
wgetejc to be automatically executed every night.
For example, the line
25 2 * * * (date; /LOCALDIR/wgetejc) >>getem.log 2>&
in your crontab (see crontab(1)) will cause wgetejc to be run
at 2:25am each night, with the file
getem.log in your home directory receiving any error messages.
- The alternative script wgetejc8 will
only update the contents of Volume 8, in case you want to do that more often.
Mirroring via FTP
An alternative is to use FTP to collect the E-JC files.
This is more complicated to set up but has the advantage
of being quite a lot faster than wget.
[Howver, if you are running the mirror software overnight, who
cares how long it takes?]
The method I will describe uses a clever perl script
written by Leo Novik of the Weizmann Institute, Israel.
You need the program perl, but these days there is barely
a UNIX system without it.
Here goes...
- Go to /LOCALDIR.
- Fetch the script update.pl,
and rename it as getem.pl.
Check that the location for Perl that appears on the
first line is correct. The UNIX command "which perl"
might tell you where Perl is.
- Create a shell script getem like this:
#!/bin/sh
cd /LOCALDIR
cp timestamp timestamp_save
./getem.pl ftp.combinatorics.org /pub/ejc/Journal -stamp timestamp \
get /LOCALDIR
if egrep -s 1900 timestamp ; then
mv timestamp_save timestamp
echo "replacing timestamp with previous version"
fi
find . -o -type d -exec chmod 755 {} \; -o -exec chmod 644 {} \;
- Create a file timestamp containing these six lines:
1990
Jan
1
12:00
http:/Journal
http:/HTTPDIR
- Now you should have three files, getem.pl,
getem and timestamp.
Make sure the first two are executable
(chmod +x getem.pl getem).
- Execute getem and wait... .
- Keep waiting.
- Unless something is wrong, this will copy all of the files from
the master site in Pennsylvania to your machine. If you are far
away from Pennsylvania, it might take you hours. And hours.
Fortunately, this only has to be done once.
- If it finally finishes, you should probably test it.
- Arrange for getem to be executed periodically. It will
never take as long as the first time, but will just copy over any
new stuff.
What you need to do is rather system dependent;
I did it by putting this entry in my crontab:
21 8,21 * * * (date; /LOCALDIR/getem) >>getem.log 2>&1
A log appears in the file getem.log in my home directory.
A tiny bit of explanation.
The first four lines of timestamp contain a date and time
in the timezone of the master site in Pennsylvania.
What getem.pl does is to connect to Pennsylvania by FTP and fetch
any file whose creation time is later than that.
Then getem.pl edits timestamp to contain the
creation time of the most recent file it copied.
The last two lines in timestamp tell getem.pl how
to edit html files so that http addresses valid at Pennsylvania will
be valid at your site instead.
(We attempt to avoid site-specific addresses anyway.)
If something goes wrong, for example FTP times out during a file
transfer, you can always get back on track by manually setting back
the date in timestamp.
Comments on this description are welcome.
Happy mirroring!
Brendan McKay. bdm@cs.anu.edu.au