Username:
Password:
    Forgot your password?
Member Login

Mapping Old Links into Sitellite

Notes

Chat Loading chat status
  • Please subscribe to chat.
  • Older messages can be viewed in the chat archive.

Subscribe  |  The Lounge  |  Share Lesson

Chapter 2: Taking Stock

The first step is to identify all of the old pages that need to be rerouted. An easy way to retrieve the list of active pages from your former site is to analyze your Apache access logs. In the following code, we'll be parsing through our Apache access log to find all of the .html and .htm pages and save them to a text file named old_links.txt.

Note that if your web site uses a different file extension for web pages you need to reroute, you can add as many file extensions as you want to the $ext list at the top of the script.

Also note that the path I've specified for my access log may not be correct for your web site. Check your server configuration or check with your web server administrator to find out the correct path for your site.

<?php

// valid file extensions to map
$ext = array ('html', 'htm');

// open the access log file
$fp = fopen ('/var/log/httpd/access_log', 'r');
if (! $fp) {
	die ('File open failed!');
}

$pages = array ();

// read the access log
while (! feof ($fp)) {
	$line = fgets ($fp, 4096);

	// parse the access log for relevant page requests
	list ($pre, $request, $post) = explode ('"', $line);
	if (preg_match (
		'/^(GET|POST) (.+)\.(' . join ('|', $ext) . ') HTTP\/1/i',
		$request,
		$regs
	)) {
		$pages[] = $regs[2] . '.' . $regs[3];
	}
}

fclose ($fp);

// save the links
$fp = fopen ('old_links.txt', 'w');
if (! $fp) {
	die ('File write failed!');
}

foreach (array_unique ($pages) as $page) {
	fwrite ($fp, $page . "\n");
}

fclose ($fp);

?>

You should be able to run this by saving it to a file called parse_log.php in your web site document root, then issuing the following command on your command line:

php -f parse_log.php
Once you've successfully created the old_links.txt file, you won't need this script again, so you may delete it if you like.

Chapter 3: Mapping the Old Site »