SiteRank | TheNoodleman.ca
/---------------------------------------------------------------------------------\
| SiteRank -- the colour scheme on this site makes me think of Halloween!          |
|                                                                                  |
| This is a new idea that I had while trying to fall asleep last night. The        |
| thought started as just a place for me to put links to all my little websites    |
| and projects that I was working on, but I decided that was too boring.           |
|                                                                                  |
| Then I said to myself (not out loud), "Hey, it would be cool if it showed each   |
| site in the order of 'most recently updated'." I toyed around with that idea,    |
| and realised that this thing didn't have to be restricted to my websites, but    |
| it could work with other websites too.                                           |
|                                                                                  |
| Its not implemented yet, but what I'm thinking is when you add a site to be      |
| 'monitored' it will copy the index page and take an 'md5 sum' of it (because     |
| it'll be smaller than actually storing the page. Then at a regular interval      |
| the script (or maybe a small daemon that I write that will reside on the         |
| server) will check each site and take an md5 sum of them and compare it          |
| against the stored sum. If its been changed then the database will be updated    |
| to reflect this, and the new md5 sum will be stored.                             |
|                                                                                  |
| Then I will be able to track statistics about these pages like, Most Recently    |
| Updated, Most Frequently Updated, Most Updates Ever (since being added).         |
|                                                                                  |
| I think this is going to be a lot of fun, and I look forward to when I finally   |
| stop procrastinating and work on it. But until then here is a list of all my     |
| sites and projects that I'm working on:                                          |
|                                                                                  |
| - TheNoodleman.ca                                                                |
| - TheDump                                                                        |
| - Simple PHP Photo Gallery                                                       |
|                                                                                  |
| If I actually end up implementing this, and it becomes some kind of internet     |
| phenomenon, I'm going to need a domain for it. Plus, for the purposes now, I     |
| need a better name than SiteRank -- it doesn't really reflect what the page is   |
| actually going to do. So here are a few ideas I have:                            |
|                                                                                  |
| - UpdateTHIS.net (.com is already registered)                                    |
|                                                                                  |
|                                                                                  |
|                                                                                  |
|                                                                                  |
|                                                                                  |
| check out my colossal failure and his esteemed brother ginormous failure... it   |
| can't handle dynamic content that is set on the serverside. Looks like its       |
| back to the drawing board                                                        |
| REVIVAL!!! I seem to have solved one problem... next problem on the list,        |
| dealing with pages with frames where the content of interest is in a frame.      |
| Chizzek it out.                                                                  |
| ---                                                                              |
| Next Step in the right direction...                                              |
| Worked out a better way to analyse the pages, but it is more cpu intensive, or   |
| less efficient. But its more logical. Need to do some research to see if PERL    |
| might be a better choice based on the specific needs of this project             |
| ---                                                                              |
| This info is a pain to maintain                                                  |
| I've cleaned up test4 quite a bit, and broke it into files. Everything is a      |
| lot more organised now, which helps with the work. I've just got a couple        |
| lingering bugs that are disagreeing with me. Need a way to keep an eye on        |
| things. More info in the notes at the bottom of this page.                       |
\---------------------------------------------------------------------------------/

/---------------------------------------------------------------------------------\ | Changelog and Version notes: | | (first couple test files (test.php, test2.php) don't really count, these notes | | start from test3.php) | | (Project started on June 23, 2005) | | | | super/test/testcomp.php | | ========== | | - Giant leap forward! All that is left is some slight tweaking to the | | algorithm, then sit back and collect data to make sure it does what is intended | | - Next on plate: | | ______* Design the website | | ______* User database | | ______* Write server daemon to handle file comparisons and DB updates | | | | | | super/index.php | | ========== | | - I've lost complete track of everything, but I've got stuff working, and it | | feels good. Just need to iron out some things. | | * Changelog will no longer be changed. Its a whole job unto itself, and I | | don't care enough to do it. | | | | test4.php | | ========== | | - New method used to analyse page... no more guessing or 'hoping' that 30% is | | good enough | | - (details will not be given, this is considered sensitive info) | | - Cannot detect changes on a page where every line on the page contains | | dynamic content that changes everytime the page is reloaded (like this) | | | | -- Handles dynamic content on thegoge.com perfectly now, no more surprises | | -- still need to handle pages with frames | | -- Will consider updates to be 'real' based on user's desire for number of | | changes. | | | | | | test3.php | | ========== | | - Removes first and last 30% of the sourcefile (hoping to get rid of any | | serverside dynamic ads). | | - Removes numbers to remove any dynamic 'number' functions (clocks, percent | | calculations, etc) | | | | It can detect changes such as: | | - New newsposts | | - edited newsposts | | - tiny spelling corrections (even only 1 character) | | - ignores changes where the new source file looks identical to a previous | | source file (thinking of removing this) | | - small changes in minor parts of the webpage (some people might not consider | | them important or 'update worthy' changes, but they are changes nonetheless | | and therefore are detected | \---------------------------------------------------------------------------------/