/---------------------------------------------------------------------------------\
| SiteRank -- the colour scheme on this site makes me think of Halloween! |
| |
| This is a new idea that I had while trying to fall asleep last night. The |
| thought started as just a place for me to put links to all my little websites |
| and projects that I was working on, but I decided that was too boring. |
| |
| Then I said to myself (not out loud), "Hey, it would be cool if it showed each |
| site in the order of 'most recently updated'." I toyed around with that idea, |
| and realised that this thing didn't have to be restricted to my websites, but |
| it could work with other websites too. |
| |
| Its not implemented yet, but what I'm thinking is when you add a site to be |
| 'monitored' it will copy the index page and take an 'md5 sum' of it (because |
| it'll be smaller than actually storing the page. Then at a regular interval |
| the script (or maybe a small daemon that I write that will reside on the |
| server) will check each site and take an md5 sum of them and compare it |
| against the stored sum. If its been changed then the database will be updated |
| to reflect this, and the new md5 sum will be stored. |
| |
| Then I will be able to track statistics about these pages like, Most Recently |
| Updated, Most Frequently Updated, Most Updates Ever (since being added). |
| |
| I think this is going to be a lot of fun, and I look forward to when I finally |
| stop procrastinating and work on it. But until then here is a list of all my |
| sites and projects that I'm working on: |
| |
| - TheNoodleman.ca |
| - TheDump |
| - Simple PHP Photo Gallery |
| |
| If I actually end up implementing this, and it becomes some kind of internet |
| phenomenon, I'm going to need a domain for it. Plus, for the purposes now, I |
| need a better name than SiteRank -- it doesn't really reflect what the page is |
| actually going to do. So here are a few ideas I have: |
| |
| - UpdateTHIS.net (.com is already registered) |
| |
| |
| |
| |
| |
| check out my colossal failure and his esteemed brother ginormous failure... it |
| can't handle dynamic content that is set on the serverside. Looks like its |
| back to the drawing board |
| REVIVAL!!! I seem to have solved one problem... next problem on the list, |
| dealing with pages with frames where the content of interest is in a frame. |
| Chizzek it out. |
| --- |
| Next Step in the right direction... |
| Worked out a better way to analyse the pages, but it is more cpu intensive, or |
| less efficient. But its more logical. Need to do some research to see if PERL |
| might be a better choice based on the specific needs of this project |
| --- |
| This info is a pain to maintain |
| I've cleaned up test4 quite a bit, and broke it into files. Everything is a |
| lot more organised now, which helps with the work. I've just got a couple |
| lingering bugs that are disagreeing with me. Need a way to keep an eye on |
| things. More info in the notes at the bottom of this page. |
\---------------------------------------------------------------------------------/
/---------------------------------------------------------------------------------\
| Changelog and Version notes: |
| (first couple test files (test.php, test2.php) don't really count, these notes |
| start from test3.php) |
| (Project started on June 23, 2005) |
| |
| super/test/testcomp.php |
| ========== |
| - Giant leap forward! All that is left is some slight tweaking to the |
| algorithm, then sit back and collect data to make sure it does what is intended |
| - Next on plate: |
| ______* Design the website |
| ______* User database |
| ______* Write server daemon to handle file comparisons and DB updates |
| |
| |
| super/index.php |
| ========== |
| - I've lost complete track of everything, but I've got stuff working, and it |
| feels good. Just need to iron out some things. |
| * Changelog will no longer be changed. Its a whole job unto itself, and I |
| don't care enough to do it. |
| |
| test4.php |
| ========== |
| - New method used to analyse page... no more guessing or 'hoping' that 30% is |
| good enough |
| - (details will not be given, this is considered sensitive info) |
| - Cannot detect changes on a page where every line on the page contains |
| dynamic content that changes everytime the page is reloaded (like this) |
| |
| -- Handles dynamic content on thegoge.com perfectly now, no more surprises |
| -- still need to handle pages with frames |
| -- Will consider updates to be 'real' based on user's desire for number of |
| changes. |
| |
| |
| test3.php |
| ========== |
| - Removes first and last 30% of the sourcefile (hoping to get rid of any |
| serverside dynamic ads). |
| - Removes numbers to remove any dynamic 'number' functions (clocks, percent |
| calculations, etc) |
| |
| It can detect changes such as: |
| - New newsposts |
| - edited newsposts |
| - tiny spelling corrections (even only 1 character) |
| - ignores changes where the new source file looks identical to a previous |
| source file (thinking of removing this) |
| - small changes in minor parts of the webpage (some people might not consider |
| them important or 'update worthy' changes, but they are changes nonetheless |
| and therefore are detected |
\---------------------------------------------------------------------------------/
|