TP-Docs
HTML5 Icon HTML5 Icon HTML5 Icon
TP on Social Media

Recent

Welcome to TinyPortal. Please login or sign up.

April 19, 2024, 10:25:45 AM

Login with username, password and session length
Members
  • Total Members: 3,885
  • Latest: Growner
Stats
  • Total Posts: 195,164
  • Total Topics: 21,219
  • Online today: 266
  • Online ever: 3,540 (September 03, 2022, 01:38:54 AM)
Users Online
  • Users: 0
  • Guests: 77
  • Total: 77

Word cloud

Started by JPDeni, August 04, 2006, 08:34:56 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

JPDeni

This is what I'd started out to make, but it was a bit complicated and I wanted something simpler first, so I did the poster cloud. Now I think I've got the word cloud done. This was adapted from someone else's code who wrote it for Wordpress.

The script takes the most recent 30 posts from the forum, counts the words, dumps all the punctuation and such and prints out the 50 most frequently occurring words that are 4 or more letters long. All of this is adjustable, of course, so you can tweak it to fit your situation.

There's a list of common words that you probably don't want in your cloud because they're not very interesting. There may be more that you want to add.  There's also some html code things that I've caught -- non-breaking space, apostrophe, exclamation point, quotation mark -- but there will likely be others. The best thing to do is to check it from time to time to see if there's anything you need to add.

One thing to particularly take note of... If you're trying to get rid of html codes, you can't just put the code in. TP will translate it and it'll be pointless to try. You need to do it in two pieces. (What did she say?!?!!?)  :o

Example: To define the non-breaking space, I had to use:

$nbsp = 'nb' . 'sp;';

You'll see others in the code and you may need to come up with some more as you work with it. But it does work.  ;D

global $db_prefix;
$number_of_words = 50;
$min_length = 4;

// This is the list of words to exclude from your cloud
  $exclude_words = array(
    '@http://@',
    '@ about @',
    '@ also @',
    '@ amp @',
     '@ because @',
    '@ been @',
    '@ cant @',
    '@ could @',
    '@ didnt @',
    '@ doesnt @',
    '@ dont @',
    '@ even @',
    '@ from @',
    '@ going @',
    '@ have @',
    '@ havent @',
    '@ here @',
    '@ http_request @',
    '@ into @',
    '@ its @',
    '@ just @',
    '@ like @',
    '@ look @',
    '@ make @',
    '@ many @',
    '@ more @',
    '@ much @',
    '@ must @',
    '@ need @',
    '@ should @',
    '@ shouldnt @',
    '@ some @',
    '@ someone @',
    '@ such @',
    '@ the @',
    '@ take @',
    '@ that @',
    '@ their @',
    '@ then @',
    '@ there @',
     '@ theres @',
   '@ these @',
    '@ they @',
    '@ this @',
    '@ this @',
    '@ want @',
    '@ well @',
    '@ were @',
    '@ what @',
    '@ when @',
    '@ where @',
    '@ which @',
    '@ will @',
    '@ with @',
    '@ without @',
    '@ would @',
    '@ wouldnt @',
    '@ your @',
    '@ youre @'
  );

// Various punctuation that should be filtered from the cloud
  $exclude_symbs = array('@[0-9]@','@\.@','@\,@','@\:@','@"@','@\?@','@\(@','@\)@','@\!@','@\/@','@\&@');
  $apostrophe = '&#'. '39;';
  $exclamation = '&#'. '33;';
  $nbsp = 'nb' . 'sp;';
  $quot = 'qu' . 'ot;';
  $low_count = 0;

// Reset our class globals and other variables
  $cloudy = '';
  $word_list = array();
  $cnt = 0;
  $totalwords = '';

  $query = db_query(
    "SELECT body
     FROM {$db_prefix}messages AS mess
     LEFT JOIN {$db_prefix}boards AS board
     ON mess.ID_BOARD = board.ID_BOARD
     WHERE FIND_IN_SET(-1, board.memberGroups)
     AND board.permission_mode=0
     AND board.ID_BOARD <> 2
      AND board.ID_BOARD <> 12
     AND board.ID_BOARD <> 21
     ORDER BY posterTime DESC
     LIMIT 50", __FILE__, __LINE__);

  while ($row = mysql_fetch_assoc($query))
  {
    $words = $row['body'];
    $words = parse_bbc($words,1);
    $words = strip_tags($words); // Clean HTML tags
    $words = strtolower($words); // Make all words lower case
    $words = str_replace($apostrophe,'',$words); // remove apostrophes
    $words = str_replace($exclamation,'',$words); // remove exclamations
    $words = str_replace($nbsp,'',$words); // remove non-breaking space
    $words = str_replace($quot,'',$words); // remove quote
    $words = preg_replace($exclude_symbs, ' ', $words); // Strip excluded symbols
    $words = preg_replace($exclude_words, ' ', $words); // Strip excluded words
    $words = preg_replace('/\s\s+/', ' ', $words); // Strip extra white space
    $totalwords .= $words;
  }
  $words = '';
  $wordslist = explode(' ', $totalwords); // Turn it back into an array
  $word_count = array_count_values($wordslist); // Count word usage

// Clear out the big array of words.
  arsort($word_count); // Sort the array by usage count

// Here we build our smaller array of words that will be used.
  foreach ($word_count as $key => $val) {
    if (strlen($key) >= $min_length) {
      $word_list[$key] = $val;
      $cnt++;
    }
    if ($cnt >= $number_of_words) {
      $low_count = $val;
      break;
    }
  }


// Sort the array randomly for the cloud
  $random = array_rand($word_list, $number_of_words);
echo '<div style="text-align: center">';
// Build the cloud's HTML
  foreach ($random as $value) {
    $fsize = intval($word_list[$value] / $low_count) *4;
    $fsize = $fsize + 4;
    if ($fsize > 20) { $fsize= 20; }
    echo '<span style="font-size:' . $fsize . 'pt;">' . $value . '</span> ';
  }
echo '</div>';


I'll add a screen shot of what it looks like when there's room in the upload folder. It's pretty much the same as the Poster Cloud, though.

Assistance

intresting idea
kinda power hungry
and didnt work
wouldnt even let me go to index page

JPDeni

I suppose it's one of those "Your mileage may vary" things. Works fine on mine and doesn't seem to slow things down at all. I'll see if there's room on the server here to upload my screenshot this time.

sburke930

I cant seem to get this to work.  Is there something in the code that I need to change that would be specific to my forum?

akulion

nice would come in handy for letting people know whats the most discussed thing :D

thanks

JPDeni

Quote from: sburke930 on August 19, 2006, 06:32:36 AM
I cant seem to get this to work.  Is there something in the code that I need to change that would be specific to my forum?
No. There's nothing specific to a forum. It uses the smf database tables. When you say "I can't seem to get this to work," what do you mean? What does it do?

It might slow things down, especially if your server is being slow. If you want to try to play around with it a bit, change

LIMIT 30

to some other number, like 10 or 20.

akulion, I think it would encourage people to create substantive posts. They would be able to see the results of their posts pretty quickly.

sburke930

There is nothing in the block, it is completely empty.

Here is what I have

global $db_prefix;
$number_of_words = 50;
$min_length = 4;

// This is the list of words to exclude from your cloud
  $exclude_words = array(
    '@http://@',
    '@ about @',
    '@ also @',
    '@ because @',
    '@ been @',
    '@ cant @',
    '@ could @',
    '@ didnt @',
    '@ doesnt @',
    '@ dont @',
    '@ even @',
    '@ from @',
    '@ going @',
    '@ have @',
    '@ havent @',
    '@ here @',
    '@ http_request @',
    '@ into @',
    '@ its @',
    '@ just @',
    '@ like @',
    '@ look @',
    '@ make @',
    '@ many @',
    '@ more @',
    '@ much @',
    '@ must @',
    '@ need @',
    '@ should @',
    '@ shouldnt @',
    '@ some @',
    '@ someone @',
    '@ such @',
    '@ the @',
    '@ take @',
    '@ that @',
    '@ their @',
    '@ then @',
    '@ there @',
     '@ theres @',
   '@ these @',
    '@ they @',
    '@ this @',
    '@ this @',
    '@ want @',
    '@ well @',
    '@ were @',
    '@ what @',
    '@ when @',
    '@ where @',
    '@ which @',
    '@ will @',
    '@ with @',
    '@ without @',
    '@ would @',
    '@ wouldnt @',
    '@ your @',
    '@ youre @'
  );

// Various punctuation that should be filtered from the cloud
  $exclude_symbs = array('@[0-9]@','@\.@','@\,@','@\:@','@"@','@\?@','@\(@','@\)@','@\!@','@\/@','@\&@');
  $apostrophe = '&#'. '39;';
  $exclamation = '&#'. '33;';
  $nbsp = 'nb' . 'sp;';
  $quot = 'qu' . 'ot;';

// Reset our class globals and other variables
  $cloudy = '';
  $word_list = array();
  $cnt = 0;
  $high_count = 0;
  $low_count = 0;
  $totalwords = '';

  $query = db_query(
    "SELECT body
     FROM {$db_prefix}messages AS mess
     LEFT JOIN {$db_prefix}boards AS board
     ON mess.ID_BOARD = board.ID_BOARD
     WHERE FIND_IN_SET(-1, board.memberGroups)
     AND board.permission_mode = 2
     ORDER BY posterTime DESC
     LIMIT 30", __FILE__, __LINE__);

  while ($row = mysql_fetch_assoc($query))
  {
    $words = $row['body'];
    $words = parse_bbc($words,1);
    $words = strip_tags($words); // Clean HTML tags
    $words = strtolower($words); // Make all words lower case
    $words = str_replace($apostrophe,'',$words); // remove apostrophes
    $words = str_replace($exclamation,'',$words); // remove exclamations
    $words = str_replace($nbsp,'',$words); // remove non-breaking space
    $words = str_replace($quot,'',$words); // remove quote
    $words = preg_replace($exclude_symbs, ' ', $words); // Strip excluded symbols
    $words = preg_replace($exclude_words, ' ', $words); // Strip excluded words
    $words = preg_replace('/\s\s+/', ' ', $words); // Strip extra white space
    $totalwords .= $words;
  }
  $words = '';
  $wordslist = explode(' ', $totalwords); // Turn it back into an array
  $word_count = array_count_values($wordslist); // Count word usage

// Clear out the big array of words.
  arsort($word_count); // Sort the array by usage count

// Here we build our smaller array of words that will be used.
  foreach ($word_count as $key => $val) {
    if (strlen($key) >= $min_length) {
      if ($high_count == 0)
        $high_count = $val;
      $word_list[$key] = $val;
      $cnt++;
    }
    if ($cnt >= $number_of_words) {
      $low_count = $val;
      break;
    }
  }


// Get the high and low, and calculate the range.
// This is used to weight the size of the words

  $range = ($high_count - $low_count) / 5;


// Sort the array randomly for the cloud
  $random = array_rand($word_list, $number_of_words);
echo '<div style="text-align: center">';
// Build the cloud's HTML
  foreach ($random as $value) {
    $fsize = $word_list[$value] + 7;
    echo '<span style="font-size:' . $fsize . 'pt;">' . $value . '</span> ';
  }
echo '</div>';

JPDeni

I copied your code into a block and got the same thing as I had before. The only thing I can figure is that the posts on your forum have a lot of short words in them that are being eliminated.

To test it out, comment out -- put // at the beginning of -- the line
    $words = preg_replace($exclude_words, ' ', $words); // Strip excluded words]

You also might want to change the value of the $min_length variable.
Change the number in the line
$min_length = 4;
to 3 or 2 or even 1.

If you still don't get anything (and you have posts in your forum! :)) I'm not sure what to tell you. It may just not work for you for some reason.

Harro

#8
Another great block code!
Gonna add this one aswell I think! :D

[edit]
Also not working for me.
And getting this in my error log:
2: Invalid argument supplied for foreach()
File: /home/f5186for/public_html/Themes/default/languages/Stats.english.php (main_below sub template - eval?)
Line: 139

JPDeni

You know, I wish they would just let it not work so we'd get a real error message instead of the error log. I can never figure out where the problem is from the little bit of info they give.

It's probably best just not to use this code.