Re: MacroOrganism

Thomas Lapp (thomas%mvac23.uucp@udel.edu)
Tue, 8 Dec 92 21:31:30 EST

for usenet.hist@weber.ucsd.edu
Date: Tue, 8 Dec 92 22:07:55 -0800
From: Eeyore's Evil Twin <chuq@apple.com>
To: usenet.hist@weber.ucsd.edu
Subject: Re: MacroOrganism

>There is a anyone-can-be-moderator feature:
>e.g. one can subscribe to all articles that either Jack or Jill liked.

Interesting. It sounds like a rehash of a system called "accolades", dreamed
up by Erik Fair a number of years ago and published in ";Login" (the name
accolade was termed by me). Nice concept. One technical problem: imagine
increasing the number of messages slogged around by an order of magnitude as
we ship out lots of accolade control messages around the network. Of course,
the technical issues are the easy ones...

>There are services that inform people of new journal articles
>that may be of interest. For example, I can ask to be notified
>of all papers whose title and/or contents contains
>either "mobile" or "nomadic" and either "computing" or "networking".

Keyword searches are very primitive. Laurie has an account with the NYTimes
clipping service. She not only gets articles about Apple computer, but apple
harvests in Washington and the business column by a guy named Bill Apple. On
the other hand, trying to close down those loopholes means she misses
articles she wants...

They're a step in the right direction, but still stupid.

They can also be very resource hungry. When I was at National, I spent a few
weeks prototyping a keyword-lookup front end to usenet using DBM. I found I
could eat up 20% of a 780 vax for about four hours trying to build a
database that ended up taking about 45% of the disk space of
/usr/spool/news. That might be doable for a circa-1982 usenet, but I shudder
at trying to do it now. (although once the basic algorithm is proven, a
custom, compressed, densely populated lookup tree could save a lot of that
disk space. DBM makes a great prototyping tool and little else, which is why
we depend on it so greatly in usenet).

I learned a fair amount with the prototype, including that the first thing I
had to do to get useful data is throw out the Keywords: header line, since
95% of the ones that aren't blank are filled with garbage values. I finally
ended up pulling significant values (poster, newsgroup, distribution,
reference's line values, subject) and keywords from the first 1K of the
file, then parsing it through a filter that bulid meta-keywords from a
synonym list (so that, for instance, tomato and tomatoes and tomatoe all
mapped to the same keyword, because otherwise trying to find useful, general
keywords was, um, fruitless) and then parsed out nonsense words ("the",
"I"). So you could, in fact, look for all messages posted by a person with
the word "Macintosh" and "Computer" within 25 words of each other that
weren't followups to a given message thread.

By the time it finished chugging away, there was a good chance the article
was expired. Massively resource hungry, and I simply didn't find it got me
significantly better hit-rate on the data I was looking for than a nicely
tuned rn with kill file.

I've thought since then that an alternative would be to create a daemon that
would allow users to register a clipping service that would swim the input
stream of usenet looking for matches to various criteria and "mail" the
successes to you (mail being a euphemistic term for passing a pointer to the
reader prog, not the actual data. You could, however, modify mail to
recognize the pointer and pull the message so it'd be invisible to the user.
I've ALWAYS felt that mail and usenet should be viewed from the same app,
since ultimately it's a transport layer difference and not an essential
data-material difference. Like having to read uucp mail iwth one prog and
smtp mail with another. But I digress). It wouldn't be real time, but I
don't consider that a disadvantage, either. And the daemon would only use
idle time on the system (and parallel-processing during off hours to speed
throughput) so the loading wouldn't be so bad.

>Some work could also be done on improving the quality of posted articles.

True, but to date, work at improving posting quality has limited success and
occasionally blows up in our face. Remember the really neat addition to
postnews that wouldn't post an article that was more than 50% inclusion?

Nice theory, but what it did was cause people to artificially add blank
lines, or to change the way they did inclusions to spoof the inclusion
checker. The former simply created longer, less useful postings, the latter
made it impossible for news-readers to easily locate the inclusions so
readers could skip past them. In both cases, it created worse problems than
it solved.

>It would be nice if postnews did a spelling check.

I have very mixed feelings about this. As a writer and editor, I've found
far too many times users get a false sense of security from spell checkers.
If the checker says it's okay, then it must be, and so they send out really
horrible crap that happens to pass a spell-checker. It might help somewhat,
but this stuff tends to help less than you might think, unless you have a
user who is interested in using a spellchecker right, and in that case, you
don't need to make them use it anyway.

>When a user does a "followup", postnews could check if any unread
>followups were present.

Good idea, depending on how you define followups. Are we talking the entire
related thread? Messages specific to this sub-thread? If a sub-thread has
gone off on a tangent, forcing a user to read the entire thread just to
comment on the tangent is wrong, but then again, not forcing them to read
the entire thread when they haven't broken off into a tangent would also be
wrong. Even using the references line, there's no practical way to tell the
difference, either.

The absolute best thing we could do is force the user to sleep on it before
writing a followup. (glyph of chuqui laughing hysterically). Never happen.

This page last updated on: Jul 1 09:16