November 22, 2005

Net News Note

This brief article appeared at the top of the "print" November 2005 NN issue:

You didn’t miss it; due to various other emergencies, there was no e-mailed issue of NN for October. This is the November issue, the first of volume 6! Can you believe it?). Part of the problem is that I do the writing for NN directly on the blog, and then put together this paper copy, which is rather time consuming. I’m thinking about making NN a bimonthly rather than a monthly as a “print” publication. But, you can always see what I’m working on at http://netnewsarchives.blogspot.com, and even read articles from all the way back to the beginning.

There are even two articles on the blog that didn’t make it to this print copy, which, at five pages, is long enough as it is: Scientology vs. the Internet pt. 2 and Citation Muddle.

November 21, 2005

Splogger? Who Are You Calling a Splogger?

No sooner do I read in this week's Newsweek about spam blogs (or "splogs") than I find out that that's what Blogger thinks I am--a splogger. "Splog" and "splogger" are ugly words with ugly meanings, so I'm offended. Let me explain.

A splog is a computer generated blog that is characterized by "irrelevant, repetitive, or nonsensical text, along with a large number of links, usually all pointing to a single site," according the Blogspot help. Automated programs create splogs, and, and they're yet another way some people are trying to turn a dishonest profit off the internet. I admit I don't really get this (I guess I don't have a devious mind), but here is what Newsweek says about how spam blogs make money:

Here's how they work: first find a subject that draws consumers who may be valuable to advertisers on Google or Yahoo, and register for the programs that let those search companies place ads on your blog. Then set up a blog that automatically sucks in items from the news (via easy-to-set-up feeds) about that subject. If you've done it right, Google's search engines will identify your blog as a prime place for a high-value ad. Then, as (Technorati's David) Sifry says, "you can pay housewives in India to sit there and click on the ads.


Because it is a free service that makes setting up a blog so easy even a computer can do it, Blogspot is particularly prone to splog abuse.

Like e-mail spam, spam blogs clog up the system. What's worse, they threaten to distort the results you get when you search Google, the world's most popular search engine. Remember, when you search the big G, Google uses its proprietary "Page-rank" program to determine the order in which search results appear on your results page. The most linked-to sites appear at the top of the list; the idea is that sites with lots of links to them must be the best. Writes one observer:

Page-rank is under attack and the attackers are winning. "It won't be long before Google itself is infested. ... It's time for Google to get on top of this. They're both the victimizer and the victim. The spammers found a huge hole in Page-rank.


The "victimizer and victim" part of the quote refers to the fact that, ironically, Google owns both the Blogger blogging software and Blogspot hosting service.

Like the bird flu, the distortion of Google search results is more an ominous threat than a present reality. Google has taken steps to stop the creation of "blogspam," most notably the addition of a "word verification" requirement for blogs that are suspected of being splogs. This is the situation I described in the first paragraph of this post. Because Blogger's spam-prevention robots think that the NN blog "has characteristics of a spam blog." ("irrelevant, repetitive, or nonsensical text?!" I'm offended!), I now have to type a string of meaningless letters (like "yjygmr")into a box to prove that I'm a real person before a new post is added to the NN blog. The letters are distorted and are not computer readable. This method of verifying that a person is a person and not a machine is called CAPTCHA (which stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart). You may have seen CAPTCHA when you've helped users set up a free e-mail account with Yahoo.

Blogger apologizes:

Since you're an actual person reading this, your blog is probably not a spam blog. Automated spam detection is inherently fuzzy, and we sincerely apologize for this false positive.


I could ask Blogger to review my blog to determine if it merits the removal of the CAPTCHA, but it hardly seems worth the effort.

Since the splogging seems to have reached a critical mass, and since it seems to be such a Google-centric problem, it will be interesting to see in the coming year if this issue causes any major changes in the way that Google and/or Blogger do business.

Search Back in the Spotlight

Back in January 2004, there was a NN article called "When Is A Web Surfer Not a Web Surfer?" The answer to this riddle is that is that a web surfer is not a web surfer when he/she is using non-browser internet applications such as "media players, instant messenger programs and file sharing programs such as KaZaa." This article reported that at that time, 76% of web users were using these non-browser applications.

The answer to this riddle is that is that a web surfer is not a web surfer when he/she is using non-browser internet applications such as "media players, instant messenger programs and file sharing programs such as KaZaa." This article reported that at that time, 76% of web users were using these non-browser applications.

Times have changed. Old-fashioned searching the web is back to being the most popular online activity. According to the Yahoo’s report on the Pew Charitable Trust's Internet and American Life Project

These results from September 2005 represent a sharp increase from mid-2004. Pew Internet Project data from June 2004 show that use of search engines on a typical day has risen from 30% to 41% of the internet-using population, which itself has grown in the past year. This means that the number of those using search engines on an average day jumped from roughly 38 million in June 2004 to about 59 million in September 2005 – an increase of about 55%. comScore data, which are derived from a different methodology, show that from September 2004 to September 2005 the average daily use of search engines jumped from 49.3 million users to 60.7 million users – an increase of 23%.


And furthermore:


This means that the use of search engines is edging up on email as a primary internet activity on any given day. The Pew Internet Project data show that on a typical day, email use is still the top internet activity. On any given day, about 52% of American internet users are sending and receiving email, up from 45% in June of 2004.


Where, I wonder, does this leave those "media players, instant messenger programs and file sharing programs such as KaZaa"?

November 20, 2005

International Crisis Averted

Nobody owns the Internet, but the U.S., and U.S.-based companies like Google do a pretty thorough job of running the show. Even the Internet Corporation For Assigned Names and Numbers (ICANN), the group that controls the assignation of top level domain names (.com, .org, etc.) ultimately answers to the U.S. Department of Commerce. To be fair, ICANN has board members from several different countries. But foreign leaders don't like the U.S. control of the "root zone file," of top level domain names, including the two letter country codes you see on web sites in other countries. "Control of the root means that the United States could, in theory, wipe another country's top-level domain out of the system for political reasons, leaving it largely unreachable to web and e-mail traffic," writes one observer. Some nations are calling for ICANN to be brought under international control, perhaps under the auspices of the United Nations. The Bush Administration, on the other hand, argues that it's better to have ICANN hosted in one country, in order to ensure its stability and to cut down on red tape.

Tensions came to a head recently just before the U.N.'s World Summit on the Information Society. Technology observers were prepared for a battle between the U.S. and the forces for an international ICANN (the European Union and its allies on this issue, including Brazil and Iran), but negotiators managed to avert a crisis. For the foreseeable future, it looks like the U.S. will retain its hegemony regarding the internet. The U.S. may have won this battle, but the war for control of top level domain names isn't over yet.

November 19, 2005

Search Engine Promises Something New

It's hard to find anything new under the search engine sun, but here's a system that claims to be really different. The Australians who created Factbites claim that while "other search engines spew out meaningless site-names and mangled phrases.
Factbites offers you real, meaningful sentences that are right on topic." These sentences appear on the results page, and directly from the pages indentified as being most relevant by the Factbites search engine. The Factbiters say:

You can often gain a great deal of factual information on a topic without ever having to leave the search page! When users do select a page, they can have much more confidence that the page deals directly and informatively with their topic.


Keeping in mind that Google is the de facto standard for web search engines, and that G's hegemony is not likely to change anytime soon, comparisons are inevitable. FactBites vs. Google page shows you the differences between the two services. FactBites, however, does not point out that Google's "define" command (just type "define [word]" into the Google search box for a list of definitions of the word in question, culled from the web, along with links to the sites the definitions came from) does much the same thing as FactBites does.

Give it a try at FactBites.com, keeping in mind that the search engine is still in beta and many searches won't work.