21st August 2008

Sitemap

This post is for Google. It’s not really meant to be ready by humans, but you can if you’d like. With the new more useful 404’s by Google they promise such things as:

In addition to attempting to correct the URL, the 404 widget also suggests the following, if available:

  • a link to the parent subdirectory
  • a sitemap webpage
  • site search query suggestions and search box

I haven’t seen them offer the sitemap page yet.  Perhaps its because I was clever and named it articles and not sitemap or because they hate it since it’s graybarred even though linked to on every page of the site.  Either way, Googlebot the sitemap is located at http://www.jlh-design.com/articles/.  Seeing that it’s just a giant list of all the posts and pages on the site I would think it would have been quite transparent that though the page is named articles, it’s actually a SITEMAP.

posted in Google | 0 Comments

21st August 2008

Giving Yahoo some love


This blog is quite Google centric but every once in a while I need to give Yahoo! some love. Well, today is the day. See how the word Yahoo! was highlighted in that first sentence? (whoops I did it again) I didn’t need to do anything but actually just type it, a plug-in took over and stylized for me. It will actually do that throughout the blog automatically in all posts and comments.

Yahoo! Highlighter Download

posted in Plug-ins | 0 Comments

1st August 2008

Don’t use Robots.txt to control indexing

It seems a day doesn’t go by in GWHG that someone is concerned that some page that they blocked in their robots.txt file is showing up in Google. Google’s handling of the robots.txt is quite elaborate, well documented, and easily tested. Having said all of that many do not fully understand the intent of robots.txt and how the opportunity to use it for optimization of a web site.

Any discussion of robots.txt cannot be complete without the caveat that only GOOD robots follow it and it’s a very public file, so don’t expect it to keep out rouge bots or as a security measure to keep stuff hidden. That being said, I’d like to talk about an obedient bot, googlebot.

As elaborate or simple as your robots.txt may be it accomplishes one thing it directs the crawler where it can and cannot go explicitly by disallowing some pages/folders or indirectly by only allowing certain pages and blocking others. Stopping the crawler from crawling a page should not be confused with giving it direction on what to do with that page. As a matter of fact Google will indeed index urls that explicitly blocked by the robots.txt file. Since they cannot crawl them they really don’t know what’s on the page so the URL will often be listed as URL only without a Title or description (snippet). Sometimes if they can find the information elsewhere like the ODP they’ll use that to help fill in the blanks.

I don’t know exactly what threshold exists for the decision to include a URL that’s blocked by robots.txt but I’d imagine as with anything Google it has something to do with the quantity and quality of links pointing to it. That being said, and as anyone who’s trying to rank something in Google knows, those links are gold and not to be taken too lightly. Most honest-to-goodness real links start out in someones browser bar. They’ve navigated to a page and found it interesting enough to tell others about it by cutting-n-pasting the URL into some sort of HTML somewhere. It would be a crying shame if Google were to follow that link only to be blocked by a robots.txt and not be able to transfer any value to the site other than to list the URL as URL-only in the search results, which will more than likely only ever be shown for a search on the anchor text, which may actually only be “click here“.

Say Matt Cutts really wants to rip into me with one of his famous debunking posts. In part of his article he really wants to show how often I speak of Google on this blog. To emphasis that fact he may link to an internal site search page like: http://www.jlh-design.com/?s=google which will find all the posts here that use the word Google. Being a good webmaster I don’t want Google to return my search results in their search results as we’ve been warned not to.

I could block all search results from being crawled in my robots.txt with something like this:

1
2
User-agent: *
Disallow: /?s=*

Which will keep Google from crawling that URL. However a link from Matt Cutts is prized and rare so I may want to take advantage of it when it does come around.

The better option is to allow the URL to be crawled but stop Google from indexing it via a robots meta tag.

1
<meta name="robots" content="noindex,follow,noodp,noydir" />

The page that Matt linked to does contain all of my site’s navigation pointing to previous posts, the home page, categories etc, that I’d like indexed and ranked. Allowing Google to crawl the page and follow the links while stopping it from being indexed accomplishes the goal of keeping it out of the index but passing value to the site as a whole.

For a fine example of this in the wild let’s take a renowned SEO site SEOmoz who has this in their robots.txt file.

1
2
User-agent: *
Disallow: /ugc/category/

Yet Google has 28 URL-only pages indexed currently. (screenshot)

So remember that robots.txt doesn’t stop a page from being indexed it does however stop the page from passing any value to your site if they can’t crawl it. Using the robots noindex meta tag will control indexing but allow crawling for discovery of other links on the page.

posted in SEO, Webmastering | 2 Comments

31st July 2008

Publish or Perish

Publish or perish is a term used in academia used to describe the notion that one must publish on a consistent basis to sustain their career and prestige within their institution and among their colleagues. The concept was no more apparent than tonight in a monthly review of this site’s statistics. I offer you this small snapshot:

Screen shot of awstats for JLH-Design.com

The trend is not what you’d like to see in the normal development of a site. Notice that uniques, visits, and pages are all down by at least 50% this month compared to last month, this after a reasonably steady natural growth rate.

Upon further inspection the search engine traffic is right where is normally is, reader’s per post is within normal range, but the large disparity is that “other sites” category. The normally (normal for me anyway) largest source of traffic which is other sites, type ins, bookmarks, social media, etc.

Admittedly posting and quality of content has been down lately as other pressing needs and sites have become more important than this small blog but trend is an important lesson in web publishing. If you’re (by ‘you’ I mean I) are not putting forth the effort to publish new and compelling material you’re also not spending enough time on promotion of the material. What can be more of an example than a loss of nearly 100,000 pageviews in a single month? Blog type formats may be more susceptible to this as the content tends to be timely in nature and rely much less on search engines supplying the visitors than normal information or commerce type sites.

In all the ongoing discussion of Google search results, links, optimisation, etc I think what’s often lost in the discourse is a less than concrete concept of passion. When I write or publish something that I’m excited about I get passionate about it and I want to share that passion with other readers. Saying something you believe in isn’t enough you want others to hear it. Given the flakiness and uncontrollable nature of search refers I tend to promote ideas I’m passionate through other means and that can be seen in the site’s stats. It’s not all about Google when it comes to a site’s readership, involvement and ultimately conversion it’s about engaging the audience and bringing them to the site first.

I half expect to see next months search referrals to be down as well. With 100,000 less pageviews this month that’s 100,000 less chances for someone to be inspired to provide a link and share the information. Negative link acceleration on a site can be the death knell for it in the natural rankings and those tend to lag reality by a few weeks. It should be noted that the lack of publishing really started (or stopped as the case may be) in June and continued in July, only now is the fall out being able to be seen and graphed.

I’m not making any promises on being more engaging on this site in the near future but I have made a mental note of the affects of passionate involvement and hope to further cultivate that in other projects.

posted in Site News, Webmastering | 1 Comment

28th July 2008

Cuil

Cuil.com debuted it’s search engine.  I’m not going to declare it the next Google killer or a total failure based one day’s results, however I did find this interesting.

If you visit their section for webmasters they have:

If you would like Cuil to crawl your site and have it included in our index, please let us know

Where the “please let us know” is a link to an actual email address.

I doubt if many remember this but Google used to actually use email back when they were young (and not billionaires)

Before I am too quick to judge cuil’s capabilities I’ll keep in mind Google’s once humble beginnings.

posted in search | 1 Comment

24th July 2008

I have arrived!

I’ve been flattered with interviews, received recognition on Google’s webmaster blog, mentioned on industry leading sites like Search Engine Roundtable and Search Engine Land, linked to by Matt Cutts, and even made the BigList.

But Today I have I arrived. My fame is now official.

I have an insane cyber-stalker.

Please, I beg you, please, go read my #1 fans’ site by John H. Gohde (screenshot). Apparently somewhere he got the impression that I was an SEO. Okay, so he doesn’t have his facts straight, but it makes for good comedy. He spends most of his day searching for [JLH] on Google to see where I rank. I never knew of my desire to rank for JLH until this very moment when I was trying to follow his posts.

I don’t know this guy from Adam other than he was one of a very few banned from Google groups for being abusive to people. I see he’s on Sphinn now, I expect the mods there will have their hands full once he settles in and starts rambling and attacking people.

If they’re shooting at you, you know you’re doing something right. (The West Wing - the Midterms)

I can’t think of someone I’d rather not like me in the online world more than someone who manages to get themselves banned from both Google Groups and Wikipedia. That puts me with some good company.

posted in SEO | 12 Comments

23rd July 2008

Sean Michael Korte: 1969-2008

On a very personal note.

I lost one of my closest high school friends late last week. We’ve lost touch for one reason or another. I last spoke to him in October. I’ll remember every word.

I did some calligraphy in high school that still hangs in my parents house and it rings so true today. So sorry, I don’t know the author, but here is the poem.

Before It’s Too Late

If you’ve a tender message
Or a loving word to say
Don’t wait till you forget,
But whisper it today.
The tender word unspoken,
The letter never sent,
The long forgotten messages,
The wealth of love unspent.
For these hearts are breaking,
For these loved ones wait,
So show them that you care
Before it is too late.

Sean Korte will be missed. My heart is very heavy.

Update 7/23/08:

The full obituary has now been published:

Sean M. Korte, 39, passed away Thursday, July 17, 2008 at his home in Shawnee, KS. Memorial services will be 2 p.m. Saturday, August 2, 2008 at the Atonement Lutheran Church, 9948 Metcalf, Overland Park, KS. Graveside services will be 2 p.m. Friday, August 1, 2008 at Fremont Lutheran Church Cemetery in Red Oak, IA. Sean was born February 18, 1969 in Red Oak, IA. He graduated from Aquinas High School in La Crosse, WI in 1987 where he avidly played football and wrestled. After high school Sean attended Saint John’s University in Collegeville, MN and later joined the United States Navy and served proudly for five years as a Gunner’s Mate Second Class. After his service in the Navy he pursued his love of cooking and had attended culinary arts school. Sean was an avid outdoorsman and tried to spend as much time as possible camping and hiking. He is survived by his parents, Dr. Stephen and Judy Korte, Shawnee, KS; his grandmother, Mary C. Korte of Arkansas City, KS, a sister Stephenie Korte, Louisville, CO; two brothers, Jonathan Korte, Lawrence, KS; Jason and wife Angela Korte, Lafayette, CO; two nieces Antonia and goddaughter Lauren Korte. Numerous aunts, uncles and many, many cousins will miss him greatly.

posted in Personal | 5 Comments

22nd July 2008

Get your twitter links while you can

Earlier today Dave Naylor outed the little known twitter fact that you can get a non-nofollowed link by adding a web address in your “One Line Bio” of your profile.

Like:

For a resulting profile page like this:

It’s not going to last long as internet officer on-the-spot Matt Cutts has spotted it and taken action to stop the flow of link juice to people:

Since Matt is so interested in helping out twitter he may want to mention that some of their “capacity” issues may be due to Google crawling the non-canonical versions of URLs that exist throughout the site.

Notice the same page is indexed twice in Google. One as twitter/johnweb and as twitter/JohnWeb, same content, same spelling, just different cases used.

I guess we’ll see who twitter is more interested in pleasing, it’s users by reducing the server load with a simple url canonicalization fix or Google with their cure-all rel=”nofollow”, by which is fixed first.

posted in Google, Matt Cutts | 0 Comments

11th July 2008

GWHG Looses a valuable Googler

In case you missed it “Bergy” Berghausen announced that he is leaving Google and moving on to pursue a career path in the legal profession.

His goodbye is here:

Hi folks!

I am extremely grateful for the time I’ve been able to spend
monitoring this group–responding to questions, watching threads,
reading all the new “Introduce yourself” posts, and being consistently
amazed at the speed with which some of our users can type.  This has
become my home online and holds a very special place in my heart,
though with great regret I must announce that my time monitoring this
group in an official capacity has come to an end.

I have made a very tough decision to leave my position at Google to
follow my calling to join the legal profession, and today is my last
day on the job.  It’s been a wonderful time, and I would like to thank
all of you, especially beussery for his great attitude and Flash
expertise, webado for her untiring dedication and mod_rewrite
expertise, and Autocrat for his sense of humor and for his rocket-
speed ascent from being an occasional poster to the second most
frequent in a matter of a few weeks.

It’s been a lot of fun spending time with you here.  Also, this isn’t
exactly goodbye either, since I will definitely be stopping by without
my big blug [G] and contributing in my personal capacity when I’m not
reading about contracts or rules of evidence.  :-)

So, thanks for helping each other–keep on posting!
-Bergy

Good luck Bergy and thank you very much for your all of your help in webmastering issues!

posted in GWHG | 7 Comments

9th July 2008

Googlebot using Yahoo IP range for crawling?

Okay, the title may be jumping to conclusions but please help me understand this.

I noticed an odd referral today in my stats for this blog. It was for the search term [ip address 74.6.8.94], it seemed a bit strange so I checked it out.

The IP address 74.6.8.94 belongs to Inktomi Corporation:

Every one of my single post pages contain a little plug-in that shows the user’s IP address, like:

So it would make sense that the IP address of the crawler would be added into the text of the page and returned for search results.

The ODD thing however was that this search referral was from Google, with the Yahoo! IP address.

The Google Search for [ip address 74.6.8.94]

Screenshot

Returns one of my pages at the 10th spot, and clicking on the cache of that page shows the Yahoo! address stored in the cache:

To be sure this isn’t normal behavior the following thumbnail is for a cache of another page showing the Google IP address 66.249.65.100:

So the question I have is how does a Google cache get taken showing a Yahoo! IP address? I’m sure there is a logical explanation that I am just missing but I am hoping that somebody out there can explain it to me.

Added After Initial Posting

After I initially posted this I thought it would be a good idea to see if this one page was an anomaly or if other indexed pages showed the Yahoo! IP address, apparently the one I showed above is the only one. Note that the other two URLs shown are this post and the home page which were already updated in the index when I went back and checked.

posted in Google, search | 0 Comments

8th July 2008

Bits-n-pieces

I have been quite busy with other stuff and have been quite derelict in my duty as a blogger. Hope to find more time/inspiration/desire in the near future.

In the meantime….

  • Google, PLEASE fix the Webmaster tools so that the statistics regarding home page crawling, the cache links, and what Goooglebot sees are at least close to reality. I realize you’ve got limited resources with only a few hundred thousand computers and all but at least set aside some computing time to push fresh data weekly on a Monday at 1:00 pm or something. The GWHG is inundated with people who get the wrong impression of their sites performance based on this data. Considering that probably only a very small percentage of the confused masses actually find their way to the GWHG I would say that your tools are statistics are actually hurting more people than helping. At some point when a tool becomes harmful it really ceases being a tool. If resources are a problem then I suggest a disclaimer placed within the webmaster account stating that the statistics provided are not up to date and generally wrong and for amusement only. People respect the quality of Google’s index and expect a certain amount of quality with their other offerings, webmaster’s tools while innovative and well intended, are doing more to harm that reputation than help it.
  • There was a Sebastian siting. I wish he’d finish taking over the world or whatever he’s doing, I miss his presence.
  • You should consider joining Adam Lasnik’s Question of the Day room over at friendeed. While not search related at all it provides some good entertainment and insight It combines Adam’s elegant style as a writer with the preciseness of a programmer and his philosophic outlook on things,
  • While you are at it join me on friendfeed.
  • Or Plurk
  • Or Twitter
  • Or Facebook
  • John Mueller lasted a little over three days without posting on his vacation. He is incredible.
  • Barry can’t find the source of the spiders.
  • Speaking of spiders, and I was, example.com has a robots.txt that includes:
    • User-agent: *
      Disallow: /
    • Yet Google has indexed over 15,700 URLs for the site. A fine example of how robots.txt is a crawler directive and not to be used to limit indexing of content and how ineffective a 404 is at removing indexed content. (there is money to be made in that last statement, maybe I should write a post about that?)

posted in search | 2 Comments

19th June 2008

Google, please let us report paid links

In their ever vigilant zeal to be perplexing and clear as mud on the issue Google has many stances on the paid links situation.

Some official:

Buying or selling links that pass PageRank is in violation of Google’s webmaster guidelines and can negatively impact a site’s ranking in search results.

Some not so official:

We’ll be concentrating primarily on the sellers, but if you send us a site that appears to be buying links that pass PageRank it’s trivial for us to look up all the backlinks for that site to find potential sellers and work from there.

Whether or not they are “concentrating” on link buyers or not, it appears through many threads on Google Webmasters Help Group that people are actually being penalized for buying links. The ones I’ve seen have been pretty obvious either through sponsored themes, automated link networks, or the most obvious sitewide footer links.

They do offer a method for buying links without getting in Google-hot-water with the much maligned and oft misapplied rel=”nofollow” link or through a robots.txt block:

Links purchased for advertising should be designated as such. This can be done in several ways, such as:

  • Adding a rel=”nofollow” attribute to the <a> tag
  • Redirecting the links to an intermediate page that is blocked from search engines with a robots.txt file

Which is a all well and good if you are running the site and have control over the links. But what if you are buying the links? What Google is failing to recognize is that sometimes people may actually buy links because they want the traffic. Gasp. It is possible that a permanent link purchased for a set price will in the long run cost less per click than… let’s say… an adwords ad.

I haven’t mentioned the negative SEO aspect yet, as I’m not convinced that it’s really a viable method, but it is often discussed. If Google is penalizing sites that buy links generally the next thought in the room is “Then I’ll just buy my competition a bunch of links and report them!”. First off I’m not 100% convinced they actually penalize the buying sites but rather just discount the links from the sites that sold them, which if the case you are just paying money for clicks to your competition. Not generally a good business practice. Second, I’m not sure they’ll react to all of the reports so you may in fact be buying them some links that will help them in the rankings PLUS the clicks, also not a sustainable plan. Either way there are a fair amount of webmasters out there worrying that someone else can buy links to their site and have it hurt them.

With all this in mind, the desire to buy links (that you cannot control the format of) for traffic and the logical concern that someone else could buy links to your site that may hurt you I propose that Google institutes a Report My Paid Links” or Disavow Links” feature in Webmaster Tools.

I envision this tool to allow a webmaster to list domains or pages that have linked to their verified domain that they do not want to count for or against them in ranking. It’s a way for a webmaster to say that they’ve purchased links for traffic in a local directory or perhaps a high profile school newspaper but don’t want to give the impression that those links were purchased for PageRank manipulation. It would have the added benefit of letting a webmaster feel more at ease if they see some spammy links pointing to their site that they may want to disavow. Oh, perhaps the old idea that there is almost nothing a competitor can do to harm you still applies and those links won’t actually hurt you, but it would be a good thing to help put them at ease.

So I say: Google, please let me report paid links! Let me tell you which links I bought for traffic. Let me tell you so that if somebody reports my site as a link buyer you can see that I already told you about them, increasing your trust in me rather than taking the chance that some human reviewer gets it wrong. Let me have those links on record in case the link I bought which was on a nofollow page is changed later by the webmaster without my knowledge.

Then again if you are only going to punish the sellers and not the buyers, then say so, so we can put all this “Google bowling” non-sense behind us. :)

posted in Google, Paid Links | 0 Comments

23rd May 2008

Really Minty Fresh Indexing

It took all of 13 minutes for Google to pick up my post before this about my death threat and send a visitor for mayhissolrestinpeace.  Very impressive.  Expecting my first referral from Yahoo! late next week. :-)

posted in Google | 1 Comment

23rd May 2008

Someone I call my friend, wants me dead

I received a disturbing bit of email the other day. Searching for it I found very little information so I thought I’d post it as I’m sure I’m not the only person that got it.

From: BLOOD BLOOD (mayhissolerestinpice3@gala.net)
Subject: SOMEONE YOU CALL YOUR FRIEND, WANTS YOU DEAD.

SOMEONE YOU CALL YOUR FRIEND, WANTS YOU DEAD.

I felt very sorry and bad for you, that your life is going to end like this if you don’t comply, i was paid to eliminate you and I have to do it within 10 days.

Someone you call your friend wants you dead by all means, and the person have spent a lot of money on this, the person came to us and told us that he wants you dead and he provided us your names, photograph and other necessary information we needed about you. If you are in doubt with this I will send you your name and where you are residing in my next mail.

Meanwhile, I have sent my boys to track you down and they have carried out the necessary investigation needed for the operation, but I ordered them to stop for a while and not to strike immediately because I just felt something good and sympathetic about you. I decided to contact you first and know why somebody will want you dead by all means. Right now my men are monitoring you, their eyes are on you, and even the place you think is safer for you to hide might not be.

Now do you want to LIVE OR DIE? It is up to you. Get back to me now if you are ready to enter deal with me, I mean life trade, who knows, and I might just spear your life, $8,000 is all you need to spend. You will first of all pay $3,000 then I will send the tape of the person that want you dead to you and when the tape gets to you, you will pay the remaining $5,000. If you are not ready for my help, then I will have no choice but to carry on the assignment after all I have already being paid before now.

Warning: do not think of contacting the police or even tell anyone because I will extend it to any member of your family since you are aware that somebody want you dead, and the person knows some members of your family as well.

For your own good I will advise you not to go out once is 7pm until I make out time to see you and give you the tape of my discussion with the person who want you dead then you can use it to take any legal action. Good luck as I await your reply to this e-mail contact: mayhissolrestinpeace@gmail.com

Bye.

With such a threat like that I had to respond. So I did to the gmail given in the letter and to the address who sent the letter, mayhissolerestinpice3@gala.net.

I want to put this behind me as soon as possible I will meet you at walmart with the money.

About 19 hours later I got this response:

From: ASSASSIN ASSASSIN (mayhissolrestinpeace1@gmail.com)
Subject: Attention John Honeck, [with the comma]

Attention John Honeck,

you only have three days to send this money because time is not on my side I am now in the state together with my boys so you don’t need to delay this payment for any reason I don’t just want to west your life with reason but if you can not comply then bye. I will locate you where ever you are it will only take me 24 hours to get you.

You can send this money direct to my local boys in Benin via western union money transfer.

Receivers name. IGWAZE SAMUEL.
Country. Benin. [map]
City. Cotonou.
Address Cotonou Benin republic.
Question. What is my name?
Answer. Boys.

You should send the money via western union money transfer amount 3000 any delay that you apply on this will be on your own risk. And I will guarantee you that as soon as you send this money the person that sends me to this job will be on danger because I don’t want to west you but any delay on sending this money will affect your life and your family because that is my job. As soon as you send the money I will send you the tape of my conversation with this person that sends me for this job.

Bye bye.

William yahman.

From the generally nice closing of “bye bye” I guess my would be killer is warming up to me, hopefully we can strike a deal. I’m not sure if anyone is actually reading my emails so I replied this time with:

I have washed the elephant with cheese, please see how it glows.

I am awaing William Yahman’s response to that. I’ll have to let you know if they do in fact come for me.  Google was unable to calculate driving directions for me to make a personal visit.  Bummer.

More updates to come…

posted in Personal | 32 Comments

20th May 2008

Twitter: Epic Fail

Twitter Fail Boat

posted in SEO, web 2.0 | 0 Comments

  • Please Support

  • Marquette University

  • Sponsored

125x125

  • Donations


  • ;

Enter your email address:

Delivered by FeedBurner

rss posts
  • Latest Comments

    • Judy: I got that too. By someone named "amy fr...
    • Patrick Daly: Great post. Often the ease of a one-line...
    • Kimberly: You can buy primatene mist at CVS Pharma...
    • Data Entry: I know it does get exhausting, especiall...
    • pageoneresults: I do believe that person is quickly comm...
    • Everett: That's all fine until you start dealing...
    • John Honeck "JLH": g1smd, things to learn from this. 1)...
    • g1smd: Wish I had seen this earlier. :-(...
  • Readers