|
How to make Search Engines work for you
Search Engines |
Meta-Search Engines | Subject
Directories, Portals, and Gateways | Creating
a Search Strategy | Basic
Search Techniques | Advanced
Search Techniques | Problems | More
Information | Evaluating Web Pages|
Saving Web Pages
this guide in Word to print (10 pages).
Search Engines
Search engines are actually huge databases compiled by "spiders" or "robots"
("bots"),
The "bots" automatically crawl through the WWW indexing
pages.
Many websites stop search engines indexing them.
Most search engines don't index every web page within a site.
Some web pages may be only partially indexed.
The biggest search engines probably index less than 20% of the web.
Even so, a simple search is likely to return hundreds of thousands of
hits.
Typically, you'll find more than you want, and less than you need.
When you do a search, the search engine doesn't actually go out and
search the entire WWW while you wait.
It simply scans its database for matches to the keywords and phrases
you typed in.
So you are really searching a small portion of the web as it existed
when the search engine last indexed it.
That's why your search sometimes finds websites which appear irrelevant
- the contents have changed since the search engine bots last visited.
It also explains why search engines won't always find really current
news items.
Some Search Engines show the date they last indexed the page, which
is very useful.
Search engines are inadequate but they are the best means yet devised
for searching the web.
Are all search engines created equal?
No two search engines are the same in terms of size, speed, content,
ranking schemes and search options.
Therefore, your search is going to be different on every engine you
use.
Typically, with two search engines, out of any 100 hits, 60 will appear
in both, and 40 will appear in only one.
Try a few search engines.
Choose the one that seems to work best for you for your default use.
If it doesn't come up with the goods, then try others - for serious
searching, always consider using two search engines.
What's on top: how search engines rank web pages
Ranking is the order in which hits are displayed and it is vitally
important to the usefulness of a search engine.
You want the most relevant pages to appear at the top of the list.
Anything which doesn't appear on the first screen has a seriously reduced
chance of being looked at.
Search engines rank according to various rules and the exact methodology
is usually secret.
The number of times a keyword appears, where it appears (at the top
of a document is better than at the bottom),
how many other sites link to it, how often a page gets returned in searches,
how often a page gets visited
- there are many possibilities, none of them perfect. Google seems to
do it best at present.
Describing the sites
How search engines describe each site is also important.
That description is all you have to decide whether a site is worth exploring
further.
Some search engines include the first line or two on the page.
Others attempt to show how each search word appears in the page, which
is generally more useful.
Duplicate hits
Often the same page exists on the web with a variety of different URLs.
Some search engines filter out this duplication so you get fewer, but
all different, hits.
Others cluster duplicate hits (or those appearing under the same home
page) so they appear together.
This saves you time and helps you evaluate how useful a site might be.
Meta-Search Engines
Meta-search engines don't have their own databases.
They search the databases of many individual engines simultaneously.
In other words, meta-search engines search search engines.
Most meta-search engines display the results as a single merged list,
identify the source search engine,
and attempt to remove duplicates.
It sounds good, but there are drawbacks - the most common
- Because search engines differ in search methods, there's no guarantee
your meta-search will work properly.
- To reduce problems, most meta-search engines restrict you to very
simple searches
- You rarely get as many results as you would with the individual
search engines
- On the other hand, those that are displayed are the most highly
ranked, and so tend to be more relevant.
- Google, one of the best search engines, is excluded from most meta-searches.
When do you use meta-search engines?
Use meta-searchers when you
- want a broad overview,
- have a very simple search
- haven't had any luck with your favourite search engine/s.
- want help choosing the best search engine for your purposes
Subject
Directories, Portals, Vortals and Gateways
Subject Directories organise links to Internet resources into subject
categories.
They are created by humans, not robots.
Every link is individually selected, evaluated and often annotated
by "expert" editors.
Selection criteria and content quality tend to be variable.
Directories are much smaller and more focussed than search engine
databases.
They index far fewer sites, and fewer pages on each site (typically
only the home page or top level pages).
Most offer a search engine to query all sites in the directory or
in a subject category.
For and against subject directories
Directories are organised hierarchically into browsable subject
categories and sub-categories.
For broad topics, they typically give you fewer, higher quality,
and more relevant results than search engines.
Because directories are maintained by humans, keeping them up to
date is more difficult than for search engines.
There tend to be more dead links and pages with changed content
which are no longer relevant to your search.
Some directories also appear to favour e-commerce sites.
Giving high rankings in return for payment is an issue for subject
directories and search engines.
Most search engines now include directories - their own (as with
Yahoo!) or under license (usually the Open Directory.)
Search engines are a far easier way to find the relevant directory
than having to drill down through several hierarchical subject layers.
When do you use subject directories?
Subject directories are best for
- browsing
- broad searches likely to return too many hits with a search
engine
- information on popular topics, organisations, and products
- a quick overview of the kind of information available on the
web in a particular field of interest
What are Portals?
Portals are subject directories with added commercial and community
content.
What are Vortals?
Vortals (Vertical Industry Portals) are focussed portals that concentrate
on a particular subject or industry.
What are Subject Gateways (and the "Invisible Web")
Subject gateways provide links to evaluated resources which support
research in a particular discipline.
These resources are reviewed, recommended and described by specialists
in the field, usually librarians and academics.
Gateways are like directories, but are usually designed to support
academic teaching, learning and research needs.
They are good sources for specialised high quality information relevant
to the subject area.
They can include links to information in the "Invisible Web".
A large portion of the web, dubbed the "Invisible Web," is barred
to search engine spiders.
This includes password-protected sites, documents behind firewalls,
and the contents of specialised databases created by researchers,
governmental agencies etc.
Creating a Search Strategy
First, think carefully about what you are searching for:
- What question do you want answered, and why?
- Is your question very general or highly specialised?
- What sort of information are you looking for?
- Is there likely to be lots of information, or very little, available?
Next, decide your approach - do you want to
- Locate a specific piece of information?
- Use a search engine. If that fails, try a subject gateway or
directory.
- Retrieve everything you can find on the subject?
- Start with your favourite search engine and follow up with a
couple of others and/or a meta-search engine.
- Don't forget to check resources off the web, such as books, journals
and other print reference sources.
- Browse?
- If you're browsing and getting and idea of what's available in
your subject area, start with a subject directory or gateway. If
this fails, try a meta-search engines, just to see what sort of
stuff is out there.
Finally, construct your search statement
Constructing a search statement
Every search statement or query consists of a few key words or phrases
which
MUST, MAY or MUST NOT occur in a webpage for the search engine to list
it.
When constructing your search, keep the following tips in mind:
- carefully analyse your question/topic
- reduce it to a few KEYWORDS - words that are vital to your search
and accurately describe your topic
- use more than one keyword (three is a useful minimum to aim for)
- try to think of all the other words (and spellings) that might be
used in webpages covering your topic.
- be as specific as you can
- use nouns as keywords wherever possible
- try to avoid using very common words
- start with a simple search first (unless you are pretty expert)
- if a simple search fails, try using advanced search strategies
like BOOLEANS and TRUNCATION
(see below)
Basic Search Techniques
These will work with most search engines in their basic search option.
Forcing the inclusion/exclusion of words
Use the plus (+) and minus (-) signs in front of words to force their
inclusion and exclusion in searches.
EXAMPLE: Searching for:
- anorexia bulimia will, with some search engines, return
any page that contains EITHER word
- +anorexia +bulimia will return only pages that contain
BOTH words
(NOTE: NO space between the sign and the keyword),
- +anorexia -bulimia should return only pages that contain
the word anorexia without the word bulimia
Searching for phrases
Use double quotation marks (" ") around phrases to ensure they are
searched exactly as you type them,
with the words side by side in the same order.
EXAMPLE: Searching for
- "north queensland" will only return pages where queensland
follows north, separated by a single space
- +north +queensland will return pages with both words appearing
anywhere, in any order
Simple Search Menus
Many search engines allow you to choose a simple search option from
a drop-down menu or using buttons:
- All of the words or Must contain will
return only documents containing ALL the words you enter (as with
+)
- Any of the words or Should contain
will return documents containing ANY of the words you enter
- Must not contain will exclude all documents containing
this word (as with -)
- Search as a phrase will return only documents containing
that exact string of letters and spaces (as with " ")
Combining words and phrases
You can combine phrases with keywords, using the double quotes and
the plus (+) and/or minus (-) signs.
EXAMPLE: Searching for
- +"james cook university" +biology -marine will find references
to non-marine biology at JCU
Upper and lower case
Type words in lower case to find both lower and upper case versions.
With some search engines, capital letters will only return an exact
case match.
EXAMPLE: Searching for
- north will return pages containing north OR North
OR NORTH.
- North may not return pages that contain north OR
NORTH UNLESS they also contain North
Punctuation
Punctuation is sometimes included and sometimes ignored by search engines.
Google includes apostrophes, but ignores commas, full stops, colons,
semi-colons etc.
EXAMPLE: Searching in Google for
- the keyword Crocodile's gives different results to Crocodiles.
- the phrases "crocodiles, alligators" "crocodiles.
Alligators" "crocodiles: alligators" "crocodiles;
alligators" and "crocodiles alligators"
all return the same results.
Truncation
Use truncation to look for all words which start the same way.
The most common truncator is *
Alta Vista is best for truncation (most other search engines don't support
it).
You don't usually need truncation (but it can be a very useful tool)
EXAMPLE: Searching for
- north* returns north, northern, northerly etc (but remember,
it will also return Northumbria and other irrelevant words so be
careful how and where you truncate).
Stop Words
Most search engines speed up searches by ignoring small and common
words.
These might include a, an, and, as, at, be, if, into, it, of, on, or,
the, to, etc.
If a search engine is ignoring words essential to your search, you'll
have to find another one that accepts them.
Advanced Search Techniques
Not all search engines offer all (or even most) of the following advanced
search options.
You should check out the search engine's help file before trying to
use them.
As a general rule, the more advanced the search, the more chance of
errors.
It is always worth giving your results a reality check.
Boolean logic
Boolean logic is a fancy term for using the operators: AND, OR, and
NOT to link words and phrases for more precise queries.
NOTE: Always use CAPS when typing Boolean operators - even if it isn't
necessary, it won't hurt.
AND
AND narrows your search by returning only documents that contain every
one of the keywords you enter
- the more you enter, the narrower your search becomes (works the same
as +)
OR
OR expands your search by returning documents which contain any of
your keywords
- the more keywords you enter, the more documents you will retrieve.
EXAMPLE: coke OR chips
Note that the use of OR can sometimes produce erratic results
NOT
NOT limits your search by eliminating documents containing keyword/s
(works the same as - above).
Note that some search engines accept NOT while others accept AND NOT.
EXAMPLE: coke NOT chips
The plus (+) and minus (-) signs used instead of AND and NOT are referred
to as "implied Boolean operators".
Implied Boolean operators are accepted in the basic search options of
most search engines.
AND, NOT etc as Boolean operators are sometimes accepted only in the
advanced search option.
Some basic search engines allow limited use of Booleans via dropdown
menus (see also Simple Search Menus
above).
EXAMPLE:
- All of the words or Must contain works
the same as AND,
- Any of the words or Should contain
works the same as OR,
- Must not contain works the same as NOT.
Search engine defaults
If your search statement contains more than one keyword without any
Boolean operator,
the search engine will automatically default to either AND or
OR.
This can radically alter your search, so make sure you know the default
settings your search engine uses.
EXAMPLE: if you search for north queensland
- some search engines will return any page that contains EITHER
north OR queensland,
- most search engines will return only pages that contains BOTH
north AND queensland.
Nesting
Nesting (or using parentheses ((brackets)), allows you to do very complex
searches by combining several search statements into one. Very few search
engines support nesting, and you should be careful attempting it. Usually
it isn't necessary.
EXAMPLE
- (coral OR corals) AND ("great barrier reef" OR "queensland")
- ((DNA OR protein* OR nucleic) AND (sequenc* OR data*)) AND
human AND (genome* OR gene*)
Proximity operators
Some search engines accept proximity operators. These can be useful
in refining search statements.
NEAR allows you to search for terms situated within a specified
distance of each other in any order.
The use of NEAR can sometimes produce better results than AND.
EXAMPLE: phylogeny NEAR ontogeny
ADJ (adjacent to) returns documents where the two terms appear next to
each other but, unlike phrase searching,
allows them to appear in any order.
EXAMPLE: Searching for
- Ernest ADJ Hemingway returns both Ernest Hemingway
and Hemingway Ernest.
Domain, country, host and URL searching
If you are seeking information from a particular kind of site, you
may be able to limit your search to a top level domain
(see Checking the Source below).
You can also restrict a search to sites in a particular country by
using the two-letter country code.
For a list of ISO Internet Country Codes, try searching "Internet country
codes".
Remember that most US sites do not have a country letter code, and
the use of country codes is not perfect.
EXAMPLE: Searching for
- domain:edu AND "Theory of Evolution" AND Darwin AND history
limits your search to educational sites dealing with the history of
Darwin's Theory of Evolution.
- domain:au and "coral reef ecology"
limits your hits to documents on Australian servers dealing with coral
reef ecology
- remember however that not all Australian servers have the .au domain.
If you are looking for information housed on a specific computer or
server, check if you can use a "host" or "site" query.
EXAMPLE: Searching for
- host:www.jcu.edu.au will return all pages hosted at this
site (but not those at www.library.jcu.edu.au).
URL searching finds all pages whose URL contains a given string.
EXAMPLE: Searching for
- url:jcu will return all pages on any JCU server (plus
other pages whose URL contains the letter string jcu)
Some search engines provide a special box in the Advanced Search options.
This allows you to search for the above without having to remember the
special commands.
Link searching
Link searches tell you which web pages are providing links to your
webpage (or any page you are interested in).
EXAMPLE: Searching for
- link:www.jcu.edu.au returns pages with links to the
JCU home page.
Language Limiting
Most search engines allow you to limit your results to webpages
in a single specified language (English, French etc)
Date Limiting
Some search engines allow you to limit your searches (usually in
the advanced search option) to pages which have been updated within
a given period of time - e.g. last three months, last year. Use
this carefully as it is very imperfect.
File Format Limiting
Some search engines now index more than just .HTM, .HTML, .SHTML
etc web pages.
Google and AlltheWeb both index .PDF files on the web.
PDF (Adobe's Portable Document Format) files are exact facsimiles
of printed pages.
Google also indexes web documents created using Microsoft Word,
Excel and PowerPoint and some others.
Image searching
Most search engines allow you to search for images via a special
search page - often accessed by clicking a tab on the main search
page.
The search will return thumbnail (ie reduced size) images which
you can then click on to get the full picture.
Image searching is very imprecise - you depend on how accurately
the web page creator has named the image file or how well the search
engine links the surrounding descriptive text to an image.
Search engines also allow you to search specifically for video and
MP3 files in a similar way to images.
Translations
Some search engines (eg Alta Vista, Google) provide very rough
but serviceable machine translations of webpage text.
This is usually limited to a few paragraphs in the more common European
languages like French, German, Italian or Spanish -
but it's often enough to be useful.
Finding email addresses
Search engines are one of the most effective ways of finding email
addresses.
Try searching for "email peter costello". If that doesn't
work, try it without the quote marks.
Finding your search words in a web document
When you've gone to a document as the result of a search, use the
EDIT >>> FIND (ie find in page) command
in Netscape to locate your keyword/s.
Problems: What To Do If You Get...
Too many hits
Try using
- AND (+)
- NOT (-)
- additional search terms
- more specific search terms
- a phrase
- a subject directory
Too few hits
Try using
- OR with words that are synonymous with your existing search terms
- truncation
- fewer search terms
- broader, less specific terms
- another search engine
"404 - FILE NOT FOUND" message
This message tells you that the file you seek no longer exists at the
given URL. It may have been moved, removed, or renamed.
- check out the other search results to see if the same page appears
with a different URL
- try your search on Google, which maintains cached copies of pages.
- edit the URL by successively removing parts from the left
EXAMPLE: if you get a 404 from
http://www.wherever.edu/library/information/howtosearch.html
try
http://www.wherever.edu/library/information/
then try
http://www.wherever.edu/library/
and finally
http://www.wherever.edu/
At each stage look for links or a site search engine that might point
you to the document you are looking for.
"SERVER DOES NOT HAVE A DNS ENTRY" message
This message tells you that your browser can't locate the server (i.e.
the computer that hosts the web page).
It usually means that the network is busy or that the server is temporarily
unavailable. Try again later.
It could also mean you typed in the URL incorrectly - check your spelling.
"SERVER ERROR" OR "SERVER IS BUSY" message
The server you are attempting to contact may be down or very busy.
Try again later.
For More Information on Search Engines
(and links to our preferred ones):
Visit our Searching
the Internet page.
It is the result of testing all the major search engines with a variety
of searches
and finding which gave the best results with the least effort.
We looked at
- how long searches took,
- how easy it was to use the search engines, help files and results
pages
- which gave the most results - and more importantly,
- which gave the most relevant results on the first page.
Remember that choice of search engines is very subjective, and the search
engine world changes very rapidly.
Evaluating Web Pages
You can usually rely on information you find in print sources in the University
library (or their online versions).
Scholarly books and journal articles are written by experts and go through
a peer-reviewing process.
This helps ensure quality and accuracy.
Knowing the author, editor and publisher gives you a good idea of the
quality of what you are reading.
It's much easier for anyone to "publish" on the web.
It follows that it is much harder (and more important) to evaluate the
quality and accuracy of what you find there.
Don't automatically accept everything you read on a web page.
Check for bias and objectivity.
Who sponsors the page? The Flat Earth Society? The Royal Astronomical
Society?
What are the qualifications and affiliations of the author?
Do sites which you accept as reputable link to the page?
Does the page itself link to dubious sites?
These are just a few of the questions you should ask.
Checking the source
You can expect to find everything on the web: from scholarly papers
to commercial puffery, from encyclopaedic knowledge
by world experts to downright lies by malicious hoaxers. How do you
sort it all out?
You can tell a lot about the authenticity of a page by finding out
all you can about its author/publisher.
Knowing how to read a URL (Universal Resource Locator) helps.
Every webpage has a unique URL - it is the WWW version of a street address.
URLs: tell you where a document is housed on the WWW, and take you straight
to it.
http://www.jcu.edu.au/ is the URL for the JCU homepage:
Here's what it means:
- http is the protocol
- www is the host computer name
- jcu (James Cook University) is the second-level domain
name
- edu is the top-level domain name
- au is the country code (sites without a country codes
are usually in the USA)
Government and educational hostnames (GOV, EDU, AC) are usually more reliable
and accurate
than commercial and private hostnames (NET, ORG, COM).
Go here
for more information on URLs
Checking the vital signs
A reputable web page will usually provide you with the following information:
- Last date page updated
- Mail-to link for questions and comments
- Name, description and contact information for the page author
When was the page was last updated. Is the information current?
(In Netscape, you can check View >>> Document Info for more information
on when the page was updated.)
Try to distinguish between advertising, and serious content.
Beware of authors with email addresses like queenie@buckingham.palace.
Watch out for deliberate frauds and hoaxes. Many urban legends start
on the web.
Check out every statement on your internal credibility meter and if
alarm bells ring, check further.
Go here
for more information on evaluating web pages.
Saving Web Pages
Unlike printed materials, web pages are fluid.
The page you cite today may disappear tomorrow (or be radically changed
in content).
When you are using web pages as source material (eg for an research
paper),
keep copies for later reference and verification.
To save a webpage to disk, go to FILE on the menu bar and select
SAVE AS.
You can save it either as TEXT (.txt) or HTML (.htm).
You can also save the entire page exactly as you see it, pictures
and all, in Netscape. Here's how:
- Create a new folder
- Under FILE select EDIT PAGE
- The page will now open in COMPOSER
- Under FILE, select SAVE AS
- Save with an appropriate name in the new folder
Some older versions of Netscape require a slightly different procedure:
- Create a new folder
- Under FILE select EDIT DOCUMENT.
- A dialogue box "Save Remote Document" will appear.
- Make sure a cross appears in all the boxes, then select
SAVE.
- Save it to the previously created new folder.
- You will now be able to open the document locally exactly
as you saw it.
If you just want to save an image, right click in Netscape and select
SAVE IMAGE AS...
Bookmarking
Search Engines are dynamic. If you exactly repeat your search at
a later date, you won't necessarily get the same results in the
same order.
If you find a useful page as the result of a search and there is
a likelihood you will want to go back to it later, BOOKMARK it -
otherwise you may never find it again.
Acknowledgement
Much of this information was borrowed from the very excellent Bare
Bones 101 web tutorial created by Ellen Chamberlain, Head Librarian
and Full Professor at the University of South Carolina Beaufort campus.
Feedback to Ward
Saylor, including suggested additions, excisions and corrections,
is welcome. Produced for Information and Research Support,
part of the Information Services program, Academic Support
Division at James Cook University
by Ward Saylor, August 2002.
|