Web Mail | LearnJCU | Contacts | Bulletins | Campus Maps
   Information For > Prospective Students | International Students | Current Students | Visitors | Staff | Jobs at JCU
Information About > The University | Research | Teaching | Courses & Degrees | Faculties & Divisions | Library & Computing

Finding More on the Web

How to make Search Engines work for you

Search Engines | Meta-Search Engines | Subject Directories, Portals, and Gateways | Creating a Search Strategy | Basic Search Techniques | Advanced Search Techniques | Problems | More Information | Evaluating Web Pages| Saving Web Pages


Download
this guide in Word to print (10 pages).



Search Engines

    Search engines are actually huge databases compiled by "spiders" or "robots" ("bots"),
    The "bots" automatically crawl through the WWW indexing pages.

    Many websites stop search engines indexing them.
    Most search engines don't index every web page within a site.
    Some web pages may be only partially indexed.
    The biggest search engines probably index less than 20% of the web.
    Even so, a simple search is likely to return hundreds of thousands of hits.
    Typically, you'll find more than you want, and less than you need.

    When you do a search, the search engine doesn't actually go out and search the entire WWW while you wait.
    It simply scans its database for matches to the keywords and phrases you typed in.
    So you are really searching a small portion of the web as it existed when the search engine last indexed it.

    That's why your search sometimes finds websites which appear irrelevant - the contents have changed since the search engine bots last visited.
    It also explains why search engines won't always find really current news items.
    Some Search Engines show the date they last indexed the page, which is very useful.

    Search engines are inadequate but they are the best means yet devised for searching the web.

    Are all search engines created equal?

    No two search engines are the same in terms of size, speed, content, ranking schemes and search options.
    Therefore, your search is going to be different on every engine you use.

    Typically, with two search engines, out of any 100 hits, 60 will appear in both, and 40 will appear in only one.

    Try a few search engines.
    Choose the one that seems to work best for you for your default use.
    If it doesn't come up with the goods, then try others - for serious searching, always consider using two search engines.

    What's on top: how search engines rank web pages

    Ranking is the order in which hits are displayed and it is vitally important to the usefulness of a search engine.
    You want the most relevant pages to appear at the top of the list.
    Anything which doesn't appear on the first screen has a seriously reduced chance of being looked at.

    Search engines rank according to various rules and the exact methodology is usually secret.
    The number of times a keyword appears, where it appears (at the top of a document is better than at the bottom),
    how many other sites link to it, how often a page gets returned in searches, how often a page gets visited
    - there are many possibilities, none of them perfect. Google seems to do it best at present.

    Describing the sites

    How search engines describe each site is also important.
    That description is all you have to decide whether a site is worth exploring further.

    Some search engines include the first line or two on the page.
    Others attempt to show how each search word appears in the page, which is generally more useful.

    Duplicate hits

    Often the same page exists on the web with a variety of different URLs.

    Some search engines filter out this duplication so you get fewer, but all different, hits.
    Others cluster duplicate hits (or those appearing under the same home page) so they appear together.
    This saves you time and helps you evaluate how useful a site might be.

Top of Page

Meta-Search Engines

    Meta-search engines don't have their own databases.
    They search the databases of many individual engines simultaneously.
    In other words, meta-search engines search search engines.

    Most meta-search engines display the results as a single merged list, identify the source search engine,
    and attempt to remove duplicates.

    It sounds good, but there are drawbacks - the most common

    • Because search engines differ in search methods, there's no guarantee your meta-search will work properly.
    • To reduce problems, most meta-search engines restrict you to very simple searches
    • You rarely get as many results as you would with the individual search engines
    • On the other hand, those that are displayed are the most highly ranked, and so tend to be more relevant.
    • Google, one of the best search engines, is excluded from most meta-searches.


    When do you use meta-search engines?

    Use meta-searchers when you

    • want a broad overview,
    • have a very simple search
    • haven't had any luck with your favourite search engine/s.
    • want help choosing the best search engine for your purposes
Top of Page

Subject Directories, Portals, Vortals and Gateways

    Subject Directories organise links to Internet resources into subject categories.

    They are created by humans, not robots.
    Every link is individually selected, evaluated and often annotated by "expert" editors.
    Selection criteria and content quality tend to be variable.

    Directories are much smaller and more focussed than search engine databases.
    They index far fewer sites, and fewer pages on each site (typically only the home page or top level pages).
    Most offer a search engine to query all sites in the directory or in a subject category.

    For and against subject directories

    Directories are organised hierarchically into browsable subject categories and sub-categories.
    For broad topics, they typically give you fewer, higher quality, and more relevant results than search engines.

    Because directories are maintained by humans, keeping them up to date is more difficult than for search engines.
    There tend to be more dead links and pages with changed content which are no longer relevant to your search.

    Some directories also appear to favour e-commerce sites.
    Giving high rankings in return for payment is an issue for subject directories and search engines.

    Most search engines now include directories - their own (as with Yahoo!) or under license (usually the Open Directory.)
    Search engines are a far easier way to find the relevant directory than having to drill down through several hierarchical subject layers.

    When do you use subject directories?

    Subject directories are best for

    • browsing
    • broad searches likely to return too many hits with a search engine
    • information on popular topics, organisations, and products
    • a quick overview of the kind of information available on the web in a particular field of interest


    What are Portals?

    Portals are subject directories with added commercial and community content.

    What are Vortals?

    Vortals (Vertical Industry Portals) are focussed portals that concentrate on a particular subject or industry.

    What are Subject Gateways (and the "Invisible Web")

    Subject gateways provide links to evaluated resources which support research in a particular discipline.
    These resources are reviewed, recommended and described by specialists in the field, usually librarians and academics.

    Gateways are like directories, but are usually designed to support academic teaching, learning and research needs.
    They are good sources for specialised high quality information relevant to the subject area.
    They can include links to information in the "Invisible Web".

    A large portion of the web, dubbed the "Invisible Web," is barred to search engine spiders.
    This includes password-protected sites, documents behind firewalls, and the contents of specialised databases created by researchers, governmental agencies etc.

    Top of Page

Creating a Search Strategy

    First, think carefully about what you are searching for:

    • What question do you want answered, and why?
    • Is your question very general or highly specialised?
    • What sort of information are you looking for?
    • Is there likely to be lots of information, or very little, available?


    Next, decide your approach - do you want to

    • Locate a specific piece of information?
      • Use a search engine. If that fails, try a subject gateway or directory.
    • Retrieve everything you can find on the subject?
      • Start with your favourite search engine and follow up with a couple of others and/or a meta-search engine.
      • Don't forget to check resources off the web, such as books, journals and other print reference sources.
    • Browse?
      • If you're browsing and getting and idea of what's available in your subject area, start with a subject directory or gateway. If this fails, try a meta-search engines, just to see what sort of stuff is out there.


    Finally, construct your search statement

    Constructing a search statement

    Every search statement or query consists of a few key words or phrases which
    MUST, MAY or MUST NOT occur in a webpage for the search engine to list it.

    When constructing your search, keep the following tips in mind:

    • carefully analyse your question/topic
    • reduce it to a few KEYWORDS - words that are vital to your search and accurately describe your topic
    • use more than one keyword (three is a useful minimum to aim for)
    • try to think of all the other words (and spellings) that might be used in webpages covering your topic.
    • be as specific as you can
    • use nouns as keywords wherever possible
    • try to avoid using very common words
    • start with a simple search first (unless you are pretty expert)
    • if a simple search fails, try using advanced search strategies like BOOLEANS and TRUNCATION (see below)
Top of Page

Basic Search Techniques

    These will work with most search engines in their basic search option.

    Forcing the inclusion/exclusion of words

    Use the plus (+) and minus (-) signs in front of words to force their inclusion and exclusion in searches.

    EXAMPLE: Searching for:

      • anorexia bulimia will, with some search engines, return any page that contains EITHER word
      • +anorexia +bulimia will return only pages that contain BOTH words

      • (NOTE: NO space between the sign and the keyword),
      • +anorexia -bulimia should return only pages that contain the word anorexia without the word bulimia


    Searching for phrases

    Use double quotation marks (" ") around phrases to ensure they are searched exactly as you type them,
    with the words side by side in the same order.

    EXAMPLE: Searching for

      • "north queensland" will only return pages where queensland follows north, separated by a single space
      • +north +queensland will return pages with both words appearing anywhere, in any order


    Simple Search Menus

    Many search engines allow you to choose a simple search option from a drop-down menu or using buttons:

    • All of the words or Must contain will return only documents containing ALL the words you enter (as with +)
    • Any of the words or Should contain will return documents containing ANY of the words you enter
    • Must not contain will exclude all documents containing this word (as with -)
    • Search as a phrase will return only documents containing that exact string of letters and spaces (as with " ")


    Combining words and phrases

    You can combine phrases with keywords, using the double quotes and the plus (+) and/or minus (-) signs.

    EXAMPLE: Searching for

      • +"james cook university" +biology -marine will find references to non-marine biology at JCU


    Upper and lower case

    Type words in lower case to find both lower and upper case versions.
    With some search engines, capital letters will only return an exact case match.

    EXAMPLE: Searching for

      • north will return pages containing north OR North OR NORTH.
      • North may not return pages that contain north OR NORTH UNLESS they also contain North

Punctuation

Punctuation is sometimes included and sometimes ignored by search engines. Google includes apostrophes, but ignores commas, full stops, colons, semi-colons etc.

    EXAMPLE: Searching in Google for

    • the keyword Crocodile's gives different results to Crocodiles.
    • the phrases "crocodiles, alligators" "crocodiles. Alligators" "crocodiles: alligators" "crocodiles; alligators" and "crocodiles alligators" all return the same results.


    Truncation

    Use truncation to look for all words which start the same way.
    The most common truncator is *
    Alta Vista is best for truncation (most other search engines don't support it).
    You don't usually need truncation (but it can be a very useful tool)

       

      EXAMPLE: Searching for

      • north* returns north, northern, northerly etc (but remember, it will also return Northumbria and other irrelevant words so be careful how and where you truncate).

Stop Words

    Most search engines speed up searches by ignoring small and common words.
    These might include a, an, and, as, at, be, if, into, it, of, on, or, the, to, etc.

    If a search engine is ignoring words essential to your search, you'll have to find another one that accepts them.

Top of Page

Advanced Search Techniques

Not all search engines offer all (or even most) of the following advanced search options.
You should check out the search engine's help file before trying to use them.
As a general rule, the more advanced the search, the more chance of errors.
It is always worth giving your results a reality check.

Boolean logic

Boolean logic is a fancy term for using the operators: AND, OR, and NOT to link words and phrases for more precise queries.

NOTE: Always use CAPS when typing Boolean operators - even if it isn't necessary, it won't hurt.

AND

AND narrows your search by returning only documents that contain every one of the keywords you enter
- the more you enter, the narrower your search becomes (works the same as +)

    EXAMPLE: coke AND chips

OR

OR expands your search by returning documents which contain any of your keywords
- the more keywords you enter, the more documents you will retrieve.

EXAMPLE: coke OR chips
Note that the use of OR can sometimes produce erratic results

NOT

NOT limits your search by eliminating documents containing keyword/s (works the same as - above).
Note that some search engines accept NOT while others accept AND NOT.

EXAMPLE: coke NOT chips

The plus (+) and minus (-) signs used instead of AND and NOT are referred to as "implied Boolean operators".
Implied Boolean operators are accepted in the basic search options of most search engines.
AND, NOT etc as Boolean operators are sometimes accepted only in the advanced search option.

Some basic search engines allow limited use of Booleans via dropdown menus (see also Simple Search Menus above).

EXAMPLE:

  • All of the words or Must contain works the same as AND,
  • Any of the words or Should contain works the same as OR,
  • Must not contain works the same as NOT.

Search engine defaults

If your search statement contains more than one keyword without any Boolean operator,
the search engine will automatically default to either AND or OR.
This can radically alter your search, so make sure you know the default settings your search engine uses.

EXAMPLE: if you search for north queensland

  • some search engines will return any page that contains EITHER north OR queensland,
  • most search engines will return only pages that contains BOTH north AND queensland.

Nesting

Nesting (or using parentheses ((brackets)), allows you to do very complex searches by combining several search statements into one. Very few search engines support nesting, and you should be careful attempting it. Usually it isn't necessary.

EXAMPLE
  • (coral OR corals) AND ("great barrier reef" OR "queensland")
  • ((DNA OR protein* OR nucleic) AND (sequenc* OR data*)) AND human AND (genome* OR gene*)

Proximity operators

    Some search engines accept proximity operators. These can be useful in refining search statements.

    NEAR allows you to search for terms situated within a specified distance of each other in any order.
    The use of NEAR can sometimes produce better results than AND.

      EXAMPLE: phylogeny NEAR ontogeny
    ADJ (adjacent to) returns documents where the two terms appear next to each other but, unlike phrase searching,
    allows them to appear in any order.
      EXAMPLE: Searching for
      • Ernest ADJ Hemingway returns both Ernest Hemingway and Hemingway Ernest.


    Domain, country, host and URL searching

    If you are seeking information from a particular kind of site, you may be able to limit your search to a top level domain
    (see Checking the Source below).

    You can also restrict a search to sites in a particular country by using the two-letter country code.
    For a list of ISO Internet Country Codes, try searching "Internet country codes".

    Remember that most US sites do not have a country letter code, and the use of country codes is not perfect.

      EXAMPLE: Searching for
      • domain:edu AND "Theory of Evolution" AND Darwin AND history

      • limits your search to educational sites dealing with the history of Darwin's Theory of Evolution.
      • domain:au and "coral reef ecology"

      • limits your hits to documents on Australian servers dealing with coral reef ecology
        - remember however that not all Australian servers have the .au domain.

     

    If you are looking for information housed on a specific computer or server, check if you can use a "host" or "site" query.

      EXAMPLE: Searching for
      • host:www.jcu.edu.au will return all pages hosted at this site (but not those at www.library.jcu.edu.au).


    URL searching finds all pages whose URL contains a given string.

      EXAMPLE: Searching for
      • url:jcu will return all pages on any JCU server (plus other pages whose URL contains the letter string jcu)

     

Some search engines provide a special box in the Advanced Search options.
This allows you to search for the above without having to remember the special commands.

Link searching

    Link searches tell you which web pages are providing links to your webpage (or any page you are interested in).

      EXAMPLE: Searching for
      • link:www.jcu.edu.au returns pages with links to the JCU home page.


    Language Limiting

    Most search engines allow you to limit your results to webpages in a single specified language (English, French etc)

    Date Limiting

    Some search engines allow you to limit your searches (usually in the advanced search option) to pages which have been updated within a given period of time - e.g. last three months, last year. Use this carefully as it is very imperfect.

    File Format Limiting

    Some search engines now index more than just .HTM, .HTML, .SHTML etc web pages.
    Google and AlltheWeb both index .PDF files on the web.
    PDF (Adobe's Portable Document Format) files are exact facsimiles of printed pages.
    Google also indexes web documents created using Microsoft Word, Excel and PowerPoint and some others.

    Image searching

    Most search engines allow you to search for images via a special search page - often accessed by clicking a tab on the main search page.
    The search will return thumbnail (ie reduced size) images which you can then click on to get the full picture.
    Image searching is very imprecise - you depend on how accurately the web page creator has named the image file or how well the search engine links the surrounding descriptive text to an image.
    Search engines also allow you to search specifically for video and MP3 files in a similar way to images.

    Translations

    Some search engines (eg Alta Vista, Google) provide very rough but serviceable machine translations of webpage text.
    This is usually limited to a few paragraphs in the more common European languages like French, German, Italian or Spanish -
    but it's often enough to be useful.

    Finding email addresses

    Search engines are one of the most effective ways of finding email addresses.
    Try searching for "email peter costello". If that doesn't work, try it without the quote marks.

    Finding your search words in a web document

    When you've gone to a document as the result of a search, use the EDIT >>> FIND (ie find in page) command
    in Netscape to locate your keyword/s.

    Top of Page

Problems: What To Do If You Get...

    Too many hits

    Try using

    • AND (+)
    • NOT (-)
    • additional search terms
    • more specific search terms
    • a phrase
    • a subject directory


    Too few hits

    Try using

    • OR with words that are synonymous with your existing search terms
    • truncation
    • fewer search terms
    • broader, less specific terms
    • another search engine


    "404 - FILE NOT FOUND" message

    This message tells you that the file you seek no longer exists at the given URL. It may have been moved, removed, or renamed.

    • check out the other search results to see if the same page appears with a different URL
    • try your search on Google, which maintains cached copies of pages.
    • edit the URL by successively removing parts from the left

    •   EXAMPLE: if you get a 404 from
      http://www.wherever.edu/library/information/howtosearch.html
      try
      http://www.wherever.edu/library/information/
      then try
      http://www.wherever.edu/library/
      and finally
      http://www.wherever.edu/

      At each stage look for links or a site search engine that might point you to the document you are looking for.
       

    "SERVER DOES NOT HAVE A DNS ENTRY" message

    This message tells you that your browser can't locate the server (i.e. the computer that hosts the web page).
    It usually means that the network is busy or that the server is temporarily unavailable. Try again later.
    It could also mean you typed in the URL incorrectly - check your spelling.

    "SERVER ERROR" OR "SERVER IS BUSY" message

    The server you are attempting to contact may be down or very busy. Try again later.

Top of Page

For More Information on Search Engines (and links to our preferred ones):

    Visit our Searching the Internet page.

    It is the result of testing all the major search engines with a variety of searches
    and finding which gave the best results with the least effort.

    We looked at

    • how long searches took,
    • how easy it was to use the search engines, help files and results pages
    • which gave the most results - and more importantly,
    • which gave the most relevant results on the first page.
    Remember that choice of search engines is very subjective, and the search engine world changes very rapidly.
Top of Page

Evaluating Web Pages

    You can usually rely on information you find in print sources in the University library (or their online versions).
    Scholarly books and journal articles are written by experts and go through a peer-reviewing process.
    This helps ensure quality and accuracy.
    Knowing the author, editor and publisher gives you a good idea of the quality of what you are reading.

    It's much easier for anyone to "publish" on the web.
    It follows that it is much harder (and more important) to evaluate the quality and accuracy of what you find there.

    Don't automatically accept everything you read on a web page.
    Check for bias and objectivity.
    Who sponsors the page? The Flat Earth Society? The Royal Astronomical Society?
    What are the qualifications and affiliations of the author?
    Do sites which you accept as reputable link to the page?
    Does the page itself link to dubious sites?
    These are just a few of the questions you should ask.

    Checking the source

    You can expect to find everything on the web: from scholarly papers to commercial puffery, from encyclopaedic knowledge
    by world experts to downright lies by malicious hoaxers. How do you sort it all out?

    You can tell a lot about the authenticity of a page by finding out all you can about its author/publisher.

    Knowing how to read a URL (Universal Resource Locator) helps.
    Every webpage has a unique URL - it is the WWW version of a street address.
    URLs: tell you where a document is housed on the WWW, and take you straight to it.

    http://www.jcu.edu.au/ is the URL for the JCU homepage:

    Here's what it means:

      • http is the protocol
      • www is the host computer name
      • jcu (James Cook University) is the second-level domain name
      • edu is the top-level domain name
      • au is the country code (sites without a country codes are usually in the USA)
    Government and educational hostnames (GOV, EDU, AC) are usually more reliable and accurate
    than commercial and private hostnames (NET, ORG, COM).

    Go here for more information on URLs

    Checking the vital signs

    A reputable web page will usually provide you with the following information:

      • Last date page updated
      • Mail-to link for questions and comments
      • Name, description and contact information for the page author
    When was the page was last updated. Is the information current?
    (In Netscape, you can check View >>> Document Info for more information on when the page was updated.)

    Try to distinguish between advertising, and serious content.
    Beware of authors with email addresses like queenie@buckingham.palace.
    Watch out for deliberate frauds and hoaxes. Many urban legends start on the web.
    Check out every statement on your internal credibility meter and if alarm bells ring, check further.

    Go here for more information on evaluating web pages.

Top of Page

Saving Web Pages

    Unlike printed materials, web pages are fluid.
    The page you cite today may disappear tomorrow (or be radically changed in content).

    When you are using web pages as source material (eg for an research paper),
    keep copies for later reference and verification.

    To save a webpage to disk, go to FILE on the menu bar and select SAVE AS.
    You can save it either as TEXT (.txt) or HTML (.htm).

    You can also save the entire page exactly as you see it, pictures and all, in Netscape. Here's how:

        • Create a new folder
        • Under FILE select EDIT PAGE
        • The page will now open in COMPOSER
        • Under FILE, select SAVE AS
        • Save with an appropriate name in the new folder
      Some older versions of Netscape require a slightly different procedure:
        • Create a new folder
        • Under FILE select EDIT DOCUMENT.
        • A dialogue box "Save Remote Document" will appear.
        • Make sure a cross appears in all the boxes, then select SAVE.
        • Save it to the previously created new folder.
        • You will now be able to open the document locally exactly as you saw it.
    If you just want to save an image, right click in Netscape and select SAVE IMAGE AS...

    Bookmarking

    Search Engines are dynamic. If you exactly repeat your search at a later date, you won't necessarily get the same results in the same order.

    If you find a useful page as the result of a search and there is a likelihood you will want to go back to it later, BOOKMARK it - otherwise you may never find it again.

Acknowledgement

    Much of this information was borrowed from the very excellent Bare Bones 101 web tutorial created by Ellen Chamberlain, Head Librarian and Full Professor at the University of South Carolina Beaufort campus.
Feedback to Ward Saylor, including suggested additions, excisions and corrections, is welcome.Produced for Information and Research Support, part of the Information Services program, Academic Support Division at James Cook University by Ward Saylor, August 2002.
 

Top of Page

If this information is inadequate, incorrect, or can be improved in any way, please let us know