Copyright Robelle Solutions Technology Inc. 1995-1996
OverviewTheWorld Wide Web (WWW)is a collection of servers distributed all over the world that respond to various clients. The WWW allows you to click on links to text, pictures, music, or video located on these servers and then to play the selected files on your local client PC, workstation, or terminal, along with more links to related information. You never need to know where the information is located or to learn any obscure commands to access it.
The on-line version of this paper is available as alinked set of filesor as a largesingle file.Downloading this paper as a single file may take some time, but has the advantage of making it convenient to save or print the entire paper with your Web browser.
To help you understand the World Wide Web, we have organized this paper into these major sections: Copyright 1995-1996 Robelle Solutions Technology Inc.
To understand the WWW, it helps if you understand some basic Web concepts. Fundamental to this understanding is the concept of client/server computing on a global scale.
Whether youre reading WWW documents or creating your own, it helps if you understand the basic components of the WWW language.
One powerful feature of the WWW is that the information you publish on your server can be read by many different clients. In this section, we provide a quick introduction to some of the popular WWW clients.
If you want to make your own information available to WWW clients, youll want to set up your own server. In this section, we discuss some common WWW server software and give our suggestions for how WWW server information should be designed.
The WWW is a big place. Here are a few pointers to some of the things that we have liked or found useful.
These are our parting thoughts on client/server, WWW, and the Internet.
A short list of books that we have found very useful for learning more about the WWW.
Jump on board for a ride on the Web. We hope that youll find enough information here to join us with your own WWW information.
The WWW is a new way of viewing information — and a rather different one. If, for example, you are viewing this paper as a WWW document, you will view it with a browser, in which case you can immediately access hypertext links. If you are reading this on paper, you will see the links indicated in parentheses and in a different font. Keep in mind that the WWW is constantly evolving. We have tried to pick stable links, but sites reorganize and sometimes they even move. By the time you read the printed version of this paper, some WWW links may have changed.
The World Wide WebThe WWW project has the potential to do for the Internet what Graphical User Interfaces (GUIs) have done for personal computers — make the Net useful to end users. The Internet contains vast resources in many fields of study (not just in computer and technical information). In the past, finding and using these resources has been difficult. The Web provides consistency: Servers provide information in a consistent way and clients show information in a consistent way. To add a further thread of consistency, many users view the Web through graphical browsers which are like other windows (Microsoft Windows, Macintosh windows, or X-Windows) applications that they use. A principal feature of the Web is its links between one document and another. These links, described in the section on hypertext, allow you to move from one document to another. Hypertext links can point to any server connected to the Internet and to any type of file. These links are what transform the Internet into a web.A History of the WebThe Web project was started by Tim Berners-Lee at the European Particle Physics Laboratory (CERN) in Geneva, Switzerland. Tim wanted to find a way for scientists doing projects at CERN to collaborate with each other on-line. He thought of hypertext as one possible method for this collaboration. Tim started the WWW project at CERN in March 1989. In January 1992, the first versions of WWW software, known as Hypertext Transfer Protocol (HTTP), appeared on the Internet. By October 1993, 500 known HTTP servers were active. When Robelle joined the Internet in June 1994, we were about the 80,000th registered HTTP server. By the end of 1994, it was estimated that there were over 500,000 HTTP servers. Attempts to keep track of the number of HTTP servers on the Internet have not been successful. Programs that try to automatically count HTTP servers never stop — new servers are being added constantly.
On-Line versus BatchThis paper is available on the World Wide Web (on-line) or as a paper document (batch). If you are reading this viaRobelles WWW Service,you probably already know how to access the on-line version. Much of the value of the Web lies in its links between one document and another. When you view this paper with a WWW browser, the links are hidden from you. When you read the text or paper copy of this paper, you see the links in parentheses. Because links tend to be long, they do not format well in the text and paper versions. Since more than half the effort of writing this paper went into finding and testing the links, we have left them in the text and printed versions, despite their distracting appearance. We will describe what the links mean a little later.
What is Hypertext?Hypertext provides the links between different documents and different document types. If you have used Microsoft Windows WinHelp system or theMacintoshhypercard application, you likely know how to use hypertext. In a hypertext document, links from one place in the document to another are included with the text. By selecting a link, you are able to jump immediately to another part of the document or even to a different document. In the WWW, links can go not only from one document to another, but from one computer to another.
Client/Server ComputingThe last few years have seen an explosion of information about client/server computing. For many people, the definition of client/server is still unclear. We describe it as a method of distributing applications over one or more computers. A client is one process that requests services of another process. These processes can be on different computers or on the same computer. The processes communicate via a networking protocol.
People often think of client/server computing in terms of local area networks, PCs with graphical user interface capabilities, and servers with information that is needed by the PC clients. You do not have to implement client/server computing this way. It is possible for the same computer to be both the client and the server. The key point is that there is a communications protocol that allows two processes (often on different computers) to request and to respond to demands for services.
The Hypertext Transfer ProtocolWhen you use a WWW client, it communicates with a WWW server using the Hypertext Transfer Protocol(HTTP).When you select a WWW link, the following things happen:
The client looks up the hostname and makes a connection with the WWW server.The HTTP software on the server responds to the clients request.The client and the server close the connection.Compare this with traditional terminal/host computing. Users usually logon (connect) to the server and remain connected until they logoff (disconnect). An HTTP connection, on the other hand, is made only for as long as it takes for the server to respond to a request. Once the request is completed, the client and the server are no longer in communication.
WWW clients use the same technique for other protocols. For example, if you request a directory at ananonymous FTP site,the WWW client makes an FTP connection, logs on as an anonymous user, switches to the directory, requests the directory contents, and then logs off the FTP server. If you then select a file, the WWW client once again makes an FTP connection, logs on again, changes directories, downloads the file, and then logs off. If you use an FTP client to do the same thing, you would normally log on to the FTP server, change directories several times, and download one or more files. Only when you were finished would you log off.
The InternetThe Internet is the worlds largest interconnected computer network. Computers on the Internet communicate using the Internet Protocol (IP) and the Transmission Control Protocol (TCP). You identify individual computers by their IP-address. This address is a 32-bit number that is usually represented by four octets (e.g., 184.108.40.206). Fortunately, you can usually refer to a computer by its name (e.g.,
If you can send network packets to one computer on the Internet, you can send network packets to any computer on the Internet. This feature is what makes the Internet so powerful; it is also what concerns system managers. If you can send packets to the Internet, it follows that anyone can send packets to your computer, even the PC on your desktop.Accessing the InternetIf you are reading the text or paper version of this paper, youre probably wondering How do I get started on the Internet? It is much easier to connect an individual PC and a modem to the Internet than it is to connect a server like an HP 3000 or HP 9000. We suggest that you find a local Internet access provider to connect your PC to the Net. Most access providers include everything you need to log on and start exploring. In addition, several books on connecting to the Internet also provide all the software and the telephone numbers of Internet access providers you need to get started. Once youre connected to the Internet, you can begin investigating many of the sites described in this paper. You will also be able to access and download much of the software needed to create your own WWW application which, as we discuss further on, can be of help to you, even if you never plan to connect your servers to the Internet.
In order to use the WWW, you must know something about the language used to communicate in the Web. There are three main components to this language:Uniform Resource Locators (URLs)
URLs provide the hypertext links between one document and another. These links can access a variety of protocols (e.g., ftp, gopher, or http) on different machines (or your own machine).
WWW documents contain a mixture of directives (markup), and text or graphics. The markup directives do such things as make a word appear in bold type. This is similar to the way UNIX users write nroff or troff documents, and MPE users write with Galley, TDP, or Prose. For PC users, this is completely different from WYSIWYG editing. However, a number of tools are now available on the market that hide the actual HTML.
Servers use the CGI interface to execute local programs. CGIs provide a gateway between the HTTP server software and the host machine.
Uniform Resource Locators(URLs) specify the access-method (how), the server name (where), and the location (what) needed for a WWW client to find and access a WWW object. The general form of a URL isaccess-method://server-name[:port]/locationAccess MethodsThe three most popular access methods arehttp:
This is the method provided by WWW servers. It includes hypertext linking, the hypertext markup language, and server scripts.
Gopherwas developed at the University of Minnesota as a distributed campus information service. There are gopher servers everywhere — many of them provide campus-wide information systems. Gopher information is organized into menus. Because hypertext provides the same services as gopher and more, many sites are moving from gopher-supplied information to WWW-supplied information.
The File Transfer Protocol is one of the oldest and most popular of all Internet services. You can access millions of files, documentation, source code, and other useful objects on anonymous FTP archives. You can use a WWW browser to view and to retrieve information from FTP archives.
Server NameThe server name is an IP host name or an IP address. WWW servers often start with the name www as in port number is usually not needed. If there are many servers on one machine (e.g., two different WWW servers on the same host), you would use a port number to select one of them. By default, WWW servers are on port 80. Other protocols have different ports (e.g., the default for FTP is 21). Most users never need to know about port numbers.Welcome PageMost WWW servers provide a welcome or home page. This is the document that you see if you specify a machine name, but not a document name (see all the examples above under Server Name). Good WWW welcome pages provide a short description of the information the WWW server provides, as well as links to all the other information available on the server. The welcome page must be explicitly configured for each WWW server. If you access a WWW server without giving a document name, and receive the error message no document found, you should try one of the following common document names: welcome.html, index.html, or default.html.LocationThe location can be a filename, a directory, a directory and filename, a server-script name, or something specific to the access-method. Filenames and directory structure often change, so dont be surprised if a URL that worked a few months ago no longer works now.
When you write documents for WWW, you use theHypertext Markup Language a markup language, you mix your text with the marks that indicate how formatting is to take place. Most WWW browsers have an option to View Source that will show you the HTML for the current document that you are viewing.
Each WWW browser renders HTML in its own way. Character-mode browsers use terminal highlights (e.g., inverse video, dim, or underline) to show links, bold, italics, and so on. Graphical browsers use different typefaces, colors, and bold and italic formats to display different HTML marks. Writers have to remember that each browser in effect has its own HTML style sheet. For example, Lynx and Mosaic do not insert a blank line before unnumbered user lists, but Netscape does.
If you want to see how your browser handles standard and non-standard HTML, try theWWW Test Pattern.The test pattern will show differences between your browser, standard HTML, and other broswers.Creating HTMLCreating HTML is awkward, but not that difficult. The most common method of creating HTML is to write the raw markup language using a standard text editor. If you are creating HTML yourself, we have found the chapterAuthoring for the Webin theOReillybook Managing Internet Information Services to be an excellent resource. You might also find theHTML Quick Referenceto be useful.
Bob Green, founder of Robelle, findsHTML Writerto be useful for learning HTML. Instead of hiding the HTML tags, HTML Writer provides menus with all of the HTML elements and inserts these into a text window. To see how your documents look, you must use a separate Web browser.
If you dont want to deal directly with HTML, you can get a WYSIWYG HTML editor. On the PC, we have tried HoTMetal and the Microsoft Word Internet add-on. HoTMetal is produced bySoftQuadThere is a free version, which we found somewhat unreliable, and a professional version. HoTMetal probably works best if you are writing HTML documents from scratch (we tried to edit existing documents, some of which may have had invalid HTML).
Microsoft has produced a new add-on to Microsoft Word that produces HTML.The Internet Assistantis available from Microsoft at no charge. You will need to know the basic concepts of Microsoft Word to take advantage of the Internet Assistant. Since we are not experienced Microsoft Word users, we found that the Internet Assistant didnt help us much.
The HTML area of WWW is changing quickly. Users do not want to go back to ascii text editing after theyve used WYSIWYG editors for the last several years. The Web itself carries a list ofWYSIWYG HTML editorsfor a variety of operating systems.
TheCommon Gateway Interface (CGI)provides a method for WWW servers to invoke other programs. You can write these programs with any tool or language. They usually return HTML as their output. The RobelleWWW server statisticsare provided by a CGI script that runs thegetstats program.FormsThe WWW supportssimple formswith text boxes, radio buttons, and pull-down lists. Forms are processed by CGI scripts.
You will likely first experience the World Wide Web through a WWW client. In WWW terms, these are called browsers. Browsers are available for almost all major computer platforms, however you also need the appropriate network infrastructure to make them work.Network Infrastructure
What browser you use depends largely on how you are connected to the Internet. If you are using a terminal emulator and a serial connection, you will most likely use a character-mode browser. If you can send network packets from your computer to the Internet, you will probably use a graphical-mode browser.
A popular character-mode browser isLynx.You cannot use Lynx to display graphical images, but it does support forms, as well as all HTML 2.0.
Three popular graphical browsers areMosaicNetscapeandMicrosoft Internet Explorer.
Mosaic and Netscape are available for Microsoft Windows, X-Windows, and the Macintosh, while Microsofts IE is only available for Microsoft Windows. Mosaic and Microsoft IE are free to anyone; Netscape is free to any not-for-profit institution.
How you connect to the Internet affects how you view the WWW. If you connect via a modem, you wont be able to view large WWW pages, images, sounds, or video; if you have a T1 connection (1.544M bits/second), you will be able to enjoy these features. Some WWW pages assume that you have a fast connection to the Internet.
Local Area NetworksIf your Local Area Network has a gateway to the Internet (there are several different methods to do this), you should be able to use a graphical browser on your own workstation to cruise the WWW. If you are using a PC with Microsoft Windows, youll need to have aWinsockinterface installed (in addition to the regular networking configuration). Macintosh users already have network support via MacTCP. UNIX workstation users should also have built-in support for networking.Dial-in AccessThere are two methods of dialing into a machine to get access to the Internet. If you dial in and log on as usual (on UNIX you see login: and shell prompt or on MPE you type HELLO and get a colon prompt), your computer is not directly connected to the Internet, so it cannot send network packets from your PC to the Internet. In this case, you will have to use Lynx to access the WWW.
If you dial-in using SLIP (Serial Line IP) or PPP (Point-to-Point Protocol), your computer becomes part of the Internet, which means it can send network packets to and from the Internet. In this case, you can use graphical browsers like Mosaic or Netscape to access the WWW.The Internet Adapteris supposed to allow users with only shell account access to obtain a SLIP connection.ShivaandLivingstonprovide products that allow users to dial into hosts using SLIP or PPP.
While Lynx is not the only character-mode browser, it is one of the most powerful.Lynxis available for many platforms. You can obtain a pre-compiled version of Lynx for MPE/iX .Some users are disappointed that Lynxs display is limited to text. What Lynx does demonstrate is that a single server can provide information to both character-mode and graphical clients. Still, to gain a full understanding of how powerful the client/server concept can be, you should compare Lynxs capabilities to the capabilities of graphical browsers such as Mosaic or Netscape.
Mosaic is one of the tools that makes the WWW so popular. With Mosaic, you can view in-line graphical images surrounded by proportional font text in multiple colors. For an excellent introduction to Mosaic, see the OReilly bookThe Mosaic Handbook.Three versions of the book are available (Windows, Macintosh, and X-Windows). The PC version of Mosaic requires the Win32s subsystem which is described in theMosaic readme file.
While Mosaic is popular, the newer Netscape browser is even more appealing, especially when used with slower network connections. Earlier versions of Mosaic did not display anything until an entire URL (and its associated graphical images) had been downloaded. Netscape, by contrast, starts displaying as soon as a screenful of information is available. As you page down through a document, Netscape barely pauses as it continues to download the URL in the background.
The newest graphical browser is theMicrosoft Internet Explorer.This browser is part of Microsofts strategy to make the Internet an important part of all Microsoft products. Like Netscape, the Microsoft IE also does background network transfers. We perfer Netscape over Microsoft IE, due to Netscapes user interface and better reliability.External ViewersNeither Mosaic nor Netscape tries to handle all the data that can potentially be served up on the Web. They both understand HTML, in-line graphics, and URLs. Netscape can display external GIF (Graphics Interchange Format) files, but Mosaic cannot. To view images, listen to sound, watch movies, or view spread sheets, you must haveexternal toolsto support these data formats. For Microsoft Windows users, a popular graphical viewer isLView.The Mosaic Handbook provides a good introduction to the external tools that you need to support full multimedia applications. Most of these tools also work with Netscape.
WWW servers provide information to the Web. Server software is available for many computer platforms, but setting up a server isnt always easy.Why Set Up a WWW Server?
Even if you dont have an Internet connection, there are lots of uses for an internal WWW server.
Setting up a server to provide information to the many different Internet clients requires extra thought, but the effort is worth it.
Server software exits for UNIX, MPE, Windows NT, Microsoft Windows, and even MS-DOS.
Like most applications, your WWW server will need a little help from time to time.
If you have a full-time Internet connection, you might want to set up a WWW server to provide information about your company, your division, your group, or yourself. Even if you are not connected to the Internet, you still might want to set up a server.
Hypertext is a useful way to distribute information because it can contain mixed text and graphics (or more), as well as links to other documents. Using WWW servers, you can create sophisticated help systems without a lot of work. Once established, these systems then become available to all users on your internal network who have suitable client software (browsers).
With CGI scripts and e-mail, you can automate forms which you now process by hand (e.g., expense reports, travel reports, or purchase requisitions). With some extra work, you could even have the forms processed directly into a database. You can also design scripts to look up information in your existing databases and display it for clients.
If your users are pushing for Microsoft Windows interfaces to all of their database data, you can use your WWW server as an intermediate solution. This way users get an immediate graphical interface and managers can experience the difficulties of managing client/server configurations.
When you set up a WWW server, keep in mind that many different clients will be accessing your server. If your server is available on the Internet, you should not assume that the clients will all have high-speed Internet connections and graphical browsers.
Consider these things when designing your WWW server:
Concentrate on your text. Well-written text conveys a lot of information. If you use text to convey essential information, then your server will be friendly to text-based clients like Lynx.
Organize documents the way you would organize a book: gather information together into chapters; each chapter should describe a single idea or related topics. Provide navigational tools (like previous or next chapter) and an overview with a table-of-contents. We have attempted to have all of these elements in this paper.
Question each graphical image that you provide. Does the graphic add meaning to the text or is it just neat? Compare the size of the graphics file to the size of your text files. If the graphical image is much larger, does it really add a lot of necessary information?
If your WWW server is on a fast network, do all the clients have fast access to your server? You may have a T1 connection (1.544M bits/sec), but many WWW clients connect via 14.4 modems. Some commercial Internet providers even charge by the hour, which makes it more expensive for clients to download large files and graphics. If your clients have a fast connection to the Internet, you can provide more graphical information and larger text files without annoying them. Nevertheless, its a good idea to keep these limitations in mind when youre developing your server.
Try to keep files to a reasonable size (we suggest three to ten thousand bytes long). When converting existing documents to HTML, remember that they will often end up quite large (tens of thousands of bytes). Do clients want to download such a large file only to find that it is of no interest? The converse is also true. Can clients download a single file with the complete text (e.g., this paper), without having to follow all the hypertext links?
Hypertext does not mean disorganization. Provide an index or a table of contents to your web pages, so users can quickly find information. Provide summaries for long articles and files.
Use graphic-design common sense. Use white space to increase readability. If you use special effects (bold, italics, underline, horizontal rules, etc), use them sparingly to increase their effect.
If your WWW server is available on the Internet, many visitors will access your server out of curiosity. Make your welcome page attractive, but clearly identify what information your WWW server is providing. Of all the files you publish, be most careful of the size of your welcome page. It will likely be the most frequently accessed page.
We also suggest that you look at theW3 Style Guide.
First, you need to decide what computer will host your WWW information (or you could pick several hosts). If your WWW server will make information available to many machines, the host must be connected to your network or the Internet.
While WWW server software is available for a variety of machines, each server software package runs only on certain operating systems. The server software you pick will have to be compatible with the host machine that provides the WWW service.WWW Server SoftwareW3 maintain a good list ofWWW server software. Two of the most popular UNIX WWW server software packages areNCSA HTTPDandCERN HTTPD. A pre-compiled copy of the NCSA HTTPD software is available forMPE/iX.
Windows NT is becoming more popular as a WWW server, largely due to its built-in networking support and its familiar Windows interface. FreeWindows NT HTTP Server softwareis available from theEuropean Microsoft Windows NT Academic Center.TheRobelle Windows NT WWW Serveruses the OReillyWebsitesoftware. Website comes with comprehensive documentation — something other server software is lacking.
Configuration and management is different for each package. We found theOReilly BookManaging Internet Information Services to be a valuable resource in setting up our WWW servers. The book is an excellent introduction to HTML, with many good examples of configurations. Unfortunately, the book only covers the configuration of the NCSA HTTPD software.SecurityThe CERN and NCSA HTTPD packages allow the WWW administrator to configure security. By default, both packages allow anyone to connect to your WWW service. However, you can configure the servers to allow connections only from specific IP addresses (be sure to do this if your WWW service is for internal use only). You can also password protect individual files. TheMPE WWW Serverincludes a demonstration of the NCSA security features.
By default, the CERN and NCSA server software allow individual directories of hypertext files. If someone specifies a URL with a directory starting with tilde (~), the server software looks for a user directory of that name and then searches under the user name for the directory public_html.
Writing HTMLOnce you have the WWW server software running, you need to create WWW information. WWW documents use the Hypertext Markup Language (HTML). See theHTML descriptionearlier in this paper for suggestions and tools for writing HTML.
Be sure to test your files before adding them to your WWW server. We test with at least three different browsers (Lynx, Mosaic, and Netscape). We also useWeblinton all of our Web documents. Weblint checks for common errors in HTML. While Weblint isnt perfect, it does help produce HTML that is acceptable to the widest range of WWW browsers.
Weblint is written inPerl.To use Weblint, you must have a working copy of Perl. Perl is short for Practical Extraction and Report Language. Perl is designed to be more powerful than the shell, but easier to use than C.Host NameIf your WWW server is available on the Internet, its a good idea to create an alias for the actual computer that hosts your WWW service. Most people chose www as the alias name. This will make it easier for you to change the host without affecting users of your WWW service.RobotsWWW s