HTML and the Web

The HTML is a computer language designed for a Web browser to execute. It is not like C, C++ or Java programming languages, but it is an interpreted or a script language. To understand this language and its purpose, the reader has to be familiar with the Internet and World Wide Web. The attempt here is give the reader a picture of the Web and from that picture, the HTML language will be easy to understand and hopefully easy to use. There is a number of concepts and definitions that need to be covered or explained briefly so the reader will not be overwhelmed with terms that are not clear or familiar.

Any computer system is composed of hardware and software. The hardware is the physical machine and software is made up of the programs and data. Programs are divided into systems and applications. System programs are divided into an operating system and services. For example, the old IBM PC has DOS as its operating system, which provides the control over the machine. DOS has a number of utilities such as formatting diskettes and printing routines. There are other system programs, that are not part of DOS that can be added to the PC and the most famous is Norton Utilities.

To understand the Internet in terms of hardware and software is a little more complicated than PCs, but the Web is composed of hardware and software. The difference between the Web and the PC is the PC is a dedicated machine with a dedicated operating system and only one user. The Web is a network of computers. This requires the understanding of what a network is, and from that, we can understand the Web.

Network
A network is a collection of computers, terminals and other equipment that uses communication channels (such as phone lines, microwaves or satellites) to share data, information, hardware and software.

Client/Server
The Client/Server concept needs some clarification. A client is the receiver of a service and server is the service provider. For example, a computer requesting data from a network is a client and computer that is providing the data is the server. Does this means a client and server can be toggled? The answer is yes, but there is a dedicated server with only one job, which is to provide service to clients. The same hold true for some clients, they have nothing to offer but requests. On any network, there is a layer of software that acts a server for that network. The client can use this layer of software by using specific commands.

World Wide Web
To understand the Internet or the Web, we may need to give an analogy of something similar to the Web. Assume that you the reader want to send a letter to a company. The reader lives in Chicago and the company is located in Los Angeles. The reader can simply use a fax machine to dial-up and fax the letter. This is a direct link from the reader to the company, where the reader or a client dials up and connects to a network and using its services. This direct dial-up may be costly if it is a long-distance call. In the case where the reader needs to send a number of large packages to several companies that are located in different parts of the country, dialing up is not feasible. The reader may find sending each package himself/herself directly to each company is a costly and a time consuming task. The reader may find it cheaper, faster and more convenient to use one of the package delivery companies such as UPS. How the delivery company sends the package is not important to the reader as long as it gets there on time and in one piece. The delivery company can send it by air, sea, train, truck, car, or even hires another company to do it. If the package is supposed to be sent to Los Angeles and an airplane takes the package from Chicago to Japan and back to Los Anglos, the reader may not know nor cares.

The Internet or the Web is hundreds of thousands of networks (servers) that are connected to each other like a spider web. Looking at a spider web and how the spider weaves a large net of silk, the spider web is an equal analogy of the Internet. The term World Wide Web (or the Web for short) may have come from the spider web analogy. The spider web’s parts are connecting in several ways and to get to any part of the web, there exists numerous routes. If the reader can imagine that every spider silk intersection on the web is a package delivery company (site, server or a network), and all these companies cooperate to deliver packages, then using one of these companies is equal to using all of them. The Internet is a web of networks that are connected to each other and share a pool of data, information, software, and equipment. Each of the networks is an independent network with its own operating system and applications, data and hardware. This web of networks is similar to the web of package delivery companies, where they cooperate to connect each other to provide a service. A package may handle by several companies before it reaches its destination. The same thing with the Internet, a client in Chicago may go through several networks to connect to a server in Los Angeles. For all these independent networks to cooperate there are a number of issues that may need to be addressed as follows:

         1. Connection and Communication
         2. Address – TCP/IP
         3. Domain names and Category
         4. Services and Service Providers
         5. Individuality
         6. Interfaces
         7. Common Languages
         8. Accessibility
         9. Cost
         10. Security

A Web user or Web client can connect and use any of the sites (servers) on the Web, the Web client or user can access the entire Web services and hardware (if permissible) as if they are local to the client network and services. The Web gives the client a worldwide access to services and information. Web service providers are companies that provide Web access at a cost. Service providers provide their users with a local phone call service. The cost of using such services is very small compared to actual cost if the user tries to do the same thing on his/her own.

Connection and Communication
The Internet communicates using a number of protocols. Its sites are connected through phone lines, microwaves, satellites or bridges. A bridge is a combination of hardware and software that connect two similar type of networks.

Communication Protocols
A protocol is a set of rules and procedures for exchanging information between computers regardless of their make and operating system. Communication software is designed to work with one or more protocols. Protocols defines the following:

         1. How the communication link is established
         2. How information is transmitted
         3. How errors are detected and handled
        

For example, an IBM PC and Apple Computers have different hardware and software, that are not compatible or their software is not portable, but they can communicate using protocols.

TCP/IP
The Internet communicates using a family of protocols known as the Transmission Control Protocol/Internet Protocol (TCP/IP). TCP/IP is used to connect any machine on the Internet to another, and sends packets (especially formatted data) from one to the other.

Address
The Web is hundreds of thousands of networks (servers) that are connected to each other and these networks need some kind of labeling or addressing. The postal address of a house consists of the state, the city, the street name and the house number. The state is the most distinguished item, since you can have two cities with the same name like Springfield, Illinois and Springfield, Missouri. The same can be applied to the street name and number. So we can use the state as Domain of the address.

Domain Names and Category
What is the Internet Domain Name?
The Web is composed of servers and every Web server has an address, which is a numeric number called “Internet Protocol (IP)”. An IP number is composed of four numbers separated by a period, where each number is between 0 and 255. For example, an IP number can be one of the following numbers:

         1234.999.8888.9089

         1.2.3.4

         0.0.0.0

Computers work with numbers, but humans have problems remembering numbers. To make life easy, each IP or Web address is given a name that is unique. For example the above three IP numbers can be named as follows:

         1234.999.8888.9089      Banana

         1.2.3.4                            YMCA

         0.0.0.0                            FirstIP

These names are called Domain names which are the text alternative to the IP numbers. The Domain name or the IP number can be used interchangeably without any problems.

The Domain name or IP number is equivalent to the state, but once the state is known, then the city name, street name and house number is needed to get some place. The URL (Uniform Resource Locator) is what is needed to get to a specific site or an Internet resource.

Uniform Resource Locator (URL)
URL (Uniform Resource Locator) is the actual address or the location of a Web page, directory, path or any resource on the Web. For example, the following two URLs are the JavaSoft home pages.

         “http://java.sun.com”
         “http://java.sun.com/doc/language_enviroment”

The URL is made up of the Domain name, followed by “/” then the directory or the path. It may end with the name of a file or a resource. The file can be “.HTML” file, a Java applet or a Web page. The URL is composed of the following:

         1. Server - http
         2. Host - Domain name = java.sun.com
         3. Port number - default for http is 80 and it does no have to be listed
         4. Resource path - directory plus a file name of the resource that would be accessed.

Domain Categories
Domain names end with “.” a dot and a two or three characters extension similar to file names. The Web is organized into functional groups, such as education, government and so on. The Web also includes different countries. The Web has used a two or three character code to distinguish the different groups. The following is the organization code:

         1. COM - for commercial
         2. EDU - for education
         3. GOV - for government
         4. INT - for international or Internet?
         5. MIL - for military
         6. NET - for network
         7. ORG - for Organization         

Countries have similar code as the organizations. For example, “CA” is for Canada and “UK” is for United Kingdom (England). United States code is “US”, but it is the default since the Internet began here in the US.

What is the Domain name composed of?
The Domain name is right justified, which means the name actually starts from the right and ends in the left side of the name. For example, “java.sun.com” is a commercial company named Sun that has Java as a Domain name. The proper or complete Domain name should be “java.sun.com.us” where the country is The US, a commercial company named Sun that has Java as a Domain name.

Services and Service Providers
The original objective of the Internet was to provide communication and services for government, education and research institutes. The US government had funded the Internet cost. Now, the Web main objective is to provide communication and services for businesses, government, education and so on. The Web are commercial services and its funding is provided by the networks that make up the Web. These networks are independent companies, governments or educational institutes. They provide Web services with a fee or free to their members. These networks are called service providers. For example, a student enrolled in a state university, may have a free Web or Internet account. A company like America On-line provides its customer with a Internet access and a Web page for a monthly or yearly fee.

Every network has all the software and hardware needed to communicate with the rest of the Web and it customers or members. It may be called the service provider, host or server, but it is actually a service provider. Note that the software of that service provider (which has or owns), that does all the communication and services is called the server. This means the server is the network communication software and hardware plus all the utilities that network has. This also means a service provider may have more than one server on the Web. For example, a company or a university may have one or more of the following servers on the Web:

         1. HTTP
         2. FTP
         3. Gopher
         4. Archie
         5. Telnet
         6. WAIS

Each of these servers may use different communication protocol. For example HTTP serve uses TCP/IP family of protocol for communication.

Server Services
The server is the software that service providers use to communicate and provide services. The server can provide a number of services, which may be included in the following:

         1. Web Page – Home Page
         2. Browser
         3. E-Mail
         4. Common Gateway Interface
         5. Web Site – Virtual host
         6. Search Services
         7. Communication Channels
         8. News
         9. File Services

Script
A group of commands stored in a file is called script. This script is to be executed using a command interpreter. For example, a DOS Batch file can be considered a DOS script. Unix Shell script is another type of script, where the Unix shell is the command interpreter, which executes the shell script.

Web Page
A Web page is a text file containing a number of commands to be executed by a program called “Web Browser”. It is basically a script to be executed by an interpreter. The Web page commands are HTML commands, which are known as HTML tags. These commands are used to create a Web page with messages, banners, logos, buttons, graphic images, as well as calling Java applets, or run programs (CGI).

Home Page
Home page is the starting Web page for server or the first page the Browser starts with. Every server, Web site or a group of pages has a home page. The user of the Web can set any Web page to be a home page. For example, JavaSoft has a home page for Web users to link to and access information about Java and Java’s latest changes.

Web Site/ Web Host
A Web site is a group of related Web pages sharing a common subject or theme. A Web server is the “Web Host”, which may host many Web sites. For example, AT&T service provider may have a server with over 200 Web sites, ranging form software developers groups to ant or bug collectors.

Virtual host/Presence Provider
A Web site can be set on a server with its unique Domain name and may appear to outside world as if it is a unique server. This is type of Web site is called a Virtual Host. For example, a mid size insurance company named Banana Insurance, may pay AT&T service provider for a Web site and sets this Web site with the Domain name of “Banana.COM”. This Web site would look to the outside world as a Web server of Banana Insurance Company. AT&T service provider would also be called “Presence Provider” by providing the space for the Banana Insurance Company.

Browser
A Web page is a text file with HTML tags (commands) to be interpreted by an interpreter. A Web Browser is a program (interpreter) which is used to execute the Web pages. The two most well-known Web Browsers are Netscape Navigator and Microsoft Internet Explorer, which are used by the vast majority. A Web Browser sometimes called a "user agent", which is located on the user machine. It works by using a special protocol called HTTP to request a specially encoded text document from a Web server. The text document contains special instructions (written in HTML) that tell the Browser how to display the document on the user's screen.

Search Engines
A search engine is a Web service that helps Web users find information about any topic. For example, a Web user can search for “travel” and a search engine such as Yahoo can provide a listing of Web pages on the travel.

Search engines basically work using three steps. The first is visit Web sites and read every Web page a use the information provided by the Web pages to categorize them. The second step is to index these Web pages to speed the last step which is searching its own index listing to find a match that the search engine user is seeking.

Individuality
Web pages have given the Web ability to allow every user to express or display individuality. For example, companies can display their products and allow users to buy their products. Individuals can have their own Web page and share their ideas with others.

Interfaces
An interface is a connection between parts of the computer hardware or the software that handles the interaction between the user and an application.

Gateway
A gateway is a combination of hardware and software that allows user on one network to access the resources of a different type of network.

Common Gateway Interface (CGI)
CGI is an interface that helps execute external programs. It defines how information can be exchanged between the Web server and the external programs namely CGI programs. CGI programs are usually written in interpreted languages such as Unix shell script or PERL. CGI programs can also be written in C. CGI programs have performance problems since they run in a separate process from the server, plus they also require significant start-up time.

Common Languages
The Web with all its complexity, the servers must have some kind of common language or languages. Any Browser must understand Web pages and knows where to find things. CGI must be able to run external programs. The main issue here is the Web languages. The Web has two main languages, which are Java and HTML.

Hypertext
One of the Web features is the ability to move between Web pages and other sites with a click of a mouse. Documents or Web pages may have highlighted text that the user clicks on it to get to a different a document or Web page. This ability to jump between pages and sites is possible with the use of what is known as “Hypertext”.

Hypertext is a text, which is not constrained to be linear and contains links to other text. The links are more of jump points (address – URL) that are used to jump to other pages or another part of the same page. The Hypertext link is also known as anchor.

Hypermedia
Hypermedia is similar to hypertext, but includes media other than text, e.g. a hypermedia document could include text and graphics, or sound and animation.

Security
Security is to protect against unauthenticated interactive logins from the "outside" world. This helps prevent vandals from logging into machines on your network.

Firewall
A firewall can be defined as one of the followings:
A firewall is a set of related programs, located at a network gateway of a server, that protects the resources of a private network from users and other networks.

It is a system restricted access from machines, which are not on the company’ Internet.

It is also a computer that filters traffic going into and out of the corporate network.

Internet installs a firewall to prevent outsiders from accessing its own private data resources and for controlling outside resources its own users have access to.