On the way to the Internet with the Geneve and TI
=================================================

by Michael Zapf

Part I.  The Internet
---------------------

1. Introduction
---------------

There are computer networks for quite some decades now. Computer scientists
soon realized the advantages of connecting computers to enhance calculations,
distribute data, share expensive hardware between different participants and so
on. At first there were not many services; you could at most send some
messages, but gradually networks became very comfortable. Today networks often
consist of a server that is equipped with many resources, e.g. large hard
drives and fast processors, and many clients that are computers on their own
but use the offered services from the server by loading files from it or
dispatching difficult jobs to it. Computers are connected to each other in a
network, and networks may also be connected to each other. This forms what is
called an internet.

As is often the case, military requirements are the starting point of many
inventions. The U.S. Department of Defence instructed scientists to develop an
internet on the American territory that is fail-save in case of a destruction
of participating nodes. That means that there must not be a center where all
the messages have to pass through; instead, the net should be able to re-route
the data when connections become inaccessible. This led to the development of
the ARPAnet.

Further need of scientific know-how and a increasingly relaxing military
situation in the world allowed more and more scientific sites to join the net
that was now called the "Internet". The development gained speed at a dramatic
rate: Protocols were defined for different services in the Internet; computers
that were miles away could be used as if you were working with it directly. And
then even the border was crossed, and the Internet started spreading around the
world.

For many years the Internet was mainly used for message exchange. Because of
its analogy to the real world this exchange was called e-mail (electronic
mail). Even today, e-mail is one of the most popular services of the Internet;
what could be easier than to type in a few words, execute a send command; and
only some minutes later a reply arrives - although your peer sits in an office
on the other side of the world. And unlike telephoning, you need not make sure
you can actually reach him - your message will be presented to him as soon as
he takes a look in his mailbox.

Another interesting institution is the USENET that was soon integrated into the
Internet. The USENET consists of a collection of so-called newsgroups; today
you will find many thousands of them. Messages sent to a newsgroup are routed
to a special computer (also called news server) that stores them and
distributes them to other news servers. Users can subscribe to these newsgroups
at a news server of their choice and download all messages of the newsgroups
they are interested in. This allows a lot of interesting discussions on every
possible subject, about the TI-99/4A in comp.sys.ti, for instance.

In order to enhance file transfer on the Internet, the File Transfer Protocol
was specified. Now it was possible to store files at well-known locations in
the net and retrieve them when necessary. While the development of distributed
file access and remote execution was more interesting for subsections of the
Internet (subnets), the global data exchange started to grow exponentially with
the invention of the World Wide Web (WWW). In fact, it is still based on the
Internet and uses a new protocol "on top", the HyperText Transfer Protocol
(which you often see as "http" at the left side of WWW addresses). This allows
to define files that appear as hypertext (text with links to other files) when
they are downloaded to the user's computer. A special language (HyperText
Markup Language, HTML) is used to compose these documents. Furthermore,
multimedial elements are considered so that images, sounds, video clips and so
on can be sent along with the text.

With these features the Internet became interesting for virtually
everyone. More and more companies try to advertise their products; newspapers
and magazines offer excerpts of their printed products; there is entertainment,
knowledge, connectivity, home banking, electronic commerce and much more. It is
clear by now that this enhancement becomes a true threat to the effectiveness
of the Internet for everyone, including its fathers, the scientists. The more
participants start sending around their data, the slower the whole net becomes
because the bandwidth (amount of data that can pass a network connection in a
specific time) is limited. Even worse, the number of Internet addresses will be
exhausted in the next few years so that a new addressing scheme had to be
already defined that will replace the current one.


2. The infrastructure
---------------------

If someone lays a bomb in the building where your favorite BBS is located,
what happens? It will take quite some time before you can get online again - in
case the BBS is ever restored some day. Not so with the Internet - if a server
crashes, it will take only a short time for the adjacent net nodes to realize
this and to revise their respective routing decisions. (The pratice shows that
this does not always work very reliably - but at least it is possible.)

This implies that participating hosts (network nodes) have many more things to
do than simply to receive or send texts. And since there are so many
applications that want to utilize the network functionality, the software must
be very well designed to be usable in very different situations without
constraining future developments.

A good real-world example is the situation where the bosses of two companies
want to arrange a meeting. Each one has a secretary who takes the messages from
him or passes received messages to him, in this case there is even a clerk that
delivers the messages inside the company. The secretary herself is free to
choose a transmission medium to her peer at the other company; her boss does
not care. She, by herself, does not care about the job the telecommunication
service has to perform to transmit the fax that she decided to use. The telecom
service, on the other hand, is not interested in the message itself, but only
to deliver it as requested. Her peer notices her fax device throwing out a
sheet; she takes it, checks it briefly to see if it was correctly transmitted
but is not interested in the content. She just looks at the recipient and drops
it in the appropriate box. Another clerk comes by, fetches all the papers in
this box, and brings them to the boss. The advantage is that everyone just does
a small job and is soon ready to continue with any other work. If somebody
seems to work unsatisfactorily, he can easily be replaced.

This seemed to be a model for the realization of the global Internet. The most
successful strategy proved to be a paradigm that states that the network
software should be organized in layers where only layers of the same depth
understand each other.

Each layer receives outgoing data from its next upper layer, modifies them and
hands them over to the next lower layer. Incoming data is at first processed by
a lower layer before it reaches the next one. This restriction that data can
only be passed from one layer to the next one generates the impression of a
stack that must at first be worked down, then be rebuilt. Therefore, we also
use the term "protocol stack". A protocol is a template for the communication
between different peers; beside the real data, it includes information for the
recipients that have to process the data. In real life, people normally say
"hello" to each other before they start a communication the first time; or, if
they don't have visual contact, they call each other by name before.

The layers are named by their functionality and do not prescribe a special
protocol:

Layer 4: Transport Layer
Layer 3: Network Layer
Layer 2: Data link layer
Layer 1: Physical layer

The lower the number, the closer to the phone line or network cable the layer
can be found. Applications are set on top of this stack and communicate only
with layer 4. Each layer adds its own header to the outgoing application
data. In detail:

Physical Layer: This layer is concerned with the transmission of the bits, the
specification of the electrical values, the hardware (plugs), the transmission
rate. It is specific to the kind of connection you are using; for serial
transmission, it is the RS-232 specification. Outgoing byte strings from layer
2 are converted into bit strings; incoming data bits are converted to byte
strings before they are sent to layer 2.

Data Link Layer: The bytes of layer 1 are grouped in so-called frames of
special length; a checksum is calculated that ensures a correct
transmission. In case of an Ethernet where several hosts are connected to one
wire, the header contains the network card addresses of the sender and
receiver. This is of course not necessary with point-to-point protocols such as
PPP or SLIP that are used among two hosts that use a serial connection (e.g. a
modem). Flag bytes are used to decide whether the incoming data is to be passed
up or control data for this layer.

Network Layer: The data we got from layer 2 has been checked, now we need to
see if we are the true recipient. This can again be found in the header, and it
need not refer to the same host that can be found in the layer 2 header because
layer 2 only knows about our local network but nothing about the world
outside. If we are the final recipient, the data is passed up. Otherwise it
continues its journey, and it is the task of this layer to decide where to
forward the data. An example of a protocol is IP, the Internet protocol.

Transport Layer: The network layer can only work with data strings of limited
length, also called packets. This means that longer data strings are broken in
suitable pieces and then sent to the network layer. In the other direction, the
situation is more complicated. Nothing guarantees that the incoming packets are
complete and in the right ordering. The transport layer cares for the
completeness of the transmission; if we order 10 kilobytes, it will try to
deliver them, regardless of the packet size prescribed by the lower levels. An
example of a layer 4 protocol is TCP, the Transmission Control Protocol.

You can see that this structure implies a lot of overhead on the transmitted
data: Each layer (except layer 1 that will not be of further interest) adds its
special header that enables the corresponding layer on the recipient's host to
correctly process the data. From the upper layers to the lower layers, the
amount of transmitted data increases; in the opposite direction, the amount
decreases with every header being stripped away. To give you an example:

In an Ethernet network which uses TCP/IP/IEEE-802.3, we have 18 bytes for data
link, 20 bytes for IP, 20 bytes for TCP and then the payload bytes. One frame
is normally 1500 bytes long.

The inestimable advantage of this layer strategy is that each layer
1. can be replaced without influencing the others
2. can rely on a guaranteed service of the adjacent layers

1. means that if we decide to use Ethernet instead of a serial connection, it
is only a matter of the data link layer, but the functionality of the network
and transport layers still remain the same.

2. The higher layers have no idea what happens to their data in lower layers,
nor what the meaning for higher layers might be. Our application (to be found
in a layer greater than four) does not have to bother about the fragmentation
of the data, checksums, or the correct ordering. It simply expects the
transport layer to deliver exactly the amount of data that it wants and to do
the right thing to the data it sends to the transport layer. On the other hand,
the transport layer does not know what the meaning of the data is.

LANs (Local Area Networks) are often composed in a simpler way, so you might
ask why there is such a problem with the Internet. The reason is clear: The
inventors of this protocol stack were wise enough not to require a special
computer system that can take part. Since a world-wide system is difficult to
change, the smaller the components are, the quicker they are replaced. And even
if we have completely new transmission media or processor types, the Internet
will continue to work.


3. Overview on the TCP/IP Protocol Stack
----------------------------------------

After the last section you will now be able to figure out what is meant by this
term. The TCP/IP stack is the main protocol stack in the Internet; the data
link layer is freely selectable, e.g. Ethernet or serial line (PPP, SLIP), the
network layer is controlled by the Internet Protocol, the transport layer uses
the Transmission Control Protocol. There are of course other important
protocols that are, however, only used for special services, but they are
nonetheless important and will be described later. I will only describe those
protocols that are of major interest for us, and so I will not explain any
further Ethernet issues.  While I wrote this chapter I noticed that it's
becoming longer and longer so that I will at first give an overview how the
different protocols work without getting too far into the details.


3.0 Request for Comment: RFC
----------------------------

A strange name for a set of specifications, isn't it? Since the beginning of
the Internet (in those days still the ARPAnet), the documentations for the
various protocols and utilities, proposals (even jokes on April Fool's Day)
were collected under this label at special Internet sites. Everyone who wants
to find precise informations about a special subject must take a look in this
list that comprises more than 2000 entries at the beginning of 1997. Many of
these RFCs are updates to former ones which are obsolete.  There are many ways
how to get a copy of an RFC:

Using FTP: The RFCs can be found in the "InterNIC Directory and Database
Services" server at 'ds.internic.net'. Change to the directory 'rfc' and you
will find the RFCs as text or postscript files. (Note to German users: You can
use the server at 'nic2.nic.de')

Using e-mail: It is possible to 'order' an RFC by e-mail. Just send the
following message (nothing more; no subject) to 'mailserv@ds.internic.net':

document-by-name rfcXXXX

and replace XXXX by the corresponding RFC number. You can request more than one
RFC by using 'document-by-name rfcXXXX, rfcYYYY' or separate lines. You should
also get the index by typing

document-by-name rfc-index

I will provide the latest RFC number with each following subsection. I strongly
encourage you to get the corresponding RFC because the informations that you
can read below cannot cover all necessary aspects.


3.1 Point-to-Point Protocol (PPP)  ---  RFC 1661
----------------------------------------------------

I'll start with the Point-to-Point Protocol because it's more widely used for
modem connection than SLIP (serial line Internet Protocol, RFC 1055).

Receiving (data flow from lower to higher numbered layers):

Suppose our interface card has received a stream of bytes (in fact, out layer 1
program has received them) and sends it to the data link layer which we want to
use PPP. The tasks for PPP are

- group the bytes to maximum length strings (frames, approx. 1500 bytes)
- check the consistency of the transmitted data by checking the CRC value
- demultiplex the data for the different upper-layer services
- negotiate transmission options with the other end

Of every frame, the first five bytes are kept by PPP as well as the last three
ones. The remaining bytes are passed on to the service that is determined by
the fourth and fifth byte.

Sending (data flow from higher to lower numbered layers):

In the opposite direction, some upper layer passed data to this layer. What
must be done is

- calculate the CRC value
- write the frame header before the data and the CRC and end byte after them
- put the whole frame to the interface driver as soon as possible.

As the data link layer driver is fixed to one interface, it is often considered
to be the interface driver itself. If there are more interfaces, each one has
its separate data link layer driver.


3.2 Internet Protocol (IP) version 4 ---  RFC 791
-------------------------------------------------

Receiving:

The data we received comes from layer 2 and is considered to be IP data. What
must be done now is at first to check if this host is in fact the recipient. It
is possible that our host is to forward the data because it has two connections
to different networks (this is called a gateway). If this is the case, the next
receiver is determined by using a special directory, called routing table. This
table also tells which interface to use in order to reach a destination, so in
case of PPP, the data is just put to the appropriate data link layer driver.
If our host is indeed the recipient, the IP layer looks more closely at the
data. Again there is a header (20 bytes) that describes among other things the

- length of the packet
- packet ID and offset
- type of transmitted data (TCP, ICMP, UDP, ...)
- source and destination IP address

The IP layer strips off the header and sends the remainder to the selected
service (according to the type).

Sending:

The data that came from an upper level is cut in suitable pieces (fragments)
for the lower layer. As I already said, IP uses a so-called routing table to
find out where to send the packets. A typical entry in the table contains the
IP address of the destination host A, the IP address of a gateway B, and the
interface name I. This tells IP: To send the packets to A, the driver of
interface I needs to send it to B which will forward the data.

If our host is a dead end with a PPP connection, there should be only one
entry, namely the other side of the PPP connection as the gateway for 'default'
delivery, which means any host.

IP addresses are composed of four bytes to define a location in the
Internet. The value is written as four decimal numbers, separated by dots: For
example, one of the addresses of the InterNIC server that is mentioned above is
'198.49.45.10'. The name 'ds.internic.net' is another way of addressing, but it
must at first be translated to these numbers by the Domain Name Service
(DNS). Only the numbers can be used in IP datagrams. The numbers are
specifically structured, but I won't explain that in this overview.


3.3.1 Transmission Control Protocol (TCP)  ---  RFC 793
-------------------------------------------------------

The Internet Protocol is said to be connectionless and unreliable. This does
not imply that it is badly working; it simply means that IP does neither care
of the ordering of the transmitted packets, nor that all packets have
arrived. As this is not acceptable for real applications like FTP where we
would rather have files without ugly holes and not shuffled up, another
protocol is used to guarantee this. TCP adds another 20 bytes as header, and
when this one is stripped away, we eventually get our application data.  In
addition, several programs may require network access simultaneously. Even FTP
needs two connections; one for the data, another for control bytes. In order to
send the data to the correct application, the concept of 'ports' was
introduced. Each application allocates as many ports as it wants, and if they
are granted to it, it can start its communication. Since TCP is a bidirectional
service, both sender and recipient need to define ports for their
communication. The ports are just an operating system construct but no physical
devices.

Another important aspect of TCP is flow-control. This is achieved by using the
'sliding window' strategy: The receiver continously informs the sender about
the size of its 'window'. If the receiver does not manage to process the data
fast enough, the window 'closes', and when it is shut, the sender cannot
proceed with the transmission. (Only data classified as 'urgent' can still be
sent.) While the receiver processes the data, it lets the window slide open
again.

After the connection establishment took place where the participants exchanged
their respective sequence numbers, each one is free to send data to the other
side. An explicit termination procedure is required to close the connection.

Receiving:

The sequence number of the data from the IP layer tells TCP where to put the
segment it has received into the buffer. If no segment is actually missing up
to now, TCP sends an acknowledgement to the other side. Unless the application
gets the data from the buffer, TCP decreases the window size.  If there is a
hole in the buffer, TCP continues to send acknowledgements for the end of the
contiguous block from the start of the buffer. The sender, not receiving any
acks for the latest segments, tries to retransmit the segment that seems to be
missing. When the hole is closed, an ack of the whole block can be sent (the
last segments of which could have been there quite long until the missing
segments were filled in).

This situation is more often encountered than you would possibly
imagine. Especially when the connection is poor, the lower layers could have
discarded some data so that some segments could not be reconstructed. The
effect of IP's silently discarding packets with checksum errors is that the
corresponding segment is not acknowledged. So the connectionless and unreliable
character of IP is effectively worked around by the TCP protocol.

Sending:

During the connection establishment a maximum segment size is negotiated
between the two participants. The data from the application is split into
segments of this size, and the TCP header is written before it. If the last
window size of the receiver is larger than the segment size, the segment is
sent to the IP layer.


3.3.2 User Datagram Protocol (UDP)  ---  RFC 768
------------------------------------------------

There are situations when the full-blown TCP machinery is not necessary; for
example, when small packets shall be sent, when we are not interested that
every packet does reach the destination or when the flow control is performed
by the application itself. UDP is a very simple protocol (the RFC is only three
pages long; TCP's is 85 pages). The IP packets (datagrams) are equipped with
the already described source and destination ports; there is no connection
establishment and no automated acknowledgement. If this is desired, it is up to
the application to implement it.

Although UDP seems to be of rare use, it is needed by the Domain Name Service
(DNS) that translates textual Internet references like 'ds.internic.net' to
four-byte IP addresses.


3.3.3 Internet Control Message Protocol (ICMP) ---  RFC 792 (v4)
----------------------------------------------------------------

This is yet another important protocol that must be present in every TCP/IP
implementation. By ICMP, hosts are transmitting messages that are of major
importance for the current or future connections.

ICMP messages are normally 4 bytes, followed by message-specific content
bytes. The types of messages can be

- echo request/reply
- destination unreachable
- time exceeded (measured by hop count)
- redirect
- source quench
- router solicitation/advertisement
- parameter error
- timestamp request/reply
- information request/reply
- address mask request/reply

The first three ones are used very often. The echo request from any host on the
net must be answered by an echo reply; some nets consider hosts that do not
reply as crashed and cut dial-up connections.


4. The FTP application  ---  RFC 959
------------------------------------

Now I want to show you one very important application that works on the TCP/IP
stack: the File Transfer Protocol application. Although the usage of HTTP is
growing, FTP is still the major protocol for uploading and downloading files on
the Internet.

Beside the file transfer capabilities inside a subnet where users log on an FTP
server by a password, there is another possibility for everyone to download
files, called 'anonymous FTP'. If the system maintainer allows this kind of
access, any user can log on by identifying himself as user 'anonymous' or 'ftp'
and typing in his e-mail address as password. After that, a special part of the
directory tree of the FTP host is available for browsing and
downloading. Sometimes there is also a special subdirectory (usually called
'incoming') that is writable so that files can be uploaded; the maintainer
should sort the files into appropriate directories.

4.1 A sample session with our FTP host
--------------------------------------

At first we should watch an example of an FTP session by logging on the FTP
server in my subnet. Whenever there is a <Return> it means that the user has to
hit the Return key to continue after typing the text before.

(some_prompt) ftp www.vsb.cs.uni-frankfurt.de <Return>
Connected to diamant-atm.vsb.cs.uni-frankfurt.de.
220 www.vsb.cs.uni-frankfurt.de FTP server (Version 1.2.3 Fri Jan 10 12:02:30
 MET 1997) ready.
Name (www.vsb.cs.uni-frankfurt.de:anyone): anonymous <Return>
Guest login ok, send your complete e-mail address as password.
Password: anyone@some.where.out.there <not shown on your display, Return>
230 Guest login ok, access restrictions apply.
ftp> cd pub/people/mz <Return>
250 CWD command successful.
ftp> binary <Return>
200 Type set to I.
ftp> get fract20.xmo
200 PORT command successful.
150 Opening BINARY mode data connection for fract20.xmo (61739 bytes).
226 Transfer complete.
local: fract20.xmo remote: fract20.xmo
61739 bytes received in 61.7 seconds (1 Kbytes/s)
ftp> bye <Return>
221 Goodbye.

(some_prompt) _

What we did was to download the file fract20.xmo from the FTP server at
'vsb.cs.uni-frankfurt.de' (it is the same as the WWW server, hence the name).
As you see - no sign of segmentation, fragmentation, dropped frames or the
like. It seems as if we were doing just an ordinary file copy or a familiar BBS
file download, not involving all that fuss about this TCP/IP stack. But be
sure, it was involved ever since the first <Return>.


4.2 A closer look at the FTP session
------------------------------------

After entering the 'cd' command that changes the directory to the one with the
file and using 'binary' that tells the computer that the files to be
transferred should not undergo any conversions, we are ready to download the
file 'fract20.xmo' by using 'get'.

We suppose that everything is set up correctly and that the FTP server has just
received the FTP command "RETR fract20.xmo" (this is the actual command that is
transmitted as that very string) and take a look at the actions of the
different layers.

FTP application (141.2.150.16): 
   We have just received the command to send the contents of the file
   "fract20.xmo". The client should have said on which port it expects the data
   to arrive, so that we try to send the data to the client "socket" (IP
   address and port). As we do not (have to) care what the transfer details
   are, e.g. packet size, we just write the file to its ftp port. As this would
   be much faster than the network can transport the data, the write operation
   is blocking which serves as a brake. When there are no more data, the
   connection is closed.

TCP (141.2.150.16):
   The application (whatever it is) continues to feed data into the layer. If
   the client told us that it cannot receive more data (closed window), we
   don't accept more bytes from the application (and let it block). If the
   client announced that it is able to receive data (open window), the data are
   at first partitioned in segments of a maximum length (as was negotiated at
   the start of the connection). Now that we know the receiver's IP address
   (e.g. 141.2.28.160) and port (e.g. 1048) we construct a header with these
   informations, segment sequence number and checksum for each segment and
   just drop these segments to the network layer.

IP (141.2.150.16):
   The segments that arrive from the upper layer get another header before each
   one that contains our IP address (141.2.150.16) and the one of the recipient
   (141.2.28.160). But where's this 141.2.28.160? No idea, but our routing
   table says: Send it to 141.2.29.2, it probably knows more about it. And this
   one is reachable via interface en2.

Ethernet (141.2.150.16):
   The protocol layer above us sent us some data for 141.2.29.2, we'll at first
   find out the identification of this Ethernet interface in this network
   (called "address resolution"). Now that we know it, we encapsulate the data
   once more and send it to this Ethernet address.

Ethernet (141.2.29.2):
   There are some data for us. We check the frames and pass them up.

IP (141.2.29.2):
   Ah, there are some packets. But - it's not for us, the recipient is
   141.2.28.160. Taking a look in our routing table, we find that this one is
   connected to the PPP connection ppp160. So down again with the packet.

PPP (141.2.29.2):
   The packets we receive must be destined for the one on the other side of my
   connection. So we encapsulate them again and put them on their way.

PPP (141.2.28.160):
   There are some data for us. Up to the IP layer.

IP (141.2.28.160):
   Check the recipient; OK, it's us, no more forwarding. What protocol must be 
   used? The field in the header says it's TCP, so we strip off the header and
   pass it on to the TCP layer.

TCP (141.2.28.160):
   There are segments coming up from the network layer that are obviously
   destined for an application above us. We take the segment number and put it
   in our buffer at the appropriate place. If it was the segment we expected,
   we'll send an ACK (acknowledge) of the segment to the server. If not, we
   just don't acknowledge this incoming segment and rely on the server
   eventually retransmitting missing segments. The application continues to
   read the data from our buffer (connected to the indicated port) so that our
   window is opening again. The other side must be informed of this. 

FTP application (141.2.28.160):
   After we sent the command, the data that arrives at the port (which number
   was transmitted to the FTP server before) is simply stored in a file that
   normally has the same name as the remote file. The data arrives
   asynchronously which means that we need to use a blocking read so that no
   data are lost. When the connection is closed by the server, there are no
   more data, and the transfer is complete.

As you can see, each layer "speaks the same language" as its peer on the other
side. And - as is often the case - there is another computer in between
(141.2.29.2) that forwards the IP packets.

-----------------------------------------------------------------------------
This concludes the first part of my Internet tutorial. The second part will
examine implementation issues, especially what is needed to implement a minimal
FTP client.
