home | about | dad | journal | call this number | pics | networking tutorial | MAAS CLI

networking tutorial

Table of Contents

How We Got Here

Sometimes, bad things give us good results. TCP/IP -- and a lot of its underlying structure -- evolved to meet a specific need. TBH, that need is rather gruesome: how can we keep a computer network functioning in the event of a nuclear war? When nodes go offline, randomly, how can surviving nodes keep the communication going?

Happily, the threat of nuclear war has dropped off substantially in the meantime. Also happily, the TCP/IP network survived even the loss of its original purpose. This tutorial is about the surviving network, although, as an appropriately trained and experienced engineer, I could probably talk at equal length about nuclear war. Most happily, I won't.

Focusing On Architecture

With complicated subjects, it's always hard to know where to start. There's a huge chicken-and-egg problem with TCP/IP when trying to define terms. I prefer to take Isaac Newton's approach to physics, as he did in the Principia -- create some definitions that start from common things we're all likely to understand.

For example, Newton begins with mass; slightly paraphrasing his definition:

The quantity of matter is defined as the density of that matter and the volume that it takes up, conjointly.

In other words, m = ρ x V, or "mass equals density times volume". That doesn't seem like much, but it's an astounding starting point. Hint: If you haven't read an English translation of the Principia (it's natively in Latin), you should take the time out to do so: it'll change your understanding. Anyway, back to our story.

Network Architecture

It's very easy to just dive in, of course, and some fair percentage of my readers will get what I'm saying, but that isn't good enough for this tutorial. Instead, imagine two computers, "SanDiego" and "Bangor", located at opposite corners of the country. They want to communicate via available networks. How do they do it?

Well, we could just hook up a wire between SanDiego and Bangor:

sandiego-bangor.jpg

That would work, but there are at least two drawbacks:

  1. It's a long wire, which has lots of impedance. The signals will likely disappear into the noise long before they get there.
  2. It's a single point of failure. If someone cuts the wire, there's no alternative way for the two computers to communicate.

We could solve this by dreaming up all sorts of network architectures, but the easiest way is to create and use the Internet:

sandiego-bangor-2.jpg

In this model, SanDiego sends a message, labeled for Bangor, to some router on the Internet (which one doesn't matter so much). If this router doesn't know where Bangor is, it just sends it on to another router, until the message finds a router that knows where to forward the message:

sandiego-bangor-3.jpg

Not Just Any Old Computer

There's a idea floating around that the Internet is survivable because any and every computer can connect any and every other computer. While that might be possible, that's not generally how it works. There's actually a hierarchy which we refer to as the Internet Infrastructure:

Internet Infrastructure
a hierarchy of computers used to transfer messages from one computer to another.

For the purposes of this tutorial, we don't need to try to model this complex infrastructure too closely. What is useful is to say that there are really large networks, known as Network Service Providers or NSPs. An NSP must be connected to at least three Network Access Points (NAPs), which are just connections where messages can jump from one NSP to another. NAPs are public access points. There are also privately-owned access points ( at the same level) known as Metropolitan Area Exchanges; these act just like a NAP for the purposes of this discussion.

Given all this, we can make a very simplified picture of the infrastructure of the Internet that looks something like this:

sandiego-bangor-4.jpg

A Different Kind of Architecture

As implied by the discussion above, these networks can get really complicated. There's really (almost) no reason to even want to know how many hops a message takes, or where it hops, unless you're trying to debug a broken route with, say, traceroute. From a TCP/IP point of view, it's much easier to ignore the specific network, since it gets built on-the-fly, so to speak, and it can change every time a message is sent, even between the same two computers.

In other words, when it comes to understanding and troubleshooting networks, knowing the specific route (almost) never helps. Instead, what we want to know about is the network traffic that travels between computers. BTW, let's stop and clean up our language a little. In TCP/IP parlance, SanDiego and Bangor would be called hosts. At any given time, one would be the sending host, and one the receiving host, but that distinction rarely matters for this discussion.

The OSI Model

When we begin to look at networks as a continuous wire, we need to understand what travels on that wire from one host to the other. But that depends on our perspective, that is, our level of magnification. If we look at the highest "zoom" level, all we'll see are electrons travelling down the wire. That's not very useful for debugging purposes. We can use that information to determine whether anything's being sent, but if the message isn't going out on the wire, we can't guess why not.

In the OSI model, we start just above the raw physics, with what we'd call the Physical Layer, also known as Layer 1. The choice of "1" makes sense, because this is the lowest level we consider. Layers are normally added on top of each other. For example, if you put six coats of varnish on a piece of furniture, you're going to have six layers. The first layer you put on wouldn't sensibly be called "layer 6". Neither does the layering model work that way in networks.

At the physical layer, we don't look at electrons flowing, but we do look at signals. Specifically, we're looking for binary (on/off) signals, set to the cadence of a clock. Every computer brings its own clock to the party, so we need a way to "synchronise our watches". That method is NTP. We'll cover it in more detail a little later on. Just know that it's designed for packet-switched, variable-latency data networks.

The Physical Layer

Okay, what does "packet-switched" mean? What does "variable-latency" mean? For that matter, what do "packet" and "latency" mean in this context? Hmm.

Packets are message units. A message might be split into multiple packets. For example, imagine that you're sending a very long letter to your friend, and all you have a lots of envelopes and first-class stamps. If you've ever done a lot of mailing, you'll know that mailing a once-ounce letter costs you, say, fifty-eight cents. If you add another ounce of paper to it, that second ounce only costs you, say, twenty cents. But all you have are first class (i.e., fifty-eight-cent) stamps.

If you don't want to waste your money, you can either cram more pages in the envelope, until you're at three ounces (the most you can get with two stamps), or send two letters, each with one ounce in it. And the way envelopes go through the mailing system, you're better off not over-stuffing an envelope. So what do you do?

You sit down and write the letter to your friend, carefully numbering the pages. Then you divvy it up into piles of pages that are just under one ounce. Finally, you put each pile into an addressed, stamped envelope and mail each letter separately. When your friend gets the letter, it doesn't matter which one gets there first, because they can reassemble the letter, using the page numbers.

We Could Have Used Indeterminate Length

I suppose we could have designed computer networks to take messages of indeterminate lengths, but that presents some unique challenges when trying to manage network traffic. For example, suppose you send seven overstuffed letters to your friend, and so does everyone else on your block? All these huge letters aren't going to fit in one letter-carrier's bag, so they'll have to either send out two delivery people, or wait until tomorrow to send out someone's letters.

Choosing a fixed (relatively short) length makes it statistically possible for everyone's letters (everyone's messages) to be delivered at a fairly constant, reliable rate. That rate will vary with the size of the overall message, not with who threw their message on the Internet first. You can see that's a much fairer way of doing things.

I can't come in early in the morning and send 200 giant JPEG images to my mom, causing everyone else to wait an hour for my monster message to go through. Instead, messages are split into packets of equal length, so larger messages take longer. Otherwise, everyone would be sandbagging all their messages with giant, meaningless JPEGs, just to hog the network early in the day.

That's a simplified explanation, but basically, it's statistically more efficient to split messages into equally-sized packets than any other arrangement. It's the method that gets the highest count of complete messages through the network in a given amount of time, or as we might say in network terminology, it's the highest-throughput approach to network traffic. Specifically, it's called multiplexing.

(See, you're learning things already! And you thought networking was complicated....)

Variable-Latency

In order to understand variable latency, we need to understand network latency. Essentially, there's a delay from the time you send a packet until it reaches it destination. It's like the travel time for a packet (or a bit, matters not) from point to point. It usually consists of four things:

  1. The processing delay - how long does it take the router to process the packet header?
  2. A queuing delay - how long does the packet sit idle in routing queues?
  3. Transmission delay - how long does it take layer 1 to push the packet's bits onto the link?
  4. Propagation delay - how long does it take the bits to travel through the wire to the other end?

So which ones of these would qualify as "variable" latency? Well, if you guessed "queuing delay", you'd be correct, since the size of the queue directly influences how fast data can get onto the link. The processing and transmission delays are relatively constant. The propagation delay depends on how the network is structured, how many hops the packet has to take (i.e., how many routers it has to go through to get to the destination, etc.).

So in essence, a variable-latency network is "variable" because of the density of network traffic and the complexity of the route between hosts. Essentially, because (1) we can't predict how busy the network will be, and (2) we can't predict in advance (in most cases) what route the data will take between hosts, we can't predict exactly how long it will take to transmit a packet, thus "variable-latency".

We Really Don't Care That Much About the Physical Layer

Other than verifying that traffic is flowing on the wire, the physical layer doesn't usually tell us much about what happened to that DHCP request that never made it to the router. Consequently, we really won't talk that much about the physical layer here. Just know that it's the thing that's passing bits back and forth between hosts.

The Data-Link Layer

Layer 2 in our varnish stack -- er, protocol stack -- is the data-link layer. This layer creates a way to link two devices that share the same medium (e.g., two routers connected to the same network cable). Some of these link-layer networks, like Digital Subscriber Lines (DSL), can only connect two devices, but for all practical purposes, most of the modern layer 2 interfaces are multi-access, meaning more than two stations on the same medium can communicate directly. Examples of this include Wi-Fi and Ethernet networks. There are things built into the protocols which to manage the conversation, much like the military radio term "over" -- meaning "I'm done talking until I hear something back from you, go ahead."

Technically speaking, the purpose of the link layer -- at least from a TCP/IP perspective -- is to send and receive IP datagrams. Okay, then, what is a "datagram"? Well, it depends on what OSI layer you're examining, but basically, a datagram is a basic transfer unit. It's the indivisible unit for a given layer. So, for example, if we're talking about the data-link layer (aka the "link" layer), it's an IEEE 802.xx frame (we'll jump on that in a minute). At the network layer, it's a data packet -- yeah, that thing. At the transport layer, it would probably be a data segment. At the physical layer, it would be a chip, which is a spread-spectrum pulse in that weird, CDMA, noise-utilizing transmission system (yeah; not gonna go there: doesn't do much for us in this tutorial).

So, TBH, there are several layers whose job is to send and receive datagrams. Wait, it's really all the layers, if you define the datagram right. But for the link layer, it's an IP datagram. TCP/IP can deal with lots of different link layers. Let's stick to Ethernet (and maybe Wi-Fi, if we get that far). Just be aware that the barbershop mirror effect happens, i.e., tunneling, which is just protocols inside protocols inside protocols, all at the same (link) layer. We tend to make things more complex by talking about PDUs (protocol data units) almost synonymously with datagrams, but then we also talk about datagrams like in the User Datagram Protocol (UDP).

To unconfuse all this, let's agree to call them PDUs (protocol data units). This avoids conflation with UDP units, and it also reminds you that it's the atomic unit at the current network layer. At the link layer, it's a frame. Each of the other layers have theirs, although at the application layer, it can vary widely.

Okay, What's a Frame

A frame encapsulates the packets from the network layer so that they can be transmitted on the physical layer. A frame can be small or big, anywhere from 5 bytes up to the kilobyte range. The upper size limit is called the maximum transmission unit (MTU) or maximum frame size. This size is set by the particular network technology in use, which brings up a good point: In order to talk sensibly about frames, we'd need to say what kind of frame. In that case, we're talking about packet-switched networks, so there are about four frame types to consider: Ethernet, fiber channel, V.42, and good old PPP (point-to-point protocol). Let's randomly decide to start with Ethernet, which is defined in the IEEE 802 standards.

Ethernet: Chaos Control

Before explaining an Ethernet Frame, we need to give a little background information about how Ethernet works. Otherwise a lot of the frame components either won't make sense, or you'll wonder how it works at all.

Remember earlier, when we talked about voice radio, and the need to say "over"? Well, Ethernet at the link layer is all about controlling the conversation, so that computers don't talk over each other. Ethernet implements an algorithm called CSMA/CD, which stands for "carrier sense multiple access with collision detection." Fancy name for "one person speaking at a time." This algorithm controls which computers can access the shared medium (an Ethernet cable) without any special synchronization requirements.

"Carrier sense" means that every NIC (network interface card) does what humans do when we're talking: it waits for quiet. In this case, it's waiting for the network to be quiet, that is, when no signal is being sent on the network. Cool, huh? "Collision detection" means that, should two NICs both start to send on a shared network at the same time (because, hey, the network was quiet, right?), they each receive a jam signal. This signal tells them to wait a random amount of time before attempting again. Every time subsequent messages collide, the NIC waits twice the amount of time it previously waited, until it's waited some maximum number of times (at that point, it declares failure and reports that the message didn't go). This ensures that only one frame is traversing the network at any given time.

Media Access Control

Systems like CSMA/CD are are subset of the Media Access Control (MAC) protocol kit. MAC is one-half of the link layer, with Logical Link Control (LLC) being the other half. People who worry a lot about parallel constructs would call them sublayers (bah). By the way, LLC mostly just defines the frame format for most of the 802.xx protocols, so we can safely ignore it for now.

Now if you've worked with networks at all, you've heard of MAC addresses. Those are basically unique "serial numbers" assigned to network interface devices (like NICs) at the time of manufacture. Theoretically, they are supposed to be unique in the world, although it gets pretty complicated when we start talking about virtual NICs in virtual machines. I mean, with enough virtual machines in the world, you could theoretically run out of addresses no matter how long you made the address sequence. MAC address collsions do happen when using VMs, and there are ways to fix it, assuming that your VMs are confined to a subnetwork (subnet, we'll cover that later).

The MAC sublayer is connected to the physical layer by a media-independent interface (MII), so it doesn't care whether it's sending by cellular broadband, Wi-Fi radio, Bluetooth, Cat5e, or Morse code. The MII is kinda hardware geeky, so I love it, but you may not care, so I'll just link to Wikipedia's excellent treatise on MII, rather than going down that rabbit hole.

Essentially, the MAC sublayer grabs higher-level frames and makes them digestible by the physical layer, by encoding them into an MII format. It adds some syncronization info and pads the message (if needed). The MAC sublayer also adds a frame check sequence that makes it easier to identify errors. In conventional Ethernet networks, all this looks something like the following:

eframe.jpg

Some of those weird acronyms are part of the IEEE standard, and some of them are my own personal acronyms to help me remember what goes there. Here's what they all mean, and what's in those blocks of bits:

  • The Preamble is 7 bytes of clock sync, basically just zeroes and ones like this: ...0101010101.... This gives the receiving station a chance to catch up and sync their clock, so the following data isn't out of sync (and thus misinterpreted). To delve just a little deeper, the Preamble helps the receiving NIC figure out the exact amount of time between encoded bits (it's called clock recovery, in case you didn't know). NTP is nice, but Ethernet is an asynchronous LAN protocol, meaning it doesn't expect perfectly synchronized clocks between any two NICs. The Preamble is similar to the way an orchestra conductor might "count the players in" so they all get the same rhythm to start.

    This is much more reliable than trying to get computers all over the world synced up to the same clock frequency and the same downbeat (starting point). Ethernet actually started out that way with something called Manchester Encoding or Manchester Phase Encoding (MPE). This was important because electrical frequency varies not only across the world, but also from moment to moment when the power is slightly "dirty". MPE involved bouncing a bit between two fractional voltages using a 20MHz oscillator to generate a reference square wave. It works, but it's not very efficient, so MPE was scrapped in favor of using the Preamble like alignment marks on reels of movie film.

  • The Start Frame Delimiter is the last chance for clock sync. It is exactly 10101011, where the "11" tells the receiving station that the real message content (in this case, the destination address) starts next. The receiving NIC has to recover the clock by the time it hits the SFD, or abandon the frame to the bit bucket.
  • The Destination Address is six bytes long, and gives the physical address -- the MAC address -- of the next hop. Be aware that the next hop might be the destination, but that it's also possible that the next hop might be a NAP, MAE, NSP, or intermediate ISP. It's basically the next address, in the direction of the destination, that the sender knows about. Unlike the Source Address (described next), the Destination Address can be in a broadcast format (similar to a subnet like 192.18.0.0, but using MAC addresses).
  • The Source Address is also a six-byte MAC address, this time the MAC address of the sender, which does not change as long as the message is traversing only layer-2 (Ethernet) switches and routers.
  • The PDU Length gives the byte length of the entire frame, assuming that it's 1500 or less. If it's longer than that, it indicates (instead) a message type, like IPv4, ARP, or a "q-tagged" frame (one that carries a VLAN ID).
  • The DSAP, SSAP, and Control elements are each one byte in length, and help define devices and protocols. For the most part, we won't be worried about these. Just know that as more and more 802 point standards come out (e.g., 802.11, WiFi), these elements get longer and more complex.
  • The Data or Payload is the actual packet being sent, passed on from the layer above. It cannot be less than 46 bytes, and in conventional Ethernet, it cannot be larger than 1500 bytes. If the actual data is too small, it's padded out to 46 bytes.
  • The CRC or Frame Checksum, FCS is just a good old-fashioned checksum, used to verify that the message hasn't been corrupted along the way.

The Preamble and SFD are often considered to be part of the IEEE "packet", so some people start counting the "frame" at the Destination Address.

Trunking VLANs

There is a crucial addition to the basic frame format called a P/Q or VLAN Tag, which includes:

  • Sixteen bits of tags or a protocol ID.
  • Three bits representing a priority.
  • One bit used as a Canonical Format Indicator (CFI), which is 0 if the following VLAN ID is in Ethernet format, or 1 if it's in Token Ring format.
  • Twelve bits of VLAN ID.

This will matter when we're building complex networks with lots of VLANS that cross over switches (e.g., when using MAAS). After all, VLANs were initially controlled with ports and switches (although now they have graduated to broader methods). When more than one VLAN spans multiple switches, frames need to carry VLAN information that can be used by switches to sort or trunk the traffic. The word trunking is derived from the telephone network term trunk lines, which are lines connecting switchboards.

In the original telephone company model, each telephone had a subscriber line, which was a wire that went straight from the local Central Office (CO) to that subscriber's telephone. Connections between Central Offices were handled by trunk lines, because they ran between phone company facilities. You'd have a thick cable with lots of pairs running from CO to CO, basically enough wires to handle something like 35% of the possible calls. If you ever got the message, "All circuits are busy now; please try your call again later", you've heard what happens when the system is "trunking above capacity" or "TAC'd", as it was called.

At the CO, the wires would "branch" and run all over the place: First to junction points (those five-foot-tall boxes you see from time to time on the road), then to interface points (the square cans beside the road every half mile or so, also called "pedestals") and from there to subscriber homes. When you draw out this network, it looks like a tree, where the bundles of cables between COs look like the trunks of trees. You're welcome.

In the parlance of networks, especially VLANs, the term "trunking" is used to indicate the sharing of network routes. This sharing is made possible by the Ethernet VLAN tags, which make the VLAN-bound messages less dependent on switches and routers to get the traffic to the right place (and only the right place). Otherwise, you'd have to designate complicated port configurations for switches, which is particularly easy to screw up. With Ethernet, the trunks are actually virtual, rather than bundled pairs of wires, but the effect is the same, hence the term.

Note that the MAC sublayer is responsible for managing CDMA/CD, that is, detecting a clear line, initiating retransmission if there's a jam signal, etc. On the way in, the MAC sublayer verifies the CRC/FCS and removes frame elements before passing the data up to the higher layers. Basically, anything that some other MAC layer did to encapsulate the message for sending, the receiving MAC layer un-does on the way in.

Visualizing the Link Layer

Okay, so before we go any further, let's see if we can visualize some of this information, and then we'll try to do something useful with it. Let's start with a message coming on Layer 1 from SanDiego to Bangor (later on we can mock up the whole flow, if we want). This will also show you that we can't necessarily infer what the Link Layer does, based just on the frame structure.

When the message comes in, it looks like this:

eframe.jpg

When the Link Layer passes it up to the Network Layer, it only passes four of these fields: DSAP, SSAP, Control, and Data. Everything else is stripped off by the Link Layer, meaning that the Link Layer does whatever needs doing with these other fields. So without knowing any more, we can infer that the Link Layer does the following things:

  1. It synchronizes the NIC, so that bits will indeed be recognized as bits and the message can be properly decoded.
  2. It sort of handles the source and destination addresses. One protocol that maps MAC addresses to local IP addresses is the Address Resolution Protocol (ARP). I said sort of because the OSI model is just that -- a model. Not every part of the process maps cleanly into the model, but ARP is generally considered a layer 2 protocol because it builds and sends frames, not packets. We'll talk at length about ARP in a minute.
  3. It interprets the length/type bytes and uses them, which means it must judge the length of a frame, and of the data in a frame, or, alternatively, decide whether a frame is IPv4, ARP, or VLAN ("q-tagged"). This kinda gives away that ARP is processed by the Link Layer.
  4. It processes VLAN tags, which means, at the very least, dealing with the message priority, deciding whether the VLAN frame is Ethernet or Token Ring, and capturing and using the VLAN ID. From this, we can infer that layer handles messages by priority, knows how and when to send Ethernet or Token Ring frames, and knows how to route traffic to a specific VLAN.
  5. It computes the checksum to make sure the message is valid.

Understand these are, at this point, just inferences, meaning they could be wrong. Let's take some time to check these things.

Synchronizing the NIC to the frame

Ethernet is a synchronous method of communication. This means that the sending and receiving hosts must synchronize their clocks in order to understand each others data packets. Synchronization could be done externally, but with Ethernet, it's done with the data itself, by the Preamble. Externally-clocked systems (like serial transmissions) have separate paths (wires, circuit paths, chip paths) that transmit a clock sync signal. Internally-clocked protocols, like Ethernet frames, have the sync information built into the incoming data. In this case, the first 7 bytes of the frame are used to sync the receiving clock with the transmitting clock.

All of this happens at layer 2. How do we know this? Well, the ability to process frames is built into the firmware on the NIC. Essentially, the Link Layer is implemented in microcode on the network interface.

Handling source and destination addresses

Layer 2, the Link Layer, uses ARP to find the MAC address of the host with a given IP address. We know layer 2 does this because ARP builds Ethernet frames, not packets. ARP has to get the target IP address from the Network Layer, layer 3 -- which is why it's easy to misunderstand where ARP works. Any translation of MAC addresses to IP addresses (or vice versa) takes place -- when using ARP -- via Ethernet frames, and thus functions at layer 2.

The Network / Internetwork Layer

This is the layer where packets are managed, independent of the Layers 1 and 2. At layer 3, everything is coded in terms of packet formats that are independent of the type of link-layer network you're using. These packet formats are compatible with (but not driven by) the constraints of the physical layer. Things like addressing schemes and routing algorithms are the main characteristic of this layer.

The Transport Layer

Layer 4 brings us to protocols implemented only by the end hosts (i.e., not by the routers or other switching gear that connect the network). This layer handles things like redundancy, confirmed delivery, managing packets on an unreliable network, etc. This is the last layer that TCP/IP has anything to say about; layers above this are unique to specific applications.

The Session Layer

Layer 5, the session layer, is where ongoing interactions between applications happen, so the data is couched in terms of things an application might understand (like, e.g., cookies for a Web browser). This is also the layer where check-pointing (i.e., saving work finished so far) happens.

The Presentation Layer

The presentation layer converts data formats and ensures standard encodings are used to present the information to the application.

The Application Layer

The top layer, layer 7, is totally the province of the application(s) involved in processing messages.


Updated 2022-01-07 Fri 12:44 by Bill Wear (stormrider)


Copyright (C) 2020 by Bill Wear. All rights reserved, but asking to use is permitted and welcome.