Trials of an Upper Division Computer Science Student: September 2021

Monday, September 27, 2021

CST 311 - Week 4

Secure Communication

This week I studied secure communication. The internet provides many useful communication services, but hackers can easily intercept important messages. Apps must provide confidentiality by ensuring intercepted messages cannot be read. Hackers can also modify intercepted messages or even forge messages or headers. There must be ways to verify identities (authentication) and that messages have not been altered (integrity).

Confidentiality

Encryption enforces confidentiality. Plaintext messages are converted to encrypted versions known as ciphertext. A key and a message are passed into an encryption algorithm (RSA, for example) For encryption to work, the receiver must have a key to decrypt the message. Since it is not feasible for two random people on the internet to have the same (symmetric) key, a public key system was introduced to solve the problem. Every computer has a pair of keys: a public key and a private key, which are two halves of the same key (K+ and K-, respectively). If I want to send you a message, I will run your public key and my message through an encryption algorithm before I send it to you. You will then use your private key with the relevant decryption algorithm to read the message. While this does provide confidentiality, it does not include authentication: impersonation can occur.

Authentication and Integrity

To verify that a message has been received from a trusted sender, digital signatures can be used. Senders apply their private key to a message and receivers use the sender's public key to verify the signature. This also preserves message integrity because if the message has been altered, authentication will fail because the result of the private key and public key will not be the original message. Note that it can be computationally expensive to apply digital signatures to entire messages, so the digital signature can instead be applied to a message that has been passed through a hash function (like SHA-1 or MD5). The hashed version of the message would be used for authentication and integrity checks. To verify that a signature is real, certificate authorities may be referenced that bind public keys to specific, unique identities.

Tuesday, September 21, 2021

CST 311 - Week 3

TCP Connection Management

While UDP essentially just provides a "send and pray" service, TCP provides a reliable data transfer service. As discussed previously, TCP initiates a connection with a three-way-handshake procedure. This is accomplished by the TCP client and TCP server sending special segments to each other. These segments contain no application-layer data, but do contain headers with bits that are used to establish, maintain, and close a connection.

Establishment

To initiate a connection, he client sends a SYN segment with the SYN bit set to 1. This segment's header contains a random initial sequence number (client_isn) to be used by the server for acknowledgement. The server responds with a SYNACK segment with the SYN bit also set to 1. This segment contains a random initial sequence number (server_isn) and an acknowledgement number (client_isn+1), which is the sequence number of the next expected segment from the client. Finally, the client responds with a segment that contains the acknowledgement number (server_isn+1) and (usually) its first data payload. The SYN bit is set to 0 in this final acknowledgement because the connection is already established.

Maintenance

As noted above, sequence numbers and acknowledgement numbers are used by the client and server during connection establishment. However, it would be incorrect to assume they are offset by 1 every time. During normal TCP communication, the sequence numbers are offset by the number of bytes that have been transmitted. Each sequence number identifies the byte number of the first byte in the segment's payload. The maximum segment size (MSS) is the maximum size of the data payload (NOT the data + headers) and is dependent on the maximum transmission unit (MTU) in the link layer. When sending large files, sequence numbers are offset by one MSS. By synchronizing SEQ and ACK numbers, clients can retransmit lost packets when a server fails to acknowledge a segment (through a timeout timer) or when a server sends too many duplicate acknowledgements, which also implies packet loss. The procedure also guarantees that segments arrive in the correct order.

Closure

A connection can be closed by either a client or a server. First, the initializer sends a shutdown segment with the FIN bit set to 1 and enters the FIN_WAIT_1 state. The receiver acknowledges, sends its own shutdown segment (note two segments are sent here), and enters the CLOSE_WAIT state. The initializer receives the acknowledgement and enters FIN_WAIT_2 while waiting for the receiver's shutdown segment. When it arrives, the initializer sends one final acknowledgement and enters the TIME_WAIT state, which gives it time to resend the final acknowledgement (typically 30 seconds to 2 mins). Afterwards, the connection formally closes.

Tuesday, September 14, 2021

CST 311 - Week 2

Socket Programming

This week we were introduced to something near and dear to me: socket programming! I am no stranger to it. In fact, it was one of the very first things I ever learned about programming (while modifying mIRC connection scripts for the MSN Chat Network). What I did not know or understand until this week, however, was the difference between UDP and TCP sockets. Either the User Datagram Protocol (UDP) or the Transmission Control Protocol (TCP) may be used to send data across the internet from one host to another. For general information about them, see my blog post last week

So what is a socket? A socket is, in reality, just part of the transport layer's API. Sockets are what applications use to send and receive data over the internet. All of the major languages provide a high-level abstraction (socket library) for accessing the transport layer's services. There is little programmatic difference between using TCP or UDP sockets. The primary difference is that TCP requires a connection to be established using a three-way-handshake procedure before sending data. First, the client (sending application) sends a packet (SYN) to the server (receiving application). The server then responds to the client with a packet (SYN/ACK). Finally, the client acknowledges the response with another packet (ACK). Note that applications often choose to send their first data packet with their ACK packet. For example, web browsers place an HTTP request in theirs.

The handshake procedure involves a little extra work in our program. Unlike UDP's interface, which simply requires us to bind the server application to a certain port number* before receiving data, TCP's interface requires the server application to listen to the port (after binding to it) for any incoming connections, and then explicitly accept the connection. On the client side, a TCP application must first connect to the server before sending data, whereas UDP applications attach the server's IP address and port to each sent data packet. Note that the operating system automatically assigns an available port to our client application so it can receive data from the server.

* Most internet users know that their computer/device has an IP address that others can potentially connect to. What they may not know about is something called a port. There are many port numbers, ranging from 0 to 65353, and they all belong to your IP address! Common applications are tied to certain ports. For example, e-mail applications use port 25, web browsers use port 80, and chat rooms use port 6667. Ports make it easier for the operating system to direct traffic to the correct application. If you can think of your IP address like a seaport, you probably would not want all arriving ships to unload their cargo at the same dock.

Tuesday, September 7, 2021

CST 311 - Week 1

A New Semester

Here we go! Summer break has ended and a new class has begun. This one will (hopefully) be a bit less demanding than my last class. Indeed, I have been looking forward to learning about networking. I do have a little bit of prior knowledge from watching the likes of NetworkChuck and PowerCert Animated Videos on YouTube, but I am very much an amateur in the realm of networking. I hope that by the end of this class I will know enough about it to test for a certificate.

Five Layers of Networking

The Internet protocol stack (TCP/IP) consists of five layers*: application layer, transport layer, network layer, link layer, and physical layer. The highest level, the application layer, is what most people are familiar with. The application layer of networking includes common user software such as web browsers (HTTP), e-mail applications (SMTP), file transfer (FTP) and chat clients (IRC). In order for these applications to be useful, they must be able to send messages across the internet to other computers running the same software.

To analyze each layer of networking, we will follow an e-mail sent from our own computer (localhost) to another host: a computer, phone, laptop, or any other end-system belonging to a user on the network edge. The edge of the network consists of regular users (or the servers they connect to). These users often share a local area network (LAN) that is connected to the internet using a router (in conjunction with a modem, which converts digital signals to analog for transport over the wire, or vice versa for incoming analog signals). A router also works in tandem with a switch, which provides similar functionality ("routing" data, or packets of information, to the correct destination) but operates in a lower layer of the network.

When we press "send" in our e-mail client (say, Outlook), the first thing the application does is decide which transport protocol to use: Transmission Control Protocol (TCP) or User Datagram Protocol (UDP). These protocols provide the same functionality: taking our message (which now includes our e-mail text and other SMTP-specific header information) and packaging it up into a transportation-layer segment for transfer to the network (the next layer). There is one primary difference, however. TCP guarantees that our message arrives at the destination. Sometimes packet loss is experienced as data is sent across the internet. This occurs when a router along the path to our destination cannot accept anymore packets due to its queue being full. With TCP, the lost packets will be resent, guaranteeing that they will eventually arrive. With UDP, there is no such guarantee.

The next layer is the network layer. Let's assume our e-mail client chose TCP to guarantee delivery of our mail, and the protocol passed a nicely wrapped segment down to the network, with headers noting our own network address (the source IP address**) and a destination IP address. Here, the network will act like a post office by analyzing the destination address and selecting the best path for forwarding the message along. Any relevant header information is added onto the segment, and the resulting datagram is passed down to the link layer. Alternatively, if the destination address of an arriving datagram belongs to the node, the network layer passes the enclosed segment back up to the transport layer, which passes the enclosed message back up to the application layer so that our friend's e-mail client can read it!***

Of course, our network-layer datagrams need to be transported from router-to-router somehow. That is where the last two layers come in. The link layer provides transport services for the network layer by converting datagrams into specially-crafted frames for transport on the physical layer. The physical layer is the medium through which bits are passed from one node to the next, and frames often pass through more than one type of physical medium on their journey across the internet (and, by extension, are handled by different link-layer and physical-layer protocols). Incoming bits from the physical link are stored in a frame and the enclosed datagram is passed back up to the network layer. Note that link-layer switches function similarly to network-layer routers, but they use MAC addresses rather than IP addresses for packet switching, which are unique to each node's network interface card.

* The OSI Model defines seven layers of networking, the missing two being the presentation and session layers. The TCP/IP Model assumes that if an application requires the services provided by these missing layers (such as data compression or encryption), the functionality will be added manually by the developer.

** The Internet Protocol (IP) is the backbone of the internet and dictates how the network layer (of any node on the internet) must operate. Many nodes on the internet are just routers, and since routers operate in the network layer, they do not typically include higher level layers (transport or application).

*** All these enclosures result in multi-layer encapsulation of our message.

Trials of an Upper Division Computer Science Student