Novice's Thoughts

Introducing the Sockets API — Study Notes

Based on Beej's Guide to Network Concepts, Chapter 3


What is the Sockets API?

The sockets API is how programs communicate over a network. In Python, it wraps the lower-level C sockets API and gives you an object-oriented interface to send and receive data over the Internet.


Client Connection Process

When a client wants to connect to a server, it follows these steps:

  1. Ask the OS for a socket — Creates a socket object; not connected to anything yet.
  2. DNS lookup — Resolves a hostname like example.com to an IP address. In Python, socket.connect() does this implicitly.
  3. Connect the socket — Connects to the target IP on a specific port (e.g., port 80 for HTTP).
  4. Send and receive data — Data is exchanged as a sequence of bytes.
  5. Close the connection — Either side can close the socket when done.

Server Listening Process

A server works differently — it waits for clients to come to it:

  1. Ask the OS for a socket — Same as client.
  2. Bind to a port — Claims a specific port number (e.g., port 8080) using bind(). Ports below 1024 require root/admin privileges.
  3. Listen for connections — Tells the OS to start accepting incoming connection requests via listen().
  4. Accept connectionsaccept() blocks (sleeps) until a client connects, then returns a new socket dedicated to that client. The original socket keeps listening.
  5. Send and receive data — Server handles the client's request and sends a response.
  6. Loop back to accept — Servers are long-running and handle many clients over their lifetime.

Key Concepts & Insights

bind() — Claiming a Port

bind() assigns a specific port number to the socket so the OS knows to route incoming traffic on that port to this server. Without it, nobody knows where to knock.

Do Clients Call bind()?

Usually not. The OS automatically assigns an available ephemeral port to the client when connect() is called. Clients can explicitly call bind() if they need a specific source port (e.g., for firewall rules), but this is rare.

Why Does accept() Return a New Socket?

So the server can handle multiple clients simultaneously. The original listening socket stays free to keep accepting new connections, while each client gets its own dedicated socket. Without this, the first client would "take over" the socket and no new connections could be accepted.

The 4-Tuple — How the OS Routes Data

Every TCP connection is uniquely identified by:

(source IP, source port, destination IP, destination port)

The OS uses this 4-tuple to route every incoming packet to the correct socket. Even two connections from the same client machine will have different source ports, making them distinguishable.

What If the Server Doesn't Loop Back to accept()?

Ports Are Per-Machine, Not Global

Two different machines can both use port 3490 simultaneously — no conflict. The full 4-tuple (which includes the IP address) keeps connections unique across the entire network.

Why Do Ports Exist?

IP addresses get you to the right machine. Ports get you to the right service on that machine. Without ports, a computer could only run one network service at a time. With ports, a single machine can simultaneously run:

Bonus: NAT and Port Remapping

In a LAN, multiple devices share a single public IP via NAT (Network Address Translation). If two devices happen to use the same source port, the router remaps one to a different port and maintains a translation table to route replies back to the correct device. This is why NAT is sometimes called port masquerading.

Bonus: IP Spoofing

An attacker can forge the source IP in a packet, but TCP makes this very hard to exploit:


Reflection Q&A

Q: What role does bind() play on the server side? bind() assigns a specific port number to the socket so the OS knows to route incoming traffic on that port to this server. It is distinct from listen()bind() claims the port, listen() then starts waiting for connections on it.


Q: Would a client ever call bind()? Usually no — when a client calls connect(), the OS automatically assigns an available ephemeral port implicitly. However, a client can explicitly call bind() if it needs a specific source port, for example to satisfy firewall rules or protocol requirements.


Q: Why does accept() return a new socket instead of reusing the listening socket? So the server can handle multiple clients simultaneously. The original listening socket stays free to keep accepting new connections, while each client gets its own dedicated socket. Think of it like a receptionist who keeps greeting new visitors while assigning each one to a different staff member.


Q: How does the OS route data to the correct socket when the same client connects multiple times? Every TCP connection is uniquely identified by a 4-tuple: (source IP, source port, destination IP, destination port). Even if the same client connects again, the OS assigns it a different ephemeral source port, making the 4-tuple unique. Every packet is routed to the correct socket using this fingerprint.


Q: What happens if the server doesn't loop back to accept()? The OS maintains a backlog queue, so a second client may appear to connect successfully at the TCP level — but the server never picks it off the queue. The client is stuck waiting with no response. Once the backlog fills up, further clients start getting refused or timing out.


Q: If one computer is using TCP port 3490, can another computer use port 3490? Yes. Ports are per-machine, not global. The full 4-tuple (which includes the IP address) keeps connections unique across the entire network, so two different machines using the same port number is perfectly fine.


Q: Why do ports exist? What do they add over plain IP addresses? IP addresses get you to the right machine. Ports get you to the right service on that machine. Without ports, a computer could only run one network service at a time. With ports, a single machine can simultaneously serve HTTP, HTTPS, SSH, mail, and more — each on its own port.


Q: In a LAN, what happens if two devices use the same source port? The router handles this via NAT (Network Address Translation). It remaps one device's source port to a different unused port and maintains a translation table to route replies back to the correct device. This is why NAT is sometimes called port masquerading.


Q: Is IP spoofing possible? Technically yes, but TCP makes it very hard to exploit. Replies go to the spoofed IP, not the attacker — so the attacker is blind. TCP's handshake also requires echoing back a random sequence number from the server's SYN-ACK, which the attacker can't see. Spoofing is still used in SYN flood DoS attacks and is easier with UDP (no handshake). Modern routers also apply ingress filtering (BCP38) to drop packets with suspicious source IPs.


Extra Resources


Study session notes — Beej's Guide to Network Concepts, Chapter 3