Chengyu SHEN

Graduate Student in USC

Network Protocols: What has happended when you type "http://www.linkedin.com"

Mar 29, 2018 / Written by Chengyu

Thanks to the powerful search engines, like Google, Bing and Baidu, cyber-citizens find it is quiet convenient to search needed information by simply typing some keywords in the search bar. However, do you know what has happened when you type a famous link, such as “https://www.linkedin.com”? This article explores the “magic” behind the web browser and afterward, you are about to have a better understanding on how to get corresponding context when you type a URL.

OSI Model and TCP/IP Protocol Layers

The Open System Interconnection model (OSI) is a conceptional model which specifies the details about telecommunication without using any internal structure and technology. In this article, we will explain the process of internet communication through OSI Model Layers. To simplify, the TCP/IP protocol Layers is easier to understand. Instead of 7 different layers, TCP/IP combine the Application Layer, Performance Layer and Session Layer as Application Layer (As shown in the Figure.1) so that there is only 5 layers in TCP/IP Protocol Layers.

  Figure.1 Telecommunication Layers  

Processes:

This article separates the process by TCP/IP Protocol Layers.

First Step: Application Layer:

When you type the “https://www.linkedin.com” in an browser, we start from the Application layer. It is obvious the protocol used in the application layer is https, However. we couldn’t send the request to remote server, because we have not established the connection to the LinkedIn server at this moment.

Second Step: move down to Transport Layer:

In order to establish connection, we move top-down and go to the Transport Layer. Since the https is only for declaring protocol used in Application Layer, it is safe to remove it in the lower layer. Currently, we tend to make a TCP connection with LinkedIn Server(s). Nevertheless, the host-to-host connection requires IP address of both side, it is okay for us to know one peer IP, namely us, but the LinkedIn Server(s) addresses are unknown, right?

Third Step: move down to Network Layer:

Thus, DNS (domain name system) helps us find the IP address from the domain name. You can treat it as a database whose entities are key-value pairs, which key is the domain name(url) and value is its IP address(es). This kind of pair is different from HashMap, in which you can merely get value from key but not key from value. If you have terminal in your computer, or some Linux-like command line interface, like PuTTy, you could type “nslookup www.linkedin.com” to perform the DNS service.
Even though it seems that internet needs to waste some time for searching the IP address and someone may argue that why don’t use the IP address straightforwardly, DNS is a user-friendly design in which all the domain name are some meaningful strings, like LinkedIn or Google. DNS resolve the problem that IP address is usually random numbers and it is less difficult for users to remember domain name than IP address.
Owning to some smart browsers, they help us to get the IP address from cache before sending requests to DNS servers:

  • First, it checks the browser cache. The browser has cookies to speed up visiting. If you have visited “www.linkedin.com” recently, congratulate, it is probably you have already had the IP address of our company.
  • Second, it checks the Operating System cache. If the browser doesn’t get IP address from its own cache, it would ask help from OS since OS also maintain IP addresses of some frequently visited website
  • Third, it checks the router cache. Maybe in the Local Area Network (LAN), some users have visited “www.linkedin.com” before.
  • Finally, it checks the ISP (you can simply think it is the internet provider, like one spectrum company in your area) DNS server. Since even though no one is interested in LinkedIn in your area, some customers of your internet provider are interested in our company service.

It is acknowledge that more information stored in cache, the more dangerous your information is. However, using cache is quiet prevent in most company since it is able to improve data transfer time.

  Figure.2: Domain Hierarchy  

Unfortunately, your browser cannot find the IP address from any cache, so, it decides to send the DNS query to DNS servers. DNS domain name space is a tree structure which illustrates in the figure.2. This tree structure increase DNS lookup efficiency and it works following this step (Figure.3)

  • Your computer asks IP address of LinkedIn server(s).
  • ISP doesn’t have the IP address of LinkedIN Server(s) either, so it sends request to root DNS server.
  • Root DNS Server cannot resolve the name. But it can help peer find the IP address of Top level DNS server, namely .com Server in the scenario.
  • ISP sends request to .com DNS server.
  • .com Top level server has the IP address of all authenticated name servers whose suffixes is .com and IP address of each authentative name server for the second level domain.
  • ISP get the address of the LinkedIn organization's DNS server, and send the request to it
  • linkedIn server receive the request from ISP and return the IP address stored in its database.
  • ISP server send the IP address of www.LinkedIn.com back to you and it caches it in its local DNS database, as well.

  Figure.3: DNS Request Process  

Forth Step: move up to Transport Layer:

Now, we have the IP addresses for both peers so that we could establish connection between them! TCP/IP three-handshake is the key process for TCP connection establishment, which is a three step process where client send messages with TCP header (only SYN/ACK are utilized in this step).

  • Client want to connect to LinkedIn Server, it sends a TCP message with SYN flag to LinkedIn IP address we got from the last step.
  • The LinkedIn Server receive request from client and it allows this certain client to connect to the server. Therefore, it send the TCP message with specified ACK flag.
  • The client is happy since s/he receive the acknowledgment from The LinkedIn Server, it send the SYN message back and now two peers has established the TCP connection.

Fifth Step: move up to Application Layer:

Based on the TCP connection, you can transfer data with The LinkedIn Server now! The GET request request is utilized for HTTP/S protocol, you can see the detail in this.



Congrats, I believe you have a better understanding about the mechanism behind browser when you type “https://www.linkedin.com”!

Add on:

What we have discussed in the above is based on an assumption: you want to connect to the LinkedIn Server and transfer data with it via internet. However, if your IP address is in the same local network as LinkedIn, you don’t need to do the above things. Instead, you will use Data Linked Layer, in which ARP helps you get the MAC address of LinkedIn Server(s) and you can communicate with it straightly via MAC address.