http://searchnetworking.techtarget.com/definition/Address-Resolution-Protocol-ARP Address Resolution Protocol (ARP) is a protocol for mapping an Internet Protocol address (IP address) to a physical machine address that is recognized in the local network. ARP is used to resolve the ethernet address of a NIC from an IP address in order to construct an ethernet packet around an IP data packet. This must happen in order to send any data across the network. Reverse address resolution protocol (RARP) is used for diskless computers to determine their IP address using the network. In an Ethernet local area network, however, addresses for attached devices(MAC Address) are 48 bits long. A table, usually called the ARP cache, is used to maintain a correlation between each MAC address and its corresponding IP address. ARP provides the protocol rules for making this correlation and providing address conversion in both directions. Why need to send ARP to ask for MAC address when IP address is already unique? Because you need a unique identifier build into the network card for station identification in case you don't have an IP address. Or how should a system get a valid IP address using DHCP, when there is no identification of the station who wants to get one? And since IP is not the only protocol you can send over ethernet, ethernet itself has to provide an unique identifier to allow for different protocols. It's historical. MAC addresses are a layer-2 thing, while IP addresses are layer-3 (see, eg, this Wikipedia page for more detail on the layers).
Why this separation? Well, when ethernet was invented, IP was not the only networking technology that could be carried on an ethernet network.
How ARP Works When an incoming packet destined for a host machine on a particular local area network arrives at a gateway, the gateway asks the ARP program to find a physical host or MAC address that matches the IP address. The ARP program looks in the ARP cache and, if it finds the address, provides it so that the packet can be converted to the right packet length and format and sent to the machine. If no entry is found for the IP address, ARP broadcasts a request packet in a special format to all the machines on the LAN to see if one machine knows that it has that IP address associated with it. A machine that recognizes the IP address as its own returns a reply so indicating. ARP updates the ARP cache for future reference and then sends the packet to the MAC address that replied. There is a Reverse ARP (RARP) for host machines that don't know their IP address. RARP enables them to request their IP address from the gateway's ARP cache. http://searchnetworking.techtarget.com/definition/Reverse-Address-Resolution-Protocol RARP (Reverse Address Resolution Protocol) is a protocol by which a physical machine in a local area network can request to learn its IP address from a gateway server's Address Resolution Protocol (ARP) table or cache. A network administrator creates a table in a local area network's gateway router that maps the physical machine (or Media Access Control - MAC address) addresses to corresponding Internet Protocol addresses. When a new machine is set up, its RARP client program requests from the RARP server on the router to be sent its IP address. Assuming that an entry has been set up in the router table, the RARP server will return the IP address to the machine which can store it for future use. http://en.wikipedia.org/wiki/Reverse_Address_Resolution_Protocol The Reverse Address Resolution Protocol (RARP) is an obsolete computer networking protocol used by a client computer to request its Internet Protocol (IPv4) address from a computer network, when all it has available is its Link Layer or hardware address, such as a MAC address. The client broadcasts the request, and does not need prior knowledge of the network topology or the identities of servers capable of fulfilling its request. It has been rendered obsolete by the Bootstrap Protocol (BOOTP) and the modern Dynamic Host Configuration Protocol (DHCP), which both support a much greater feature set than RARP. RARP requires one or more server hosts to maintain a database of mappings of Link Layer addresses to their respective protocol addresses. Media Access Control (MAC) addresses needed to be individually configured on the servers by an administrator. RARP was limited to serving only IP addresses.
This repository is an attempt to answer the age old interview question "What happens when you type google.com into your browser's address box and press enter?"
The "g" key is pressed
The following sections explains all about the physical keyboard and the OS interrupts. But, a whole lot happens after that which isn't explained. When you just press "g" the browser receives the event and the entire auto-complete machinery kicks into high gear. Depending on your browser's algorithm and if you are in private/incognito mode or not various suggestions will be presented to you in the dropbox below the URL bar. Most of these algorithms prioritize results based on search history and bookmarks. You are going to type "google.com" so none of it matters, but a lot of code will run before you get there and the suggestions will be refined with each key press. It may even suggest "google.com" before you type it.
The "enter" key bottoms out
To pick a zero point, let's choose the Enter key on the keyboard hitting the bottom of its range. At this point, an electrical circuit specific to the enter key is closed (either directly or capacitively). This allows a small amount of current to flow into the logic circuitry of the keyboard, which scans the state of each key switch, debounces the electrical noise of the rapid intermittent closure of the switch, and converts it to a keycode integer, in this case 13. The keyboard controller then encodes the keycode for transport to the computer. This is now almost universally over a Universal Serial Bus (USB) or Bluetooth connection, but historically has been over PS/2 or ADB connections.
In the case of the USB keyboard:
The USB circuitry of the keyboard is powered by the 5V supply provided over pin 1 from the computer's USB host controller.
The keycode generated is stored by internal keyboard circuitry memory in a register called "endpoint".
The host USB controller polls that "endpoint" every ~10ms (minimum value declared by the keyboard), so it gets the keycode value stored on it.
This value goes to the USB SIE (Serial Interface Engine) to be converted in one or more USB packets that follows the low level USB protocol.
Those packets are sent by a differential electrical signal over D+ and D- pins (the middle 2) at a maximum speed of 1.5 Mb/s, as an HID (Human Interface Device) device is always declared to be a "low speed device" (USB 2.0 compliance).
This serial signal is then decoded at the computer's host USB controller, and interpreted by the computer's Human Interface Device (HID) universal keyboard device driver. The value of the key is then passed into the operating system's hardware abstraction layer.
In the case of Virtual Keyboard (as in touch screen devices):
When the user puts their finger on a modern capacitive touch screen, a tiny amount of current gets transferred to the finger. This completes the circuit through the electrostatic field of the conductive layer and creates a voltage drop at that point on the screen. The screen controller then raises an interrupt reporting the coordinate of the key press.
Then the mobile OS notifies the current focused application of a press event in one of its GUI elements (which now is the virtual keyboard application buttons).
The virtual keyboard can now raise a software interrupt for sending a 'key pressed' message back to the OS.
This interrupt notifies the current focused application of a 'key pressed' event.
Interrupt fires [NOT for USB keyboards]
The keyboard sends signals on its interrupt request line (IRQ), which is mapped to an interrupt vector (integer) by the interrupt controller. The CPU uses the Interrupt Descriptor Table (IDT) to map the interrupt vectors to functions (interrupt handlers) which are supplied by the kernel. When an interrupt arrives, the CPU indexes the IDT with the interrupt vector and runs the appropriate handler. Thus, the kernel is entered.
(On Windows) A WM_KEYDOWN message is sent to the app
The HID transport passes the key down event to the KBDHID.sys driver which converts the HID usage into a scancode. In this case the scan code is VK_RETURN (0x0D). The KBDHID.sys driver interfaces with the KBDCLASS.sys (keyboard class driver). This driver is responsible for handling all keyboard and keypad input in a secure manner. It then calls intoWin32K.sys (after potentially passing the message through 3rd party keyboard filters that are installed). This all happens in kernel mode.
Win32K.sys figures out what window is the active window through the GetForegroundWindow() API. This API provides the window handle of the browser's address box. The main Windows "message pump" then calls SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam). lParam is a bitmask that indicates further information about the keypress: repeat count (0 in this case), the actual scan code (can be OEM dependent, but generally wouldn't be for VK_RETURN), whether extended keys (e.g. alt, shift, ctrl) were also pressed (they weren't), and some other state.
The Windows SendMessage API is a straightforward function that adds the message to a queue for the particular window handle (hWnd). Later, the main message processing function (called a WindowProc) assigned to the hWnd is called in order to process each message in the queue.
The window (hWnd) that is active is actually an edit control and the WindowProc in this case has a message handler forWM_KEYDOWN messages. This code looks within the 3rd parameter that was passed to SendMessage (wParam) and, because it is VK_RETURN knows the user has hit the ENTER key.
(On OS X) A KeyDown NSEvent is sent to the app
The interrupt signal triggers an interrupt event in the I/O Kit kext keyboard driver. The driver translates the signal into a key code which is passed to the OS X WindowServer process. Resultantly, the WindowServer dispatches an event to any appropriate (e.g. active or listening) applications through their Mach port where it is placed into an event queue. Events can then be read from this queue by threads with sufficient privileges calling the mach_ipc_dispatch function. This most commonly occurs through, and is handled by, an NSApplication main event loop, via an NSEvent of NSEventTypeKeyDown.
(On GNU/Linux) the Xorg server listens for keycodes
When a graphical X server is used, X will use the generic event driver evdev to acquire the keypress. A re-mapping of keycodes to scancodes is made with X server specific keymaps and rules. When the scancode mapping of the key pressed is complete, the X server sends the character to the window manager (DWM, metacity, i3, etc), so the window manager in turn sends the character to the focused window. The graphical API of the window that receives the character prints the appropriate font symbol in the appropriate focused field.
Parse URL
The browser now has the following information contained in the URL (Uniform Resource Locator):
Protocol "http"
Use 'Hyper Text Transfer Protocol'
Resource "/"
Retrieve main (index) page
Is it a URL or a search term?
When no protocol or valid domain name is given the browser proceeds to feed the text given in the address box to the browser's default web search engine. In many cases the url has a special piece of text appended to it to tell the search engine that it came from a particular browser's url bar.
Check HSTS list
The browser checks its "preloaded HSTS (HTTP Strict Transport Security)" list. This is a list of websites that have requested to be contacted via HTTPS only.
If the website is in the list, the browser sends its request via HTTPS instead of HTTP. Otherwise, the initial request is sent via HTTP. (Note that a website can still use the HSTS policy without being in the HSTS list. The first HTTP request to the website by a user will receive a response requesting that the user only send HTTPS requests. However, this single HTTP request could potentially leave the user vulnerable to a downgrade attack, which is why the HSTS list is included in modern web browsers.)
Convert non-ASCII Unicode characters in hostname
The browser checks the hostname for characters that are not in a-z, A-Z, 0-9, -, or ..
Since the hostname is google.com there won't be any, but if there were the browser would apply Punycode encoding to the hostname portion of the URL.
DNS lookup
Browser checks if the domain is in its cache.
If not found, the browser calls gethostbyname library function (varies by OS) to do the lookup.
gethostbyname checks if the hostname can be resolved by reference in the local hosts file (whose location varies by OS) before trying to resolve the hostname through DNS.
If gethostbyname does not have it cached nor can find it in the hosts file then it makes a request to the DNS server configured in the network stack. This is typically the local router or the ISP's caching DNS server.
If the DNS server is on the same subnet the network library follows the ARP process below for the DNS server.
If the DNS server is on a different subnet, the network library follows the ARP process below for the default gateway IP.
ARP process
In order to send an ARP broadcast the network stack library needs the target IP address to look up. It also needs to know the MAC address of the interface it will use to send out the ARP broadcast.
The ARP cache is first checked for an ARP entry for our target IP. If it is in the cache, the library function returns the result: Target IP = MAC.
If the entry is not in the ARP cache:
The route table is looked up, to see if the Target IP address is on any of the subnets on the local route table. If it is, the library uses the interface associated with that subnet. If it is not, the library uses the interface that has the subnet of our default gateway.
The MAC address of the selected network interface is looked up.
Depending on what type of hardware is between the computer and the router:
Directly connected:
If the computer is directly connected to the router the router responds with an ARP Reply (see below)
Hub:
If the computer is connected to a hub, the hub will broadcast the ARP request out all other ports. If the router is connected on the same "wire", it will respond with an ARP Reply (see below).
Switch:
If the computer is connected to a switch, the switch will check its local CAM/MAC table to see which port has the MAC address we are looking for. If the switch has no entry for the MAC address it will rebroadcast the ARP request to all other ports.
If the switch has an entry in the MAC/CAM table it will send the ARP request to the port that has the MAC address we are looking for.
If the router is on the same "wire", it will respond with an ARP Reply (see below)
Now that the network library has the IP address of either our DNS server or the default gateway it can resume its DNS process:
Port 53 is opened to send a UDP request to DNS server (if the response size is too large, TCP will be used instead).
If the local/ISP DNS server does not have it, then a recursive search is requested and that flows up the list of DNS servers until the SOA is reached, and if found an answer is returned.
Opening of a socket
Once the browser receives the IP address of the destination server, it takes that and the given port number from the URL (the HTTP protocol defaults to port 80, and HTTPS to port 443), and makes a call to the system library function namedsocket and requests a TCP socket stream - AF_INET and SOCK_STREAM.
This request is first passed to the Transport Layer where a TCP segment is crafted. The destination port is added to the header, and a source port is chosen from within the kernel's dynamic port range (ip_local_port_range in Linux).
This segment is sent to the Network Layer, which wraps an additional IP header. The IP address of the destination server as well as that of the current machine is inserted to form a packet.
The packet next arrives at the Link Layer. A frame header is added that includes the MAC address of the machine's NIC as well as the MAC address of the gateway (local router). As before, if the kernel does not know the MAC address of the gateway, it must broadcast an ARP query to find it.
At this point the packet is ready to be transmitted through either:
For most home or small business Internet connections the packet will pass from your computer, possibly through a local network, and then through a modem (MOdulator/DEModulator) which converts digital 1's and 0's into an analog signal suitable for transmission over telephone, cable, or wireless telephony connections. On the other end of the connection is another modem which converts the analog signal back into digital data to be processed by the next network node where the from and to addresses would be analyzed further.
Most larger businesses and some newer residential connections will have fiber or direct Ethernet connections in which case the data remains digital and is passed directly to the next network node for processing.
Eventually, the packet will reach the router managing the local subnet. From there, it will continue to travel to the AS's border routers, other ASes, and finally to the destination server. Each router along the way extracts the destination address from the IP header and routes it to the appropriate next hop. The TTL field in the IP header is decremented by one for each router that passes. The packet will be dropped if the TTL field reaches zero or if the current router has no space in its queue (perhaps due to network congestion).
This send and receive happens multiple times following the TCP connection flow:
Client chooses an initial sequence number (ISN) and sends the packet to the server with the SYN bit set to indicate it is setting the ISN
Server receives SYN and if it's in an agreeable mood:
Server chooses its own initial sequence number
Server sets SYN to indicate it is choosing its ISN
Server copies the (client ISN +1) to its ACK field and adds the ACK flag to indicate it is acknowledging receipt of the first packet
Client acknowledges the connection by sending a packet:
Increases its own sequence number
Increases the receiver acknowledgment number
Sets ACK field
Data is transferred as follows:
As one side sends N data bytes, it increases its SEQ by that number
When the other side acknowledges receipt of that packet (or a string of packets), it sends an ACK packet with the ACK value equal to the last received sequence from the other
To close the connection:
The closer sends a FIN packet
The other sides ACKs the FIN packet and sends its own FIN
The closer acknowledges the other side's FIN with an ACK
TLS handshake
The client computer sends a ClientHello message to the server with its TLS version, list of cipher algorithms and compression methods available.
The server replies with a ServerHello message to the client with the TLS version, selected cipher, selected compression methods and the server's public certificate signed by a CA (Certificate Authority). The certificate contains a public key that will be used by the client to encrypt the rest of the handshake until a symmetric key can be agreed upon.
The client verifies the server digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this with the server's public key. These random bytes can be used to determine the symmetric key.
The server decrypts the random bytes using its private key and uses these bytes to generate its own copy of the symmetric master key.
The client sends a Finished message to the server, encrypting a hash of the transmission up to this point with the symmetric key.
The server generates its own hash, and then decrypts the client-sent hash to verify that it matches. If it does, it sends its own Finished message to the client, also encrypted with the symmetric key.
From now on the TLS session transmits the application (HTTP) data encrypted with the agreed symmetric key.
HTTP protocol
If the web browser used was written by Google, instead of sending an HTTP request to retrieve the page, it will send a request to try and negotiate with the server an "upgrade" from HTTP to the SPDY protocol.
If the client is using the HTTP protocol and does not support SPDY, it sends a request to the server of the form:
GET / HTTP/1.1
Host: google.com
Connection: close
[other headers]
where [other headers] refers to a series of colon-separated key-value pairs formatted as per the HTTP specification and separated by single new lines. (This assumes the web browser being used doesn't have any bugs violating the HTTP spec. This also assumes that the web browser is using HTTP/1.1, otherwise it may not include the Host header in the request and the version specified in the GET request will either be HTTP/1.0 or HTTP/0.9.)
HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,
Connection: close
HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.
After sending the request and headers, the web browser sends a single blank newline to the server indicating that the content of the request is done.
The server responds with a response code denoting the status of the request and responds with a response of the form:
200 OK
[response headers]
Followed by a single newline, and then sends a payload of the HTML content of www.google.com. The server may then either close the connection, or if headers sent by the client requested it, keep the connection open to be reused for further requests.
If the HTTP headers sent by the web browser included sufficient information for the web server to determine if the version of the file cached by the web browser has been unmodified since the last retrieval (ie. if the web browser included anETag header), it may instead respond with a request of the form:
304 Not Modified
[response headers]
and no payload, and the web browser instead retrieves the HTML from its cache.
After parsing the HTML, the web browser (and server) repeats this process for every resource (image, CSS, favicon.ico, etc) referenced by the HTML page, except instead of GET / HTTP/1.1 the request will be GET /$(URL relative to www.google.com) HTTP/1.1.
If the HTML referenced a resource on a different domain than www.google.com, the web browser goes back to the steps involved in resolving the other domain, and follows all steps up to this point for that domain. The Host header in the request will be set to the appropriate server name instead of google.com.
HTTP Server Request Handle
The HTTPD (HTTP Daemon) server is the one handling the requests/responses on the server side. The most common HTTPD servers are Apache or nginx for Linux and IIS for Windows.
The HTTPD (HTTP Daemon) receives the request.
The server breaks down the request to the following parameters:
HTTP Request Method (either GET, POST, HEAD, PUT and DELETE). In the case of a URL entered directly into the address bar, this will be GET.
Domain, in this case - google.com.
Requested path/page, in this case - / (as no specific path/page was requested, / is the default path).
The server verifies that there is a Virtual Host configured on the server that corresponds with google.com.
The server verifies that google.com can accept GET requests.
The server verifies that the client is allowed to use this method (by IP, authentication, etc.).
If the server has a rewrite module installed (like mod_rewrite for Apache or URL Rewrite for IIS), it tries to match the request against one of the configured rules. If a matching rule is found, the server uses that rule to rewrite the request.
The server goes to pull the content that corresponds with the request, in our case it will fall back to the index file, as "/" is the main file (some cases can override this, but this is the most common method).
The server parses the file according to the handler. If Google is running on PHP, the server uses PHP to interpret the index file, and streams the output to the client.
Behind the scenes of the Browser
Once the server supplies the resources (HTML, CSS, JS, images, etc.) to the browser it undergoes the below process:
Parsing - HTML, CSS, JS
Rendering - Construct DOM Tree → Render Tree → Layout of Render Tree → Painting the render tree
Browser
The browser's functionality is to present the web resource you choose, by requesting it from the server and displaying it in the browser window. The resource is usually an HTML document, but may also be a PDF, image, or some other type of content. The location of the resource is specified by the user using a URI (Uniform Resource Identifier).
The way the browser interprets and displays HTML files is specified in the HTML and CSS specifications. These specifications are maintained by the W3C (World Wide Web Consortium) organization, which is the standards organization for the web.
Browser user interfaces have a lot in common with each other. Among the common user interface elements are:
An address bar for inserting a URI
Back and forward buttons
Bookmarking options
Refresh and stop buttons for refreshing or stopping the loading of current documents
Home button that takes you to your home page
Browser High Level Structure
The components of the browsers are:
User interface: The user interface includes the address bar, back/forward button, bookmarking menu, etc. Every part of the browser display except the window where you see the requested page.
Browser engine: The browser engine marshals actions between the UI and the rendering engine.
Rendering engine: The rendering engine is responsible for displaying requested content. For example if the requested content is HTML, the rendering engine parses HTML and CSS, and displays the parsed content on the screen.
Networking: The networking handles network calls such as HTTP requests, using different implementations for different platforms behind a platform-independent interface.
UI backend: The UI backend is used for drawing basic widgets like combo boxes and windows. This backend exposes a generic interface that is not platform specific. Underneath it uses operating system user interface methods.
JavaScript engine: The JavaScript engine is used to parse and execute JavaScript code.
Data storage: The data storage is a persistence layer. The browser may need to save all sorts of data locally, such as cookies. Browsers also support storage mechanisms such as localStorage, IndexedDB, WebSQL and FileSystem.
HTML parsing
The rendering engine starts getting the contents of the requested document from the networking layer. This will usually be done in 8kB chunks.
The primary job of HTML parser to parse the HTML markup into a parse tree.
The output tree (the "parse tree") is a tree of DOM element and attribute nodes. DOM is short for Document Object Model. It is the object presentation of the HTML document and the interface of HTML elements to the outside world like JavaScript. The root of the tree is the "Document" object. Prior of any manipulation via scripting, the DOM has an almost one-to-one relation to the markup.
The parsing algorithm
HTML cannot be parsed using the regular top-down or bottom-up parsers.
The reasons are:
The forgiving nature of the language.
The fact that browsers have traditional error tolerance to support well known cases of invalid HTML.
The parsing process is reentrant. For other languages, the source doesn't change during parsing, but in HTML, dynamic code (such as script elements containing document.write() calls) can add extra tokens, so the parsing process actually modifies the input.
Unable to use the regular parsing techniques, the browser utilizes a custom parser for parsing HTML. The parsing algorithm is described in detail by the HTML5 specification.
The algorithm consists of two stages: tokenization and tree construction.
Actions when the parsing is finished
The browser begins fetching external resources linked to the page (CSS, images, JavaScript files, etc.).
At this stage the browser marks the document as interactive and starts parsing scripts that are in "deferred" mode: those that should be executed after the document is parsed. The document state is set to "complete" and a "load" event is fired.
Note there is never an "Invalid Syntax" error on an HTML page. Browsers fix any invalid content and go on.
Each CSS file is parsed into a StyleSheet object, where each object contains CSS rules with selectors and objects corresponding CSS grammar.
A CSS parser can be top-down or bottom-up when a specific parser generator is used.
Page Rendering
Create a 'Frame Tree' or 'Render Tree' by traversing the DOM nodes, and calculating the CSS style values for each node.
Calculate the preferred width of each node in the 'Frame Tree' bottom up by summing the preferred width of the child nodes and the node's horizontal margins, borders, and padding.
Calculate the actual width of each node top-down by allocating each node's available width to its children.
Calculate the height of each node bottom-up by applying text wrapping and summing the child node heights and the node's margins, borders, and padding.
Calculate the coordinates of each node using the information calculated above.
Create layers to describe which parts of the page can be animated as a group without being re-rasterized. Each frame/render object is assigned to a layer.
Textures are allocated for each layer of the page.
The frame/render objects for each layer are traversed and drawing commands are executed for their respective layer. This may be rasterized by the CPU or drawn on the GPU directly using D2D/SkiaGL.
All of the above steps may reuse calculated values from the last time the webpage was rendered, so that incremental changes require less work.
The page layers are sent to the compositing process where they are combined with layers for other visible content like the browser chrome, iframes and addon panels.
Final layer positions are computed and the composite commands are issued via Direct3D/OpenGL. The GPU command buffer(s) are flushed to the GPU for asynchronous rendering and the frame is sent to the window server.
GPU Rendering
During the rendering process the graphical computing layers can use general purpose CPU or the graphical processorGPU as well.
When using GPU for graphical rendering computations the graphical software layers split the task into multiple pieces, so it can take advantage of GPU massive parallelism for float point calculations required for the rendering process.
Window Server
Post-rendering and user-induced execution
After rendering has completed, the browser executes JavaScript code as a result of some timing mechanism (such as a Google Doodle animation) or user interaction (typing a query into the search box and receiving suggestions). Plugins such as Flash or Java may execute as well, although not at this time on the Google homepage. Scripts can cause additional network requests to be performed, as well as modify the page or its layout, causing another round of page rendering and painting
1. You enter a URL into the browser
2. The browser looks up the IP address for the domain name
The first step in the navigation is to figure out the IP address for the visited domain. The DNS lookup proceeds as follows:
Browser cache – The browser caches DNS records for some time. Interestingly, the OS does not tell the browser the time-to-live for each DNS record, and so the browser caches them for a fixed duration (varies between browsers, 2 – 30 minutes).
OS cache – If the browser cache does not contain the desired record, the browser makes a system call (gethostbyname in Windows). The OS has its own cache.
Router cache – The request continues on to your router, which typically has its own DNS cache.
ISP DNS cache – The next place checked is the cache ISP’s DNS server. With a cache, naturally.
Recursive search – Your ISP’s DNS server begins a recursive search, from the root nameserver, through the .com top-level nameserver, to Facebook’s nameserver. Normally, the DNS server will have names of the .com nameservers in cache, and so a hit to the root nameserver will not be necessary.
Here is a diagram of what a recursive DNS search looks like:
One worrying thing about DNS is that the entire domain like wikipedia.org or facebook.com seems to map to a single IP address. Fortunately, there are ways of mitigating the bottleneck:
Round-robin DNS is a solution where the DNS lookup returns multiple IP addresses, rather than just one. For example, facebook.com actually maps to four IP addresses.
Load-balancer is the piece of hardware that listens on a particular IP address and forwards the requests to other servers. Major sites will typically use expensive high-performance load balancers.
Geographic DNS improves scalability by mapping a domain name to different IP addresses, depending on the client’s geographic location. This is great for hosting static content so that different servers don’t have to update shared state.
Anycast is a routing technique where a single IP address maps to multiple physical servers. Unfortunately, anycast does not fit well with TCP and is rarely used in that scenario.
Most of the DNS servers themselves use anycast to achieve high availability and low latency of the DNS lookups.
3. The browser sends a HTTP request to the web server
You can be pretty sure that Facebook’s homepage will not be served from the browser cache because dynamic pages expire either very quickly or immediately (expiry date set to past).
So, the browser will send this request to the Facebook server:
GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]
The GET request names the URL to fetch: “http://facebook.com/”. The browser identifies itself (User-Agent header), and states what types of responses it will accept (Accept and Accept-Encodingheaders). The Connection header asks the server to keep the TCP connection open for further requests.
The request also contains the cookies that the browser has for this domain. As you probably already know, cookies are key-value pairs that track the state of a web site in between different page requests. And so the cookies store the name of the logged-in user, a secret number that was assigned to the user by the server, some of user’s settings, etc. The cookies will be stored in a text file on the client, and sent to the server with every request.
The trailing slash in the URL “http://facebook.com/” is important. In this case, the browser can safely add the slash. For URLs of the form http://example.com/folderOrFile, the browser cannot automatically add a slash, because it is not clear whether folderOrFile is a folder or a file. In such cases, the browser will visit the URL without the slash, and the server will respond with a redirect, resulting in an unnecessary roundtrip.
4. The facebook server responds with a permanent redirect
This is the response that the Facebook server sent back to the browser request:
The server responded with a 301 Moved Permanently response to tell the browser to go to “http://www.facebook.com/” instead of “http://facebook.com/”.
There are interesting reasons why the server insists on the redirect instead of immediately responding with the web page that the user wants to see.
One reason has to do with search engine rankings. See, if there are two URLs for the same page, say http://www.igoro.com/ and http://igoro.com/, search engine may consider them to be two different sites, each with fewer incoming links and thus a lower ranking. Search engines understand permanent redirects (301), and will combine the incoming links from both sources into a single ranking.
Also, multiple URLs for the same content are not cache-friendly. When a piece of content has multiple names, it will potentially appear multiple times in caches.
5. The browser follows the redirect
The browser now knows that “http://www.facebook.com/” is the correct URL to go to, and so it sends out another GET request:
6. The server ‘handles’ the request
Web server softwareThe web server software (e.g., IIS or Apache) receives the HTTP request and decides which request handler should be executed to handle this request. A request handler is a program (in ASP.NET, PHP, Ruby, …) that reads the request and generates the HTML for the response.
In the simplest case, the request handlers can be stored in a file hierarchy whose structure mirrors the URL structure, and so for example http://example.com/folder1/page1.aspx URL will map to file /httpdocs/folder1/page1.aspx. The web server software can also be configured so that URLs are manually mapped to request handlers, and so the public URL of page1.aspx could behttp://example.com/folder1/page1.
Request handlerThe request handler reads the request, its parameters, and cookies. It will read and possibly update some data stored on the server. Then, the request handler will generate a HTML response
sites that store a large amount of data and/or have many visitors have to find a way to split the database across multiple machines. Solutions include sharding (splitting up a table across multiple databases based on the primary key), replication, and usage of simplified databases with weakened consistency semantics.
7. The server sends back a HTML response
The Content-Encoding header tells the browser that the response body is compressed using the gzip algorithm.
In addition to compression, headers specify whether and how to cache the page, any cookies to set (none in this response), privacy information, etc
8. The browser begins rendering the HTML
Even before the browser has received the entire HTML document, it begins rendering the website:
9. The browser sends requests for objects embedded in HTML
As the browser renders the HTML, it will notice tags that require fetching of other URLs. The browser will send a GET request to retrieve each of these files, such as images, css/js files.
Each of these URLs will go through process a similar to what the HTML page went through. So, the browser will look up the domain name in DNS, send a request to the URL, follow redirects, etc.
However, static files – unlike dynamic pages – allow the browser to cache them. Some of the files may be served up from cache, without contacting the server at all. The browser knows how long to cache a particular file because the response that returned the file contained an Expires header. Additionally, each response may also contain an ETag header that works like a version number – if the browser sees an ETag for a version of the file it already has, it can stop the transfer immediately.
10. The browser sends further asynchronous (AJAX) requests
Long polling is an interesting technique to decrease the load on the server in these types of scenarios. If the server does not have any new messages when polled, it simply does not send a response back. And, if a message for this client is received within the timeout period, the server will find the outstanding request and return the message with the response.
解析树是以 DOM 元素以及属性为节点的树。DOM是文档对象模型(Document Object Model)的缩写,它是 HTML 文档的对象表示,同时也是 HTML 元素面向外部(如Javascript)的接口。树的根部是"Document"对象。整个 DOM 和 HTML 文档几乎是一对一的关系。
解析算法
HTML不能使用常见的自顶向下或自底向上方法来进行分析。主要原因有以下几点:
语言本身的“宽容”特性
HTML 本身可能是残缺的,对于常见的残缺,浏览器需要有传统的容错机制来支持它们
解析过程需要反复。对于其他语言来说,源码不会在解析过程中发生变化,但是对于 HTML 来说,动态代码,例如脚本元素中包含的 document.write() 方法会在源码中添加内容,也就是说,解析过程实际上会改变输入的内容
由于不能使用常用的解析技术,浏览器创造了专门用于解析 HTML 的解析器。解析算法在 HTML5 标准规范中有详细介绍,算法主要包含了两个阶段:标记化(tokenization)和树的构建。