How the web works

The Web is a subset of the Internet.

While the Internet makes up all communication between computers within any application, port, or protocol — the Web generally refers to networks of computers communicating over Hyper Text Transport Protocol (HTTP), serving Hyper Text Markup Language (HTML) and related assets, such as images, stylesheets, and scripts.

All these technologies work together to create web sites and web applications, parsed by web browsers and various Application Programming Interfaces (APIs).

When a user requests a web page in a web browser, the process typically looks like this:

(Image from easy engine’s request cycle)

  • Search, Social, or Direct Access: If a user types in a search phrase, and the browser bar also acts as a search bar, the web site of the default search engine is queried.
  • DNS Lookup: If the user types in an exact web address, or clicks a link from a social web site or search engine, a Domain Name Server lookup determines which computer should respond to the request.
    • Domain name servers can be overridden with a hosts file.
    • Some domain name servers can intercept, modify, and optimize content. For example, Cloudflare, which can add SSL, optimize HTML, optimize images, etc.
  • Connection & Headers: Browser negotiates either an insecure HTTP connection on default port 80, or a secure HTTPS connection on default port 443. Request headers and response headers are sent. These are viewable in the Web Inspector.
    • A web server doesn’t have to be running on default ports. For example, many development servers might load websites at localhost:8080 or any other arbitrary port.
  • Initial HTML & AJAX: If a browser request is the first request, HTML is most likely returned. This HTML usually contains references to hundreds of other files, which are requested and processed by the browser.
    • Initial HTML and AJAX requests, in our case, are typically handled by a Content Management System such as WordPress.
      • Each dynamic request spins up a new server side process. One page might involve hundreds of HTTP requests!
        • Content Management Systems, such as WordPress, will typically not only spin up a programming language process, but also a database instance for content. This might be MySQL, SQLite, MongoDB (NoSQL), or many others.
      • If WordPress is not in use, PHP might still be in use with another PHP-based CMS such as Laravel.
      • If PHP is not in use, the web server could be any number of server-side technologies, such as static assets (plain HTML or JavaScript), a NodeJS framework such as AdonisJS, or another language such as Ruby, Java, Python, or C# (Microsoft ASP.NET).
  • Static Assets: If the requested file is a static asset, such as CSS, JavaScript, or images, a PHP process is typically not spun up (though PHP can serve these things!) Instead, they are typically served directly by the web server (Apache, Nginx, a combination of these, or another server, such as Lighttpd or ASP.NET’s Kestrel server).
    • Serving static assets without PHP is faster than spinning up a PHP process — often by an order of magnitude.
    • Nginx is usually faster than Apache. For this reason, they are sometimes used together, with Nginx as a front-end proxy.
    • Static assets are sometimes offloaded to another server entirely — for example, a CDN such as Cloudflare CDN, Jetpack, or JSDelivr.
    • Each static asset, JavaScript, CSS, and images, require processing by the browser.
      • This processor utilization can be seen in the Web Inspector and browser metric tools such as Lighthouse. We have talked about it in detail in a dedicated course that deals with improving the website performance.
      • If CSS, JavaScript, or images are too large, users may experience a Flash of Unstyled Content (FOUC).
      • For this reason, Lazy Loading (Jetpack) might be implemented for images, critical CSS might be inlined into the HTML head, and CSS or JavaScript may be minified or combined to optimize transfer speed and processing time.
  • Once a web page is loaded, the user starts to interact with the page. From here, simple web pages might link to other pages, re-initiating the entire cycle. If JavaScript is in use, as with ReactJS, Decoupled WordPress (a website which uses a different framework or library as front-end to represent content and interacts with WordPress as content management system using its REST APIs), or plugins like Flying Pages, JavaScript might be used to update page content without reloading initial HTML.

Caching: It’s important to note — caching may take effect at almost every step of the connection process. The browser has a cache. DNS may have a cache (Cloudflare), the CMS may have a cache, and there may be various application-level caches (WordPress wp_object_cache, WordPress Transients API, or plugins such as W3 Total Cache) or server-level caches, such as Redis, Varnish, or Nginx.

Understanding which processes are responsible for which portions of a web page load should enable you to interpret graphs such as the Waterfall Chart provided by sites like webpagetest.org or the Web Inspector.