In this article, I will present you the basics of HTTP.
But why HTTP?
Why should I read about the HTTP you may ask yourself?
Well, if you are a software developer, you will understand how to write better applications by learning how they communicate. If you are system architect or network admin, you will get deeper knowledge on designing complicated network architectures.
The REST, which is very important architectural style nowadays is relying completely upon utilizing HTTP features, so that makes HTTP even more important to understand. If you want to make great RESTful applications, you must understand HTTP first.
So are you willing to pass on the chance to understand and learn the fundamental concepts behind World Wide Web and network communication?
I hope not 🙂
The focus of the article will be on explaining the most important parts of HTTP as simply as humanly possible. The idea is to organize all the useful information about HTTP in one place, to save you the time of going through books and RFCs to find the information you need.
This is the first article of the HTTP series. It will give you a short introduction of the most important concepts of the HTTP.
- The HTTP series (Part 1): Overview of the basic concepts
- The HTTP series (Part 2): Architectural aspects
- The HTTP series (Part 3): Client identification
- The HTTP series (Part 4): Authentication mechanisms
- The HTTP series (Part 5): Security
- The HTTP Reference
You will learn about:
- What the HTTP is exactly
- How the messages are exchanged between Web Client and Web Server
- Messages and some message examples
- MIME types
- Request Methods
- Status codes
Without further ado, let’s dive in.
The founder of HTTP is Tim Berners-Lee (the guy also considered to be the inventor of the World Wide Web). Among other names important to the development of the HTTP is also Roy Fielding, who is also the originator of the REST architectural style.
The Hypertext Transfer Protocol is the protocol that applications use to communicate with each other. In essence, the HTTP is in charge of delegating all of the internets media files between clients and servers. That includes HTML, images, text files, movies and everything in between. And it does this quickly and reliably.
HTTP is the application protocol and not the transport protocol because it is used for the communication in the application layer. To jog your memory here is how the Network Stack looks like.
From this image, you can clearly see the that the HTTP is the application protocol and that TCP works on the transport layer.
Everything on the internet is a resource, and the HTTP works with resources. That includes files, streams, services and everything else. HTML page is a resource, a youtube video is a resource, your spreadsheet of daily tasks on a web application is a resource… You get the point.
And how do you differentiate one resource from another?
By giving them URLs (Uniform resource locators).
URL points to the unique location where your browser can find the resource.
Every piece of content, every resource lives on some Web server (HTTP server). These servers are expecting an HTTP request to provide those resources.
But how do you request a resource from a Web server?
You need an HTTP client of course 🙂
You are using an HTTP client right now to read this article. Web browsers are HTTP clients. They communicate with HTTP servers to retrieve the resources to your computer. Some of the most popular clients are Google’s Chrome, Mozilla’s Firefox, Opera, Apple’s Safari, and unfortunately still infamous Internet Explorer.
So how does the HTTP message look like?
Without talking too much about it, here are some examples of HTTP messages:
GET /repos/CodeMazeBlog/ConsumeRestfulApisExamples HTTP/1.1
Authorization: Basic dGhhbmtzIEhhcmFsZCBSb21iYXV0LCBtdWNoIGFwcHJlY2lhdGVk
POST /repos/CodeMazeBlog/ConsumeRestfulApisExamples/hooks?access_token=5643f4128a9cf974517346b2158d04c8aa7ad45f HTTP/1.1
Here is the example of one GET and one POST request. Let’s go quickly through the different parts of these requests.
The first line of the request is reserved for the request line. It consists of request method name, request URI, and HTTP version.
Next few lines represent the request headers. Request headers provide additional info to the requests, like content types request expects in response, authorization information etc,
For the GET request, the story ends right there. POST request can also have a body and carry additional info in the form of a body message. In this case, it is a JSON message with additional info on how the GitHub webhook should be created for the given repo specified in the URI. That message is required for the webhook creation so we are using POST request to provide that information to the GitHub API.
Request line and request headers must be followed by <CR><LF> (carriage return and line feed \r\n), and there is a single empty line between message headers and message body that contains only CRLF.
Reference for HTTP request: https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
And what do we get as a response to these requests?
HTTP/1.1 200 OK
Date: Sun, 18 Jun 2017 13:10:41 GMT
Content-Type: application/json; charset=utf-8
Status: 200 OK
Cache-Control: private, max-age=60, s-maxage=60
"message": "Invalid HTTP Response: 404"
The response message is pretty much structured the same as the request, except the first line that is called the status line, which surprising as it is, carries information about the response status. 🙂
The status line is followed by the response headers and response body.
Reference for HTTP response: https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
MIME types are used as a standardized way to describe the file types on the internet. Your browser has a list of MIME types and same goes for web servers. That way files can be transferred the same way regardless of the operating system.
Fun fact is that MIME stands for Multipurpose Internet Mail Extension because they were originally developed for the multimedia email. They were adapted to be used for HTTP and several other protocols since.
Every MIME type consists of a type, subtype and a list of optional parameters in the following format: type/subtype; optional parameters.
Here are a few examples:
Content-Type: text/xml; charset=utf-8
You can find the list of commonly used MIME types and subtypes in the HTTP reference.
HTTP request methods (referred to also as “verbs”) define the action that will be performed on the resource. HTTP defines several request methods of which the most commonly known/used are GET and POST methods.
A request method can be idempotent or not idempotent. This is just a fancy term for explaining that method is safe/unsafe to be called several times on the same resources. In other words, that means that GET method, that has a sole purpose of retrieving information, should by default be idempotent. Calling GET on the same resource over and over should not result with a different response. On the other hand POST method is not an idempotent method.
Prior to HTTP/1.1, there were just three methods: GET, POST and HEAD, and the specification of the HTTP/1.1 brought a few more in the play: OPTIONS, PUT, DELETE, TRACE and CONNECT.
Find more what each one of these methods does in the HTTP Reference.
Header fields are colon-separated name-value fields you can find just after the first line of request or response message. They provide more context to the HTTP messages and ensure clients and servers are appropriately informed about the nature of the request or response.
There are five types of headers in total:
- General headers: These headers are useful to both server and client. One good example is the Date header field which provides the information about the time of the message creation.
- Request headers: Specific to the request messages. They provide the server with additional information. For example, Accept: */* header field informs the server that the client is willing to receive any media type.
- Response headers: Specific to the response messages. They provide the client with additional information. For example, Allow: GET, HEAD, PUT header field informs the client which methods are allowed for the requested resource.
- Entity headers: These headers deal with entity body. For example, Content-Type: text/html header lets the application know that the data is HTML document.
- Extension headers: These are nonstandard headers constructed by application developers. They are not the part of HTTP but need to be tolerated.
You can find the list of commonly used request and response headers in the HTTP Reference.
The status code is a three digit number that denotes the result of a request. It is followed by the reason phrase which is humanly readable status code explanation.
Some examples include:
- 200 OK
- 404 Not Found
- 500 Internal Server Error
The status codes are classified by the range in five different groups.
Both status code classification and the entire list of status codes and their meaning can be found in the HTTP Reference.
Phew, that was a lot of information.
The knowledge you gain by learning HTTP is not the kind that helps you to solve some problem directly. But it gives you the understanding the underlying principle of the internet communication which you can apply to almost every other problem on the higher level than HTTP. Whether it is REST, APIs, web application development or network, you can now be at least a bit more confident while solving these kinds of problems.
Of course, HTTP is a pretty large topic to talk about and there is still a lot more to it than the basic concepts.
Was this article helpful to you? Please leave the comment and let me know.