Shweby Developer Documentation

Preliminary report

Riadh Elloumi <riadh@melix.net>
Fares Triki <triki@enst.fr>


Project targets

In this project, we are planning to develop a stable and reliable HTTP proxy with ICAP support. This proxy can be tested and used by the Internet community, especially by people who want to control or modify the HTTP traffic on-the-fly. We are cooperating with POESIA project in order to test the first versions of Shweby.

We have to choose a compromise between stability, performance and features. Stability is the most important requirement because the proxy server must be running a long time without crashing. This needs a special style of programming. A great attention will be accorded to tests and all exceptions must be caught.

The needed performances aren't great. Shweby must be able to scale to the load of 20 PC clients, like in a classroom. Given that a client will request a web page every 10 seconds and a web page contains 10 components, the generated load is 20 requests by second.

Project Planning

We have fixed three deadlines:

Base of development

As the used licence is GPL, we can start our development under another free software. There are a lot of open source projects dealing with HTTP proxies. The most known one is Squid.

There is also an ICAP enabled version of Squid, Squid ICAP. But the tests of this version show that it is not enough stable to be used in production. POESIA project developers, an open source web filtering project, are experimenting and debugging Squid ICAP. Nevertheless, maintaining this software is quiete hard.

The reason is that Squid is a cache proxy, and Squid ICAP has kept the cache feature. It is designed with four processing points:

  1. reqmod_precache: the request is modified by ICAP server "in its way to the cache"
  2. reqmod_postcache: the request is modified by ICAP server "in its way to origin server"
  3. respmod_precache: the origin server's reply is modified before it is stored in the cache.
  4. respmod_postcache: the reply is modified by ICAP server "in its way to the client"
To know more about these processing points, please see ICAP RFC, §6.1 .

Squid is designed as a mono-threaded process. It uses one big poll in order to detect I/O events for opened sockets to clients and servers. When the data is available to be read in a socket, the socket is read and the call-back function that is responsible for this socket is called.

A second way for coding the proxy is by using thread. Each client connection will start one independent thread which will handle all the I/O traffic requested by the client. This way is more simple than the Squid call-backs, although it needs more care in concurrent access to shared data.

We had found a multithreaded HTTP proxy suitable to be adapted for ICAP support: Middleman. Middleman is an advanced HTTP/1.1 proxy server with features designed to increase privacy and remove unwanted content. It was written in C by Jason Mclaughlin and distributed under the terms of GPL licence. As this proxy has reached its production/stable development level, we decided to patch it for ICAP support.

HTTP communication

In the HTTProtocol, we can find three types of connection:

Connection type Protocols Description
Non persistant
  • HTTP/1.0 by default
  • HTTP/1.1 with "Connection: close" header
The TCP connection is closed after the end of the reply body
Persistant, without pipelining
  • HTTP/1.0 with "Connection: keep-alive" header
  • HTTP/1.1 by default
We can fetch many URLs over the same TCP connection. Requests are sent one by one, after the reception of replies
Persistant, with pipelining
  • HTTP/1.1 only !
We can fetch many URLs over the same TCP connection. Requests are sent together, and replies are provided together.

ICAP protocol support persistant connections, without pipelining. Pipelining is not mentioned in ICAP RFC. So we will not implement pipeline feature in Shweby.

In the case of a POST, the client "posts" some data to the server. The server wants to know the length of the request body. As the client cannot close the connection, he must use either "Content-Length" header or chunked transfer encoding mode (announced by "Transfer-Encoding" header). The server must do the same if the connection is persistant. Note that chunking could only be used in HTTP/1.1 .

In order to validate HTTP communication, we have developed a serie of tests, with all valid combinations of protocol versions, methods, connection header and chunking usage. These combinations are represented in the table below:

# method version comment Connection (S&C) client CL server CL client chunked server chunked
0 GET 1.0 closed by default   0 0 0 0
1 GET 1.0 closed by header close 0 0 0 0
2 GET 1.0 closed with useless CL close 0 1 0 0
3 GET 1.0 keep-alive keep-alive 0 1 0 0
4 GET 1.1 keep-alive by default   0 1 0 0
5 GET 1.1 closed close 0 0 0 0
6 GET 1.1 closed with useless CL close 0 1 0 0
7 GET 1.1 keep-alive by header keep-alive 0 1 0 0
8 GET 1.1 chunked, keep-alive by default   0 0 0 1
9 GET 1.1 chunked, closed close 0 0 0 1
10 GET 1.1 chunked, keep-alive by header keep-alive 0 0 0 1
** ****** **** ***************************************** ************** * * * *
11 POST 1.0 closed by default   1 0 0 0
12 POST 1.0 closed by header close 1 0 0 0
13 POST 1.0 closed with useless CL close 1 1 0 0
14 POST 1.0 keep-alive keep-alive 1 1 0 0
15 POST 1.1 keep-alive by default   1 1 0 0
16 POST 1.1 closed close 1 0 0 0
17 POST 1.1 closed with useless CL close 1 1 0 0
18 POST 1.1 keep-alive by header keep-alive 1 1 0 0
19 POST 1.1 chunked, keep-alive by default   0 0 1 1
20 POST 1.1 chunked, closed close 0 0 1 1
21 POST 1.1 chunked, keep-alive by header keep-alive 0 0 1 1

If we add 100 to the request number, data will be sent in small fragments of few bytes (typically 5 or 10 bytes) and the delay between fragments is 100ms. This implies that the receiver (i.e. the proxy) will read small pieces of data and must collect them in order to process. For example, the proxy must collect all the HTTP headers in order to know if the connection is persistant, etc.