We have to choose a compromise between stability, performance and features. Stability is the most important requirement because the proxy server must be running a long time without crashing. This needs a special style of programming. A great attention will be accorded to tests and all exceptions must be caught.
The needed performances aren't great. Shweby must be able to scale to the load of 20 PC clients, like in a classroom. Given that a client will request a web page every 10 seconds and a web page contains 10 components, the generated load is 20 requests by second.
There is also an ICAP enabled version of Squid, Squid ICAP. But the tests of this version show that it is not enough stable to be used in production. POESIA project developers, an open source web filtering project, are experimenting and debugging Squid ICAP. Nevertheless, maintaining this software is quiete hard.
The reason is that Squid is a cache proxy, and Squid ICAP has kept the cache feature. It is designed with four processing points:
Squid is designed as a mono-threaded process. It uses one big poll in order to detect I/O events for opened sockets to clients and servers. When the data is available to be read in a socket, the socket is read and the call-back function that is responsible for this socket is called.
A second way for coding the proxy is by using thread. Each client connection will start one independent thread which will handle all the I/O traffic requested by the client. This way is more simple than the Squid call-backs, although it needs more care in concurrent access to shared data.
We had found a multithreaded HTTP proxy suitable to be adapted for ICAP support: Middleman. Middleman is an advanced HTTP/1.1 proxy server with features designed to increase privacy and remove unwanted content. It was written in C by Jason Mclaughlin and distributed under the terms of GPL licence. As this proxy has reached its production/stable development level, we decided to patch it for ICAP support.
Connection type | Protocols | Description |
---|---|---|
Non persistant |
|
The TCP connection is closed after the end of the reply body |
Persistant, without pipelining |
|
We can fetch many URLs over the same TCP connection. Requests are sent one by one, after the reception of replies |
Persistant, with pipelining |
|
We can fetch many URLs over the same TCP connection. Requests are sent together, and replies are provided together. |
ICAP protocol support persistant connections, without pipelining. Pipelining is not mentioned in ICAP RFC. So we will not implement pipeline feature in Shweby.
In the case of a POST, the client "posts" some data to the server. The server wants to know the length of the request body. As the client cannot close the connection, he must use either "Content-Length" header or chunked transfer encoding mode (announced by "Transfer-Encoding" header). The server must do the same if the connection is persistant. Note that chunking could only be used in HTTP/1.1 .
In order to validate HTTP communication, we have developed a serie of tests, with all valid combinations of protocol versions, methods, connection header and chunking usage. These combinations are represented in the table below:
# | method | version | comment | Connection (S&C) | client CL | server CL | client chunked | server chunked |
---|---|---|---|---|---|---|---|---|
0 | GET | 1.0 | closed by default | 0 | 0 | 0 | 0 | |
1 | GET | 1.0 | closed by header | close | 0 | 0 | 0 | 0 |
2 | GET | 1.0 | closed with useless CL | close | 0 | 1 | 0 | 0 |
3 | GET | 1.0 | keep-alive | keep-alive | 0 | 1 | 0 | 0 |
4 | GET | 1.1 | keep-alive by default | 0 | 1 | 0 | 0 | |
5 | GET | 1.1 | closed | close | 0 | 0 | 0 | 0 |
6 | GET | 1.1 | closed with useless CL | close | 0 | 1 | 0 | 0 |
7 | GET | 1.1 | keep-alive by header | keep-alive | 0 | 1 | 0 | 0 |
8 | GET | 1.1 | chunked, keep-alive by default | 0 | 0 | 0 | 1 | |
9 | GET | 1.1 | chunked, closed | close | 0 | 0 | 0 | 1 |
10 | GET | 1.1 | chunked, keep-alive by header | keep-alive | 0 | 0 | 0 | 1 |
** | ****** | **** | ***************************************** | ************** | * | * | * | * |
11 | POST | 1.0 | closed by default | 1 | 0 | 0 | 0 | |
12 | POST | 1.0 | closed by header | close | 1 | 0 | 0 | 0 |
13 | POST | 1.0 | closed with useless CL | close | 1 | 1 | 0 | 0 |
14 | POST | 1.0 | keep-alive | keep-alive | 1 | 1 | 0 | 0 |
15 | POST | 1.1 | keep-alive by default | 1 | 1 | 0 | 0 | |
16 | POST | 1.1 | closed | close | 1 | 0 | 0 | 0 |
17 | POST | 1.1 | closed with useless CL | close | 1 | 1 | 0 | 0 |
18 | POST | 1.1 | keep-alive by header | keep-alive | 1 | 1 | 0 | 0 |
19 | POST | 1.1 | chunked, keep-alive by default | 0 | 0 | 1 | 1 | |
20 | POST | 1.1 | chunked, closed | close | 0 | 0 | 1 | 1 |
21 | POST | 1.1 | chunked, keep-alive by header | keep-alive | 0 | 0 | 1 | 1 |
If we add 100 to the request number, data will be sent in small fragments of few bytes (typically 5 or 10 bytes) and the delay between fragments is 100ms. This implies that the receiver (i.e. the proxy) will read small pieces of data and must collect them in order to process. For example, the proxy must collect all the HTTP headers in order to know if the connection is persistant, etc.