next up previous contents
Next: Next Shweby developments Up: report Previous: Presentation of POESIA project   Contents

Subsections

Design of the proxy server

One big advantage of developing under the GNU GPL licence is that we can start the development from another source code distributed under the same licence. In this project, called Shweby, we tried to enhance Middleman to ICAP support. Middleman [6] is an open proxy server with features designed to increase privacy and remove unwanted content. For example, Middleman enables the user to block unwanted content like banners, to remove cookies from request headers, to filter delivered content by simple scripts, etc. The first step in developing Shweby was removing these unwanted features and keeping wanted features like caching and transparent proxying.

Description of Middleman

Multithreading

The Middleman's design is based on multithreading. There is no thread pool in Middleman; Threads are created when clients connect to the proxy and are destroyed when clients are disconnected. Each thread serves a client connection. The proxy listens on some configurable ports. When a client makes a TCP connection to the proxy, the socket returned by accept is recorded in a structure called connection. The connection is attached as an attribute to a new created POSIX thread. This thread will serve the client requests coming from the TCP connection.

Caching

Middleman implements the HTTP caching [7]. HTTP caching requires two mechanisms: expiration model and validation model. When a client requests a cached file, the proxy must verify if the cached file is not expired. If it is not, the proxy delivers the cached file to the client as a response. If the cached file is expired, the proxy must send a conditional request to the origin web server (or to the next caching proxy). The conditional request is an HTTP request with the header "If-Modified-Since" followed by the time when the proxy has downloaded the file. If the cached file is expired, the server replies by a fresh copy and the proxy updates its cached copy, else the server responds with a special status code (usually, 304 (Not Modified)) with no body and the proxy validates its cache entry.

Configuration

Middleman reads its configuration at runtime from an XML file. The structure of the XML file is fully explained in README.html. We can configure Middleman by its web interface: we simply request URL http://mman on a web browser connected to Middleman. We have changed the URL to http://shweby, but these types of URLs won't be accepted by system administrators because they break the addressing plan in the local network. This URL should be changed to the host machine address.

Middleman uses an embedded XML parser coded in src/xml.c.

Upgrading Middleman to Shweby

ICAP implementation

The ICAP RFC [2] introduces the concept of "ICAP services". An ICAP service is a specific resource of an ICAP server. It is identified by its URI (Uniform Resource Identifier), which is composed of the protocol name (icap), the name of the server and the path of the service. Here is an example of ICAP service URI:

icap://icap.net/services/tranlate?lang=fr

When we send an ICAP request to an ICAP service, we request a specific value added service on the encapsulated HTTP message. This service can take place at different points, called vectoring points:

We can call one or many ICAP services in each vectoring point, in a precise order. The ICAP services are grouped in ICAP classes. ICAP classes can include one or many ICAP services with their associated vectoring points. The ICAP class defines a global value added service for the HTTP session. For example we define an ICAP class called ``kids-filter'' in table 2.1.


Table 2.1: Structure of the ICAP class ``kids-filter''.


In the ICAP class "kids-filter", there is a filter service for requests, a filter service for responses and a virus-scanning service for responses. The virus scanning is performed before storing the file in the cache, so the cache will be not infected, but may contain porno images because response filtering is done after caching.

When a client connects to the proxy, the selected ICAP service depends on the client IP address, the proxy port to which the client is connected and the access rules stored in the XML configuration file. The selection of the ICAP class is coded in src/main.c.

The entity of the proxy that communicates with an ICAP service is called an ICAP client. ICAP clients are stored in structures of type ICAP_CLIENT and are coded in src/icap.c. The connection contains the references to theses structures in connection->icap

Adding of gzip encoding and decoding

In order to perform content adaptation, ICAP servers must systematically decompress any compressed file before analysing it and this needs CPU resources. However, the advantage of requesting compressed content from the Internet is to save the access bandwidth. Thus, it would be interesting to decompress downloaded content prior to sending them to ICAP servers.

Using keep-alive connections

Keep-alive connections increase performances. Middleman keeps the connection alive with HTTP servers, even if the client requests a closed connection. The alive connections are put in a socket pool in order to be used for other HTTP requests.

We have used the same mechanism in ICAP implementation. The alive ICAP connections are also put in the socket pool and reused for other ICAP sessions.

Testing and debugging the proxy

For the tests, we developed testing scripts with Python and asyncore library. These testing scripts are committed in test module under the CVS repository. Here are the test components:

The HTTP client sends a request to the HTTP server through the proxy. Each request, identified by a number, makes the system work on a specific HTTP configuration. We can experiment POST requests, GET requests, chunking, keep-alive connections and closed connections.

We can test the thread synchronisation by running two HTTP clients in a loop.

Debugging is made by GDB (GNU Debugger). Memory leaks can be detected in debugging mode by logging all allocated memory addresses in the table marray. See src/mem.c for more details.

Configuration of Shweby and deployment scenario

We will explain here an example of deploying Shweby in POESIA environment. We consider the case of a school having two computer rooms A and B. We assume that we can identify if a computer is in room A or room B by its IP address. We assume that POESIA components (Shweby, monitor and filters) are running in a machine called "gateway" and this machine is located in the administrator office.

The room A will be occupied by kids and room B will be occupied by teenagers. The system administrator must configure POESIA to filter requests coming from room A by "kids-filter" ICAP class and requests coming from room B by "teenagers-filter" ICAP class. On his machine "gateway", the system administrator can surf the web without filtering and can also test "kids-filter" and "teenagers-filter". He can configure Shweby proxy directly from his machine by requesting the Shweby web configuration interface. POESIA filters can download web content through Shweby proxy without filtering, in order to know if a web pages contains any porn pictures.

To achieve these requirements, we configure Shweby with three listening ports. If a client connects to a port, given his IP address and the rules below (table 2.2), the proxy will perform suitable ICAP filtering. We suppose that we have already configured three ICAP services: "kids-filter", "teenagers-filter" and "bypass", which means no filtering.


Table 2.2: Example of deploying POESIA: table of access rules.

IP address Port 4000 Port 4001 Port 4002
Gateway configuration configuration configuration
  class = kids-filter class = teenagers-filter class = bypass
Room A class = kids-filter Not allowed Not allowed
Room B class = teenagers-filter Not allowed Not allowed


Here is how these rules are coded in the XML file:

<!-- Access control description -->
        <access>
                <policy>deny</policy>
                <allow>
                        <enabled>true</enabled>
                        <comment>No filtering on Gateway:4000</comment>
                        <class>bypass</class>
                        <ip>127.0.0.1</ip>
                        <port>4000</port>
                        <access>config,proxy,connect,http,transparent</access>
                </allow>                
                <allow>
                        <enabled>true</enabled>
                        <comment>to test teenagers-filter on Gateway:4001</comment>
                        <class>teenagers-filter</class>
                        <ip>127.0.0.1</ip>
                        <port>4001</port>
                        <access>config,proxy,connect,http,transparent</access>
                </allow>
                <allow>
                        <enabled>true</enabled>
                        <comment>to test kids-filter on Gateway:4002</comment>
                        <class>kids-filter</class>
                        <ip>127.0.0.1</ip>
                        <port>4002</port>
                        <access>config,proxy,connect,http,transparent</access>
                </allow>
                <allow>
                        <enabled>true</enabled>
                        <comment>Room A: filter for kids on Gateway:4000</comment>
                        <class>kids-filter</class>
                        <ip>137.194.34.5</ip>
                        <iprange>137.194.34.8-137.194.34.50</iprange>
                        <port>4000</port>
                        <access>proxy,http,transparent</access>
                </allow>
                <allow>
                        <enabled>true</enabled>
                        <comment>Room B: filter for teenagers on Gateway:4000</comment>
                        <class>teenagers-filter</class>
                        <iprange>137.194.34.51-137.194.34.80</iprange>
                        <ip>137.194.34.88</ip>
                        <iprange>137.194.34.90-137.194.34.96</iprange>
                        <port>4000</port>
                        <access>proxy,http,transparent</access>
                </allow>
        </access>


next up previous contents
Next: Next Shweby developments Up: report Previous: Presentation of POESIA project   Contents
Riadh Elloumi 2003-07-25