Wireless Web Server
Java-based web-server is portable across multiple platforms supporting java developer's kit. It is easily extended with Sun's standard servlet-API. Extensions can be programmed also with C++, because calling native libraries from Java is possible and bridge to servlet interface is implemented. Server can be used to encapsulate many ancient Internet protocols to http so that they can be used through standard HTML-browser interface and consequently through mobile phones.
- Proxies
- Servlets
- Content negotiation
- Mobile functionality
Server contains cacheable ftp-, gopher- and http-proxies. Mirrored files are cached to standard file system in top of changeable cache root directory and under subdirectories named after protocol and site name (possibly with port number) in the same tree structure as in the original site. Files are not usable directly, because they contain message headers, and consequently must be accessed only through proxy mechanism. Proxies can be configured to check that files are most recent either every time they are accessed or periodically. Gopher cache is updated only when Pragma: no-cache -header is set in request because there is no dates in gopher files. Proxy caches also clean automatically all mirrored files periodically, which means that all files which have not been accessed during specified time interval are deleted. A gateway proxy to WAIS-servers and SSL-tunneling are also implemented.
Servlet is a way to extend web-server, a replacement for CGI (Common Gateway Interface). Server can be extended in concrete ways because servlet has global state, which is initialized when server starts and local state relating to one request. In global state servlet can control e.g. daemon thread which handles tasks common to all requests and in local state servlet can change this global state according to some particular request. Servlets can be called with the following url form:
http://<site>/servlet/<servlet name>[?<parameters>]
or through virtual path where physical path is set to form above. Two servlets are also called automatically, they are frame/core when request is directed to local server, and frame/proxy when it's directed to remote servers and local server is acting as a proxy. Servlets can be used also as filters acting like pipes modifying data, which is flowing through them. E.g. image maps and server side includes are handled with filters. Some content type can be mapped with filter servlet so that every time when content with such type is encountered data is delegated to corresponding filter. E.g. files ending with suffix ‘map’ are associated with image map filter and files ending with suffix ‘.shtml’ are associated with server side include filter. Filters can be chained when some servlet returns content type, which is connected with other filter.
Content types can be associated with possible multiple automatic filter servlet chains where each conversion step has some target content type. According to Accept-header field given by client some step is chosen to be the final. If content type in Accept-header is parametrized with quality factor less than 1.0, specific quality filter servlet is called if it exists. Quality filter servlets may cache files with different quality factors if applicable.
Primitive mobile clients are supported by simulation of full client in server side. Client state is preserved keeping cookies in server side persistent store and authentication information in server memory (also in persistent store when required). Session specific data are discarded if no more requests are coming from mobile client for predetermined time (e.g. 15 mins). Permanent cookies (with expires-field) are kept in persistent store and removed when expired. Also URLs are kept in persistent store and if their predetermined life span is passed and they are not used during this time they are automatically removed. In principle all content in Internet, which can be represented in browser window with standard HTML 2.0 (without JavaScript), is accessible also with mobile phones. Because this server is also a proxy, content behind other protocols besides HTTP can be accessed with mobile phones as well. Server has among others basic servlets for accessing email and news, therefore they are also automatically usable with mobile phones and are even additionally fine tuned for this purpose. If client has this same java-based server with minimal resources as his personal proxy, all data flow between central proxy and personal proxy can be compressed. Polling requests from short message center is disguised as server socket and requests are changed to HTTP-requests, encapsulated in pseudo socket and delegated to standard HTTP-handler method which isn't explicitly aware of request's origin. Accept-header specifies what content types mobile phone can handle, e.g. text/x-ttml.
Best way to describe server’s features is to give short instructions of its usage and concrete configuration parameters. Server can be configured with parameter files. Comments can be added to parameter files prefixing them with # (hash) character. Server can also be managed remotely with FI.realitymodeler.server.Management -program when FI.realitymodeler.server.W3Manager -servlet is initialized with name system/W3Manager in servlets file. With this application server’s dynamic configuration can be modified through Internet. User database, Domains, Servlets, Mime types, Virtual paths and Cache paths can be modified interactively in graphical user interface with this program.
- Simple server startup
- Installing server as NT service
Server can be started in the directory where index.html resides (if base directory is not specified) calling java-virtual machine directly. This starting method can be used in Unix-environment and when testing or using server in client-side. In NT-environment server can be installed as NT service. Server is started as follows:
java FI.realitymodeler.server.W3Server [options]
where allowed options are: (default value is in parantheses)
-b <base directory> (<current directory>)
This sets the directory where home page files reside.
-c <cache root> (<data directory>/cache/)
Proxy cache root directory where files are subdirectoried by protocol and host name.
-d <data directory> (/)
This sets the directory where paremeter data files needed by the server reside.
-k
This specifies that only caches are checked for foreign content and no connection to Internet is tried. This an be used when browsing offline cache content.
-l <log level 0 - 3> (0)
This sets logging on. In log level 1 errors are logged to error_log, in log level 2 other events are logged to event_log and in log level 3 access operations are logged to file access_log.
-p <port number> (80 or 443 when secure)
This is the port number server starts to listen.
-s
This sets secure mode on. In this mode server is accessible only with HTTPS-protocol. Data directory must have files named cert.pem and key.pem containing certificate and RSA private key in PEM format.
-v
This sets verbose mode on. All requests are verbosed to the standard output. This is usable only when testing. No logging information to file is formed. l- and v-options are mutually exclusive. That which is last used is effective.
-z
Separates distinct option sets for different server instances. Second server can be started in e.g. secure mode. Data files common to all server instances should be in the data directory of last server.
-?
Shows these options.
Server program can be installed as NT service, issuing from JRE-bin-directory (for example C:\Program Files\JavaSoft\JRE\1.2\bin) following command:
W3Service –install
Respectively NT service can be removed with the following command:
W3Service -remove
When using program as NT service, all parameters must be given in data directory as described below. If service fails to start, stack trace can be read from file KeppiProxyError in user's home directory.
Before options in command line are applied, initial options are read from file named KeppiProxyOptions in user's home directory. Right after -d option is encountered file named system.properties is tried to read from data directory. It is of standard java permanent properties file format (key=value), and can contain system properties for various standard java classes (they get them with method java.lang.System.getProperty). After that file named server.properties is tried to read from data directory. It is also of standard java permanent properties file format (key=value). Following properties are available. Default value if applicable is in parantheses.
autoCaches=true | false (false)
If this is set, proxy caches are filled also in case client interrupts the request. This means that proxy continues to read content to disk cache from Internet in spite client who requested the content has closed the connection.
autoLogStats=true | false (false)
If this is set, server generates summary report of log-files before they are emptied (when their size has exceeded the maximum log file size).
autoResume=true | false (false) Log format in server property logFormat can be specified with following expressions: (
If this is set, connection to remote servers is resumed transparently
for requester even when caches are not used. This is of special use,
when proxy is located in local area network, or is running in local
host, where client is using it. As primary connection is in this case
completely reliable, proxy can be used as automatic connection manager,
when for example downloading files of considerable size.
backlog=<backlog count> (50)
This is sets backlog-value for server socket, which waits for requests clients.
bindAddress=<bind address>
This sets the IP-address where server socket is bound. This is used in case, when host machine has multiple network interfaces, and specifically one of them is required in this server instance.
cacheRoot=<directory, static> (<data directory>/cache/)
This gives the root directory for proxy cache files, where various protocols are subdirectoried by their name (like ftp, http, gopher, pop3). In principle all files under this directory can be deleted when server is not running, because they can be reconstructed from Internet.
completionBufferSize<size of buffer in bytes before headers are written> (4 * 1024)
When using HTTP 1.0, and response doesn’t specify Content-Length-header, this gives the maximum buffer size for response, when server can set the Content-Length afterwards. This can preserve the connection between client and server. When using HTTP 1.1, this has no use, because Transfer-Encoding Chunked is used instead.
contentEncoding=gzip | deflate
This enforces server to use compression of content in spite client has not requested it. Should be used only in exceptional cases or when testing the compression.
dataDirectory=<directory> (/)
This gives the data directory where parameter files reside. Usually this is not given, when server properties and parameter files reside in same directory. In that case data directory is given in KeppiProxyOptions file in user's home directory.
ftpProxyCache=true | false (false)
This directs that proxy should cache content from ftp-servers.
ftpProxyCacheCleaningInterval=<minutes, static> (one week)
This sets cleaning interval for content from ftp-servers. When files are not accessed during this interval, they are deleted. Cleaning is triggered only when some request to ftp-servers is received. This property is common to all server instances running in same virtual machine.
ftpProxyCacheRefreshingInterval=<minutes, static> (0)
This sets refreshing interval for content from ftp-servers. Content is tried to update from origin server after this interval. Server tries to update it every time it is requested as default. This property is common to all server instances running in same virtual machine.
ftpProxyHost=<host name, static>
If server’s own internal ftp-client is not used, but content from ftp-servers is requested from other ftp-proxy, this must give the host name of the proxy server. In this way proxies can be chained. This property is common to all server instances running in same virtual machine.
ftpProxyPort=<port number, static> (80)
This gives the port of the ftp proxy host, if used. It defaults to 80.
gopherProxyCache=true | false (false)
This directs that proxy should cache content from gopher-servers.
gopherProxyCacheCleaningInterval=<minutes, static> (one week)
This sets cleaning interval for content from gopher-servers. When files are not accessed during this interval, they are deleted. Cleaning is triggered only when some request to gopher-servers is received. This property is common to all server instances running in same virtual machine.
gopherProxyCacheRefreshingInterval=<minutes, static> (0)
This sets refreshing interval for content from gopher-servers. Content is tried to update from origin server after this interval and when client has set Pragma or Cache-Control-header to no-cache. This is because gopher servers do not deliver modification dates of files. This property is common to all server instances running in same virtual machine.
gopherProxyHost=<host name, static>
If servers own internal gopher-client is not used, but content from
gopher-servers is requested from other gopher-proxy, this must give the
host name of the proxy server. In this way proxies can be chained. This
property is common to all server instances running in same virtual
machine.
gopherProxyPort=<port number, static> (80)
This gives the port of gopher proxy host, if used. It defaults to 80.
httpProxyCache=true | false (false)
This directs that proxy should cache content from http-servers.
httpProxyCacheCleaningInterval=<minutes, static> (one week)
This sets cleaning interval for content from http-servers. When files are not accessed during this interval, they are deleted. Cleaning is triggered only when some request to http-servers is received. This property is common to all server instances running in same virtual machine.
httpProxyCacheRefreshingInterval=<minutes, static> (0)
This sets refreshing interval for content from ftp-servers. Content is tried to update from origin server after this interval. Server tries to update it every time it is requested as default. This property is common to all server instances running in same virtual machine.
httpProxyHost=<host name, static> (80)
If servers own internal http-client is not used, but content from http-servers is requested from other http-proxy, this must give the host name of the proxy server. In this way proxies can be chained. This property is common to all server instances running in same virtual machine.
httpProxyPort=<port number, static>
This gives the port of http proxy host, if used. It defaults to 80.
imap4CacheCleaningInterval=<minutes, static> (one day)
This sets cleaning interval for attachment-content from imap4-servers. When files are not accessed during this interval, they are deleted. Cleaning is triggered only when some request to imap4-servers is received. This property is common to all server instances running in same virtual machine.
imap4ProxyHost=<host name, static>
If servers own internal imap4-client is not used, but content from imap4-servers is requested from other imap4-proxy, this must give the host name of the proxy server. In this way proxies can be chained. This property is common to all server instances running in same virtual machine. There are no others known imap4-proxies than server of this kind.
imap4ProxyPort=<port number, static>
This gives port number for imap4 proxy host, if used. It defaults to 80.
imap4ServerHost=<host name, static>
This gives default imap4 server host, when no imap4 proxy is used, and no host name is specified.
keepAliveCount=<number> (10)
This gives the maximum number of requests client can give during one connection to server. If it is set to –1, there are no limit.
keepAliveTimeout=<seconds> (30)
This gives the timeout in seconds, server closes the connection, if client has no requested anything.
logFormat=(c-ip time-taken authuser [date time] "request" s-status bytes)
This gives log format for access log file in Extended Log File Format, which is decribed below.
logLevel=0 - 3 (0)
This sets the log level. In log level 1 errors are logged to error_log, in log level 2 other events are logged to event_log and in log level 3 access operations are logged to file access_log. In log level 0, no logging information is generated.
logSize=<kbytes> (1024)
This sets the maximum size of log files in kilo bytes. When this size is exeeded for some log file, its name is changed to unique name using current system time, and new log files is started. If autoLogStats is not set, these files must be moved to some backup store, or deleted periodically.
managerCredentials=<string> (manager:reganam)
This gives default manager credentials for remote management servlet when running server for first time and no user database is yet created.
maxCacheEntryLength=<kbytes> (128)
This gives the maximum length of memory cache entry. All local files smaller than this limit accessed through server are cached to memory, if there is space available.
maxConnections=<number> (-1)
This gives the maximum number of connections one client can take to server simultaneously. When it is –1, which is the default, there is no limit.
maxInactiveInterval=<seconds> (60)
This gives the maximum interval client session can be inactive, before it is removed.
maxProxyCacheEntryLength=<kbytes, static> (512)
This gives the maximum length of proxy cache entries. All files smaller than this limit accessed through proxy are cached to disk.
maxProxyConnections=<number> (-1)
This gives the maximum number of connections, proxy can take to Internet for one client. Connections are preserved as long session lasts.
maxRequests=<number> (-1)
This gives the maximum number of requests server will accept simultaneosly. When it is –1, which is the default, there is no limit.
mobileServer=true | false (false)
This indicates if server is used as proxy for mobile users. All data flow through server is compressed in this case, normal HTTP-requests can’t be accepted. Only other server of same kind can make requests to this server.
mobileServerHost=<host name, static>
This indicates if server will use mobile server in specified host. All data flow between this server and mobile server is compressed in this case. This server can accept normal HTTP-requests.
mobileServerPort=<port number, static> (80)
This gives port for mobile server host, if used. It defaults to 80.
nativeDomain=<name of native domain in NT>
If user information from NT-domain is used, this gives the domain name.
nntpCacheCleaningInterval=<minutes, static> (one day)
This sets cleaning interval for attachment-content from nntp-servers. When files are not accessed during this interval, they are deleted. Cleaning is triggered only when some request to nntp-servers is received. This property is common to all server instances running in same virtual machine.
nntpProxyHost=<host name, static>
If servers own internal nntp-client is not used, but content from nntp-servers is requested from other nntp-proxy, this must be give the host name of the proxy server. In this way proxies can be chained. This property is common to all server instances running in same virtual machine. There are no others known nntp-proxies than server of this kind.
nntpProxyPort=<port number, static> (80)
This gives port number for nntp proxy host, if used. It defaults to 80.
nntpServerHost=<host name, static>
This gives default nntp server host, when no nntp proxy if used, and no host name is specified.
noCacheHeaders=true | false (false)
When this flag is set to true, no response headers are stored in the
cache files. Cache files with response headers and without response
headers should not be mixed in same cache file tree. When running server
program, cache files already stored in current cache file directory must
correspond to setting of this flag, otherwise incorrect results will be
returned from cache, because server program is indicated of the
existence of response headers in cache files only with this flag. It
must be noticed that FI.realitymodeler.server.ClientRobot-program
doesn't store cache headers by default. When no cache headers are
stored, Content Type of file is guessed from it's file suffix or in some
cases from few first bytes. Also it's Last Modified date is read from
file system's corresponding attribute, where it was stored after reading
from remote server. In some cases Content Type can not be guessed from
file, in which case response will be incorrect. All possibly significant
extra headers are also lost. This is the reason why cache headers should
be always stored when server is used by many users. When using server
only personally and correct operation need not be guaranteed, and cache
files are preferred to be used also directly through file system,
storing of cache headers can be omitted. When headers are not stored to
cache files, they are binary compatible with original files like text,
image and audio-files.
pop3CacheCleaningInterval=<minutes, static> (one day)
This sets cleaning interval for attachment-content from pop3-servers. When files are not accessed during this interval, they are deleted. Cleaning is triggered only when some request to pop3-servers is received. This property is common to all server instances running in same virtual machine.
pop3ProxyHost=<host name, static>
If servers own internal pop3-client is not used, but content from pop3-servers is requested from other pop3-proxy, this must give the host name of the proxy server. In this way proxies can be chained. This property is common to all server instances running in same virtual machine. There are no others known pop3-proxies than server of this kind.
pop3ProxyPort=<port number, static>
This gives port number for pop3 proxy host, if used. It defaults to 80.
pop3ServerHost=<host name, static>
This gives default pop3 server host, when no pop3 proxy is used, and no host name is specified.
port=<port number> (80 or 433 when secure)
This gives the port number where server socket is assigned. It defaults
to 80, or if server is set as secure, default port is 433.
proxyTimeout=<proxy socket read timeout in seconds, static> (60)
This gives the timeout in seconds proxy waits for response from external Internet servers. This property is common to all server instances in one virtual machine, and to all protocols proxy is using.
requestTimeout=<seconds> (100)
This gives the timeout in seconds request handler thread is removed from thread pool in case it is not used. Server maintains pool of request threads waiting for requests to shorten the server reaction time.
secure=true | false (false)
If this is set server will use Secure Sockets Layer. It can be contacted only with https-protocol. Parameter data directory must have files named cert.pem and key.pem containing certificate and RSA private key in PEM format.
serverDisabled=true | false (false)
If this is set, actual server is disabled (no server socket is
instantiated), and server can be used only to launch some servlets who
have some kind of server functionality inside. For example rmi registry
servlet, cellular engine servlet (serves requests from mobile phones and
can use servlets configured in server, which launched it).
serverGroup=<server group name or id in unix>
When server is used in Unix-environment, this can give the account group
where server is preferred to run.
serverID=<server identifier (maybe pseudonym)>
This gives identifier for server which is used in Via-header. It
defaults to server name and port if it differs from default. This can be
set to pseudonym if it is not wanted to be revealed.
serverName=<server host name>
This gives the host name of server, which is returned for example in
Servlet API’s getServerName()-method and in Via-header (in case serverID is not explicitly specified). This
defaults to host name of the machine, but in some environments domain
name is not specified with system call. If server is used in public
network, here should be specified full name of the machine. Generally
this property and file Aliases should contain all
names of this server, which can be used to refer to it in DNS-level
(excluding loop back address and numeric IP-address, which are included
automatically). Reason for this is that server recognizes that it is
not used as a proxy, only when host name is not specified in requests,
or host name is one of names specified in these parameters. When server
is used locally, it tries not to resolve host name with DNS, so that in
case DNS is down, it doesn't delay at least local use of the server.
serverUser=<server user name or id in unix>
When server is used in Unix-environment, this can give the account user under which server is preferred to run.
shutdownCommand=<shutdown command>
This can give native machine command, which is executed automatically when server is shut down. In Windows-environment it can be for example cmd /c somename.bat. It can for example close some network connections which where made during server startup.
smtpProxyHost=<host name, static>
If servers own internal smtp-client is not used, but content to smtp-servers is sent through other smtp-proxy, this must give the host name of the proxy server. In this way proxies can be chained. This property is common to all server instances running in same virtual machine. There are no others known smtp-proxies than server of this kind.
smtpProxyPort=<port number, static>
This gives port number for smtp proxy host, if used. It defaults to 80.
smtpServerHost=<host name, static>
This gives default smtp proxy host, when smtp proxy is not used and no host name is specified.
sslProxyDisabled=true | false (false)
This disables use of SSL-proxy, namely the use of CONNECT-method in HTTP-requests.
sslProxyHost=<host name>
If SSL-tunneling is redirected to some other SSL-proxy, this must give the host name.
sslProxyPort=<port number>
This gives port number of ssl proxy host, if used. It defaults to 80.
startupCommand=<startup command>
This can give native machine command, which is executed automatically at server startup. In Windows-environment it can be for example cmd /c somename.bat. It can for example make some connections to local network.
userDatabase=user database class (FI.realitymodeler.server.UserDatabase)
This can give alternative class, which implements user database for server. This class must be derived from default class used for this purpose.
verbose=true | false (false)
If this is set, all logging information is dumped to standard output with additional information, which is not usually given. This must be used only when testing or using as personal server in client-side.
virtual=true | false (false)
If this is set, this server instance is used only as virtual host, whose name is specified in property serverName and port in property port. Some other server instance must receive actual requests and direct them to this server instance.
waisProxyHost=<host name, static>
If servers own internal wais-client is not used, but content from
wais-servers is requested from other wais-proxy, this must give the host
name of the proxy server. In this way proxies can be chained. This
property is common to all server instances running in same virtual
machine. Only some versions of HTTP-server from www.w3c.org can be used as
wais-proxies besides server of this kind.
waisProxyPort=<port number, static>
This gives port number of wais proxy host, if used. It defaults to
80.
Extended Log File Format)
identifier
Relates to the transaction as a whole.
prefix-identifier
Relates to information transfer between parties defined by the prefix.
prefix(header)
Identifies the value of the HTTP header field header for transfer between parties defined by the prefix.
The following prefixes are defined:
c - Client
s - Server
r - Remote
cs - Client to Server
sc - Server to Client
sr - Server to Remote Server
rs - Remote Server to Server
The following identifiers do not require a prefix:
date - Date at which transaction completed.
time - Time at which transaction completed.
time-taken - Time taken for transaction in seconds.
bytes - Bytes transferred
cached - Records whether a cache hit occured.
authuser - The username as which the user has authenticated himself.
request - The request line exactly as it came from the client.
thread - Name of the thread where request is running.
number - Number of the request in the same connection.
running-requests - number of running requests.
waiting-requests - number of waiting requests.
The following identifiers require a prefix:
ip - IP address and port
dns - DNS name
status - Status code
comment - Comment returned with status code
method - Request method
uri - Request URI
uri-stem - Stem portion of URI
uri-query - Query portion of URI
If proxy cache is used with some protocol, files are cached to the cache root in subdirectories by protocol and host name (with port if it differs from default). If file named <protocol name>_cache_paths exists in the parameter data directory, only files with paths listed there are cached. It is of the following format:
<host/path>
.
.
.
Paths are listed in separate lines with preceding host name and without protocol name. Subdirectories are included if path ends with * (asterisk). Proxy cache paths can be modified also interactively with remote management application. Cleaning interval in minutes given in server properties specifies how often cache is cleaned. Cleaning means that files not accessed for cleaning interval are deleted. When cleaning interval is set to -1 cache is never cleaned automatically by the server, but must be cleaned through Management-program. This is the minimum interval, actual operation is triggered only when cache is used from the client side. Refreshing interval given in minutes specifies how often cached files are updated from origin server. This defaults to 0 where files are checked every time they are accessed and updated if necessary except in gopher cache, where cached files are updated only when client sets pragma-header to no-cache.
File named aliases should contain various site names of this server. These are all DNS-names, which this server can be referred with and addresses not local to server but which are directed to it e.g through address converter, or alternative names of server not recognized as such through name resolution. This file is common to all server instances. See also serverName.
From data directory is read file named servlets if it exists, which contains servlet name and class mapping plus their parameters and is of the following format:
<protection directory>/<servlet name> <servlet class> [<options>]
<parameter name>=<value>
.
.
.
Options are tag names preceded with colon. If option ‘:native’ is specified, servlet is considered to be implemented as native library. DLL-library (or shared library in Unix) must export three functions as follows:
void <library's name>Init(Native *native);
void <library's name>Service(Context *context);
void <library's name>Destroy(Native *native);
Source file must include Native.h and library must be linked with native.lib which contains support functions resembling methods in Servlet-classes. They can be used as a reference when observing the functions in header file. Instead of getInputStream and getOutputStream -methods there is following functions:
int read(int off, int len);
void write(int off, int len);
read-function returns number of bytes actually read. Variable buf in Context points to area, which are used with functions above and variable ‘len’ contains length of this area.
If option ‘:disabled’ is specified, servlet is configured but not loaded
and can not be called by clients. Servlets can be also managed
interactively with remote management application. Servlet classes can
be in jar-files in Java Runtime Environment's lib/ext-directory, in
which case they are available if they appear in servlets-file. They can
be also as class-files in subdirectories under data directory named by
their protection directory, e.g. public or private (and under their
package directories if they are coded under packages). In this case they
are dynamically loaded at runtime, if their class definitions are not
found in memory when first invoked. In this case they do not need to
appear in servlets-file, and are referred by their class name (without
.class-extension). If servlets are used in this way, they can have
initialization parameters only when used in .shtml-files as server side
includes.
Authorization is checked in every request if there is in the data directory file named domains which contains domain names and request path mapping and which has the following format:
<domain name>[:<authentication servlet>]=<request path>...
.
.
.
If authentication servlet is not specified basic is assumed. Subdirectories are separated with a slash (/) like in url-paths. Specified domains shall contain all those paths. Subdirectories are included if path ends with asterisk (*). If request path is absolute, it restricts use of proxy. If local path starts with colon (:), it has write-access (relates to PUT-method). Default is that all put-operations are forbidden in local machine.
All directories not found in any domain or found in a special domain called public are freely accessible. Access to all directories in a special domain called hidden is disabled. Request paths starting with /hidden/ and :* (writing to all directories) are in hidden domain by default. Domain called system is reserved for server's internal use. Domain called native contains operating system's own user accounts. In NT-environment server property nativeDomain must specify domain name for user accounts and account where server is running must have advanced user right called 'act as part of the operating system.' Domains can also be modified interactively with remote management application.
From data directory is read file named mime_types if it exists, which contains mime type and file extension mapping and is of the following format:
<mime type>[<;quality filter servlet name>][ [.]<target mime type>:<filter servlet name>...]=<file extension>...
.
.
.
This file is common to all server instances of this virtual machine. If quality filter servlet is specified it is called if client is asking of this type with quality factor less than 1.0.
If filter servlets are specified, they are called if target mime type is accepted by client. Dot (.) in front of target mime type starts new filter chain. Filters can be chained also when servlet returns mime type, which has a filter. Filter servlets must call request methods before writing to output stream. Type with file extension asterisk (*) is considered to be the default. Dot (.) can be used to indicate empty file extension list (when only filter servlet chain is defined). Mime types can also be modified interactively with remote management application.
File named virtual_paths may contain virtual path and actual path mapping and is of the following format:
<virtual path> <actual path>
.
.
.
Virtual path can be pattern with an asterisk (*) in the beginning or in the end of string. Actual path can be another request path. It can be also physical file path. In Windows-environment this must be preceded with drive letter and colon (:), in Unix this must be preceded with colon to identify it as file path and in both cases slash (/) must be used as path separator character. It can be servlet path (starting with /servlet) or absolute URL with any protocol. It can also contain asterisk (*) which is replaced by the remaining part of original path, which matched asterisk in the virtual path. Virtual paths can also be modified interactively with remote management application.
File named virtual_hosts may contain port numbers of servers acting as virtual hosts in this server instance.
W3Server management frontend is started as follows:
java FI.realitymodeler.server.Management [options] [<server host>]
Parameter gives host name of the server and defaults to local machine. Allowed options are: (default value is in parantheses)
-f : clean ftp proxy cache and exit
This cleans all files accessed via ftp-proxy and then exits from the management program. This option is usable in scheduled cleaning operations. It must be noted that machine where operations are scheduled needs not be the same where server runs. Cleaning proxy cache means that all files, which haven't been accessed for set cache cleaning interval are deleted.
-g : clean gopher proxy cache and exit
This cleans all files accessed via gopher-proxy and then exits from the management program. Cleaning is done in the same manner as with ftp-proxy.
-h : clean http proxy cache and exit
This cleans all files accessed via http-proxy and then exits from the management program. Cleaning is done in the same manner as with ftp-proxy.
-m <username> <password>
Sets manager username and password.
-p <port number> (80 or 443 when secure)
Gives port number of the server host.
-s : secure on (off)
This sets secure mode on.
-t : stop server and exit (works only from server host)
This stops server. Program must be started in server host machine in the data directory.
-y <socket read timeout in seconds> (60)
Sets read timeout for socket in seconds.
-?
Shows these options.
W3Server-program is equipped with various builtin servlets, available through server’s standard servlet interface. Also basic modules responsible for handling requests directed to local file system and remote servers are implemented as servlets. For example, proxy servlet can be called explicitly when requests from remote servers must be tunneled through other proxy server. This can be the case when using for example ISP, which requires to use it’s own proxy server, and preferred W3Server is located at distance. Servlets are called using their name after /servlet-path in requests. Among others, following servlets are available:
- Core
- Proxy
- Ldap
- News
- Wais
- CGI
Default request path pattern: /frame/core/<file path>
This servlet handles requests directed to local file system.
Default request path pattern: /frame/proxy/<url>
This servlet handles requests directed to remote servers.
Default request path pattern: /public/Ldap
This servlet can be used to browse content of LDAP-directories.
Default initial request path pattern: /public/Mail?new=
This servlet is an email client operating in browser window. It can be used with POP3, IMAP4 and SMTP-servers.
Default initial request path pattern: /public/News
This servlet is news client operating in browser window. It can be used with NNTP-servers.
Default initial request path pattern: /public/Wais
This servlet can be used to query information from WAIS-servers (Wide Area Information Server).
Default request path pattern: /public/cgi/<cgi name>/<cgi parameters>
This servlet can be used to call CGI-programs.
General purpose Internet client robot.
1. Mirrors content in remote servers to local cache.
2. Tests servers by simulating client, which is sending requests and forms.
3. Searches content by keywords from remote servers.
4. Searches content by keywords from other robots.
5. Sends content and messages to remote servers and mail systems.
Checks robot exclusion file robots.txt.
Recognizes cookies used during one session.
Class can be used as a standalone program. When it is invoked, it first tries to read from current directory file named ‘system.properties’, which can contain in standard java permanent properties file format (name=value) various system properties for standard java classes. It requires file called ‘mime_types’ in current directory, which contains file name suffix to mime type mappings. It can be found from W3Server-program’s data directory with the same name. Client robot is invoked in the following way:
java FI.realitymodeler.server.ClientRobot [options] URL [keywords]
where allowed options are: (default value is in parantheses if applicable)
-3 <pop3 proxy host> <pop3 proxy port>
Sets the host name and port of pop3-server proxy if used (must be
FI.realitymodeler.server.W3Server).
-A : append remaining user values to query
When filling HTML-forms automatically this adds remaining name-value-pairs specified in parameter file for -m option to the query although they do not exist in original form.
-C <accept content type list>
Sets value for Accept-header used in requests.
-D <header name> <default value>
Sets value as default for specified HTTP-header name used in all requests.
-F : fill only caches, do not read input stream
-E : force resuming to cache filling
Use this when cache filling was interrupted out of control. This avoids
checking if partial cache file is valid.
-H : request fresh copy from origin server
If this is set, intermediate caches are requested to respond with fresh copies from origin servers.
-N <NNTP proxy host> <nntp proxy port>
This sets the host name and port of nntp-server proxy if used (must be FI.realitymodeler.server.W3Server).
-P <file listing parts of name collected protocol://host -names must start or end with or not to start nor end with if part of name is prefixed with !>
This constraints set of URLs which are collected from parsed HTML-pages when recursively navigating the web. For example, 'host.domain' means that only URL's starting or ending with string 'host.domain' are collected.
-R : trace only form handling
Says that form handling specified with -m option is only traced but not actually processed.
-S <SMTP proxy> <port>
This sets the host name and port of smtp-server proxy if used (must be FI.realitymodeler.server.W3Server).
-T : use head-tail matching
Used with options -b and -c and specifies that if head (protocol and host name) and tail (end of filename) of URL matches it is collected. For example pattern http://host.domain/file matches http://host.domain/directory/file.
-U <file listing root URLs>
Specifies the file name where URLs are listed used as starting points of navigation. This can be used to specify more than one URL specified in command line parameter.
-a <username> <password>
Specifies username and password used when logging in to remote servers.
-b <file listing paths branched URLs should start with or not start with if path is prefixed with !>
Specifies file, which lists parts of URLs by which collected URLs must start with. Asterisk can be used as a wild card in the end of string. For example: http://host.domain*
-c <file listing paths sending URLs should start with or not start with if path is prefixed with !>
Specifies file, which lists parts of URLs by which sending URLs must start with. Used with options -d or -e. Asterisk can be used as a wild card in the end of string. For example: mailto:*
-d : send files given in place of keywords to URLs found in given url.
Sends files given in place of keywords to URLs found from specified url.
-e : send files given in place of keywords to URL.
Sends files given in place of keywords to specified URL.
-f : set FTP proxy cache on (off)
Turns FTP-proxy cache on.
-g : set GOPHER proxy cache on (off)
Turns GOPHER-proxy cache on.
-h : set HTTP proxy cache on (off)
Turns HTTP-proxy cache on.
-i <local IP-address to bind> <local port>
Sets the ip address socket used when connecting is bound.
-j <GOPHER proxy> <port>
Sets the host name and port of gopher-server proxy if used (can be any web-server supporting GOPHER-proxy).
-k : only keywords given and they are searched from robots
Keywords are searched from search engine URLs specified in file 'robots'. Forms sent to search engines are specified in file 'robot_forms' and parts of URLs not collected from parsed result pages are specified in file 'robot_parts'.
-l : test all links
-m <form values file containing action paths with list of name=value pairs>
Specifies file where assignments for form handing are defined. File 'robot_forms' can be used as example. File is of the following format:
<action URL>
name=value | *
.
.
.
<action URL> is complete url which appears in FORM-tag’s ACTION-attribute. When this url is encountered in page, form is issued by specified parameters. All input parameters given inside form-tag are used in request, possibly overwritten by values specified in form file. If parameter’s value is set to asterisk (*), keywords given in command line are used in order in it’s place.
-n <number of threads in client simulation>
-o <HTTP proxy> <port>
Sets the host name and port of http-server proxy if used (can be any web-server supporting http-proxy).
-q : type files to standard output
-r : read all files so that cache gets filled
-s : stay only under given URL
-t : trace only and do not actually send
-u <ftp proxy> <port>
Sets the host name and port of ftp-server proxy if used (can be any web-server supporting ftp-proxy).
-w <delay in seconds between requests>
-x : force cache filling with dynamic content
Forces cache to be filled also with request containing queries.
-y : <socket read timeout in secs> (60)
-z : set cache header storing on (off)
Specifies if response headers are stored to cache files. In this case cache files cannot be directly used from file system, but must be accessed through FI.realitymodeler.server.W3Server's proxy mechanism. Cache files with headers and without headers must not be mixed in same cache file tree. See noCacheHeaders. When response headers are not stored in cache files, they can be usually used directly by browser or some other program, because in this case they are identical in binary level to original files (e.g. text, images and audio files).
ftp://nic.mil/rfc1738.txt
ftp://nic.mil/rfc/rfc1945.txt
ftp://nic.mil/rfc/rfc2616.txt
ftp://nic.mil/rfc/rfc2069.txt
ftp://nic.mil/rfc/rfc2069.txt
ftp://nic.mil/rfc/rfc959.txt
ftp://nic.mil/rfc/rfc1436.txt
ftp://nic.mil/rfc/rfc821.txt
ftp://nic.mil/rfc/rfc977.txt
ftp://nic.mil/rfc/rfc2045.txt
ftp://nic.mil/rfc/rfc2046.txt
ftp://nic.mil/rfc/rfc2047.txt
http://www.w3.org/pub/WWW/TR/WD-logfile.html
http://hoohoo.ncsa.uiuc.edu/cgi/
http://wap.forum.org
http://wap.forum.org
http://wap.forum.org
http://wap.forum.org