Bill Welliver is sharing code with you
Bitbucket is a code hosting site. Unlimited public and private repositories. Free for small teams.
Don't show this againfulltext overview
Recent commits See more »
| Author | Revision | Comments | Message | Labels | Date |
|---|---|---|---|---|---|
|
|
b32e04fc31ac |
more logging cleanup |
|
||
|
|
1244c0b3bbfb |
update logging |
|
||
|
|
fec739d36bba |
a little more verbosity for mime type prohibitions |
|
||
|
|
e7dd41bef95b |
clean up a little bit more. |
|
||
|
|
42acbe871148 |
adding individual index info |
|
Getting Started You should have the following installed: Pike 7.8 Fins framework Xapian full text library Public.Xapian module To start the server, first look at the settings in the config/dev.cfg file to make sure that everything looks good. Once you've done that, set the FINS_HOME environment variable so that it points to the location you've installed the Fins framework. Once you've done that, you should be able to run the start script, which is located in the bin/ directory: FINS_HOME=/path/to/fins export FINS_HOME cd /path/to/fulltext bin/start.sh (or "bin/start.sh -d", optionally once you've verified everything's working) FTAdmin Fins/Xapian provides a script that performs certain administrative functions. This script is located within the bin directory and performs the following functions: Create a new index: bin/ftadmin.sh new indexname Grant access to an index (prints out the newly granted auth code): bin/ftadmin.sh grant indexname Revoke access to an index for an auth code: bin/ftadmin.sh grant indexname authcode Shut down the server (optionally after a delay): bin/ftadmin.sh shutdown [seconds] Note that in order for the script to work, the server must be running on the local host. Security The FullText application supports 2 levels of security: standard and simplified. You may choose either based on your particular needs, however the "standard" model is enabled by default. When using the standard security model, there are administrative authorization codes that are used to create new indices as well as to grant or revoke access to a given index. The administrative authorization codes are placed in the "auth" section of the application configuration file, and multiple administrative authorization codes may be enabled at one time. These codes are read at start up time and the application must be shut down in order to flush existing codes. In order to search or update an index while running in standard security mode, a client must provide a valid index authorization code. A given code is specific for a particular index and may be obtained by using the administrative client. Similarly, codes may be revoked using the administrative client. Codes may be granted and accessed at any time, without restarting the FullText application. When using the simplified mode, during search or update operations, the FullText application simply validates the authorization code provided by a client against its list of administrative authorization codes. This can simplify management of authorization codes for certain scenarios, such as developement or other small scale installations at the expense of giving each user "the keys to the castle". You may enable the simplified security mechanism by setting the "use_simple_security" flag in the "auth" section of the application configuration file. When running in the simplified mode, the grant and revoke functionality is disabled. In either case, if a valid administrative access code is not present in the application configuration file on startup, one will be created and enabled. A message will be displayed in the application log along with the new administration authorization code. Client Example import FullText; string index = "myFTIndex"; // change to '1' if you want to create the index if it doesn't exist. int create_if_new = 0; string authcode = "1234567890"; // see the security section for details on auth codes. // if we're running the FullText application on http://localhost:8124, // we can use the default url. object a = AdminClient(0, authcode); if(!a->exists(index)) { a->new(index); werror("new auth code for index: %O\n", authcode = a->grant_access(index)); } object u = UpdateClient(0, index, authcode); // now, let's add some content string content = "mary had a little lamb, its fleece was white as snow."; string title = "mary and her lamb"; // the title of the content, stored and returned with searches string handle = "/rhymes/mary"; // a (hopefully) unique identifier for this bit of content u->add(title, Calendar.now()->seconds(), content, handle, 0, "text/plain"); // ok, now that we've added, we can search: object s = SearchClient(0, index, authcode); foreach(s->search("lamb");; mapping doc) werror("found a hit: %O, rating: %O, handle: %O", doc->title, doc->score, doc->handle); Indexing support for various file formats The indexer has built in support for plain text files and HTML. You may add support for additional file formats by telling the engine about programs that can convert other formats to HTML or plain text. Some examples that have been successfully tested: PDF http://pdftohtml.sourceforge.net/ Install pdftohtml and then add the following to your FullText config file: [transform_pdf] type=converter mimetype=application/pdf command=/usr/local/bin/pdftohtml -stdout -q %f RTF http://sourceforge.net/projects/rtf2html-lite/ The free tool, rtf2html, can be used to process rtf files. However, out of the box, this tool does not behave as either a filter or converter. A simple script is included in the extras folder which can be used to make the rtf2html utility behave in a compatible mannter. Install rtf2html, edit the rtf2html_converter script appropriately, and then add the following to your FullText config file: [transform_rtf] type=converter mimetype=text/rtf command=/path/to/extras/rtf2html_converter %f DOC/DOCX/ODT/ABW AbiWord can load various Word/OpenOffice formats and includes a tool called "AbiCommand" that can be used to read a file and convert it into HTML format. The actual implementation is left as an exercise to for the reader, however, the following page includes almost everything a user might need to make this happen. Hint: start with the RTF filter above. http://www.abisource.com/wiki/AbiCommand OTHERS Apache Tika seems like it could be a useful tool, it includes support for a number of popular file formats and has an out of the box command line utility. Drawbacks include: - written in java, so not exactly nimble If you download the Tika jar from the Apache Tika website, you can use the following config section to handle pdf, doc and various other formats: [transform_tika] type=converter command=java -jar /home/hww3/Fins/FullText/extras/tika-app-1.0.jar -h %f mimetype=application/pdf mimetype=text/rtf mimetype=application/rtf mimetype=application/msword mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document mimetype=application/vnd.oasis.opendocument.text mimetype=application/x-vnd.oasis.opendocument.text