Software-update: Xapian / Omega 1.0.7

Xapian is een in C++ geschreven 'open source information retrieval library' en kan gebruikt worden als engine achter een zoekmachine. Het geheel omvat een eigen databaseformaat, api's om deze databases te bewerken en te doorzoeken, tools om de databases te controleren en koppelingsmogelijkheden voor andere talen zoals Java, Ruby, PHP en Python.

Omega is een applicatie die bovenop Xapian als een zoekmachine kan worden gebruikt om Xapian-databases te doorzoeken. Met Omega worden ook enkele tools meegeleverd die gebruikt kunnen worden om databases te vullen met data. Omdat de ontwikkeling van Omega nauw verbonden is met die van Xapian zelf, worden door de ontwikkelaars van beide programma's gelijktijdig nieuwe versies uitgebracht met hetzelfde versienummer. gebruikt Xapian en Omega als zoekmachine voor het forum. Meer over de implementatie en de werking van de zoekmachine kan in dit document worden gevonden.

Het ontwikkelteam van The Xapian Project heeft versie 1.0.7 van Xapian en Omega uitgebracht. De lijsten met veranderingen voor de verschillende onderdelen zien er als volgt uit:

Xapian-core 1.0.7:

    • If there were gaps in the document id numbering, these operators could return document ids which weren't present in the database. This has been fixed.
    • These operators are now more efficient when there are a lot of "missing" document ids (bug#270).
    • Optimise Query(OP_VALUE_GE, , "") to Query::MatchAll.
  • Xapian::QueryParser:
    • QueryParser now stops parsing immediately when it hits a syntax error. This doesn't change behaviour, but does mean failing to parse queries is now more efficient.
    • Cases of O(N*N) behaviour have been fixed.
  • Xapian::Stem now recognises "nl" as an alias for "dutch" (debian bug 484458).
  • Setting sort by value was being ignored by a Xapian::Enquire object which had previously had a Xapian::Sorter set (bug#256).
  • Improved test coverage in a few places.
  • When using a MatchDecider, we weren't reducing matches_lower_bound unless all the potential results were retrieved, which led to the lower bound being too high in some such cases.
  • We now track how many documents were tested by a MatchDecider and how many of those it rejected, and set matches_estimated based on this rate. Also, matches_upper_bound is reduced by the number of rejected documents.
  • Fixed matches_upper_bound in some cases when collapsing and using a MatchDecider.
  • Fixed matches_lower_bound when collapsing and using a percentage cutoff.
  • When using two or more of a MatchDecider, collapsing, or a percentage cutoff, we now only round the scaled estimate once, and we also round it to the nearest rather than always rounding down. Hopefully this should improve the estimate a little in such cases.
  • Fix problem on x86 with the top match getting 99% rather than 100% (caused by excess precision in intermediate value).
flint backend:
  • If Database::reopen() is called and the database revision on disk hasn't changed, then do as little work as possible. Even if it has changed, don't bother to recheck the version file (bug#261).
  • xapian-compact:
    • Fix check for user metadata key to not match other key types we may add in the future. When compacting, we can't assume how we should handle them.
    • If the same user metadata key is present in more than one source database with different tag values, issue a warning and copy an arbitrary tag value.
    • Fix potential SEGV when compacting database(s) with user metadata but no postings.
    • In error message, refer to "iamflint" as the "version file", not the "meta file".
  • xapian-inspect:
    • Print top-bit-set characters as escaped hex forms as they often won't be valid UTF-8 sequences.
    • If we're passed a database directory rather than a single table, issue a special error message since this is an obvious mistake for users to make.
  • Fix cursor handling for a modified table which has previously only had sequential updates which usually manifested as zlib errors (bug#259).
quartz backend:
  • Fix cursor handling for a modified table which has previously only had sequential updates which usually manifested as incorrect data being returned (bug#259).
  • Calling skip_to() as the first operation on an all-documents PostingIterator now works correctly.
remote backend:
  • Improve performance of matches with multiple databases at least one of which is remote, and when the top hit is from a remote database (bug#279).
  • When remote protocol version doesn't match, the error message displayed now shows the minor version number supplied by the server correctly.
  • We now wait for the connection to close after sending MSG_SHUTDOWN for a WritableDatabase, which ensures that changes have been written to disk and the lock released.
  • We no longer ever send MSG_SHUTDOWN for a read-only Database - just closing the connection is enough (and is protocol compatible).
inmemory backend:
  • Fix bug which resulted in the values not being stored correctly when replacing an existing document, or if there are gaps in the document id numbering.
build system:
  • This release now uses newer versions of the autotools (autoconf 2.61 -> 2.62; automake 1.10 -> 1.10.1; libtool 1.5.24 -> 1.5.26). The newer autoconf reportedly results in a faster configure script, and warns about use of unrecognised configure options.
  • Fix configure to recognise --enable-log=profile and fix build problems when this is enabled.
  • "make up" in the "tests" subdirectory now does "make" in the top-level.
  • Fix "make distcheck" by using dist-hook to install generated files from either srcdir or builddir, with the appropriate dependency to generate them automatically in maintainer mode builds.
  • intro_ir.html: Improve wording a bit.
  • The documentation now links to trac instead of bugzilla. For links to the main website, we now prefer to
  • Doxygen-generated API documentation:
    • Improved documentation in several places.
    • The helper macro XAPIAN_VISIBILITY_DEFAULT no longer appears in the output.
    • Header and directory relationship graphs are no longer generated as they aren't actually informative here.
  • HACKING: Numerous updates and improvements.
  • quest: Output get_description() of the parsed query.
  • Fix build with GCC 2.95.3.
  • Fix build with GCC 4.3.
  • Newer libtool features improved support for Mac OS X Leopard and added support for AIX 6.1.
debug code:
  • Database::get_spelling_suggestion() now debug logs with category APICALL rather than SPELLING, for consistency with all other API methods.
  • Added APICALL logging to a few Database methods which didn't have it.
  • Remove debug log tracing from get_description() methods since logging for other methods calls get_description() methods on parameters, so logging these calls just makes for more confusing debug logs. A get_description() method should have no side-effects so it's not very interesting even when explicitly called by the user.

Omega 1.0.7:

  • omegascript.html,scriptindex.html: Fix empty titles.
indexers - omindex:
  • When indexing text files, handle UCS-2 and UTF-16 text files with a byte-order mark (BOM), and ignore any UTF-8 "byte-order" mark.
  • The built-in conversion code (used when iconv isn't available) now handles UCS-2/UTF-16 with and without a BOM, and also the explicit BE and LE forms.
  • Overhaul the $highlight colour combinations since some were rather unreadable (Debian bug 484456).
build system:
  • configure: Synchronise code for working out warning flags used for builds with that used for xapian-core, which in particular handles different output formats from "gcc --version".
  • configure: Fix header checks to pre-include which Mac OS X needs for some other headers to work.
  • configure: Fix probing for iconv to work better when iconv isn't found (previously this only worked on Mac OS X with fink).
  • Fix compilation error on FreeBSD, introduced in 1.0.5.
  • In omega, cast size to unsigned before division to avoid a warning about signed overflow.
  • xapian-omega.spec: Remove "www." from and URLs.

Xapian-bindings 1.0.7:

  • Document how all the database factory functions and library version functions are wrapped for all languages.
  • Fix to build against a xapian-core which has quartz and/or flint disabled.
  • The "program" version of Remote::open() has been wrapped for some time, so update the documentation which said it wasn't.
  • xapian-bindings.spec: Remove "www." from and URLs.
  • For Java, Python, and Ruby, use the libtool -shrext option to specify a different module extension rather than our own ugly bodge.
  • Make passing string from Java to C++ zero-byte safe. It doesn't appear to be simple to make C++ to Java work though.
  • Add test that OP_VALUE_GE works for PHP.
  • Several corrections to the Python documentation.
  • configure: Fix problem with building under mingw.
  • Include simplematchdecider.rb example.
  • smoketest.rb: Test the version reporting functions.
  • Include simpleexpand.tcl example.
  • Fix where the Tcl module gets installed.
  • README: Note that Tcl 8.3 and earlier are no longer supported by upstream.
Versienummer 1.0.7
Releasestatus Final
Besturingssystemen Windows 9x, Windows NT, Windows 2000, Linux, BSD, Windows XP, macOS, OS/2, Solaris, UNIX, Windows Server 2003, Windows Vista
Website The Xapian Project
Licentietype GPL

Door Japke Rosink


19-07-2008 • 14:23

2 Linkedin

Bron: The Xapian Project

Reacties (2)

Wijzig sortering
Het klinkt zo makkelijk. Ik heb een keer naar Omega gekeken en ik vond het nog vrij moeilijk, niet iets voor de beginnende programmeur. Goed framework nonetheless.
Als je iets makkelijkers zoekt kun je ook eens kijken naar Lucene. Dit is een IR-library geschreven in Java. Er zijn in ieder geval ports naar .NET en Python.

Op dit item kan niet meer gereageerd worden.

Kies score Let op: Beoordeel reacties objectief. De kwaliteit van de argumentatie is leidend voor de beoordeling van een reactie, niet of een mening overeenkomt met die van jou.

Een uitgebreider overzicht van de werking van het moderatiesysteem vind je in de Moderatie FAQ

Rapporteer misbruik van moderaties in Frontpagemoderatie.

Google Pixel 7 Sony WH-1000XM5 Apple iPhone 14 Samsung Galaxy Watch5, 44mm Sonic Frontiers Samsung Galaxy Z Fold4 Insta360 X3 Nintendo Switch Lite

Tweakers vormt samen met Hardware Info, AutoTrack,, Nationale Vacaturebank, Intermediair en Independer DPG Online Services B.V.
Alle rechten voorbehouden © 1998 - 2022 Hosting door True

Tweakers maakt gebruik van cookies

Tweakers plaatst functionele en analytische cookies voor het functioneren van de website en het verbeteren van de website-ervaring. Deze cookies zijn noodzakelijk. Om op Tweakers relevantere advertenties te tonen en om ingesloten content van derden te tonen (bijvoorbeeld video's), vragen we je toestemming. Via ingesloten content kunnen derde partijen diensten leveren en verbeteren, bezoekersstatistieken bijhouden, gepersonaliseerde content tonen, gerichte advertenties tonen en gebruikersprofielen opbouwen. Hiervoor worden apparaatgegevens, IP-adres, geolocatie en surfgedrag vastgelegd.

Meer informatie vind je in ons cookiebeleid.


Toestemming beheren

Hieronder kun je per doeleinde of partij toestemming geven of intrekken. Meer informatie vind je in ons cookiebeleid.

Functioneel en analytisch

Deze cookies zijn noodzakelijk voor het functioneren van de website en het verbeteren van de website-ervaring. Klik op het informatie-icoon voor meer informatie. Meer details


    Relevantere advertenties

    Dit beperkt het aantal keer dat dezelfde advertentie getoond wordt (frequency capping) en maakt het mogelijk om binnen Tweakers contextuele advertenties te tonen op basis van pagina's die je hebt bezocht. Meer details

    Tweakers genereert een willekeurige unieke code als identifier. Deze data wordt niet gedeeld met adverteerders of andere derde partijen en je kunt niet buiten Tweakers gevolgd worden. Indien je bent ingelogd, wordt deze identifier gekoppeld aan je account. Indien je niet bent ingelogd, wordt deze identifier gekoppeld aan je sessie die maximaal 4 maanden actief blijft. Je kunt deze toestemming te allen tijde intrekken.

    Ingesloten content van derden

    Deze cookies kunnen door derde partijen geplaatst worden via ingesloten content. Klik op het informatie-icoon voor meer informatie over de verwerkingsdoeleinden. Meer details