Cookies op Tweakers

Tweakers maakt gebruik van cookies, onder andere om de website te analyseren, het gebruiksgemak te vergroten en advertenties te tonen. Door gebruik te maken van deze website, of door op 'Ga verder' te klikken, geef je toestemming voor het gebruik van cookies. Wil je meer informatie over cookies en hoe ze worden gebruikt, bekijk dan ons cookiebeleid.

Meer informatie

Door , , 2 reacties
Bron: The Xapian Project

Xapian is een in C++ geschreven 'open source information retrieval library' en kan gebruikt worden als engine achter een zoekmachine. Het geheel omvat een eigen databaseformaat, api's om deze databases te bewerken en te doorzoeken, tools om de databases te controleren en koppelingsmogelijkheden voor andere talen zoals Java, Ruby, PHP en Python.

Omega is een applicatie die bovenop Xapian als een zoekmachine kan worden gebruikt om Xapian-databases te doorzoeken. Met Omega worden ook enkele tools meegeleverd die gebruikt kunnen worden om databases te vullen met data. Omdat de ontwikkeling van Omega nauw verbonden is met die van Xapian zelf, worden door de ontwikkelaars van beide programma's gelijktijdig nieuwe versies uitgebracht met hetzelfde versienummer. Tweakers.net gebruikt Xapian en Omega als zoekmachine voor het forum. Meer over de implementatie en de werking van de zoekmachine kan in dit document worden gevonden.

Het ontwikkelteam van The Xapian Project heeft versie 1.0.7 van Xapian en Omega uitgebracht. De lijsten met veranderingen voor de verschillende onderdelen zien er als volgt uit:

Xapian-core 1.0.7:

API:
  • OP_VALUE_RANGE, OP_VALUE_GE, and OP_VALUE_LE:
    • If there were gaps in the document id numbering, these operators could return document ids which weren't present in the database. This has been fixed.
    • These operators are now more efficient when there are a lot of "missing" document ids (bug#270).
    • Optimise Query(OP_VALUE_GE, , "") to Query::MatchAll.
  • Xapian::QueryParser:
    • QueryParser now stops parsing immediately when it hits a syntax error. This doesn't change behaviour, but does mean failing to parse queries is now more efficient.
    • Cases of O(N*N) behaviour have been fixed.
  • Xapian::Stem now recognises "nl" as an alias for "dutch" (debian bug 484458).
  • Setting sort by value was being ignored by a Xapian::Enquire object which had previously had a Xapian::Sorter set (bug#256).
testsuite:
  • Improved test coverage in a few places.
matcher:
  • When using a MatchDecider, we weren't reducing matches_lower_bound unless all the potential results were retrieved, which led to the lower bound being too high in some such cases.
  • We now track how many documents were tested by a MatchDecider and how many of those it rejected, and set matches_estimated based on this rate. Also, matches_upper_bound is reduced by the number of rejected documents.
  • Fixed matches_upper_bound in some cases when collapsing and using a MatchDecider.
  • Fixed matches_lower_bound when collapsing and using a percentage cutoff.
  • When using two or more of a MatchDecider, collapsing, or a percentage cutoff, we now only round the scaled estimate once, and we also round it to the nearest rather than always rounding down. Hopefully this should improve the estimate a little in such cases.
  • Fix problem on x86 with the top match getting 99% rather than 100% (caused by excess precision in intermediate value).
flint backend:
  • If Database::reopen() is called and the database revision on disk hasn't changed, then do as little work as possible. Even if it has changed, don't bother to recheck the version file (bug#261).
  • xapian-compact:
    • Fix check for user metadata key to not match other key types we may add in the future. When compacting, we can't assume how we should handle them.
    • If the same user metadata key is present in more than one source database with different tag values, issue a warning and copy an arbitrary tag value.
    • Fix potential SEGV when compacting database(s) with user metadata but no postings.
    • In error message, refer to "iamflint" as the "version file", not the "meta file".
  • xapian-inspect:
    • Print top-bit-set characters as escaped hex forms as they often won't be valid UTF-8 sequences.
    • If we're passed a database directory rather than a single table, issue a special error message since this is an obvious mistake for users to make.
  • Fix cursor handling for a modified table which has previously only had sequential updates which usually manifested as zlib errors (bug#259).
quartz backend:
  • Fix cursor handling for a modified table which has previously only had sequential updates which usually manifested as incorrect data being returned (bug#259).
  • Calling skip_to() as the first operation on an all-documents PostingIterator now works correctly.
remote backend:
  • Improve performance of matches with multiple databases at least one of which is remote, and when the top hit is from a remote database (bug#279).
  • When remote protocol version doesn't match, the error message displayed now shows the minor version number supplied by the server correctly.
  • We now wait for the connection to close after sending MSG_SHUTDOWN for a WritableDatabase, which ensures that changes have been written to disk and the lock released.
  • We no longer ever send MSG_SHUTDOWN for a read-only Database - just closing the connection is enough (and is protocol compatible).
inmemory backend:
  • Fix bug which resulted in the values not being stored correctly when replacing an existing document, or if there are gaps in the document id numbering.
build system:
  • This release now uses newer versions of the autotools (autoconf 2.61 -> 2.62; automake 1.10 -> 1.10.1; libtool 1.5.24 -> 1.5.26). The newer autoconf reportedly results in a faster configure script, and warns about use of unrecognised configure options.
  • Fix configure to recognise --enable-log=profile and fix build problems when this is enabled.
  • "make up" in the "tests" subdirectory now does "make" in the top-level.
  • Fix "make distcheck" by using dist-hook to install generated files from either srcdir or builddir, with the appropriate dependency to generate them automatically in maintainer mode builds.
documentation:
  • intro_ir.html: Improve wording a bit.
  • The documentation now links to trac instead of bugzilla. For links to the main website, we now prefer xapian.org to www.xapian.org.
  • Doxygen-generated API documentation:
    • Improved documentation in several places.
    • The helper macro XAPIAN_VISIBILITY_DEFAULT no longer appears in the output.
    • Header and directory relationship graphs are no longer generated as they aren't actually informative here.
  • HACKING: Numerous updates and improvements.
examples:
  • quest: Output get_description() of the parsed query.
portability:
  • Fix build with GCC 2.95.3.
  • Fix build with GCC 4.3.
  • Newer libtool features improved support for Mac OS X Leopard and added support for AIX 6.1.
debug code:
  • Database::get_spelling_suggestion() now debug logs with category APICALL rather than SPELLING, for consistency with all other API methods.
  • Added APICALL logging to a few Database methods which didn't have it.
  • Remove debug log tracing from get_description() methods since logging for other methods calls get_description() methods on parameters, so logging these calls just makes for more confusing debug logs. A get_description() method should have no side-effects so it's not very interesting even when explicitly called by the user.

Omega 1.0.7:

documentation:
  • omegascript.html,scriptindex.html: Fix empty titles.
indexers - omindex:
  • When indexing text files, handle UCS-2 and UTF-16 text files with a byte-order mark (BOM), and ignore any UTF-8 "byte-order" mark.
  • The built-in conversion code (used when iconv isn't available) now handles UCS-2/UTF-16 with and without a BOM, and also the explicit BE and LE forms.
omega:
  • Overhaul the $highlight colour combinations since some were rather unreadable (Debian bug 484456).
build system:
  • configure: Synchronise code for working out warning flags used for builds with that used for xapian-core, which in particular handles different output formats from "gcc --version".
portability:
  • configure: Fix header checks to pre-include which Mac OS X needs for some other headers to work.
  • configure: Fix probing for iconv to work better when iconv isn't found (previously this only worked on Mac OS X with fink).
  • Fix compilation error on FreeBSD, introduced in 1.0.5.
  • In omega, cast size to unsigned before division to avoid a warning about signed overflow.
packaging:
  • xapian-omega.spec: Remove "www." from xapian.org and oligarchy.co.uk URLs.

Xapian-bindings 1.0.7:

Documentation:
  • Document how all the database factory functions and library version functions are wrapped for all languages.
General:
  • Fix to build against a xapian-core which has quartz and/or flint disabled.
  • The "program" version of Remote::open() has been wrapped for some time, so update the documentation which said it wasn't.
Packaging:
  • xapian-bindings.spec: Remove "www." from xapian.org and oligarchy.co.uk URLs.
Portability:
  • For Java, Python, and Ruby, use the libtool -shrext option to specify a different module extension rather than our own ugly bodge.
Java:
  • Make passing string from Java to C++ zero-byte safe. It doesn't appear to be simple to make C++ to Java work though.
PHP:
  • Add test that OP_VALUE_GE works for PHP.
Python:
  • Several corrections to the Python documentation.
  • configure: Fix problem with building under mingw.
Ruby:
  • Include simplematchdecider.rb example.
  • smoketest.rb: Test the version reporting functions.
Tcl:
  • Include simpleexpand.tcl example.
  • Fix where the Tcl module gets installed.
  • README: Note that Tcl 8.3 and earlier are no longer supported by upstream.
Versienummer:1.0.7
Releasestatus:Final
Besturingssystemen:Windows 9x, Windows NT, Windows 2000, Linux, BSD, Windows XP, macOS, OS/2, Solaris, UNIX, Windows Server 2003, Windows Vista
Website:The Xapian Project
Download:http://www.xapian.org/download.php
Licentietype:GPL
Moderatie-faq Wijzig weergave

Reacties (2)

Het klinkt zo makkelijk. Ik heb een keer naar Omega gekeken en ik vond het nog vrij moeilijk, niet iets voor de beginnende programmeur. Goed framework nonetheless.
Als je iets makkelijkers zoekt kun je ook eens kijken naar Lucene. Dit is een IR-library geschreven in Java. Er zijn in ieder geval ports naar .NET en Python.

Op dit item kan niet meer gereageerd worden.



Apple iOS 10 Google Pixel Apple iPhone 7 Sony PlayStation VR AMD Radeon RX 480 4GB Battlefield 1 Google Android Nougat Watch Dogs 2

© 1998 - 2016 de Persgroep Online Services B.V. Tweakers vormt samen met o.a. Autotrack en Carsom.nl de Persgroep Online Services B.V. Hosting door True