Commit Graph

44 Commits

Author SHA1 Message Date
6595dd04d1 Add backwards-compatible declarations of some core GIN support functions.
These are needed to support reloading dumps of 9.0 installations containing
contrib/intarray or contrib/tsearch2.  Since not only regular dump/reload
but binary upgrade would fail, it seems worth the trouble to carry these
stubs for awhile.  Note that the contrib opclasses referencing these
functions will still work fine, since GIN doesn't actually pay any
attention to the declared signature of a support function.
2011-02-16 17:24:46 -05:00
52fd2d65a3 Fix up core tsquery GIN support for new extractQuery API.
No need for the empty-prefix-match kluge to force a full scan anymore.
2011-01-09 14:34:50 -05:00
5d950e3b0c Stamp copyrights for year 2011. 2011-01-01 13:18:15 -05:00
3e5f9412d0 Reduce the memory requirement for large ispell dictionaries.
This patch eliminates per-chunk palloc overhead for most small allocations
needed in the representation of an ispell dictionary.  This saves close to
a factor of 2 on the current Czech ispell data.  While it doesn't cover
every last small allocation in the ispell code, we are at the point of
diminishing returns, because about 95% of the allocations are covered
already.

Pavel Stehule, rather heavily revised by Tom
2010-10-06 19:31:05 -04:00
9b910def24 Clean up temporary-memory management during ispell dictionary loading.
Add explicit initialization and cleanup functions to spell.c, and keep
all working state in the already-existing ISpellDict struct.  This lets us
get rid of a static variable along with some extremely shaky assumptions
about usage of child memory contexts.

This commit is just code beautification and has no impact on functionality
or performance, but it opens the way to a less-grotty implementation of
Pavel's memory-saving hack, which will follow shortly.
2010-10-06 15:15:15 -04:00
9f2e211386 Remove cvs keywords from all files. 2010-09-20 22:08:53 +02:00
4c10623306 Update a number of broken links in comments.
Josh Kupershmidt
2010-04-02 15:21:20 +00:00
0239800893 Update copyright for the year 2010. 2010-01-02 16:58:17 +00:00
a88a48011c Introduce filtering dictionary support to tsearch. Propagate --nolocale option
to CREATE DATABASE command in pg_regress to allow correct checking of
locale-sensitive contrib modules.
2009-08-18 10:30:41 +00:00
de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
41d17e042b Fix URL generation in headline. Only tag lexeme will be replaced by space.
Per http://archives.postgresql.org/pgsql-bugs/2008-12/msg00013.php
2009-01-15 16:33:59 +00:00
511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
2a0083ede8 Improve headeline generation. Now headline can contain
several fragments a-la Google.

Sushant Sinha <sushant354@gmail.com>
2008-10-17 18:05:19 +00:00
4e57668da4 Create a selectivity estimation function for the text search @@ operator.
Jan Urbanski
2008-09-19 19:03:41 +00:00
6f6d863258 Create a type-specific typanalyze routine for tsvector, which collects stats
on the most common individual lexemes in place of the mostly-useless default
behavior of counting duplicate tsvectors.  Future work: create selectivity
estimation functions that actually do something with these stats.

(Some other things we ought to look at doing: using the Lossy Counting
algorithm in compute_minimal_stats, and using the element-counting idea for
stats on regular arrays.)

Jan Urbanski
2008-07-14 00:51:46 +00:00
fbeb9da22b Improve error reporting for problems in text search configuration files
by installing an error context subroutine that will provide the file name
and line number for all errors detected while reading a config file.
Some of the reader routines were already doing that in an ad-hoc way for
errors detected directly in the reader, but it didn't help for problems
detected in subroutines, such as encoding violations.

Back-patch to 8.3 because 8.3 is where people will be trying to debug
configuration files.
2008-06-18 20:55:42 +00:00
9de09c087d Move wchar2char() and char2wchar() from tsearch into /mb to be easier to
use for other modules;  also move pnstrdup().

Clean up code slightly.
2008-06-18 18:42:54 +00:00
dc69c0362f Move USE_WIDE_UPPER_LOWER define to c.h, and remove TS_USE_WIDE and use
USE_WIDE_UPPER_LOWER instead.
2008-06-17 16:09:06 +00:00
0f5c606f41 Comment fix, should say TSQuery instead of TSVector.
Per Jan Urbanski.
2008-06-10 08:55:50 +00:00
e6dbcb72fa Extend GIN to support partial-match searches, and extend tsquery to support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov
2008-05-16 16:31:02 +00:00
8472bf7a73 Allow float8, int8, and related datatypes to be passed by value on machines
where Datum is 8 bytes wide.  Since this will break old-style C functions
(those still using version 0 calling convention) that have arguments or
results of these types, provide a configure option to disable it and retain
the old pass-by-reference behavior.  Likewise, provide a configure option
to disable the recently-committed float4 pass-by-value change.

Zoltan Boszormenyi, plus configurability stuff by me.
2008-04-21 00:26:47 +00:00
220db7ccd8 Simplify and standardize conversions between TEXT datums and ordinary C
strings.  This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString.  A number of
existing macros that provided variants on these themes were removed.

Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed.  There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).

This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring.  We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.

Brendan Jurd and Tom Lane
2008-03-25 22:42:46 +00:00
716e8b8374 Fix RS_isRegis() to agree exactly with RS_compile()'s idea of what's a valid
regis.  Correct the latter's oversight that a bracket-expression needs to be
terminated.  Reduce the ereports to elogs, since they are now not expected to
ever be hit (thus addressing Alvaro's original complaint).
In passing, const-ify the string argument to RS_compile.
2008-01-21 02:46:11 +00:00
9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
11fccbeaeb Adjust the names of a couple of tsearch index support functions that had
inappropriately generic-sounding names.  This is more or less free since
we already forced initdb for the next beta, and it may prevent confusion or
name conflicts (particularly at the C-global-symbol level) down the road.
Per my proposal yesterday.
2007-11-28 19:33:05 +00:00
f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
4394c1b09c Resurrect the code for the rewrite(ARRAY[...]) aggregate function,
and put it into contrib/tsearch2 compatibility module.
2007-11-13 22:14:50 +00:00
654dcfb9e4 Clean up ts_locale.h/.c. Fix broken and not-consistent-across-platforms
behavior of wchar2char/char2wchar; this should resolve bug #3730.  Avoid
excess computations of pg_mblen in t_isalpha and friends.  Const-ify
APIs where possible.
2007-11-09 22:37:35 +00:00
592c88a0d2 Remove the aggregate form of ts_rewrite(), since it doesn't work as desired
if there are zero rows to aggregate over, and the API seems both conceptually
and notationally ugly anyway.  We should look for something that improves
on the tsquery-and-text-SELECT version (which is also pretty ugly but at
least it works...), but it seems that will take query infrastructure that
doesn't exist today.  (Hm, I wonder if there's anything in or near SQL2003
window functions that would help?)  Per discussion.
2007-10-24 02:24:49 +00:00
12f25e70a6 Fix two-argument form of ts_rewrite() so it actually works for cases where
a later rewrite rule should change a subtree modified by an earlier one.
Per my gripe of a few days ago.
2007-10-23 01:44:40 +00:00
1ea47dd8cb Fix shared tsvector/tsquery input code so that we don't say "syntax error in
tsvector" when we are really parsing a tsquery.  Report the bogus input,
too.  Make styles of some related error messages more consistent.
2007-10-21 22:29:56 +00:00
638bd34f89 Found another small glitch in tsearch API: the two versions of ts_lexize()
are really redundant, since we invented a regdictionary alias type.
We can have just one function, declared as taking regdictionary, and
it will handle both behaviors.  Noted while working on documentation.
2007-10-19 22:01:45 +00:00
476045a21b Remove QueryOperand->istrue flag, it was used only in cover ranking
(ts_rank_cd). Use palloc'ed array in ranking instead of flag.
2007-09-11 16:01:40 +00:00
13553cbbff Fix header's size of structs defines in ispell.
Backpatch is needed for contrib version.
2007-09-11 12:57:05 +00:00
57cafe7982 Refactor from Heikki Linnakangas <heikki@enterprisedb.com>:
* Defined new struct WordEntryPosVector that holds a uint16 length and a
variable size array of WordEntries. This replaces the previous
convention of a variable size uint16 array, with the first element
implying the length. WordEntryPosVector has the same layout in memory,
but is more readable in source code. The POSDATAPTR and POSDATALEN
macros are still used, though it would now be more readable to access
the fields in WordEntryPosVector directly.

* Removed needfree field from DocRepresentation. It was always set to false.

* Miscellaneous other commenting and refactoring
2007-09-11 08:46:29 +00:00
d982daae0b Change void* opaque argument to Datum type, add argument's
name to PushFunction type definition.

Per suggestion by Tome Lane <tgl@sss.pgh.pa.us>
2007-09-10 12:36:41 +00:00
978de9d06d Improvements from Heikki Linnakangas <heikki@enterprisedb.com>
- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.

- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().

- Reimplement the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.
2007-09-07 16:03:40 +00:00
8983852e34 Improving various checks by Heikki Linnakangas <heikki@enterprisedb.com>
- add code to check that the query tree is well-formed. It was indeed
  possible to send malformed queries in binary mode, which produced all
  kinds of strange results.

- make the left-field a uint32. There's no reason to
  arbitrarily limit it to 16-bits, and it won't increase the disk/memory
  footprint either now that QueryOperator and QueryOperand are separate
  structs.

- add check_stack_depth() call to all recursive functions I found.
  Some of them might have a natural limit so that you can't force
  arbitrarily deep recursions, but check_stack_depth() is cheap enough
  that seems best to just stick it into anything that might be a problem.
2007-09-07 15:35:11 +00:00
e5be89981f Refactoring by Heikki Linnakangas <heikki@enterprisedb.com> with
small editorization by me

- Brake the QueryItem struct into QueryOperator and QueryOperand.
  Type was really the only common field between them. QueryItem still
  exists, and is used in the TSQuery struct as before, but it's now a
  union of the two. Many other changes fell from that, like separation
  of pushval_asis function into pushValue, pushOperator and pushStop.

- Moved some structs that were for internal use only from header files
  to the right .c-files.

- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
  about half of the size of tsvector.c, it's also used from tsquery.c, and
  it has some data structures of its own, so it seems better to separate
  it. Cleaned up the API so that TSVectorParserState is not accessed from
  outside tsvector_parser.c.

- Separated enumerations (#defines, really) used for QueryItem.type
  field and as return codes from gettoken_query. It was just accidental
  code sharing.

- Removed ParseQueryNode struct used internally by makepol and friends.
  push*-functions now construct QueryItems directly.

- Changed int4 variables to just ints for variables like "i" or "array
  size", where the storage-size was not significant.
2007-09-07 15:09:56 +00:00
7351b5fa17 Cleanup for some problems in tsearch patch:
- ispell initialization crashed on empty dictionary file
- ispell initialization crashed on affix file with prefixes but no suffixes
- stop words file was run through pg_verify_mbstr, with database
  encoding, but it's supposed to be UTF-8; similar bug for synonym files
- bunch of comments added, typos fixed, and other cleanup

Introduced consistent encoding checking/conversion of data read from tsearch
configuration files, by doing this in a single t_readline() subroutine
(replacing direct usages of fgets).  Cleaned up API for readstopwords too.

Heikki Linnakangas
2007-08-25 00:03:59 +00:00
d321421d0a Simplify the syntax of CREATE/ALTER TEXT SEARCH DICTIONARY by treating the
init options of the template as top-level options in the syntax.  This also
makes ALTER a bit easier to use, since options can be replaced individually.
I also made these statements verify that the tmplinit method will accept
the new settings before they get stored; in the original coding you didn't
find out about mistakes until the dictionary got invoked.

Under the hood, init methods now get options as a List of DefElem instead
of a raw text string --- that lets tsearch use existing options-pushing code
instead of duplicating functionality.
2007-08-22 01:39:46 +00:00
140d4ebcb4 Tsearch2 functionality migrates to core. The bulk of this work is by
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.

Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
2007-08-21 01:11:32 +00:00