This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
This creates a new gin-btree callback function for creating a downlink for
a page. Previously, ginxlog.c duplicated the logic used during normal
operation.
This patch improves performance of most built-in aggregates that formerly
used a NUMERIC or NUMERIC array as their transition type; this includes
not only aggregates on numeric inputs, but some aggregates on integer
inputs where overflow of an int8 value is a possibility. The code now
uses a special-purpose data structure to avoid array construction and
deconstruction overhead, as well as packing and unpacking overhead for
numeric values.
These aggregates' transition type is now declared as INTERNAL, since
it doesn't correspond to any SQL data type. To keep the planner from
thinking that that means a lot of storage will be used, we make use
of the just-added pg_aggregate.aggtransspace feature. The space estimate
is set to 128 bytes, which is at least in the right ballpark.
Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
Formerly the planner had a hard-wired rule of thumb for guessing the amount
of space consumed by an aggregate function's transition state data. This
estimate is critical to deciding whether it's OK to use hash aggregation,
and in many situations the built-in estimate isn't very good. This patch
adds a column to pg_aggregate wherein a per-aggregate estimate can be
provided, overriding the planner's default, and infrastructure for setting
the column via CREATE AGGREGATE.
It may be that additional smarts will be required in future, perhaps even
a per-aggregate estimation function. But this is already a step forward.
This is extracted from a larger patch to improve the performance of numeric
and int8 aggregates. I (tgl) thought it was worth reviewing and committing
this infrastructure separately. In this commit, all built-in aggregates
are given aggtransspace = 0, so no behavior should change.
Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra
Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr
must be able to compute valid em_nullable_relids for any new equivalence
class members it creates. I'd worried about this in the commit message
for db9f0e1d9a4a0842c814a464cdc9758c3f20b96c, but claimed that it wasn't a
problem because multi-member ECs should already exist when it runs. That
is transparently wrong, though, because this function is also called by
initialize_mergeclause_eclasses, which runs during deconstruct_jointree.
The example given in the bug report (which the new regression test item
is based upon) fails because the COALESCE() expression is first seen by
initialize_mergeclause_eclasses rather than process_equivalence.
Fixing this requires passing the appropriate nullable_relids set to
get_eclass_for_sort_expr, and it requires new code to compute that set
for top-level expressions such as ORDER BY, GROUP BY, etc. We store
the top-level nullable_relids in a new field in PlannerInfo to avoid
computing it many times. In the back branches, I've added the new
field at the end of the struct to minimize ABI breakage for planner
plugins. There doesn't seem to be a good alternative to changing
get_eclass_for_sort_expr's API signature, though. There probably aren't
any third-party extensions calling that function directly; moreover,
if there are, they probably need to think about what to pass for
nullable_relids anyway.
Back-patch to 9.2, like the previous patch in this area.
If a page is deleted, and reused for something else, just as a search is
following a rightlink to it from its left sibling, the search would continue
scanning whatever the new contents of the page are. That could lead to
incorrect query results, or even something more curious if the page is
reused for a different kind of a page.
To fix, modify the search algorithm to lock the next page before releasing
the previous one, and refrain from deleting pages from the leftmost branch
of the tree.
Add a new Concurrency section to the README, explaining why this works.
There is a lot more one could say about concurrency in GIN, but that's for
another patch.
Backpatch to all supported versions.
Pending patches for logical replication will use this to determine
which columns of a tuple ought to be considered as its candidate key.
Andres Freund, with minor, mostly cosmetic adjustments by me
Merge the isEnoughSpace and placeToPage functions in the b-tree interface
into one function that tries to put a tuple on page, and returns false if
it doesn't fit.
Move createPostingTree function to gindatapage.c, and change its contract
so that it can be passed more items than fit on the root page. It's in a
better position than the callers to know how many items fit.
Move ginMergeItemPointers out of gindatapage.c, into a separate file.
These changes make no difference now, but reduce the footprint of Alexander
Korotkov's upcoming patch to pack item pointers more tightly.
These variables no longer have any useful purpose, since there's no reason
to special-case brute force timezones now that we have a valid
session_timezone setting for them. Remove the variables, and remove the
SET/SHOW TIME ZONE code that deals with them.
The user-visible impact of this is that SHOW TIME ZONE will now show a
POSIX-style zone specification, in the form "<+-offset>-+offset", rather
than an interval value when a brute-force zone has been set. While perhaps
less intuitive, this is a better definition than before because it's
actually possible to give that string back to SET TIME ZONE and get the
same behavior, unlike what used to happen.
We did not previously mention the angle-bracket syntax when describing
POSIX timezone specifications; add some documentation so that people
can figure out what these strings do. (There's still quite a lot of
undocumented functionality there, but anybody who really cares can
go read the POSIX spec to find out about it. In practice most people
seem to prefer Olsen-style city names anyway.)
Formerly, when using a SQL-spec timezone setting with a fixed GMT offset
(called a "brute force" timezone in the code), the session_timezone
variable was not updated to match the nominal timezone; rather, all code
was expected to ignore session_timezone if HasCTZSet was true. This is
of course obviously fragile, though a search of the code finds only
timeofday() failing to honor the rule. A bigger problem was that
DetermineTimeZoneOffset() supposed that if its pg_tz parameter was
pointer-equal to session_timezone, then HasCTZSet should override the
parameter. This would cause datetime input containing an explicit zone
name to be treated as referencing the brute-force zone instead, if the
zone name happened to match the session timezone that had prevailed
before installing the brute-force zone setting (as reported in bug #8572).
The same malady could affect AT TIME ZONE operators.
To fix, set up session_timezone so that it matches the brute-force zone
specification, which we can do using the POSIX timezone definition syntax
"<abbrev>offset", and get rid of the bogus lookaside check in
DetermineTimeZoneOffset(). Aside from fixing the erroneous behavior in
datetime parsing and AT TIME ZONE, this will cause the timeofday() function
to print its result in the user-requested time zone rather than some
previously-set zone. It might also affect results in third-party
extensions, if there are any that make use of session_timezone without
considering HasCTZSet, but in all cases the new behavior should be saner
than before.
Back-patch to all supported branches.
With these, one need no longer manipulate large object descriptors and
extract numeric constants from header files in order to read and write
large object contents from SQL.
Pavel Stehule, reviewed by Rushabh Lathia.
When we are using a C99-compliant vsnprintf implementation (which should be
most places, these days) it is worth the trouble to make use of its report
of how large the buffer needs to be to succeed. This patch adjusts
stringinfo.c and some miscellaneous usages in pg_dump to do that, relying
on the logic recently added in libpgcommon's psprintf.c. Since these
places want to know the number of bytes written once we succeed, modify the
API of pvsnprintf() to report that.
There remains near-duplicate logic in pqexpbuffer.c, but since that code
is in libpq, psprintf.c's approach of exit()-on-error isn't appropriate
for use there. Also note that I didn't bother touching the multitude
of places that call (v)snprintf without any attempt to provide a resizable
buffer.
Release-note-worthy incompatibility: the API of appendStringInfoVA()
changed. If there's any third-party code that's calling that directly,
it will need tweaking along the same lines as in this patch.
David Rowley and Tom Lane
asprintf(), aside from not being particularly portable, has a fundamentally
badly-designed API; the psprintf() function that was added in passing in
the previous patch has a much better API choice. Moreover, the NetBSD
implementation that was borrowed for the previous patch doesn't work with
non-C99-compliant vsnprintf, which is something we still have to cope with
on some platforms; and it depends on va_copy which isn't all that portable
either. Get rid of that code in favor of an implementation similar to what
we've used for many years in stringinfo.c. Also, move it into libpgcommon
since it's not really libpgport material.
I think this patch will be enough to turn the buildfarm green again, but
there's still cosmetic work left to do, namely get rid of pg_asprintf()
in favor of using psprintf(). That will come in a followon patch.
This avoids an assumption about the signed number representation. It is
anticipated to have no functional changes on supported configurations;
many two's complement assumptions remain elsewhere.
Per a suggestion from Andres Freund.
Previously, unless all columns were auto-updateable, we wouldn't
inserts, updates, or deletes, or at least not without a rule or trigger;
now, we'll allow inserts and updates that target only the auto-updateable
columns, and deletes even if there are no auto-updateable columns at
all provided the view definition is otherwise suitable.
Dean Rasheed, reviewed by Marko Tiikkaja
Although previously-introduced APIs allow the process that registers a
background worker to obtain the worker's PID, there's no way to prevent
a worker that is not currently running from being restarted. This
patch introduces a new API TerminateBackgroundWorker() that prevents
the background worker from being restarted, terminates it if it is
currently running, and causes it to be unregistered if or when it is
not running.
Patch by me. Review by Michael Paquier and KaiGai Kohei.
Development of IRIX has been discontinued, and support is scheduled
to end in December of 2013. Therefore, there will be no supported
versions of this operating system by the time PostgreSQL 9.4 is
released. Furthermore, we have no maintainer for this platform.
All of these platforms are very much obsolete.
As far as I can determine, the last version of SINIX, later renamed
Reliant, occurred some time between 2002 and 2005.
The last release of SunOS that would run on a sun3 was released in
November of 1991; the last release of OpenBSD which supported that
platform was in 2001. The highest clock speed of any processor in
the family was 25MHz.
The NS32K (national semiconductor 320xx) architecture was retired
in 1990.
Support can be re-added if a maintainer emerges for any of these
platforms, but it seems unlikely.
Reviewed by Andres Freund.
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition. Replacement implementations taken from
NetBSD.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
The existing renegotiation code was home for several bugs: it might
erroneously report that renegotiation had failed; it might try to
execute another renegotiation while the previous one was pending; it
failed to terminate the connection if the renegotiation never actually
took place; if a renegotiation was started, the byte count was reset,
even if the renegotiation wasn't completed (this isn't good from a
security perspective because it means continuing to use a session that
should be considered compromised due to volume of data transferred.)
The new code is structured to avoid these pitfalls: renegotiation is
started a little earlier than the limit has expired; the handshake
sequence is retried until it has actually returned successfully, and no
more than that, but if it fails too many times, the connection is
closed. The byte count is reset only when the renegotiation has
succeeded, and if the renegotiation byte count limit expires, the
connection is terminated.
This commit only touches the master branch, because some of the changes
are controversial. If everything goes well, a back-patch might be
considered.
Per discussion started by message
20130710212017.GB4941@eldon.alvh.no-ip.org
make maintainer-check was obscure and rarely called in practice, and
many breakages were missed. Fold everything that make maintainer-check
used to do into the normal build. Specifically:
- Call duplicate_oids when genbki.pl is called.
- Check for tabs in SGML files when the documentation is built.
- Run msgfmt with the -c option during the regular build. Add an
additional configure check to see whether we are using the GNU
version. (make maintainer-check probably used to fail with non-GNU
msgfmt.)
Keep maintainer-check as around as phony target for the time being in
case anyone is calling it. But it won't do anything anymore.
Change the input/output format to {A,B,C}, to match the internal
representation.
Complete the implementations of line_in, line_out, line_recv, line_send.
Remove comments and error messages about the line type not being
implemented. Add regression tests for existing line operators and
functions.
Reviewed-by: rui hua <365507506hua@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview
containing a column of a type without a default btree operator
class. It also did not produce results consistent with a non-
concurrent REFRESH or a normal view if any column was of a type
which allowed user-visible differences between values which
compared as equal according to the type's default btree opclass.
Concurrent matview refresh was modified to use the new operators
to solve these problems.
Documentation was added for record comparison, both for the
default btree operator class for record, and the newly added
operators. Regression tests now check for proper behavior both
for a matview with a box column and a matview containing a citext
column.
Reviewed by Steve Singer, who suggested some of the doc language.
The TYPEALIGN macro, and the related ones like MAXALIGN, don't work with
values larger than intptr_t, because TYPEALIGN casts the argument to
intptr_t to do the arithmetic. That's not a problem when dealing with
pointers or lengths or offsets related to pointers, but the XLogInsert
scaling patch added a call to MAXALIGN with an XLogRecPtr argument.
To fix, add wider variants of the macros, called TYPEALIGN64 and MAXALIGN64,
which are just like the existing variants but work with uint64 instead of
intptr_t.
Report and patch by David Rowley, analysis by Andres Freund.
If a tuple was frozen while its predicate locks mattered,
read-write dependencies could be missed, resulting in failure to
detect conflicts which could lead to anomalies in committed
serializable transactions.
This field was added to the tag when we still thought that it was
necessary to carry locks forward to a new version of an updated
row. That was later proven to be unnecessary, which allowed
simplification of the code, but elimination of xmin from the tag
was missed at the time.
Per report and analysis by Heikki Linnakangas.
Backpatch to 9.1.
DISCARD ALL will now discard cached sequence information, as well.
Fabrízio de Royes Mello, reviewed by Zoltán Böszörményi, with some
further tweaks by me.
It makes for cleaner code to have separate Get/Add functions for PostingItems
and ItemPointers. A few callsites that have to deal with both types need to
be duplicated because of this, but all the callers have to know which one
they're dealing with anyway. Overall, this reduces the amount of casting
required.
Extracted from Alexander Korotkov's larger patch to change the data page
format.
It seems to make more sense to use "cutoff multixact" terminology
throughout the backend code; "freeze" is associated with replacing of an
Xid with FrozenTransactionId, which is not what we do for MultiXactIds.
Andres Freund
Some adjustments by Álvaro Herrera
The prototype for inval_twophase_postcommit wasn't removed when it's definition
was removed in efc16ea520679d713d98a2c7bf1453c4ff7b91ec / the initial HS commit.
Andres Freund
Commit 95ef6a344821655ce4d0a74999ac49dd6af6d342 removed the
ability to create rules on an individual column as of 7.3, but
left some residual code which has since been useless. This cleans
up that dead code without any change in behavior other than
dropping the useless column from the catalog.
If the hash table backing a catalog cache becomes too full (fillfactor > 2),
enlarge it. A new buckets array, double the size of the old, is allocated,
and all entries in the old hash are moved to the right bucket in the new
hash.
This has two benefits. First, cache lookups don't get so expensive when
there are lots of entries in a cache, like if you access hundreds of
thousands of tables. Second, we can make the (initial) sizes of the caches
much smaller, which saves memory.
This patch dials down the initial sizes of the catcaches. The new sizes are
chosen so that a backend that only runs a few basic queries still won't need
to enlarge any of them.
This reverts commit 269e780822abb2e44189afaccd6b0ee7aefa7ddd
and commit 5b571bb8c8d2bea610e01ae1ee7bc05adcfff528.
Unfortunately, the initial patch had insufficient performance testing,
and resulted in a regression.
Per report by Thom Brown.
Performance testing shows that if the insertpos_lck spinlock and the fields
that it protects are on the same cache line with other variables that are
frequently accessed, the false sharing can hurt performance a lot. Keep
them apart by adding some padding.