postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-14 02:17:02 +08:00

Author	SHA1	Message	Date
Andres Freund	c8a9246e09	Add output directory argument to generate-unicode_norm_table.pl This is in preparation for building postgres with meson / ninja. When building with meson, commands are run at the root of the build tree. Add an option to put build output into the appropriate place. Author: Andres Freund <andres@anarazel.de> Author: Peter Eisentraut <peter@eisentraut.org> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/5e216522-ba3c-f0e6-7f97-5276d0270029@enterprisedb.com	2022-07-18 12:24:39 -07:00
Peter Eisentraut	c64fb698d0	Make update-unicode target work in vpath builds Author: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/616c6873-83b5-85c0-93cb-548977c39c60@enterprisedb.com	2022-03-25 09:47:50 +01:00
Bruce Momjian	27b77ecf9f	Update copyright for 2022 Backpatch-through: 10	2022-01-07 19:04:57 -05:00
John Naylor	5bc429aacb	Extend collection of Unicode combining characters to beyond the BMP The former limit was perhaps a carryover from an older hand-coded table. Since commit bab982161 we have enough space in mbinterval to store larger codepoints, so collect all combining characters. Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com	2021-08-26 13:07:34 -04:00
John Naylor	bab982161e	Update display widths as part of updating Unicode The hardcoded "wide character" set in ucs_wcwidth() was last updated around the Unicode 5.0 era. This led to misalignment when printing emojis and other codepoints that have since been designated wide or full-width. To fix and keep up to date, extend update-unicode to download the list of wide and full-width codepoints from the offical sources. In passing, remove some comments about non-spacing characters that haven't been accurate since we removed the former hardcoded logic. Jacob Champion Reported and reviewed by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com	2021-08-26 10:53:56 -04:00
John Naylor	1563ecbc1b	Revert "Rename unicode_combining_table to unicode_width_table" This reverts commit eb0d0d2c7300c9c5c22b35975c11265aa4becc84. After I had committed eb0d0d2c7 and 78ab944cd, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com	2021-08-26 10:06:12 -04:00
John Naylor	f8c8a8bccc	Revert "Change mbbisearch to return the character range" This reverts commit 78ab944cd4b9977732becd9d0bc83223b88af9a2. After I had committed eb0d0d2c7 and 78ab944cd, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com	2021-08-26 09:58:28 -04:00
John Naylor	78ab944cd4	Change mbbisearch to return the character range Add a width field to mbinterval and have mbbisearch return a pointer to the found range rather than just bool for success. A future commit will add another width besides zero, and this will allow that to use the same search. Reviewed by Jacob Champion Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com	2021-08-25 13:08:11 -04:00
John Naylor	eb0d0d2c73	Rename unicode_combining_table to unicode_width_table No functional changes. A future commit will use this table for other purposes besides combining characters.	2021-08-25 13:01:35 -04:00
Bruce Momjian	ca3b37487b	Update copyright for 2021 Backpatch-through: 9.5	2021-01-02 13:06:25 -05:00
Michael Paquier	ceaeac54f7	Fix minor issues with new unicode {de,re}composition code The table generation script would incorrectly complain in the recomposition sorting when matching code points. This would not have caused the generation of an incorrect table. Note that this condition is not reachable yet, but could have been reached with future updates. pg_bswap.h does not need to be included in the frontend.x Author: John Naylor Discussion: https://postgr.es/m/CAFBsxsGWmExpvv=61vtDKCs7+kBbhkwBDL2Ph9CacziFKnV_yw@mail.gmail.com	2020-11-07 10:15:58 +09:00
Michael Paquier	783f0cc64d	Improve performance of Unicode {de,re}composition in the backend This replaces the existing binary search with two perfect hash functions for the composition and the decomposition in the backend code, at the cost of slightly-larger binaries there (35kB in libpgcommon_srv.a). Per the measurements done, this improves the speed of the recomposition and decomposition by up to 30~40 times for the NFC and NFKC conversions, while all other operations get at least 40% faster. This is not as "good" as what libicu has, but it closes the gap a lot as per the feedback from Daniel Verite. The decomposition table remains the same, getting used for the binary search in the frontend code, where we care more about the size of the libraries like libpq over performance as this gets involved only in code paths related to the SCRAM authentication. In consequence, note that the perfect hash function for the recomposition needs to use a new inverse lookup array back to to the existing decomposition table. The size of all frontend deliverables remains unchanged, even with --enable-debug, including libpq. Author: John Naylor Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com	2020-10-23 11:05:46 +09:00
Michael Paquier	80f8eb79e2	Use perfect hash for NFC and NFKC Unicode Normalization quick check This makes the normalization quick check about 30% faster for NFC and 50% faster for NFKC than the binary search used previously. The hash lookup reuses the existing array of bit fields used for the binary search to get the quick check property and is generated as part of "make update-unicode" in src/common/unicode/. Author: John Naylor Reviewed-by: Mark Dilger, Michael Paquier Discussion: https://postgr.es/m/CACPNZCt4fbJ0_bGrN5QPt34N4whv=mszM0LMVQdoa2rC9UMRXA@mail.gmail.com	2020-10-11 19:09:01 +09:00
Tom Lane	5cbfce562f	Initial pgindent and pgperltidy run for v13. Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.	2020-05-14 13:06:50 -04:00
Peter Eisentraut	73afabcdc2	Fix update-unicode target The normalization-check target needs to be run last, after moving the newly generated files into place. Also, we need an additional dependency so that unicode_norm.o is rebuilt first. Otherwise, norm_test will still test the old files but against the new expected results, which will probably fail.	2020-04-19 14:59:29 +02:00
Peter Eisentraut	2991ac5fc9	Add SQL functions for Unicode normalization This adds SQL expressions NORMALIZE() and IS NORMALIZED to convert and check Unicode normal forms, per SQL standard. To support fast IS NORMALIZED tests, we pull in a new data file DerivedNormalizationProps.txt from Unicode and build a lookup table from that, using techniques similar to ones already used for other Unicode data. make update-unicode will keep it up to date. We only build and use these tables for the NFC and NFKC forms, because they are too big for NFD and NFKD and the improvement is not significant enough there. Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com	2020-04-02 08:56:27 +02:00
Peter Eisentraut	d40d564c5a	Add support for other normal forms to Unicode normalization API It previously only supported NFKC, for use by SASLprep. This expands the API to offer the choice of all four normalization forms. Right now, there are no internal users of the forms other than NFKC. Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/c1909f27-c269-2ed9-12f8-3ab72c8caf7a@2ndquadrant.com	2020-03-24 10:02:46 +01:00
Peter Eisentraut	f85a485f89	Add support for automatically updating Unicode derived files We currently have several sets of files generated from data provided by Unicode. These all have ad hoc rules and instructions for updating when new Unicode versions appear, and it's not done consistently. This patch centralizes and automates the process and makes it part of the release checklist. The Unicode and CLDR versions are specified in Makefile.global.in. There is a new make target "update-unicode" that downloads all the relevant files and runs the generation script. There is also a new script for generating the table of combining characters for ucs_wcwidth(). That table is now in a separate include file rather than hardcoded into the middle of other code. This is based on the script that was used for generating d8594d123c155aeecd47fc2450f62f5100b2fbf0, but the script itself wasn't committed at that time. Reviewed-by: John Naylor <john.naylor@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com	2020-01-09 10:08:14 +01:00
Bruce Momjian	7559d8ebfa	Update copyrights for 2020 Backpatch-through: update all files in master, backpatch legal files through 9.4	2020-01-01 12:21:45 -05:00
Peter Eisentraut	b802412106	Fix output of Unicode normalization test Several off-by-more-than-one errors caused the output in case of a test failure to be truncated and unintelligible. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/6a7a8516-7d11-8fbd-0e8b-eadb4f0679eb%402ndquadrant.com	2019-12-11 08:42:17 +01:00
Peter Eisentraut	bdb839cbde	Update unicode.org URLs Use https, consistent host name, remove references to ftp. Also update the URLs for CLDR, which has moved from Trac to GitHub.	2019-10-13 22:10:38 +02:00
Michael Paquier	8548ddc61b	Fix inconsistencies and typos in the tree, take 9 This addresses more issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com	2019-08-05 12:14:58 +09:00
Peter Eisentraut	2cadefbb97	Make script output more pgindent compatible	2019-06-24 22:50:56 +02:00
Peter Eisentraut	eb8d05bfec	Correct script name in README file	2019-06-24 22:50:56 +02:00
Michael Paquier	1fb6f62a84	Fix typos in various places Author: Andrea Gelmini Reviewed-by: Michael Paquier, Justin Pryzby Discussion: https://postgr.es/m/20190528181718.GA39034@glet	2019-06-03 13:44:03 +09:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Alexander Korotkov	edf59c40dd	Fix more wrong paths in header comments It appears that there are more files, whose header comment paths are wrong. So, fix those paths. No backpatching per proposal of Tom Lane. Discussion: https://postgr.es/m/CAPpHfdsJyYbOj59MOQL%2B4XxdcomLSLfLqBtAvwR%2BpsCqj3ELdQ%40mail.gmail.com	2018-07-11 17:57:04 +03:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Tom Lane	e3860ffa4d	Initial pgindent run with pg_bsd_indent version 2.0. The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname varname", as well as some similar cases for const- or volatile-qualified pointers. Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us	2017-06-21 14:39:04 -04:00
Bruce Momjian	ce55481032	Post-PG 10 beta1 pgperltidy run	2017-05-17 19:01:23 -04:00
Peter Eisentraut	0e8286d354	Fix perlcritic warnings	2017-04-17 13:49:34 -04:00
Peter Eisentraut	0cb2e51992	Avoid using a C++ keyword in header file per cpluspluscheck	2017-04-07 16:32:02 -04:00
Heikki Linnakangas	60f11b87a2	Use SASLprep to normalize passwords for SCRAM authentication. An important step of SASLprep normalization, is to convert the string to Unicode normalization form NFKC. Unicode normalization requires a fairly large table of character decompositions, which is generated from data published by the Unicode consortium. The script to generate the table is put in src/common/unicode, as well test code for the normalization. A pre-generated version of the tables is included in src/include/common, so you don't need the code in src/common/unicode to build PostgreSQL, only if you wish to modify the normalization tables. The SASLprep implementation depends on the UTF-8 functions from src/backend/utils/mb/wchar.c. So to use it, you must also compile and link that. That doesn't change anything for the current users of these functions, the backend and libpq, as they both already link with wchar.o. It would be good to move those functions into a separate file in src/commmon, but I'll leave that for another day. No documentation changes included, because there is no details on the SCRAM mechanism in the docs anyway. An overview on that in the protocol specification would probably be good, even though SCRAM is documented in detail in RFC5802. I'll write that as a separate patch. An important thing to mention there is that we apply SASLprep even on invalid UTF-8 strings, to support other encodings. Patch by Michael Paquier and me. Discussion: https://www.postgresql.org/message-id/CAB7nPqSByyEmAVLtEf1KxTRh=PWNKiWKEKQR=e1yGehz=wbymQ@mail.gmail.com	2017-04-07 14:56:05 +03:00

33 Commits