Path to this page:
Subject: CVS commit: pkgsrc/databases/R-dbplyr
From: Makoto Fujiwara
Date: 2024-12-14 14:38:41
Message id: 20241214133841.72D1DFC1C@cvs.NetBSD.org
Log Message:
(databases/R-dbplyr) Updated 2.3.2 to 2.5.0
# dbplyr 2.5.0
## Improved tools for qualified table names
* Specification of table names with schema/catalogs has been overhauled to
make it simpler. This includes the following features and fixes:
* The simplest way to refer to a qualified table is now to wrap it in
`I()`, e.g. `I("schema_name.table_name")`.
* Use of `sql()` and `ident_q()` inside `in_catalog()` and `in_schema()`
is once again supported (#1388).
* It's ok to use `ident_q()` once again (#1413) and you should no longer
see unsuppressable warnings about using `in_schema()` (#1408).
* The names of the arguments to `Id()` no longer matter, only their order
(#1416). Additionally, thanks to changes to the DBI package, you no
longer need to name each argument.
* If you accidentally pass a named vector to any of the database identifer
functions, those names will be automatically stripped (#1404).
* `tbl_sql(check_from)` is now deprecated.
* dbplyr now exports some tools to work with the internal `table_path` class
which is useful for certain backends that need to work with this
data structure (#1300).
## Improved SQL
* New translations for clock functions `add_years()`, `add_days()`,
`date_build()`, `get_year()`, `get_month()`, `get_day()`,
and `base::difftime()` on SQL server, Redshift, Snowflake, and Postgres.
* `select()` will keep computed columns used to `arrange()` subqueries that are
eliminated by a subsequent select (@ejneer, #1437).
* `semi_join()` will no longer inline away an aggregate filter (i.e. `HAVING`
clause) that was followed by a `select()` (@ejneer, #1474)
* Improved function translations:
* Functions qualified with the base namespace are now also translated, e.g.
`base::paste0(x, "_1")` is now translated (@mgirlich, #1022).
* `-1 + x` now generates a translation instead erroring (#1420).
* `x$name` never attempts to evaluate `name` (#1368).
* You can once again use `NULL` on the LHS of an infix operator in order
to generate SQL with unusual syntax (#1345).
* Namespaced calls now error if the function doesn't exist, or a translation
is not available (#1426).
* `lead()` translation coerces `n` to an integer.
* Databricks: now supports creating non-temporary tables too (#1418).
* Oracle:
* `db_explain()` now works (@thomashulst, #1353).
* `as.Date()` works when applied to a string (#1389).
* `head()` is once again translated to `FETCH FIRST`. This does require Oracle
12c or newer, but it actually works, compared to the approach using
`ROWNUM` from #1292 (#1436).
* Added support for `str_replace()` and `str_replace_all()` via
`REGEXP_REPLACE()` (@thomashulst, #1402).
* Snowflake (@nathanhaigh, #1406)
* Added support for `str_starts()` and `str_ends()` via `REGEXP_INSTR()`
* Refactored `str_detect()` to use `REGEXP_INSTR()` so now supports
regular expressions.
* Refactored `grepl()` to use `REGEXP_INSTR()` so now supports
case-insensitive matching through `grepl(..., ignore.case = TRUE)`
* SQL server:
* Now products a clear error if you attempt to use `n_distinct()` in
`mutate()` or `filter()` (#1366).
* `filter()` does a better job of converting logical vectors
from bit to boolean (@ejneer, #1288).
* MySQL: `as.integer()` gets correct translation (@krlmlr, #1375).
## Minor improvements and bug fixes
* Deprecation status of functions deprecated in previous versions (at least
2 years old) have been advanced. In particular, `src_sql()` is now defunct,
as is the use of `partial_eval()` with character `data`.
* Database errors now show the generated SQL, which hopefully will make it
faster to track down problems (#1401).
* When dbplyr creates an index on a table in a schema (e.g. `schema.table`),
it now only includes the table name in the index name, not the schema name.
* The class of remote sources now includes all S4 class names, not just
the first (#918).
* `compute()` passes additional arguments all the way to
`sql_query_save()`-methods (@rsund).
* `db_sql_render()` correctly passes on `...` when re-calling with
`sql_options` set (#1394).
* `reframe()` now gives an informative error that it isn't supported (#1148).
* `rows_patch(in_place = FALSE)` now works when more than one column should be
patched (@gorcha, #1443).
* New `simulate_mariadb()` (@krlmlr, #1375).
* `sql_translator()` now checks for duplicated definitions (@krlmlr, #1374).
# dbplyr 2.4.0
## Breaking changes
* Using `compute(temporary = FALSE)` without providing a name is now
deprecated (@mgirlich, #1154).
* `ntile()`'s first argument has been renamed from `order_by` to `x` to
match the interface of `dplyr::ntile()` (@mgirlich, #1242).
* `simulate_vars()` and `simulate_vars_is_typed()` were removed as they weren't
used and tidyselect now offers `tidyselect_data_proxy()` and
`tidyselect_data_has_predicates()` (@mgirllich, #1199).
* `sql_not_supported()` now expects a function name without parentheses.
* `sql_query_append()`, `sql_query_insert()`, `sql_query_update()`,
`sql_query_upsert()`, and `sql_query_delete()` changed their arguments to
make them more consistent to the other `sql_query_*()` functions:
* `x_name` was renamed to `table`.
* `y` was renamed to `from` and must now be a table identifier or SQL instead
of a lazy table.
* `sql_query_append()` and `sql_query_insert()` have gained the argument `cols`.
* `remote_name()` now returns a string with the name of the table. To get the
qualified identifier use the newly added `remote_table()` (@mgirlich, #1280).
* `tbl_lazy()` loses `src` argument after it has been deprecated for years
(@mgirlich, #1208).
* `translate_sql()` now requires the `con` argument (@mgirlich, #1311).
The `vars` argument has been removed after it threw an error for the last 7
years (@mgirlich).
## Improved SQL
* Preliminary databricks Spark SQL backend (#1377).
* Joins
* `*_join()` after `full_join()` works again (@mgirlich, #1178).
* `*_join()` now allows specifying the relationship argument. It must be
`NULL` or `"many-to-many"` (@bairdj, #1305).
* Queries now qualify `*` with the table alias for better compatibility
(@mgirlich, #1003).
* `full_join()` can now handle column names that only differ in case
(@ejneer, #1255).
* The `na_matches` argument of `semi_join()` and `anti_join()` works again
(@mgirlich, #1211).
* A `semi/anti_join()` on fitlered `y` is inlined when possible (@mgirlich, #884).
* Joins now work again for Pool and Oracle connections (@mgirlich, #1177, #1181).
* A sequence of `union()` resp. `union_all()` now produces a flat query
instead of subqueries (@mgirlich, #1269).
* Added translations for:
* `nzchar()` (@MichaelChirico, @mgirlich, #1094).
* `str_detect()`, `str_starts()` and `str_ends()` with fixed patterns
(@mgirlich, #1009).
* `runif()` (@mgirlich, #1200).
* `if_any()` and `if_all()` translations are now wrapped in parentheses.
This makes sure it can be combined via `&` with other conditions
(@mgirlich, #1153).
* `nth()`, `first()`, and `last()` now support the `na_rm` argument
(@mgirlich, #1193).
## Minor improvements and bug fixes
* `across()` now supports namespaced functions, e.g.
`across(x, dplyr::dense_rank)` (@mgirlich, #1231).
* `db_copy_to(overwrite = TRUE)` now actually works.
* `db_copy_to()`'s `...` are now passed to `db_write_table()` (@mgirlich, #1237).
* Added `db_supports_table_alias_with_as()` to customise whether a backend
supports specifying a table alias with `AS` or not (@mgirlich).
* `db_write_table()` and `db_save_query()` gain the `overwrite` argument.
* `dbplyr_pivot_wider_spec()` is now exported. Unlike `pivot_wider()` this can
be lazy. Note that this will be removed soon after `pivot_wider_spec()`
becomes a generic (@mgirlich).
* `filter()`ing with window functions now generates columns called `col01`
rather than `q01` (@mgirlich, #1258).
* `pivot_wider()` now matches tidyr `NA` column handling (@ejneer #1238).
* `select()` can once again be used after `arrange(desc(x))` (@ejneer, #1240).
* `show_query()` and `remote_query()` gain the argument `sql_options` that allows
to control how the SQL is generated. It can be created via `sql_options()`
which has the following arguments:
* `cte`: use common table expressions?
* `use_star`: use `SELECT *` or explicitly select every column?
* `qualify_all_columns`: qualify all columns in a join or only the ambiguous \
ones?
(@mgirlich, #1146).
Consequently the `cte` argument of `show_query()` and `remote_query()` has
been deprecated (@mgirlich, #1146).
* `slice_min/max()` can now order by multiple variables like dplyr, e.g. use
`slice_min(lf, tibble(x, y))` (@mgirlich, #1167).
* `slice_*()` now supports the data masking pronouns `.env` and `.data` \
(@mgirlich, #1294).
* `sql_join_suffix()` gains the argument `suffix` so that methods can check
whether the suffix is valid for the backend (@mgirlich).
* `sql_random()` is now deprecated. It was used to power `slice_sample()` which
is now done via the translation for `runif()` (@mgirlich, #1200).
* `tbl()` now informs when the user probably forgot to wrap the table identifier
with `in_schema()` or `sql()` (@mgirlich, #1287).
## Backend specific improvements
* Access
* Added translation for `!=` to `<>` (@erikvona, #1219).
* DuckDB
* now supports the `returning` argument of `rows_*()`.
* MySQL/MariaDB:
* `rows_update()` and `rows_patch()` now give an informative error when the
unsupported `returning` argument is used (@mgirlich, #1279).
* `rows_upsert()` now gives an informative error that it isn't supported
(@mgirlich, #1279).
* `rows_*()` use the column types of `x` when auto copying `y` (@mgirlich, #1327).
* `copy_inline()` now works (@mgirlich, #1188).
* Fix translation of `as.numeric()`, `as.POSIXct()`, `as_datetime()`, and
`as.integer64()` (@avsdev-cw, #1189).
* MS SQL:
* `row_number()` now works when no order is specified (@ejneer, @fh-mthomson, \
#1332)
* Oracle:
* Fix translation of `rows_upsert()` (@mgirlich, @TBlackmore, #1286)
* `head(n)` is now translated to `WHERE ROWNUM <= n` to also support old
versions <= 11.2 (@JeremyPasco, #1292).
* Postgres
* The `rows_*()` functions now also work inside a transaction (@mgirlich, #1183).
* SQLite
* Subqueries now also get an alias. This makes it consistent with other
backends and simplifies the implementation.
* SQL Server
* `distinct(.keep_all = TRUE)` now works (@mgirlich, #1053).
* The translation of `between()` now also works when used in `mutate()`
(@mgirlich, #1241).
* `any()` and `all()` now work (@ejneer, #1273).
* Fixed negation of bit (boolean) fields (@ejneer, #1239)
* Snowflake:
* `na.rm = TRUE` is now respected in `pmin()` and `pmax()` instead of being \
silently ignored (@fh-mthomson, #1329)
* `row_number()` now works when no order is specified (@fh-mthomson, #1332)
* Teradata
* `distinct()` + `head()` now work (@mgirlich, #685).
* `as.Date(x)` is now translate to `CAST(x AS DATE)` again unless `x` is a
string (@mgirlich, #1285).
* `row_number()` no longer defaults to partitioning by groups (now aligned \
with other databases when no order is specified: `ROW_NUMBER()` defaults to \
`ORDER BY (SELECT NULL)`) (@fh-mthomson, #1331)
# dbplyr 2.3.4
* Hot patch release to resolve R CMD check failures.
# dbplyr 2.3.3
* Hot patch to fix R CMD check issues
Files: