Skip to content

feat: support new metrics firehose api with get_usage() #404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 21, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
99e5a10
initial implementation of hits endpoint
toph-allen Apr 29, 2025
1e6a816
use client method
toph-allen Apr 30, 2025
a06bb56
Merge branch 'main' into toph/metrics-firehose
toph-allen May 1, 2025
437803b
complete get_usage functionality
toph-allen May 2, 2025
19e45cd
changes to allow time zone propagation
toph-allen May 2, 2025
66e61ef
add tests, fix bugs
toph-allen May 2, 2025
24f9dc3
fix doc problem
toph-allen May 2, 2025
6442b4d
make lintr happier
toph-allen May 2, 2025
919bb0a
remove cyclocomp linter
toph-allen May 2, 2025
7b05c73
respond to comments
toph-allen Jul 15, 2025
01a6ccd
edit comment
toph-allen Jul 15, 2025
567fea3
Apply suggestion from @nealrichardson
toph-allen Jul 17, 2025
f68f753
Apply suggestion from @nealrichardson
toph-allen Jul 17, 2025
26980e3
Apply suggestion from @nealrichardson
toph-allen Jul 17, 2025
491f7df
Apply suggestion from @nealrichardson
toph-allen Jul 17, 2025
b474ed9
update documentation
toph-allen Jul 17, 2025
d2ae1a0
Merge branch 'main' into toph/metrics-firehose
toph-allen Jul 17, 2025
30ea333
Do not parse usage immediately; provide as.data.frame method
toph-allen Jul 18, 2025
cf48a08
update _pkgdown.yml
toph-allen Jul 18, 2025
442ac62
fix typo
toph-allen Jul 21, 2025
b19f147
fix tests
toph-allen Jul 21, 2025
9d6cef4
fix for CI warning
toph-allen Jul 21, 2025
68e4586
fix as.data.frame method for S3 generic conformance
toph-allen Jul 21, 2025
fa24199
update documentation to match
toph-allen Jul 21, 2025
b709457
fix lint
toph-allen Jul 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .lintr
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
linters: linters_with_defaults(
line_length_linter = line_length_linter(120L),
object_name_linter = object_name_linter(styles = c("snake_case", "symbols", "CamelCase")),
cyclocomp_linter = cyclocomp_linter(30L),
cyclocomp_linter = NULL, # Issues with R6 classes.
object_length_linter(32L),
indentation_linter = indentation_linter(hanging_indent_style = "tidy"),
return_linter = NULL
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ export(get_tag_data)
export(get_tags)
export(get_thumbnail)
export(get_timezones)
export(get_usage)
export(get_usage_shiny)
export(get_usage_static)
export(get_user_permission)
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# connectapi (development version)

## New features

- New `get_usage()` function returns content usage data from Connect's `GET
v1/instrumentation/content/hits` endpoint on Connect v2025.04.0 and higher.
(#390)

## Enhancements and fixes

Expand Down
4 changes: 3 additions & 1 deletion R/connect.R
Original file line number Diff line number Diff line change
Expand Up @@ -905,7 +905,9 @@ Connect <- R6::R6Class(
docs = function(docs = "api", browse = TRUE) {
stopifnot(docs %in% c("admin", "user", "api"))
url <- paste0(self$server, "/__docs__/", docs)
if (browse) utils::browseURL(url)
if (browse) {
utils::browseURL(url)
}
url
},

Expand Down
70 changes: 70 additions & 0 deletions R/get.R
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,76 @@ get_usage_static <- function(
return(out)
}

#' Get usage information for deployed content
#'
#' @description

#' Retrieve content hits for all available content on the server. Available
#' content depends on the user whose API key is in use. Administrator accounts
#' will receive data for all content on the server. Publishers will receive data
#' for all content they own or collaborate on.
#'
#' If no date-times are provided, all usage data will be returned.

#' @param client A `Connect` R6 client object.
#' @param from Optional date-time (`POSIXct` or `POSIXlt`). Only
#' records after this time are returned. If not provided, records
#' are returned back to the first record available.
#' @param to Optional date-time (`POSIXct` or `POSIXlt`). Only records
#' before this time are returned. If not provided, all records up to
#' the most recent are returned.
#'
#' @return A tibble with columns:
#' * `id`: An identifier for the record.
#' * `user_guid`: The GUID of logged-in visitors, NA for anonymous.
#' * `content_guid`: The GUID of the content.
#' * `timestamp`: The time of the hit as `POSIXct`.
#' * `path`: The path of the hit. Not recorded for all content types.
#' * `user_agent`: If available, the user agent string for the hit. Not
#' available for all records.
#'
#' @details
#'
#' The data returned by `get_usage()` includes all content types. For Shiny
#' content, the `timestamp` indicates the *start* of the Shiny session.
#' Additional fields for Shiny and non-Shiny are available respectively from
#' `get_usage_shiny()` and `get_usage_static()`.
#'
#' When possible, however, we recommend using `get_usage()` over
#' `get_usage_static()` or `get_usage_shiny()`, as it will perform better
#' than those endpoints, which use pagination.
#'
#' @examples
#' \dontrun{
#' client <- connect()
#'
#' # Fetch the last 2 days of hits
#' usage <- get_usage(client, from = Sys.Date() - 2, to = Sys.Date())
#'
#' # Fetch usage after a specified date
#' usage <- get_usage(
#' client,
#' from = as.POSIXct("2025-05-02 12:40:00", tz = "UTC")
#' )
#'
#' # Fetch all usage
#' usage <- get_usage(client)
#' }
#'
#' @export
get_usage <- function(client, from = NULL, to = NULL) {
error_if_less_than(client$version, "2025.04.0")

usage_raw <- client$GET(
v1_url("instrumentation", "content", "hits"),
query = list(
from = make_timestamp(from),
to = make_timestamp(to)
)
)
usage <- parse_connectapi_typed(usage_raw, connectapi_ptypes$usage)
fast_unnest_character(usage, "data")
}

#' Get Audit Logs from Posit Connect Server
#'
Expand Down
43 changes: 43 additions & 0 deletions R/parse.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,19 @@ ensure_column <- function(data, default, name) {
# manual fix because vctrs::vec_cast cannot cast double -> datetime or char -> datetime
col <- coerce_datetime(col, default, name = name)
}

if (inherits(default, "fs_bytes") && !inherits(col, "fs_bytes")) {
col <- coerce_fsbytes(col, default)
}

if (inherits(default, "integer64") && !inherits(col, "integer64")) {
col <- bit64::as.integer64(col)
}

if (inherits(default, "list") && !inherits(col, "list")) {
col <- list(col)
}

col <- vctrs::vec_cast(col, default, x_arg = name)
}
data[[name]] <- col
Expand Down Expand Up @@ -101,6 +105,45 @@ parse_connectapi <- function(data) {
))
}

# nolint start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What linting are we escaping here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, commented code — it's not roxygen2 docs, just a regular comment, because it's not an exported function.

# Unnests a list column similarly to `tidyr::unnest_wider()`, bringing the
# entries of each list-item up to the top level. Makes some simplifying
# assumptions.
# 1. All inner variables are treated as character vectors;
# 2. The names of the first entry of the list-column are used as the
# names of variables to extract.
fast_unnest_character <- function(df, col_name) {
if (!is.character(col_name)) {
stop("col_name must be a character vector")
}
if (!col_name %in% names(df)) {
stop("col_name is not present in df")
}

list_col <- df[[col_name]]

new_cols <- names(list_col[[1]])

df2 <- df
for (col in new_cols) {
df2[[col]] <- vapply(
list_col,
function(row) {
if (is.null(row[[col]])) {
NA_character_
} else {
row[[col]]
}
},
"1",
USE.NAMES = FALSE
)
}

df2[[col_name]] <- NULL
df2
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data returned from the endpoint includes path and user_agent fields nested under a data field. Without special treatment these are returned a list-column, which is awkward. I initially experimented with tidyr::unnest(), but that was slow on the larger datasets returned by this endpoint, so I wrote a custom fast_unnest_character() function which runs in about 5% (!) of the time that tidyr::unnest() takes.

Thinking about this a bit more: this isn't a huge chunk of code of course, but it is another chunk that we will take on the maintenance of if we go this route. This is another example where having our data interchange within connectapi be all data frames means we have to worry about the performance of json-parsed list responses into data frames and make sure those data frames are in a natural structure for folks to use. If we relied instead on only the parsed list data as our interchange and then gave folks as.data.frame() methods, we could defer the (sometimes expense) reshaping until late in process and eaking out performance gains like this are much less important so we can rely on more off the shelf tools.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see what you mean, and this does align with what you've been saying about other server objects.

I hear what you're saying about parsing to data frames for data interchange and I think that approach would be great to use for, say, the integrations endpoints that I just added stories for.

For the data from the hits endpoint, presenting it as anything other than a data frame goes back to feeling kinda not-R-idiomatic, as it isn't data that can… hmm…

So definitely one of the tasks, and maybe the main task that I can imagine for this data is to, like, treat it as a data frame and filter, plot, etc., it. But another thing you might want to do is, like, get the content item associated with this hit. And yeah, in that case, you might just want to be able to pass the hit, or hit$content_guid to content_item().

I still think we might want to keep code like this around in an as.data.frame() method — for making an unnested data frame out of nested data from the Connect API, tidyr::unnest() was 20x slower (which I was surprised by! it seems like a wild differential).

Open to a variety of options — let's discuss what the best approach would be to finalize and merge this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this in the Libraries Guild meeting. Some takeaways:

  • Whether we parse the endpoint's response to a data frame in the get_usage() function or in an as.data.frame() method for that function's response, we want that code to be performant.
  • I could profile tidyr::unnest() to understand why it's slow, and explore options (e.g. its ptype argument) to see if they speed it up, which would remove the need to have this chunk of code to maintain.

Copy link
Collaborator Author

@toph-allen toph-allen Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling tidyr::unnest_wider(usage, data, ptype = list(path = character(0), user_agent = character(0))) is much more performant — that would be reasonable. But I don't really want to add a tidyr dependency to connectapi, I realize — we've been trying to remove dependencies. I have conflicting feelings on how to proceed here and I'd appreciate anyone's input.

I think these are the approaches open to us, in order of… least to most code in the package:

  • Don't convert to a tibble. I know we want to move in this direction with other connectapi classes, but I do feel like usage data will almost always be addressed in tabular form, so I think it's nicer convert to a data frame in this function.
  • Don't unnest the data column. This is reasonable, but similar to above, I feel like… it's probably nicer to unnest it, given that structurally it just… contains additional data fields for each hit.
  • Unnest using tidyr::unnest(). I feel like we should discount this one because we don't want to add that as a dependency.
  • Unnest using custom function. The problem with this is that it relies on us adding a custom unnest function to the package, which is more surface area to maintain / test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an option that splits the difference somewhat would be to have the function return a list, provide an as.data.frame method that only works if you have tidyr installed, and make tidyr a suggested dependency but not a required one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an option that splits the difference somewhat would be to have the function return a list, provide an as.data.frame method that only works if you have tidyr installed, and make tidyr a suggested dependency but not a required one?

@karawoo I took this approach — take a look and see how it reads to you!


coerce_fsbytes <- function(x, to, ...) {
if (is.numeric(x)) {
fs::as_fs_bytes(x)
Expand Down
9 changes: 8 additions & 1 deletion R/ptype.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
NA_datetime_ <- # nolint: object_name_linter
vctrs::new_datetime(NA_real_, tzone = "UTC")
vctrs::new_datetime(NA_real_, tzone = Sys.timezone())
NA_list_ <- # nolint: object_name_linter
list(list())

Expand Down Expand Up @@ -38,6 +38,13 @@ connectapi_ptypes <- list(
"bundle_id" = NA_character_,
"data_version" = NA_integer_
),
usage = tibble::tibble(
"id" = NA_integer_,
"user_guid" = NA_character_,
"content_guid" = NA_character_,
"timestamp" = NA_datetime_,
"data" = NA_list_
),
content = tibble::tibble(
"guid" = NA_character_,
"name" = NA_character_,
Expand Down
22 changes: 22 additions & 0 deletions man/PositConnect.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

67 changes: 67 additions & 0 deletions man/get_usage.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
[
{
"id": 8966707,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T12:49:16.269904Z",
"data": {
"path": "/hello",
"user_agent": "Datadog/Synthetics"
}
},
{
"id": 8966708,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T12:49:17.002848Z",
"data": {
"path": "/world",
"user_agent": null
}
},
{
"id": 8967206,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T13:01:47.40738Z",
"data": {
"path": "/chinchilla",
"user_agent": "Datadog/Synthetics"
}
},
{
"id": 8967210,
"user_guid": null,
"content_guid": "475618c9",
"timestamp": "2025-04-30T13:04:13.176791Z",
"data": {
"path": "/lava-lamp",
"user_agent": "Datadog/Synthetics"
}
},
{
"id": 8966214,
"user_guid": "fecbd383",
"content_guid": "b0eaf295",
"timestamp": "2025-04-30T12:36:13.818466Z",
"data": {
"path": null,
"user_agent": null
}
}
]
2 changes: 1 addition & 1 deletion tests/testthat/test-content.R
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ test_that("get_log() gets job logs", {
source = c("stderr", "stderr", "stderr"),
timestamp = structure(
c(1733512169.9480169, 1733512169.9480703, 1733512169.9480758),
tzone = "UTC",
tzone = Sys.timezone(),
class = c("POSIXct", "POSIXt")
),
data = c(
Expand Down
Loading
Loading