Skip to content

Commit

Permalink
Fix #4249, part of #4064: Domain components for learner analytics (#4267
Browse files Browse the repository at this point in the history
)

## Explanation
<!--
  - Explain what your PR does. If this PR fixes an existing bug, please include
  - "Fixes #bugnum:" in the explanation so that GitHub can auto-close the issue
  - when this PR is merged.
  -->
Fix #4249
Fix part of #4064

Introduces the domain utilities necessary for logging learner analytics, but doesn't make them available for actual usage yet (that is being done in #4269 to keep this PR smaller & more focused).

Some notes on the history of this PR:
- This PR is a rebase of #4253 to remove dependencies on #2173 (which has since been merged into developer) for a much cleaner history.
- This PR is pulling out elements from #4118, #4247, and #4248 which contained completed work by @Sarthak2601.
- This PR extracts just the 'domain' pieces from the above, and changes a bunch about its architecture & adds tests. This PR should have little to no impact on the behavior of the app since the new logging functionality isn't being used yet, or is even accessible to broad components in the app.
- This is starting as 'pt4' since it's  continuing the work introduced in: #4114, #4115, and #4116.

This PR makes a number of changes over the original design document and implementation, the most noteworthy being:
- This PR organizes learner analytics logging into its own logger (and makes changes to event bundle creation & the generic ``OppiaLogger``). I think that we should move toward this pattern generally in the future rather than continuing with a generic ``OppiaLogger`` as it seems to help keep things much more focused. Existing logging should not be affected.
- The notion of a device ID has been dropped as there's no reliable way retrieve such an ID (see https://developer.android.com/training/articles/user-data-ids). Instead, we're using a per-device ID (by leveraging ``PersistentCacheStore``), and have confirmed with study partners that this is workable.
- The logging logic for the new logs was rearranged such that all new analytics logs will be logged for everyone, but the user and installation-tied IDs won't be logged in such cases (since they are more sensitive data). These events are generally useful for the platform, so we shouldn't restrict them as such.
- Learner ID generation for profiles only occurs if the experiment is enabled, and otherwise stays empty. We may add future cleanup code to ensure it's erased across studies, but this at least lays the initial groundwork to keep such IDs separate when they aren't needed.

For a high-level on the design, please refer to [this design document](https://docs.google.com/document/d/1c8lpH-IUvoU1t4LUoYNqNilP2e9yCnzGnSSG0yBxBrY/edit).

Other noteworthy design choices:
- ``DebugEventLogger`` was updated to call through to the real logger (as it makes event verification simpler in developer builds; normally analytics is off so this won't have any effects for the broader team)
- Both ``DebugEventLogger`` and ``FakeEventLogger`` were updated to be thread-safe
- Some extended functionality was added to ``FakeEventLogger``
- ``LearnerAnalyticsLogger`` is designed a bit differently compared to other domain classes in that it actually provides session-specific objects to the application-wide singleton graph (which is needed for logging certain situations, such as the user playing/stopping audio during a play session)
- ``LoggingIdentifierController`` makes use of a lazy retrieval for session ID now (which is fine because it's guaranteed to compute exactly one initial ID)
- ``StateFlow`` is used for easier cross-thread communication, including to expose internal asynchronous state across domain components (the only way we had to do this before was ``Deferred``, and that can be clunky; the new approach is much cleaner)
- An ``EventLogSubject`` was introduced to make testing event logs easier. It's used extensively in tests for this feature, but most existing use cases weren't migrated. #4272 is tracking adding tests for this subject (hence the test file exemption).
- There were TODOs introduced on #4064 to provide explicit clarity to reviewers on what needs to be changed in later PRs (as there's some things being introduced before the final PR that aren't actually used yet to help break up the project)
- Multiple test suites verify behaviors with and without the feature enabled to be very explicit about what behavior occurs when
- ``EventBundleCreatorTest`` in particular has very strict tests to ensure that sensitive IDs are logged exactly when expected (initially, never since they aren't turned on in this PR; this is fixed in a later PR)
- ``ExplorationDataController`` was updated to introduce new play entrypoints, but these aren't "interesting" yet as the underlying ``ExplorationProgressController`` changes are coming in a later PR. Further, testing coverage technically removes checking ``playExploration``, but it'll be removed (and it's technically tested through the other functions since they call through).
- A new ``ClipboardManager`` was introduced with the specific design of not allowing the broad app access to clipboard information from other apps. Instead, it provides an interface to confirm whether the app's known clipboard has been kept. A regex content check was added to ensure developers never use the clipboard service directly and instead use this manager.
- ``PersistentCacheStore`` was updated to include a new ``primeInMemoryAndDiskCacheAsync`` function which works more predictably for initialization than ``primeInMemoryCacheAsync`` (formerly ``primeCacheAsync``). In particular, ``primeInMemoryCacheAsync`` is better for ensuring that the cache will quickly be read once it needs to be (and, if it isn't, will default in the same way the cache store normally defaults). However, there are cases when the app wants to change the default values such that: (1) the normal default is never used, (2) the default has to be computed and isn't cheap, and (3) it should never compute that default again once saved on disk. ``primeInMemoryAndDiskCacheAsync`` makes these assurances which, in turn, makes the installation ID cache store even possible without potential race conditions or breaking Dagger's cheap-initialization best practice. 

Test exemptions: all exemptions are annotations or interfaces except ``EventBundleCreatorTest`` (which is explained above).

## Essential Checklist
<!-- Please tick the relevant boxes by putting an "x" in them. -->
- [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".)
- [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation.
- [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide).
- [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)).
- [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop".
- [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)).

## For UI-specific PRs only
N/A -- This PR doesn't make UI changes, and existing flows shouldn't be affected.

Commits:

* strings for learner analytics

* platform parameter impl for learner analytics

* nit

* nit

* event action enum update

* addition of contexts

* nit

* controller level logging and contexts

* nit

* nit fixes.

* nit fixes.

* event bundle modifications

* sync status, logging identifiers, profile update, lifecycle owner

* ui impl: part 1 -- basic

* admin control strings

* strings correction

* strings correction

* device id correction

* exhaustive when fix.

* exhaustive when fix.

* todo formatting

* nits.

* nits.

* collapsed contexts, added spacing, added comments

* event action removal + nits

* tests + dev options event logs fixes post event action removal

* nits

* removal of method for event action formatted string

* nits, null context changes.

* nits

* reserved fixes and help index fix

* bazel imports

* bazel build fixes

* test fixes

* nit

* logging identifier controller, module + uuid wrapper, real impl

* logging identifier controller tests, fake uuid, tests

* sync status manager + fake

* logging methods, test setup

* profile management, tests

* sync status update.

* lifecycle observer

* Post-merge fixes + Bazel support.

* Lots of reorganizing & changes.

New tests and documentation have also been added. More broadly, this
changes the device ID computation, but actually breaks it so more work
will be needed in subsequent commits.

* Lint fixes.

* Post-merge fix (proper merge of maven_install).

* Lint fixes (includes post-merge cleanups).

* Lots of stuff.

Restructured a lot of the UI, addressed most failing static checks
(except KDocs and lint which will be in the follow-up commit), added
tests, fixed copying, and generally finished the UI.

Sync status seems broken, and it's not yet clear whether events are
actually being logged (I need to investigate this). Analytics are
disabled in local testing, so that might also be the reason for logs
being stuck in an uploading state.

* Documentation + lint fixes.

This also changes the contract of ClipboardController.

* Finish remaining planned tests.

* Move over changes from learner-analytics-proto.

* Manually pull in changes from 3d6c716.

Note that this is operating on a different base).

* Post-merge fixes.

These at least ensure that the app can build, but many tests will still
fail (which is fine seeing as much of this code is going to be split up
soon, anyway).

Rebase version: app build is no longer guaranteed.

* Lint fixes.

* Undo all learner analytics changes.

I'll be pulling in specific components in specific PRs to organize the
changes across 4 PRs.

Note that I took this approach to preserve the history from the earlier
commits. Those changes will still be included in this PR chain, just a
bit awkwardly (i.e. it'll look like I introduced them originally, but
that distinction is lost during the squash-and-merge, anyway).

* Manually pull in non-app module changes.

A bunch of work is still needed to finish these, and I'm still trying to
figure out whether I can de-couple the module changes to make reviewing
a bit nicer.

* Post-merge fixes.

All tests verified as building & passing.

* Add sync status for no connectivity case.

* Remove unnecessary sync manager.

* Copy over changes from #4263.

These are the domain changes needed for finishing learner analytics
support. Cleanup, documentation, and testing all still need to be
completed.

* Add domain changes for AudioPlayerController.

These originate from #4263.

* Add missing Javadoc from #4263.

* Finish tests & documentation.

This also renames 'device ID' to be 'installation ID' for more
correctness.

* Lint fixes.

* Fix OS-specific issue in ClipboardController.

Co-authored-by: Sarthak Agarwal <[email protected]>
  • Loading branch information
BenHenning and Sarthak2601 authored May 5, 2022
1 parent 4b04663 commit a7c8e6c
Show file tree
Hide file tree
Showing 84 changed files with 9,491 additions and 1,997 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import kotlin.concurrent.withLock
* An on-disk persistent cache for proto messages that ensures reads and writes happen in a
* well-defined order. Note that if this cache is used like a [DataProvider], there is a race
* condition between the initial store's data being retrieved and any early writes to the store
* (writes generally win). If this is not ideal, callers should use [primeCacheAsync] to
* (writes generally win). If this is not ideal, callers should use [primeInMemoryCacheAsync] to
* synchronously kick-off a read update to the store that is guaranteed to complete before any
* writes. This will be reflected in the first time the store's state is delivered to a subscriber
* to a LiveData version of this data provider.
Expand Down Expand Up @@ -97,7 +97,7 @@ class PersistentCacheStore<T : MessageLite> private constructor(
* LiveData-converted version of this data provider, so it must be handled at the callsite for
* this method.
*/
fun primeCacheAsync(forceUpdate: Boolean = false): Deferred<Any> {
fun primeInMemoryCacheAsync(forceUpdate: Boolean = false): Deferred<Any> {
return cache.updateIfPresentAsync { cachePayload ->
if (forceUpdate || cachePayload.state == CacheState.UNLOADED) {
// Store the retrieved on-disk cache, if it's present (otherwise set up state such that
Expand All @@ -111,6 +111,47 @@ class PersistentCacheStore<T : MessageLite> private constructor(
}
}

/**
* Primes the current cache such that both the in-memory and on-disk versions of this cache are
* guaranteed to be in sync, returning a [Deferred] that completes only after the operation is
* finished.
*
* The provided [initialize] initializer will only ever be called if the on-disk cache is not yet
* initialized, and it will be passed the initial value used to create this cache store. The value
* it returns will be used to initialize both the in-memory and on-disk copies of the cache.
*
* The value of the returned [Deferred] is not useful. The state of the cache should monitored by
* treating this provider as a [DataProvider]. This method may result in multiple update
* notifications to observers of this [DataProvider], but the latest value will be the source of
* truth.
*
* Where [primeInMemoryCacheAsync] is useful to ensure any on-disk cache is properly loaded into
* memory prior to using a cache store, this method is useful when a disk cache has a
* contextually-sensitive initialization routine (such as an ID that cannot change after
* initialization) as it ensures a reliable, initial clean state for the cache store that will be
* consistent with future runs of the app.
*/
fun primeInMemoryAndDiskCacheAsync(initialize: (T) -> T): Deferred<Any> {
return cache.updateIfPresentAsync { cachePayload ->
when (cachePayload.state) {
CacheState.UNLOADED -> {
val loadedPayload = loadFileCache(cachePayload)
when (loadedPayload.state) {
// The state should never stay as UNLOADED.
CacheState.UNLOADED ->
error("Something went wrong loading the cache during priming: $cacheFile")
CacheState.IN_MEMORY_ONLY -> storeFileCache(loadedPayload, initialize) // Needs saving.
CacheState.IN_MEMORY_AND_ON_DISK -> loadedPayload // Loaded from disk successfully.
}
}
// This generally indicates that something went wrong reading the on-disk cache, so make
// sure it's properly initialized.
CacheState.IN_MEMORY_ONLY -> storeFileCache(cachePayload, initialize)
CacheState.IN_MEMORY_AND_ON_DISK -> cachePayload
}
}
}

/**
* Callers should use this read function if they they don't care or specifically do not want to
* observe changes to the underlying store. If the file is not in memory, it will loaded from disk
Expand Down Expand Up @@ -157,7 +198,7 @@ class PersistentCacheStore<T : MessageLite> private constructor(
/** See [storeDataAsync]. Stores data and allows for a custom deferred result. */
fun <V> storeDataWithCustomChannelAsync(
updateInMemoryCache: Boolean = true,
update: (T) -> Pair<T, V>
update: suspend (T) -> Pair<T, V>
): Deferred<V> {
return cache.updateWithCustomChannelIfPresentAsync { cachedPayload ->
val (updatedPayload, customResult) = storeFileCacheWithCustomChannel(cachedPayload, update)
Expand All @@ -173,7 +214,7 @@ class PersistentCacheStore<T : MessageLite> private constructor(
* does notify subscribers.
*/
fun clearCacheAsync(): Deferred<Any> {
return cache.updateIfPresentAsync {
return cache.updateIfPresentAsync { currentPayload ->
if (cacheFile.exists()) {
cacheFile.delete()
}
Expand All @@ -183,7 +224,7 @@ class PersistentCacheStore<T : MessageLite> private constructor(
// Always clear the in-memory cache and reset it to the initial value (the cache itself should
// never be fully deleted since the rest of the store assumes a value is always present in
// it).
CachePayload(state = CacheState.UNLOADED, value = initialValue)
currentPayload.copy(state = CacheState.UNLOADED, value = initialValue)
}
}

Expand All @@ -206,12 +247,12 @@ class PersistentCacheStore<T : MessageLite> private constructor(
private fun loadFileCache(currentPayload: CachePayload<T>): CachePayload<T> {
if (!cacheFile.exists()) {
// The store is not yet persisted on disk.
return currentPayload.moveToState(CacheState.IN_MEMORY_ONLY)
return currentPayload.copy(state = CacheState.IN_MEMORY_ONLY)
}

val cacheBuilder = currentPayload.value.toBuilder()
return try {
CachePayload(
currentPayload.copy(
state = CacheState.IN_MEMORY_AND_ON_DISK,
value = FileInputStream(cacheFile).use { cacheBuilder.mergeFrom(it) }.build() as T
)
Expand All @@ -221,10 +262,7 @@ class PersistentCacheStore<T : MessageLite> private constructor(
}
// Update the cache to have an in-memory copy of the current payload since on-disk retrieval
// failed.
CachePayload(
state = CacheState.IN_MEMORY_ONLY,
value = currentPayload.value
)
currentPayload.copy(state = CacheState.IN_MEMORY_ONLY, value = currentPayload.value)
}
}

Expand All @@ -235,18 +273,19 @@ class PersistentCacheStore<T : MessageLite> private constructor(
private fun storeFileCache(currentPayload: CachePayload<T>, update: (T) -> T): CachePayload<T> {
val updatedCacheValue = update(currentPayload.value)
FileOutputStream(cacheFile).use { updatedCacheValue.writeTo(it) }
return CachePayload(state = CacheState.IN_MEMORY_AND_ON_DISK, value = updatedCacheValue)
return currentPayload.copy(state = CacheState.IN_MEMORY_AND_ON_DISK, value = updatedCacheValue)
}

/** See [storeFileCache]. Returns payload and custom result. */
private fun <V> storeFileCacheWithCustomChannel(
private suspend fun <V> storeFileCacheWithCustomChannel(
currentPayload: CachePayload<T>,
update: (T) -> Pair<T, V>
update: suspend (T) -> Pair<T, V>
): Pair<CachePayload<T>, V> {
val (updatedCacheValue, customResult) = update(currentPayload.value)
// TODO(#4264): Move this over to using an I/O-specific dispatcher.
FileOutputStream(cacheFile).use { updatedCacheValue.writeTo(it) }
return Pair(
CachePayload(state = CacheState.IN_MEMORY_AND_ON_DISK, value = updatedCacheValue),
currentPayload.copy(state = CacheState.IN_MEMORY_AND_ON_DISK, value = updatedCacheValue),
customResult
)
}
Expand All @@ -265,12 +304,7 @@ class PersistentCacheStore<T : MessageLite> private constructor(
IN_MEMORY_AND_ON_DISK
}

private data class CachePayload<T>(val state: CacheState, val value: T) {
/** Returns a copy of this payload with the new, specified [CacheState]. */
fun moveToState(newState: CacheState): CachePayload<T> {
return CachePayload(state = newState, value = value)
}
}
private data class CachePayload<T>(val state: CacheState, val value: T)

// TODO(#59): Use @ApplicationContext instead of Context once package dependencies allow for
// cross-module circular ependencies. Currently, the data module cannot depend on the app module.
Expand Down
Loading

0 comments on commit a7c8e6c

Please sign in to comment.