Play all audios:
ABSTRACT While semantic and episodic memory have been shown to influence each other, uncertainty remains as to how this interplay occurs. We introduce a behavioral representational
similarity analysis approach to assess whether semantic space can be subtly re-sculpted by episodic learning. Eighty participants learned word pairs that varied in semantic relatedness, and
learning was bolstered via either testing or restudying. Next-day recall is superior for semantically related pairs, but there is a larger benefit of testing for unrelated pairs. Analyses of
representational change reveal that successful recall is accompanied by a pulling together of paired associates, with cue words in semantically related (but not unrelated) pairs changing
more across learning than target words. Our findings show that episodic learning is associated with systematic and asymmetrical distortions of semantic space which improve later recall by
making cues more predictive of targets, reducing interference from potential lures, and establishing novel connections within pairs. SIMILAR CONTENT BEING VIEWED BY OTHERS FEATURE-SPECIFIC
REACTION TIMES REVEAL A SEMANTICISATION OF MEMORIES OVER TIME AND WITH REPEATED REMEMBERING Article Open access 26 May 2021 CONCEPTUAL RELATEDNESS PROMOTES MEMORY GENERALIZATION AT THE COST
OF DETAILED RECOLLECTION Article Open access 20 September 2023 THE EFFECTS OF EPISODIC CONTEXT ON MEMORY INTEGRATION Article Open access 04 December 2024 INTRODUCTION Despite early theories
that proposed a psychological and neurobiological separation between semantic and episodic memory systems1,2, there is an increasing body of work that suggests the two systems are more
intertwined than previously believed3,4. Neuroimaging experiments have demonstrated shared neural activation5 and functional connectivity6,7 during episodic and semantic memory processes,
and pre-existing semantic knowledge can act as a scaffold to facilitate the acquisition of new episodic memories8,9,10. Moreover, semantic relatedness has been shown either
facilitate11,12,13,14,15,16 or impair17,18 episodic memory performance, depending on factors such as recall delay, degree of relatedness within the to-be-learned pairs, and the semantic
relatedness of the broader stimulus set15. Episodic experiences can also influence semantic knowledge by integrating new information as learning occurs, or by emphasizing task or
context-relevant semantic features in pre-existing semantic space19,20,21. However, further specification of the mechanisms of these putative bidirectional episodic/semantic interactions is
needed. One common assessment of episodic memory involves presenting pairs of items and later probing retention of the associations. Although one-shot learning of paired associates is
possible, many paradigms have participants with re-engage with the material through retrieval practice or restudying, and there is a well-established benefit of the former, known as the
testing effect22,23,24,25,26,27,28. There is debate as to whether the “desirable difficulty”29,30 or effortfulness31 of searching for and retrieving a target association is what strengthens
memory or whether testing is advantageous because the episodic experience of retrieval practice is more contextually similar to the final test32. While researchers have increasingly
acknowledged the interdependence of episodic and semantic memory, there are relatively few studies of the testing effect that directly manipulate the semantic information within
to-be-learned pairs of items28 or integrate its role into mechanistic accounts. Carpenter33 proposed that retrieving information from memory necessitates elaborative processes that induce
spreading activation to semantically related information34,35, which can provide additional retrieval cues31. Consistent with this framework, one recent study showed that when to-be-learned
images do not contain meaningful semantic information, there is no benefit for retrieval practice compared to restudying the images36. A separate account suggests that testing supports
memory by facilitating semanticisation37 (i.e. a shift towards more generic semantic representations as opposed to detail-rich episodic representations) and relational processing, which
promotes attention to semantic information38. The degree to which to-be-learned items have a pre-existing semantic relationship may influence how they are associated in memory. The episodic
binding of two items need not be symmetrical, in the sense that the ability of item A to predict item B does not necessarily equate with the ability of item B to predict item A. For
instance, when pairs of words are learned in one direction (cue word A→target word B), the act of testing an unrelated pair in the forward direction (A→?) also improves associative memory in
the reverse direction (B→?), yet when related pairs are tested in the forward direction (A→?), it does not improve recall of the reverse direction (B→?)39,40. Recent neuroimaging work has
also shown asymmetrical integration of associative pairs41. For example, when novel faces are paired with famous faces, the neural representation of the novel face becomes more similar to
the representation of the paired famous face, which itself shows minimal representational change. In contrast, when a novel face is paired with another novel face, the neural representations
of the two faces become more similar, but change equally. One neurobiologically-inspired computational modeling account of associative learning known as the non-monotonic plasticity
hypothesis (NMPH) attempts to explain the testing effect and account for the role of semantic information through the relative co-activation of to-be-learned items and the associated
representational change. This framework proposes that changes in memory strength are driven by the relative activation of items, such that memory for items that are strongly co-activated is
strengthened, while items that are moderately co-activated are weakened or differentiated42,43. When paired items are restudied and brought to mind together, they are strongly co-activated,
and thus strengthened44. When paired items undergo retrieval practice, there is also strong co-activation, but because retrieval is often imprecise, it will also tend to moderately
co-activate semantically related concepts34,35. According to the NMPH, this moderate activation suppresses memory for the related items and differentiates the target to reduce interference
and strengthen memory more than restudying42,43,45,46,47,48. In the present preregistered study, we sought to investigate the influence of semantic relatedness on the testing effect and
understand how episodic paired associate learning might sculpt pre-existing semantic space. We had participants learn semantically related and unrelated pairs of words via testing or
restudying and assessed their memory the next day. Although we were interested in how cued recall accuracy would vary depending on semantic relatedness and learning condition, our primary
focus was on whether and how the semantic representations of the words changed over learning. For this, we developed a behavioral representational similarity analysis approach, which we
applied to data from a similarity-based word arrangement task that participants performed before and after learning. This allowed us to investigate the bidirectional interaction of episodic
learning and semantic knowledge by indexing changes in the associative structure and semantic representation of individual words. Given the existing computational modeling work and
literature on the role of semantic information, we expected to see an overall memory benefit for semantically related pairs. We thus predicted that these already-advantaged pairs would have
less to gain from testing than unrelated pairs. We anticipated that tested pairs would undergo more representational change, and that the amount of representational change would be
correlated with behavioral performance. Finally, we expected to see asymmetric change in the semantic structure of related pairs, where representations of targets would get drawn towards
those of the cues, and symmetric changes for unrelated pairs of words. In this work, we show that the testing effect is reduced for semantically related pairs of words due to the relative
improvement in recall of restudied pairs. We also show that when pairs lack a prior semantic relationship, testing is necessary to induce representational change and that this change draws
cues and targets together symmetrically. Testing also weakens the relationship of the cue words with other moderately related non-target associates. In contrast, prior semantic knowledge can
rescue restudied pairs by inducing asymmetric representational semantic change, which makes cues more predictive of their associated targets. Finally, we show that the relationship between
representational change and recall accuracy of word pairs depends on the interaction of word position, semantic relatedness, and learning condition, where greater representational change in
the cue is associated with better recall, regardless of learning condition or semantic relatedness, but representational change in the target is only associated with better recall when pairs
are semantically unrelated and tested. RESULTS RECALL ACCURACY We began by probing whether recall accuracy for the targets of each cue-target pair systematically varied based on the
semantic relatedness (related vs. unrelated) and the learning condition (testing vs restudying, following two initial exposures); Fig. 1. On Day 1, we could only assess recall accuracy for
tested pairs, since performance for restudied pairs merely reflected participants’ ability to type the visible target word. As expected, semantically related word pairs were recalled better
than semantically unrelated pairs (_t_(79) = 9.979, _p_ < 0.001, _d_ = 1.12, 95% CI = [0.19 0.28]); Fig. 2A. After a 24-hour delay (Day 2), a RM-ANOVA revealed significant main effects of
semantic relatedness (F(1,79) = 362.227, _p_ < 0.001, ηG2 = 0.462) and learning condition (F(1,79) = 25.240, _p_ < 0.001, ηG2 = 0.045) but no statistically significant interaction
(F(1,79) = 0.076, _p_ = 0.78, ηG2 = 9.94 × 10−5); Fig. 2B. Comparison of marginal means at the final test showed that related pairs (M = 0.629, SD = 0.183) had a higher probability of recall
than unrelated pairs (M = 0.281, SD = 0.202), and tested pairs (M = 0.496, SD = 0.258) were more likely to be recalled than restudied pairs (M = 0.415, SD = 0.256). Given that participants
were provided with no feedback, it is possible that tested pairs that were not successfully retrieved on Day 1 would not benefit from testing, potentially obscuring an interaction between
semantic relatedness and learning condition on Day 2. To investigate this, tested pairs were split into those that were correctly recalled at initial learning and those that were not,
revealing a significant relatedness by learning condition interaction for Day 2 recall performance (F(2,154) = 23.531, _p_ < 0.001, ηG2 = 0.054); Fig. 2C, D. Follow up paired t-tests
revealed significant testing effects (i.e. the contrast of pairs that were tested and correctly recalled at Day 1 versus pairs that were restudied) for both related pairs (_t_(79) = 9.575,
_p_ < 0.001, _d_ = 1.070, 95% CI = [0.187 0.285]) and unrelated pairs (_t_(79) = 12.361, _p_ < 0.001, _d_ = 1.382, 95% CI = [0.283 0.391]), with a larger effect of learning condition
for unrelated pairs. Pairs that were tested but recalled incorrectly at Day 1 showed significantly lower accuracy on Day 2 than both restudied pairs (related: _t_(77) = 13.602, _p_ <
0.001, _d_ = 1.540, 95% CI = [0.354 0.476]; unrelated: _t_(79) = 9.692, _p_ < 0.001, _d_ = 1.084, 95% CI = [0.149 0.226]) and tested pairs that were recalled correctly at Day 1 (related:
t(77) = 22.293, _p_ < 0.001, _d_ = 2.524, 95% CI = [0.543 0.710]; unrelated: _t_(79) = 18.870, _p_ < 0.001, _d_ = 2.110, 95% CI = [0.469 0.580]). These results indicate that semantic
relatedness reduces the magnitude of the testing effect by improving recall of related restudied pairs. The benefit of semantic relatedness overcomes the relatively ineffective learning
method of restudying, leaving less to gain by testing. CHANGE IN PAIRWISE REPRESENTATIONAL SIMILARITY While differences in recall accuracy based on semantic relatedness and learning
condition show that these factors are consequential for memory, they cannot show how this happens. To gain mechanistic insight, we turned to the changes in pairwise representational
similarity (measured by the difference between within pair similarity at the final and initial Similarity-Based Word Arrangement Task (SWAT) assessments; Fig. 3), which provides a more
direct measurement of how the semantic representations of our word set change across learning; Fig. 4. First, we ran a linear mixed-effects model (LMM) testing whether there were differences
in the change in similarity for pairs that were correctly recalled at Day 2 relative to those incorrectly recalled at Day 2 and to a control condition of random word pairings that were
never experienced during learning. Pairs that were correctly recalled at Day 2 changed more than those that were incorrectly recalled (_t_(158) = 2.566, _p_ = 0.023, _d_ = 0.20, 95% CI =
[1.84 × 10−5 6.29 × 10−4]) and random pairs (_t_(158) = 3.112, _p_ = 0.007, _d_ = 0.25, 95% CI = [8.73 × 10−5 6.98 × 10−4]), but the change in pairs that were incorrectly recalled at Day 2
was not statistically significantly different from that of random pairs (t(158) = 0.547, _p_ = 0.585, d = 0.04, 95% CI = [−3.74 × 10−4 2.36 × 10−4]); Fig. 5A. We next ran a series of
one-sample t-tests (with Holm-Bonferroni corrections for multiple comparisons) to determine whether change in similarity in our conditions of interest was significantly different from zero;
Fig. 5B. For this analysis (and all hereafter), we opted to exclude tested pairs that were incorrectly recalled at Day 1 because they did not incur the benefit of testing. Significant
changes in similarity were observed for related pairs that were correctly recalled at Day 2, regardless of learning condition (tested: _t_(79) = 3.788, _p_ = 0.002, _d_ = 0.423, 95% CI =
[3.045 × 10−4 9.804 × 10−4]; restudied: _t_(79) = 4.258, _p_ < 0.001, _d_ = 0.476, 95% CI = [3.763 × 10−4 1.037 × 10−3]), and unrelated pairs that were tested and correctly recalled at
Day 2 (_t_(74) = 3.085, _p_ = 0.017, _d_ = 0.356, 95% CI = [2.736 × 10−4 1.272 × 10−3]). All other comparisons were not statistically significantly different from zero (_p_ values > 0.1;
Supplementary Table 2). When change in similarity across conditions was analyzed in an LMM with fixed effects of relatedness, learning condition, and final recall success, there were no main
effects or interactions between relatedness and learning condition; there was, however, a significant main effect of final recall success (_t_(528) = 1.965, _p_ = 0.050, ηp2 = 0.0073, 95%
CI = [1.348 × 10−6 6.372 × 10−4]), where pairs that were correctly recalled at Day 2 showed significantly more change in similarity than those that were not. Although correctly recalled
pairs showed the most overall representational change, it is also possible that there might be changes within the local semantic neighborhoods of the learned cue words that reflect the
repulsion of potential competitor words (i.e. potential lures) to reduce interference; Fig. 4A. To test for this, we first characterized the strength of potential lures for each cue word
using the LSA cosine similarity between the cue word and all other words in our 120-word set. For example, for the pair GENDER – FEMALE, the word MOTHER might interfere with recall, while
CAVERN likely would not. We then calculated the change in similarity across learning for cues in successfully recalled to-be-learned pairs and their potential lures (e.g., GENDER – MOTHER).
We used this change in similarity as the outcome variable of an LMM with fixed effects of relatedness, learning condition, and lure strength and random effects of relatedness and learning
condition. This model showed a significant learning condition by lure strength interaction (_t_(248400) = 2.840, _p_ = 0.005, ηp2 = 3.25 × 10−6, 95% CI = [−3.130 × 10−4 −5.740 × 10−5]); Fig.
5C. Follow up t-tests revealed that very strong lures are drawn together more than weak/non-lures when they were associated with both tested (z = 6.029, _p_ < 0.001, _d_ = 0.12, 95% CI =
[4.90 × 10−4 1.22 × 10−3]) and restudied pairs (z = 6.689, _p_ < 0.001, _d_ = 0.15, 95% CI = [6.59 × 10−4 1.48 × 10−3]). In contrast, moderate lures associated with tested pairs were
pulled together less than tested weak/non-lures (z = 4.182, _p_ < 0.001, _d_ = 0.03, 95% CI = [7.48 × 10−5 3.13 × 10−4]). Because there was generalized semantic change even for
weak/nonlures (see Supplementary Note 3), we further probed this interaction by contrasting each lure bin with the weak/nonlures across learning condition. This analysis revealed that
moderate lures associated with tested pairs are drawn together less than those associated with restudied pairs (z = 2.840, _p_ = 0.014, d = 0.03, 95% CI = [5.740 × 10−5 3.129 × 10−4]). All
other baseline-corrected comparisons were not statistically significant (_p_ values > 0.05); these comparisons, in addition to pairwise comparisons between other lure bins and other
significant effects, are reported in Supplementary Note 3. These results show that successful recall not only pulls to-be-learned word pairs closer together in representational space, but
also sculpts the overall representational space by drawing highly similar words closer to the cue word to potentially serve as additional retrieval cues for the to-be-learned target. Testing
additionally repels moderate lures that are unlikely to serve as retrieval cues and could potentially interfere with successful recall. CHANGE IN OVERALL REPRESENTATIONAL SIMILARITY
STRUCTURE A complementary approach to our analyses of the relationship of words within a to-be-learned pair is to investigate how the semantic relationship of each word changes with respect
to all other words in the set. To explore this, we extracted from the full representational similarity matrix the row vector reflecting a word’s similarity to its 20 nearest semantic
neighbors and compared this across learning; Fig. 4B. For example, the representation of GENDER can be defined by its similarity to its nearest semantic neighbors, including CHILDREN,
MOTHER, TEACHER, and PARENT. By comparing the similarity of GENDER to each of these words across learning, we can quantify how much the representation of GENDER changes. When Fisher
z-transformed correlation values were entered into an LMM with fixed effects of relatedness, learning condition, final recall success, and word position (cue vs target), and random effects
of word position and learning condition, there was a significant relatedness by word position by final recall success interaction _(t_(967) = 2.607, _p_ = 0.009, ηp2 = 0.007, 95% CI =
[−0.206 −0.030]), Fig. 6A, in addition to significant relatedness by final recall success interaction (_t_(967) = 1.986, _p_ = 0.047, ηp2 = 0.0041, 95% CI = [0.0014 0.153]), and position by
final recall success interaction (t(964) = 2.581, _p_ = 0.009, ηp2 = 0.0068, 95% CI = [0.024 0.175]). Additionally, there was a main effect of final recall success (_t_(963) = 2.660, _p_ =
0.007, ηp2 = 0.0073, 95% CI = [−0.135 −0.021]). Follow-up t-tests revealed that for related pairs that were successfully recalled at Day 2, target words underwent more learning-induced
representational change than cue words (_t_(372) = 3.546, _p_ = 0.002, _d_ = 0.18, 95% CI = [0.039 0.137]). Comparing the correlation of representations across learning can identify
asymmetry of change for paired words but does not provide information about _how_ the structure of the pair changes. For instance, our previous analyses showed that GENDER changes relatively
more than its target FEMALE, but it cannot tell us whether GENDER becomes more similar to FEMALE, or whether the changes are unrelated to its to-be-learned target; Fig. 4B. To investigate
this, we calculated a single asymmetry measure by subtracting the correlation of the cue after learning and target before learning from the correlation of the cue before learning and target
after learning to determine whether how the representations change relative to each other. Here, a positive value would suggest the representation of the target is drawn towards that of the
cue, a negative value would suggest the cue is drawn towards the target, and a value of zero would suggest that the relative representational change of the cue and target is symmetric. An
LMM on our asymmetry measure with fixed effects of relatedness, learning condition, and final recall success and a random effect of learning condition showed a significant main effect of
relatedness (_t_(452) = 2.414, _p_ = 0.016, ηp2 = 0.01, 95% CI = [0.006 0.064]), where the asymmetry value for related pairs was significantly different (more negative) from the asymmetry
value for unrelated pairs; Fig. 6B. We additionally found that unrelated pairs did not show any significant asymmetry relative to zero (_t_(78) = 0.814, _p_ = 0.418, _d_ = 0.18, 95% CI =
[−0.023 0.055]). In contrast, related pairs showed a numerically negative asymmetry value; however, despite a moderate effect size, this effect was only a nonsignificant trend after
corrections for multiple comparisons (_t_(79) = 2.157, _p_ = 0.068, _d_ = 0.49, 95% CI = [−0.049 −0.001]). RELATING RECALL ACCURACY TO REPRESENTATIONAL CHANGE To further probe the behavioral
relevance of representational change for learning outcomes, we conducted an item analysis (with each word pair considered an ‘item’). The average accuracy at final test across participants
for each word pair (regardless of learning condition or semantic relatedness) was significantly correlated with its similarity after learning (Fig. 7A; r(58) = 0.46, _p_ < 0.001, 95% CI =
[0.232 0.638]) and average change in similarity (Fig. 7B; r(58) = 0.39, _p_ = 0.002, 95% CI = [0.154 0.583]), suggesting that word pairs that are considered more similar after learning and
that show greater learning-induced representational change are more likely to be remembered. Despite the relationship between pairwise change in similarity and behavioral accuracy, there was
no statistically significant correlation between the magnitude of an individual’s behavioral testing effect and their average change in similarity for tested and restudied pairs when
averaging across words (r(78) = −0.163, _p_ = 0.149, 95% CI = [−0.369 0.059]). Although comparing the average pairwise change in similarity to average accuracy across participants provides a
valuable link between the re-sculpting of semantic space and behavioral performance, it overlooks the fact that semantic relatedness and learning condition may have differential effects on
the recall success of a word pair. To investigate the relative contribution of these processes to behavioral performance, we conducted a mixed effects logistic regression predicting the Day
2 recall outcome of each individual word pair. Echoing our previous analyses, this model showed a significant relatedness by learning condition interaction (_z_ = 2.424, _p_ = 0.014, ηp2 =
0.0016, 95% CI = [0.124 1.109]), in addition to significant main effects of relatedness (_z_ = 10.572, _p_ < 0.001, ηp2 = 0.028, 95% CI = [−2.046 −1.406]) and learning condition (_z_ =
6.324, _p_ < 0.001, ηp2 = 0.010, 95% CI = [0.815 1.547]). Follow-up tests revealed that there was a larger benefit of testing over restudying pairs on the probability of successful recall
at Day 2 for unrelated pairs (z = 12.431, _p_ < 0.001, _d_ = 0.20, 95% CI = [−1.96 −1.43]) than related pairs _(z_ = 9.706, _p_ < 0.001, _d_ = 0.16, 95% CI = [−1.55 −1.03]).
Additionally, this model revealed a significant main effect of the change in similarity of the cue representation (z = 2.453, _p_ = 0.014, ηp2 = 0.0015, 95% CI = [−0.772 −0.086]), suggesting
that more change in the representation of the cue across learning is associated with a higher probability of recall at Day 2; Fig. 7C. Finally, this model showed a significant relatedness
by condition by change in target representation across learning interaction (_z_ = 2.075, p = 0.038, ηp2 = 0.0011, 95% CI = [−1.678 −0.048]). Investigation of the slopes revealed that for
tested unrelated pairs, there was a significant negative relationship between the probability of final recall success and the change of the representation of the target across learning (_z_
= 2.691, _p_ = 0.007, _d_ = 0.16, 95% CI = [−0.266 −0.042]), suggesting that more change in the target across learning (i.e. lower correlation values) is associated with higher probability
of subsequent recall. This slope was significantly more negative than the slope from unrelated restudied pairs (z = 2.321, _p_ = 0.020, _d_ = 0.11, 95% CI = [0.023 0.275]); Fig. 7D. There
was no statistically significant relationship between change in target representation and probability of subsequent recall across learning for related pairs (_p_ values > 0.1;
Supplementary Table 5). Together, these results show while the magnitude of representational change that a pair undergoes is associated with its probability of subsequent recall, there may
be multiple processes underlying the change that depend on both the characteristics of the word pair itself and the learning conditions, and that these processes do not all impact the
probability of successful recall. DISCUSSION Three primary questions were addressed in the current work. First, we sought to determine how semantic relatedness between paired words
influences the testing effect. Second, we created an extension of a multi-arrangement similarity paradigm49 to investigate how paired associate learning, supported by either testing or
restudying, can shape the semantic representations of individual words. Finally, we assessed whether learning-induced changes in semantic representation were associated with behavioral
performance. To evaluate our first question, we systematically manipulated semantic relatedness between the cue and target with a to-be-learned pair of words and compared accuracy between
tested and restudied pairs after approximately 24 h. We found that although relatedness increases overall performance, it decreases the magnitude of the testing effect by substantially
improving performance for restudied pairs, such that the relative additional benefit conferred by testing is less than for unrelated pairs. Crucially, we only observed this interaction
between semantic relatedness and learning condition when tested items were split between those successfully and unsuccessfully recalled at the initial testing. This is consistent with
previous work29,50,51 showing that, in the absence of feedback, the mnemonic benefits of testing only occur if the target item is successfully recalled during the initial test. Accuracy
alone, however, can only provide limited insight into exactly how semantic relatedness differentially improves memory for tested and restudied pairs of words. To address this gap, we
developed an extension of a multi-arrangement paradigm to simultaneously measure the semantic similarity of sixty words at a time and impute the semantic similarity of words in to-be-learned
pairs without them ever being directly measured against one another. In this analysis, we showed that successful learning, especially of related pairs, draws paired words closer together in
semantic space more than unsuccessful learning attempts and pairs that did not undergo learning. Showing that pairs are drawn together, however, does not show how they become more similar.
It is possible that both items within a pair change symmetrically to become more similar to each other; alternatively, one item may remain relatively stable while the other changes. Extant
literature investigating these potential hypotheses39,40,52,53,54 tends to compare outcome measures like accuracy and reaction times when probing pairs in the forward (i.e. A→B) vs backward
(i.e. B→A) directions. These measures, while useful for answering some questions, are less effective for exploring associative asymmetry of changes in semantic space, as they cannot compare
the overall representations of concepts. To this end, we compared the semantic representations of individual words across learning. We found that for related pairs, learning-induced greater
representational change in the semantic structure of cue words than target words, while there was no statistically significant difference in the change in unrelated cue and target words. We
then adapted an approach from neuroimaging literature for investigating asymmetrical representational change41,55. If the correlation between pre-learning cues with post-learning is less
than the correlation between post-learning cues with pre-learning targets, this implies that learning draws cues towards targets in semantic space. This was indeed the pattern we observed
for related word pairs (although as an isolated effect, the negative asymmetry value narrowly failed to survive corrections for multiple comparisons; however, the change in asymmetry
relative to unrelated pairs was significant). The idea that testing creates a directionally-specific (i.e. asymmetric) associative relationship, where the cue-to-target relationship is
strengthened without influencing the backward associative target-to-cue link, is consistent with prior theoretical accounts. According to the dual memory theory56 this process occurs by
creating an episodic “cue memory” where the cue and target are encoded in the context of a retrieval task, whereas restudying creates a bidirectional association. The transfer-appropriate
processing account32 posits that the benefit of testing stems from greater episodic contextual similarity between retrieval practice and the final test, relative to restudying. Consistent
with this framework, our results show asymmetric change in cue and target representations across learning; however, this asymmetric change depends on the pre-existing semantic relatedness,
rather than learning condition, suggesting that the asymmetrical change within a pair may be driven by the semantic information within the to-be-learned pairs, rather than the creation of an
episodic “cue memory” during testing. Other work has suggested that prior knowledge plays a crucial role in the symmetry of concept representations after learning39,41,52. For instance,
when pairs of famous and novel faces are learned, multivariate neural representations of novel target faces are drawn towards those of their paired cue faces only when there is pre-existing
knowledge about the cue face41. While this asymmetric representation is in the opposite direction to the one we observed in our data, it is important to note that in that study there was no
pre-existing relationship between the paired faces and no prior knowledge surrounding the novel faces. In contrast, the word stimuli used in our study had a rich network of semantic
associations prior to learning, with pre-existing semantic relationships between half of the pairs. It is possible that the assimilation of a target item representation into that of its
paired cue item only occurs when existing semantic information about the cue can scaffold the integration of the novel information into the existing knowledge. When there is pre-existing
knowledge about both items in a pair, as was the case in our study, the cue representation instead changes asymmetrically to become more predictive of the upcoming target55,57. Although
words in our corpus that are strongly associated with a given cue word are drawn towards that cue word regardless of the learning condition or relatedness of the associated to-be-learned
pair, we only show learning-induced asymmetric sculping of the overall semantic space for semantically related pairs. Accounts of the testing effect such as the elaborative encoding
account58 or the semantic mediator hypothesis31,33,59 propose that mental elaboration during the search for correct answer during testing facilitates later recall by dynamically creating
additional retrieval routes via the activation of concepts connecting cues and targets33,59 or by increasing relational processing relative to restudying38,60. Other work has shown that
pre-existing semantic relationships between words facilitate integration of the pair, potentially amplifying these effects61. It is possible that even though semantic associates of the cues
in unrelated pairs create new links within the to-be-learned pair, the paired words are less likely to co-activate shared concepts enough to change the overall semantic space42. We
additionally conducted a set of analyses comparing participants’ idiosyncratic semantic representations (derived from the SWAT) to normative semantic representations (derived from word2vec)
to test whether these elaborative connections made during learning are truly novel or reflect the sculpting of existing features; see Supplementary Note 4 and Supplementary Fig. 6. We show
that after learning, words in tested pairs are drawn closer to their normative representations, suggesting that even though learning drives novel connections, testing shapes features that
already exist, rather than adding entirely new features to a representation. The non-monotonic plasticity hypothesis (NMPH) may also help explain the effects we observed of testing on
representational change. This account posits that while testing and restudying both strongly co-activate representations of the paired items, testing additionally requires a search process
for the to-be-learned target that induces moderate co-activation of other similar items (as has been previously shown to occur during the retrieval of highly similar episodic memories62,63),
weakening these connections and reducing interference42. This is precisely what we found in our analyses of lure representations – testing exerted the biggest impact on moderate-strength
lures, which were significantly repelled away from cue words, relative to weak/non-lure pairings. This effect is not only consistent with the predictions of NMPH, but also with an emerging
body of work showing that competition adaptively distorts and repels overlapping episodic representations so they become less similar46,64,65. Our last goal was to evaluate the linkage
between learning-induced changes in semantic representations and final recall success. To do so, we examined how the mean retrieval success of each word pair (averaged across participants)
relates to its mean learning-induced change in representational similarity, and how individual differences in multiple factors affecting representational change relate to the subsequent
recall of a given pair. Using the first approach, we showed that word pairs that undergo a greater amount of pairwise representational change (regardless of learning condition) are more
likely to be remembered at the final test. Our individual differences approach showed that pairs are more likely to be recalled after a delay when the representation of the cue changes more
across learning, while learning-induced change in the representational space of the target is only associated with final recall success in unrelated pairs that underwent testing. These
findings highlight how changes in the representation of the cue (to make it more predictive of the target) are crucial regardless of learning condition, but it may only be necessary to
sculpt the representation of the target to create elaborative links between words in a pair if they do not already exist. One potential limitation of our work comes from our use of pairwise
similarity metrics derived indirectly via imputation. If our imputation method was unreliable, it might cast doubt on our behavioral representational change results. We believe that this is
not the case and have performed extensive validation analyses of our imputations (see Supplementary Note 2). We believe that our ability to impute the subjective semantic relatedness of
pairs without ever having participants directly judge them is a key innovation of our work over existing approaches such as semantic priming and free association that can only show the
relative magnitude of the effects of semantic relatedness through measures like accuracy and reaction time. Moreover, we expect our imputation approach will allow researchers to infer
pairwise relationships without running the risk of biasing participants by presenting to-be-learned pairs before learning, nor evoking demand characteristics by having participants
explicitly judge the similarity of already-learned pairs, which may occur in traditional multi-arrangement paradigms. Another potential limitation could come from our admittedly restricted
assay of semantic space. Due to experimental time constraints, we were unable to include additional words beyond those in the to-be-learned pairs in our SWAT protocol that would enrich our
measurement of semantic space and serve as a null hypothesis test, as they should undergo little or no representational change. We also ensured that the distributions of semantic association
across conditions did not overlap so that we could treat relatedness as a dichotomous variable and actively avoided very strongly related pairs of words so that participants could not
easily guess the target word in the absence of successful learning. These design constraints may have resulted in a truncated range of semantic relatedness across all pairs. Recent work has
shown that the effect of semantic relatedness may depend on the range of strength of association across the entire stimulus set15, so future work may opt to choose a broader range to
determine if this impacts the results. Despite the general stability of semantic knowledge over the course of one’s lifetime, our results demonstrate that even a brief session of episodic
learning can subtly yet systematically re-sculpt semantic space. Our behavioral representational similarity approach identifies multiple processes supporting episodic memory, where new
connections are established between a cue and target, shared semantic information asymmetrically changes cues to become more predictive of their paired target and testing minimizes
associations with potentially interfering semantic lures. Together, these changes impart a lingering residue on semantic memory that facilitates later episodic recall. These results are
consistent with recent neuropsychological, behavioral, and neuroimaging evidence that the episodic and semantic memory systems may interact through gradients of activation of shared
cognitive processes5,6,7. In this framework, episodes are comprised of both general conceptual reinstatement and episode-specific sensory processing, while recall of semantic memory often
includes episodic information about when and where the information was acquired3. Future studies will be needed to better characterize whether these subtle learning-induced semantic
distortions are short-lived or whether they can endure for weeks or months. METHODS The experimental design and data analysis plan were preregistered prior to data collection on November
19th, 2020 on the Open Science Framework at https://osf.io/5q6th/. PARTICIPANTS Participants were recruited via Prolific (https://www.prolific.co/) and through the UCLA SONA Undergraduate
Participant Pool. A power analysis (see Supplementary Methods for details) suggested we would need a sample size of at least 73, so we aimed to collect useable data from 80 participants. A
total of 262 participants (145 from SONA, 117 from Prolific) completed the first session of the experiment. Of those, 183 returned for the second session within 28 h of completing the first
(88 from SONA, 95 from Prolific). After excluding participants who did not complete both sessions or who otherwise did not meet our strict inclusion criteria (described in the Supplementary
Methods), we were left with 29 from SONA and 51 from Prolific. Participants from Prolific received monetary compensation and participants from SONA received course credit. The two samples
were not significantly different on any key measures, so the samples were combined for a final _N_ = 80 (29 male; age range = 18–39, mean age = 24.33, SD = 5.49). Participants from SONA had
all completed at least high school level education; years of education was not collected from participants from Prolific. All participants provided informed consent prior to participating.
This research was approved by the IRB of the University of California, Los Angeles. Participants from the UCLA SONA Undergraduate Participant Pool were compensated with course credit and
participants recruited on Prolific were compensated at a rate of $7.00/hour. Additionally, we noted in our pre-registration that we would exclude participants who reported rehearsing word
pairs between sessions. Ultimately, we included the 8 participants who reported rehearsing word pairs between sessions, as we did not explicitly instruct participants not to rehearse and our
survey question was not specific enough to determine the extent to which they rehearsed (i.e. it did not distinguish whether they spent hours rehearsing all word pairs, or just happened to
spontaneously recall one or two of them). MATERIAL Stimulus materials included 60 cue-target word pairs. Thirty of these pairs were semantically related and were drawn from the FSU Free
Association Norms66. We restricted words to nouns with no homographs, a concreteness norm greater than 3.5, and deemed by Nelson et al. as appropriate for use in an experiment because they
had of an acceptable number of normed associates. In order to reduce the possibility that a participant might simply guess the target word given the cue word, pairs were restricted to have a
forward strength of association less than 0.5, meaning that fewer than half of people who saw a given cue word would generate the target word in a free association task. Finally, any pairs
of words that together made a compound word or were similar to any English idiom were excluded. For each of the related pairs, we compiled three measures of pair similarity: (1) forward
association strength, (2) cosine similarity from latent semantic analysis (LSA) derived from a corpus of 100k English words (http://www.lingexp.uni-tuebingen.de/z2/LSAspaces/) and the
_LSAfun_ R package66,67, and (3) word2vec similarity, based off of a model trained on a subset of the Google News dataset, which contains 300-dimension vectors for 3 million words and
phrases (https://code.google.com/archive/p/word2vec/). An additional 30 low relatedness pairs were selected to form the remaining 30 unrelated pairs. Target words of these pairs were
shuffled until all 30 pairs had word2vec, LSA cosine similarities and (if the pair was normed), cue-to-target association strengths that were lower than the entire list of related word pairs
to ensure no overlapping measures. PROCEDURE OVERVIEW Participation in this experiment took place over two days, with the sessions occurring no more than 28 h apart (see Fig. 1 for a
schematic of the procedure). On Day 1, participants first performed a multidimensional similarity rating task using a drag-and-drop interface (Fig. 2A; similar to an approach from work in
neuroimaging49, which used picture stimuli instead of words). Following this similarity-based word arrangement task (hereafter referred to as the SWAT), participants completed a learning
task, where they were given two opportunities to initially learn a set of 60 words pairs (30 related; 30 unrelated). We note that in our pre-registration of this experiment we had stated
that participants would only have one initial learning opportunity before the test/restudy manipulation; however, pilot data suggested that one learning opportunity was not enough to yield
sufficient accuracy on Day 2. Then, participants were given a third opportunity to engage with each pair via either testing or restudying. Last, participants completed a short questionnaire
about how distracted they were during the task. Participants received a link to the second part of the experiment the following day; if they did not complete the Day 2 session within 28 h
(i.e. before they had a second night of sleep), they were excluded from all analyses. The Day 2 session (Fig. 1) began with testing of all word pairs (“final test”), and then participants
performed another set of similarity judgements using the SWAT protocol. Testing was performed prior to the SWAT protocol on Day 2 to prevent the possibility that words encountered during the
SWAT trials would trigger additional retrieval practice or other rehearsal, which could have influenced final test performance in unpredictable ways. WORD PAIR LEARNING Participants
performed three rounds of word pair learning. During each of the first two rounds, all 60 pairs were presented on the screen in randomized order, with the text written in capital letters.
Each pair appeared for 4 s with a 2 s ISI. For each pair, the cue word was presented on the left and the target word on the right. During the first round, participants were asked to make a
judgement about how related the cue and the target word pairs were on a scale of 1−4, with 1 meaning “not related” and 4 meaning “very related”. The second round was structured the same as
the first, but participants were asked to judge how likely it would be for those two words to appear on the same page of a book or magazine on a scale of 1–4, with 1 meaning “not at all
likely” and 4 meaning “very likely”. These judgements allowed for incidental encoding and encouraged the relational processing of the words in each pair. Relatedness judgements are described
in Supplementary Fig. 1 and Supplementary Note 1. In the final learning round, 30 of the pairs underwent retrieval practice (testing) and the other 30 were restudied. Participants were
instructed that if they saw the cue and target words together (just as they had in the prior two rounds) their task was simply to type the target word into the answer box; if they saw the
cue word accompanied by four question marks (“????”) their task was to attempt to recall the target word and type it into the answer box. If they could not remember the target word,
participants were encouraged to take a guess, or they could leave the box blank. Asking participants to type the paired words in the restudy condition, rather than having them make an
additional relatedness judgement as in the first two learning rounds, allowed us to match the behavioral response with that of the testing condition (i.e. typing a word). This also served to
reduce the differences between behavioral responses in the restudy condition and the final test, where all pairs would be probed by asking the participant to type a word. The learning
condition manipulation was randomly interleaved; although this interleaved design necessitates task switching within the learning opportunity block, there was no statistically significant
difference in final recall accuracy between trials where the participant switched between testing and restudying and those where learning condition was consistent across consecutive trials
(see Supplementary Note 1 for more detail). Participants were not given a time limit on recalling the second word in the pair. No feedback was provided, as feedback can provide an additional
restudying opportunity that can enhance final test performance for tested items68 and inflate testing effects28. The assignment of the word pairs to either the test or restudy condition was
counterbalanced by creating two matched sets of pairs with 15 related and 15 unrelated pairs. Words were always presented in the forward order (i.e. cue was always presented before the
target). The sets were matched on concreteness, frequency, length of cue and target, word2vec and LSA cosine similarity measures. Each set of words was randomly assigned to either the test
or restudy condition independently for each participant. Memorability of the pairs of words was measured post-hoc by computing the average recall accuracy of the pair across participants;
there was a range of accuracy across pairs, ranging from 91% (GENDER- FEMALE) to 5% (CHILDREN – BIRD) of participants recalling any given pair (Supplementary Fig. 2). Despite the range of
memorability across all pairs, there was no statistically significant difference in mean memorability across the two sets of words pairs (see Supplementary Note 1 for more details). FINAL
TEST In the final test, performed on Day 2, participants were presented with cue words from pairs they had learned on the previous day (with the cue word on the left and “????” on the right,
just as in the testing condition on Day 1) and were asked to type in the corresponding target word. There was no time limit on recall, and participants were encouraged to guess if they
couldn’t remember the pairs or otherwise leave the box blank. Responses were scored as correct if they were spelled correctly or if a spell-checking algorithm
(https://textblob.readthedocs.io/en/dev/) identified the correct target word as the most likely word. SIMILARITY-BASED WORD ARRANGEMENT TASK (SWAT) The SWAT was performed at the beginning of
Day 1, prior to learning word pairs, and again at the end of Day 2, after the final test. Each session of the task was comprised of 4 trials. On each trial, participants received 60 words
in a “word bank” on the left side of the screen. Participants clicked on a word to bring it over to a main arrangement area (“the canvas”) and then dragged each word to the location of their
choosing. Participants were instructed to take as long as they needed to arrange the words such that more similar words were closer together and more dissimilar words were further apart.
Trials lasted a median duration of 7.14 min. Individual words were pseudo-randomly assigned to trials based on the to-be-learned pairs. The list of cues and targets were each split in half,
to create 4 lists of 30 words. Each list was paired with each other list, except for the list that would form the to-be-learned pairs. This procedure created 4 trials of 60 words each,
ensuring that each word would be arranged twice and that the two words in each to-be-learned pair were never both encountered on the same trial. This was an important constraint, as the mere
act of thinking about the semantic relationship of the words in the to-be-learned pairs (or learned pairs in the case of the post-learning assessment) during a SWAT trial could bias
participants’ word placement decisions and corrupt our ability to sensitively measure the behavioral consequences of our experimental manipulations. The order of the 4 trials was randomized
for each participant, as was the order of the words in the word bank on each trial. DERIVATION OF SEMANTIC SIMILARITY METRICS After participants completed the SWAT arrangements, semantic
dissimilarity was calculated for each pair of words by taking the Euclidean distance between the locations of each pair of words on the canvas (measured as the distance in pixels from the
center of each word). Trials were combined using an evidence-weighted average of scaled-to-match distance matrices49. However, because words within to-be-learned pairs were never included on
the same trials, we could not directly measure the distance between these words. Thus, by design, our procedure produced an incomplete representational dissimilarity matrix. In order to
reconstruct one of our primary measures of interest (i.e. the semantic distance between words in to-be-learned pairs, both before learning and after learning), SWAT trials were combined
using an evidence-weighted average and the semantic dissimilarity of unmeasured data pairs was imputed using K-nearest neighbors imputation using the KNNImputer function69 from Python’s
_sci-kit learn_ package70 with 40 neighbors (as was determined as an optimal number of nearest neighbors for imputation in simulations) and the “distance” weighting function (see Fig. 3B for
a visualization of this process). This imputation procedure was performed separately on each participant’s pre-learning SWAT data and post-learning SWAT data. Since the imputation of
not-directly-measured semantic distance ratings is a key innovation of our experimental paradigm, we conducted a number of analyses to confirm the validity of the imputation, and these are
described in the Supplementary Note 2 and Supplementary Figs. 3, 4. Finally, semantic dissimilarity measures were converted to similarity measures for ease of interpretation by taking 1 –
dissimilarity and the upper triangle of the fully imputed similarity matrix was used for further analysis. This process resulted in a range of similarity values from 0.9765 to 0.9963 for the
pre-learning SWAT (_M_ = 0.9890, SD = 0.0028) and 0.9733 to 0.9972 (_M_ = 0.9894, SD = 0.0032) on the post-learning SWAT. STATISTICAL ANALYSES All statistical analyses were conducted in R
(version 4.1.2; R Core Team, 2021) and visualized using the _ggplot2_ R package (Wickham, 2016). A list of packages used (including version information) is included in the Supplementary
Methods. Data and code are available on OSF at https://osf.io/5q6th/. Tests of normality are not reported given that t-tests, ANOVAs and linear mixed models are generally robust to
violations of normality, especially with larger sample sizes71,72. PREREGISTERED ANALYSES To investigate how semantic relatedness influences the testing effect, accuracy for tested and
restudied pairs was calculated separately for semantically related and semantically unrelated pairs for each participant in the final session. A 2 × 2 (relatedness x learning condition)
repeated measures ANOVA (RM-ANOVA) using the _rstatix_ package73 was performed to detect differences between conditions on the final test. Furthermore, a single measure of the testing effect
on behavioral performance was calculated for each semantic relatedness condition (related pairs, unrelated pairs) and all pairs (regardless of condition) by taking the difference between
the probability of a tested item being correctly recalled and the probability of a restudied item being correctly recalled. Next, tested pairs were split based on whether they were correctly
recalled on Day 1. The accuracy on Day 2 was assessed in another 3 × 2 RM-ANOVA (Day 1 condition (correctly recalled, incorrectly recalled, restudied) x relatedness (related, unrelated).
Although we initially preregistered that we would include all trials in the remainder of our analyses, we ultimately opted to exclude pairs that were tested and incorrectly recalled at Day 1
(mean number of pairs excluded=11.23, SD = 4.57) because we were primarily interested in the effects of successful testing compared to restudying. Effect sizes for RM-ANOVAs are reported
using generalized eta-squared (ηG2), which measures the effect size with variation from other effects and includes variance due to individual differences74. Change in semantic similarity was
calculated for each word pair by taking the difference between similarity on Day 1 and Day 2. With this change measure, a negative value indicates that words within a pair became less
similar over time (initial similarity > final similarity), while a positive value indicates that words within a pair became more similar over time (final similarity > initial
similarity). As a manipulation check, the Day 2 semantic similarities of learned pairs (i.e. pairs of words that were either tested or restudied in the main part of our experiment), split by
those that were correctly recalled at Day 2 and those that were not, were compared to random, unlearned pairs (i.e. all possible pairings of words from our stimulus set that were not
restudied or tested during the learning portion of the experiment). This test provides a noise ceiling (as any changes in unlearned pairs can be thought of as noise) and ensures that learned
pairs indeed show more representational change than unlearned pairs. Next, values for the learned pairs were entered into a linear mixed-effects model (LMM) with fixed effect predictors of
semantic relatedness, learning condition, and final recall success, and a random intercept of subject identity. We note that although these analyses were initially preregistered to use
RM-ANOVAs and paired t-tests, we report our results in an LMM framework to be consistent with our exploratory analyses (see below) and account for variance from potential random effects and
report the results from the RM-ANOVA in the Supplementary Note 3. A testing effect measure for similarity was calculated in a comparable way as we did for the memory recall performance data,
by taking the difference between the raw similarity on the final day and the change in similarity across days for tested items and restudied items. These measures of the testing effect from
the similarity data were correlated with the testing effect measure for performance in the learning task across all pairs. Additionally, we evaluated the asymmetrical representational
change of each individual word by extracting the vector of similarity comparing each word to its top 20 nearest neighbors before and after learning. Analyses were restricted to the 20
nearest neighbors to reduce the influence of distant words in semantic space, which would be relatively uninformative for the definition of a given word. For example, it is much more useful
to consider the definition of BLANKET in relation to words like PILLOW or CLOTH (the top two closest neighbors in our set, as defined by word2vec), where one can consider the specific
connection or compare features, than its relationship to MATH or CHEF (the two least similar words in our set), where they share few features or associates. Nearest neighbors were identified
by calculating the cosine similarity between the full semantic feature vectors extracted from word2vec and selecting the top 20 largest similarity values, excluding pairs of words where the
distance is imputed. The similarity values for these pairs as measured by the SWAT were used as the vectorized representation for each word. The representation of the cue in the initial
pair was then correlated with the target in the final pair \(r({{Cue}}_{{Day}1},{{Target}}_{{Day}2})\) and the cue in the final pair to the target in the initial pair
\(r({{Cue}}_{{Day}2},{{Target}}_{{Day}1})\). Taking the difference between the Fisher z-transformed correlation values
\(r({{Cue}}_{{Day}1},{{Target}}_{{Day}2})-r({{Cue}}_{{Day}2},{{Target}}_{{Day}1})\) provided a single measure to index the amount of asymmetric change of each of the individual words, where
a positive value would indicate that the target word becomes more similar to the cue word, while a negative value would indicate that the cue was drawn more towards the target, and a zero
value would indicate that there was equal change for each word in the pair. Asymmetry values were Fisher z-transformed and entered into a linear mixed-effects model (LMM) with learning
condition (tested vs restudied), position of word in pair, and relatedness of pair as fixed effect predictors and subject identity as a random intercept. Semantic relatedness and learning
condition were iteratively tested as potential random slopes using likelihood ratio tests (using _varCompTest_ from the _varTestnlme_ R package75) and the variance of random effects in the
final model was estimated using restricted maximum likelihood (REML). Follow up pairwise comparisons were used to investigate significant effects with Holm-Bonferroni corrections for
multiple comparisons. Additionally, we tested whether the Fisher z-transformed asymmetry values were significantly different from zero using a series of two-tailed one-sample t-tests with
Holm-Bonferroni corrections for multiple comparisons. We note that we initially pre-registered that we would complete this analysis using all values from the row vector (rather than just the
top 20 nearest neighbors). This analysis was initially attempted and resulted in no statistically significant results. However, this analysis assumes that the measured representation of
each word in our set is independent from that of the other words in our set; in a neuroimaging-based representational similarity analysis (which our analysis was inspired by), this is indeed
the case. However, in our paradigm, the semantic representation of each individual word is derived from its relationship to every other word in the set, and all of these words also
underwent learning. As such, when comparing the representation of a given word across learning to all other words in the set, we are unable to isolate the change of that specific word from
the changes in all the other words in the set, thereby inducing additional noise and making it more difficult to see any meaningful change for any individual word. Finally, we note that we
deviated from our pre-registration for both our analyses of the change in similarity and asymmetry values of learned pairs by separating pairs into those that were subsequently recalled at
Day 2 and those that were forgotten at Day 2 (and include this distinction as a fixed effect predictor in our models) to allow us to test how our findings related to behavioral performance.
Additionally, although we include observations about trials that were tested but incorrectly recalled at Day 1 in our basic behavioral analyses, we opted to exclude those trials from our
analyses of representational space as we were primarily interested in the differential effects of our learning conditions (which theoretically only occur when testing is successful50), and
we ultimately did not have a sufficient number of trials that were tested and incorrectly retrieved at Day 1 to sufficiently power any analysis of representational change for that trial
type. EXPLORATORY ANALYSES As a complement to the preregistered analyses described above, several exploratory analyses were also performed. First, we performed an additional two-tailed
paired t-test on the recall accuracy of tested pairs at the initial Day 1 test to determine whether semantically related pairs were recalled better than semantically unrelated pairs.
Restudied pairs were excluded from this analysis as the accuracy of these pairs reflected the ability to correctly type the fully visible target word, rather than memory recall performance.
In addition to conducting a 2 × 2 RM-ANOVA on the similarity measures, we conducted a series of two-tailed one sample t-tests with Holm-Bonferroni adjustments for multiple comparisons to
test whether the change in similarity in each condition was different from zero. To further probe the effects of learning on semantic representations and representational change, we
performed a series of LMMs, using the _lmer_ function from the _lmerTest_ R package76 to estimate fixed and random effects. For each model, we included predictors of relatedness (related vs
unrelated), learning condition (tested vs restudied), and recall success at Day 2 (recalled vs forgotten). Additional predictors were included for some models as necessary. Subject identity
was entered as a random intercept for each model (which allows for variance in the intercept over participants), and semantic relatedness, learning condition, and position in pair (when
relevant to model) were tested as potential random slopes sequentially using likelihood ratio tests (using _varCompTest_ from the _varTestnlme_ R package75) for each model separately.
Although the potential variance in the slopes was not the primary target of these analyses, the inclusion of random slopes allowed us to better explain variance in the model overall. All
models were run with a maximum of 200,000 iterations for convergence. Once the final model was determined, significant main effects and interactions were probed using pairwise comparisons
(using the _emmeans_ R package77) with Holm-Bonferroni corrections for multiple comparisons. All models were estimated using REML, and two-tailed t-tests for fixed effects were estimated
using Kenward-Roger’s method. Effect sizes were estimated using partial eta-squared (ηp2), as measured from the _effectsize_ R package78. Unless otherwise noted, this procedure was used for
all LMMs. Tables listing all coefficients, standard errors, degrees of freedom and t-values, in addition to variance-covariance structure for each model are reported in Supplementary Tables
3–15. In addition to our preregistered analyses of representational asymmetry described above, which operate on pairwise similarity values of cues and targets before and after learning, we
also sought to analyze how each word within a given pair underwent representational change. To test this, we first computed each word’s similarity with its top 20 nearest neighbors, and thus
derived a 20-value representational vector for each word before and after learning. We used the Fisher z-transformed Pearson correlation between these vectors as a measure of change for
each individual word. In addition to the fixed effect predictors of relatedness, learning condition, and recall at Day 2, this model included a fixed effect predictor of the word’s position
in the to-be-learned pair (cue vs target). This effect of word position was also tested as a potential random effect using likelihood ratio tests, as was done in previous models. We
additionally explored changes in the semantic distance of potentially interfering lure pairs (i.e. words in our set that were semantically related to the cue words of our to-be-learned
pairs) to further explore the sculpting of semantic space due to learning. To do so, we calculated the semantic similarity (indexed by the LSA cosine similarity) for all potential 118 pair
combinations for a given cue word in our to-be-learned set of words (excluding the associated to-be-learned target word and a word’s similarity to itself). Given that these pairs were
identified post-hoc after creation of the to-be-learned pairs, there was a wide range of similarity values. We then divided these lures into four classes of lures: weak/non-lures (LSA cosine
similarity less than 0.2), moderate lures (LSA cosine similarity between 0.2 and 0.4), strong lures (LSA cosine similarity between 0.4 and 0.6) and very strong lures (LSA cosine similarity
above 0.6); Supplementary Fig. 5. For example, for the to-be-learned pair BLANKET – BED, SEMESTER would act as a weak/non-lure, TEMPERATURE would act as a moderate lure, SEAM might act as a
strong lure and PILLOW would act as a very strong lure. Pairs that were not correctly recalled at Day 2 were excluded from this analysis, as incorrect responses were often other words from
our corpus (which would be considered lures in this analysis) and this retrieval may have influenced the similarity judgements in SWAT protocol (which was performed after the final test).
Additionally, as in our other analyses, we excluded tested pairs that were incorrectly recalled at Day 1. This selection was repeated for the cues of all to-be-learned pairs separately for
each individual, resulting in a range of 1652–5900 (mean=3106, SD = 981) semantic lure pairs per participant. We used pairwise change in similarity across learning for the semantic lures as
the dependent variable for an LMM regression with fixed effects of condition of the associated to-be-learned pair (tested vs restudied), relatedness of the associated to-be-learned pair
(related vs unrelated), and strength of the lure pair (weak/non-lure, moderate lure, strong lure and very strong lure). This model used a BOBYQA optimizer to ensure model convergence. As in
our previous analyses, subject identity was included as a random effect in all models and relatedness and learning condition were independently and sequentially tested as potential random
effects, as were potential two-way and three-way interactions. Significant main effects and interactions were probed by computing the contrast of the difference of each lure class and the
non-lure pairs and comparing across learning condition (for example, the contrast [moderate lure – weak/non-lure for tested pairs] – [moderate lure – weak/non-lure for restudied pairs]) with
Holm-Bonferroni corrections for multiple comparisons. Additionally, pairwise comparisons of all lure classes across learning condition were computed with Holm-Bonferroni corrections for
multiple comparisons and are reported in the Supplementary Note 3. To supplement our analyses relating representational change and semantic structure to final recall success, we ran a
generalized LMM with a logit link function (i.e. a mixed-effects logistic regression) using the _glmer_ function from the _lme4_ package79. This model was fit using maximum likelihood
estimation and a BOBYQA optimizer with a maximum of 200,000 iterations. We included fixed effects of learning condition (tested vs restudied), relatedness (related vs unrelated), Fisher
z-transformed correlation of the cue and target across learning, difference of Fisher z-transformed correlation to normative semantic space across learning for both cues and targets, and
Fisher z-transformed asymmetry value to predict the probability of final recall success (recalled vs forgotten). As in our previous LMMs, the effect of subject identity was included as a
random effect, and random effects of relatedness and learning condition were independently tested as potential random effects using likelihood ratio tests. Significant main effects and
interactions were probed using pairwise comparisons with Holm-Bonferroni corrections for multiple comparisons. REPORTING SUMMARY Further information on research design is available in the
Nature Portfolio Reporting Summary linked to this article. DATA AVAILABILITY The raw behavioral data generated in this study have been deposited in the Open Science Framework at
https://osf.io/5q6th/ (https://doi.org/10.17605/OSF.IO/5Q6TH). LSA cosine similarity data are available at http://www.lingexp.uni-tuebingen.de/z2/LSAspaces/. Pre-trained word2vec model is
available at https://code.google.com/archive/p/word2vec/. CODE AVAILABILITY All code necessary to reproduce all analyses in this manuscript are provided at https://osf.io/5q6th/
(https://doi.org/10.17605/OSF.IO/5Q6TH). REFERENCES * Sherry, D. F. & Schacter, D. L. The Evolution of Multiple Memory Systems. _Psychol. Rev._ 94, 439–454 (1987). Google Scholar *
Squire, L. R. Memory systems of the brain: A brief history and current perspective. _Neurobiol. Learn. Mem._ 82, 171–177 (2004). PubMed Google Scholar * Renoult, L., Irish, M., Moscovitch,
M. & Rugg, M. D. From Knowing to Remembering: The Semantic–Episodic Distinction. _Trends Cogn. Sci._ 23, 1041–1057 (2019). PubMed Google Scholar * Irish, M. & Vatansever, D.
Rethinking the episodic-semantic distinction from a gradient perspective. _Curr. Opin. Behav. Sci._ 32, 43–49 (2020). Google Scholar * Burianova, H. & Grady, C. L. Common and unique
neural activations in autobiographical, episodic, and semantic retrieval. _J. Cogn. Neurosci._ 19, 1520–1534 (2007). PubMed Google Scholar * Burianova, H., McIntosh, A. R. & Grady, C.
L. A common functional brain network for autobiographical, episodic, and semantic memory retrieval. _NeuroImage_ 49, 865–874 (2010). PubMed Google Scholar * Rajah, M. N. & McIntosh, A.
R. Overlap in the functional neural systems involved in semantic and episodic memory retrieval. _J. Cogn. Neurosci._ 17, 470–482 (2005). CAS PubMed Google Scholar * Audrain, S. &
McAndrews, M. P. Schemas provide a scaffold for neocortical integration of new memories over time. _Nat. Commun._ 13, 5795 (2022). ADS CAS PubMed PubMed Central Google Scholar *
Baldassano, C., Hasson, U. & Norman, K. A. Representation of real-world event schemas during narrative perception. _J. Neurosci._ 38, 9689–9699 (2018). CAS PubMed PubMed Central
Google Scholar * Liu, Y., Dolan, R. J., Kurth-Nelson, Z. & Behrens, T. E. J. Human Replay Spontaneously Reorganizes Experience. _Cell_ 1–13 https://doi.org/10.1016/j.cell.2019.06.012
(2019). * Liu, X. L. & Ranganath, C. Resurrected memories: Sleep-dependent memory consolidation saves memories from competition induced by retrieval practice. _Psychon. Bull. Rev._
https://doi.org/10.3758/s13423-021-01953-6 (2021). * Payne, J. D. et al. Memory for semantically related and unrelated declarative information: The benefit of sleep, the cost of wake. _PLoS
ONE_ 7, 1–7 (2012). Google Scholar * Wing, E. A., Burles, F., Ryan, J. D. & Gilboa, A. The structure of prior knowledge enhances memory in experts by reducing interference. _Proc. Natl
Acad. Sci._ 119, e2204172119 (2022). CAS PubMed PubMed Central Google Scholar * Bulevich, J. B., Thomas, A. K. & Parsow, C. Filling in the gaps: using testing and restudy to promote
associative learning. _Memory_ 24, 1267–1277 (2016). PubMed Google Scholar * Antony, J. W. et al. Semantic relatedness retroactively boosts memory and promotes memory interdependence
across episodes. _eLife_ 11, 1–32 (2022). Google Scholar * van Kesteren, M. T. R., Rignanese, P., Gianferrara, P. G. & Krabbendam, L. & Meeter, M. Congruency and reactivation aid
memory integration through reinstatement of prior knowledge. _Sci. Rep._ 10, 1–13 (2020). Google Scholar * Craig, K. S., Berman, M. G., Jonides, J. & Lustig, C. Escaping the recent
past: Which stimulus dimensions influence proactive interference? _Mem. Cogn._ 41, 650–670 (2013). Google Scholar * Antony, J. W. & Bennion, K. A. Semantic associates create retroactive
interference on an independent spatial memory task. _J. Exp. Psychol. Learn. Mem. Cogn._ https://doi.org/10.1037/xlm0001216 (2022). * Yee, E. & Thompson-Schill, S. L. Putting concepts
into context. _Psychon. Bull. Rev._ 23, 1015–1027 (2016). PubMed PubMed Central Google Scholar * Solomon, S. H. & Thompson-Schill, S. L. Finding features, figuratively. _Brain Lang._
174, 61–71 (2017). PubMed PubMed Central Google Scholar * Connell, L. & Lynott, D. Principles of Representation: Why You Can’t Represent the Same Concept Twice. _Top. Cogn. Sci._ 6,
390–406 (2014). PubMed Google Scholar * Carpenter, S. K., Pashler, H. & Cepeda, N. J. Using Tests to Enhance 8th Grade Students’ Retention of U.S. History Facts. _Appl. Cogn. Psychol._
23, 760–771 (2009). Google Scholar * Delaney, P. F., Verkoeijen, P. P. J. L. & Spirgel, A. Spacing and Testing Effects: A Deeply Critical, Lengthy, and At Times Discursive Review of
the Literature. _Psychology of Learning and Motivation - Advances in Research and Theory_ 53,63–147 (2010). * Karpicke, J. D. & Roediger, H. L. The critical importance of retrieval for
learning. _Science_ 319, 966–968 (2008). ADS CAS PubMed Google Scholar * Kornell, N. & Vaughn, K. E. How Retrieval Attempts Affect Learning. 183–215
https://doi.org/10.1016/bs.plm.2016.03.003 (2016). * Nungester, R. J. & Duchastel, P. C. Testing versus review: Effects on retention. _J. Educ. Psychol._ 74, 18–22 (1982). Google Scholar
* Carpenter, S. K. & Kelly, J. W. Tests enhance retention and transfer of spatial learning. _Psychon. Bull. Rev._ 19, 443–448 (2012). PubMed Google Scholar * Rowland, C. A. The
Effect of Testing Versus Restudy on Retention: A Meta-Analytic Review of the Testing Effect. _Psychol. Bull._ https://doi.org/10.1037/a0037559 (2014). * Kornell, N., Bjork, R. A. &
Garcia, M. A. Why tests appear to prevent forgetting: A distribution-based bifurcation model. _J. Mem. Lang._ 65, 85–97 (2011). Google Scholar * Bjork, E. L. & Bjork, R. A. Making
things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. _Psychol. Real World Essays Illus. Fundam. Contrib. Soc_. 56, 55–4 (2011). * Pyc, M. A. &
Rawson, K. A. Why testing improves memory: Mediator effectiveness hypothesis. _Science_ 330, 335 (2010). ADS CAS PubMed Google Scholar * Morris, C. D., Bransford, J. D. & Franks, J.
J. Levels of processing versus transfer appropriate processing. _J. Verbal Learn. Verbal Behav._ 16, 519–533 (1977). Google Scholar * Carpenter, S. K. Cue Strength as a Moderator of the
Testing Effect: The Benefits of Elaborative Retrieval. _J. Exp. Psychol. Learn. Mem. Cogn._ 35, 1563–1569 (2009). PubMed Google Scholar * Collins, A. M. & Loftus, E. F. A Spreading
Activation Theory of Semantic Processing. _Psychol. Rev._ 82, 407–428 (1975). Google Scholar * Anderson, J. R. A Spreading Activation Theory of Memory. _J. Verbal Learn. Verbal Behav_. 22,
261–295 (1983). * Ferreira, C. S. & Wimber, M. The testing effect for visual materials depends on pre-existing knowledge. (2021). * Lifanov, J., Linde-Domingo, J. & Wimber, M.
Feature-specific reaction times reveal a semanticisation of memories over time and with repeated remembering. _Nat. Commun._ 12, 1–10 (2021). Google Scholar * Rawson, K. A. & Zamary, A.
Why is free recall practice more effective than recognition practice for enhancing memory? Evaluating the relational processing hypothesis. _J. Mem. Lang._ 105, 141–152 (2019). Google
Scholar * Popov, V., Zhang, Q., Koch, G. E., Calloway, R. C. & Coutanche, M. N. Semantic knowledge influences whether novel episodic associations are represented symmetrically or
asymmetrically. _Mem. Cogn._ 47, 1567–1581 (2019). Google Scholar * Vaughn, K. E. & Rawson, K. A. Effects of criterion level on associative memory: Evidence for associative asymmetry.
_J. Mem. Lang._ 75, 14–26 (2014). Google Scholar * Bein, O., Reggev, N. & Maril, A. Prior knowledge promotes hippocampal separation but cortical assimilation in the left inferior
frontal gyrus. _Nat. Commun._ 11, 1–13 (2020). Google Scholar * Ritvo, V. J. H., Turk-Browne, N. B. & Norman, K. A. Nonmonotonic Plasticity: How Memory Retrieval Drives Learning.
_Trends Cogn. Sci_. 1–17 https://doi.org/10.1016/j.tics.2019.06.007 (2019). * Sinclair, A. H. & Barense, M. D. Prediction Error and Memory Reactivation: How Incomplete Reminders Drive
Reconsolidation. _Trends Neurosci_. 42, 8–13 (2019). * Detre, G. J., Natarajan, A., Gershman, S. J. & Norman, K. A. Moderate levels of activation lead to forgetting in the think/no-think
paradigm. _Neuropsychologia_ 51, 2371–2388 (2013). * Hulbert, J. C. & Norman, K. A. Neural differentiation tracks improved recall of competing memories following interleaved study and
retrieval practice. _Cereb. Cortex_ 25, 3994–4008 (2015). CAS PubMed Google Scholar * Rafidi, N. S., Hulbert, J. C., Brooks, P. P. & Norman, K. A. Reductions in Retrieval Competition
Predict the Benefit of Repeated Testing. _Sci. Rep._ 8, 1–12 (2018). CAS Google Scholar * Antony, J. W., Ferreira, C. S., Norman, K. A. & Wimber, M. Retrieval as a fast route for
consolidation. _Trends Cogn. Sci._ 21, 573–576 (2017). PubMed PubMed Central Google Scholar * Ye, Z., Shi, L., Li, A., Chen, C. & Xue, G. Retrieval practice facilitates memory
updating by enhancing and differentiating medial prefrontal cortex representations. _eLife_ 9, 1–51 (2020). Google Scholar * Kriegeskorte, N. & Mur, M. Inverse MDS: Inferring
dissimilarity structure from multiple item arrangements. _Front. Psychol._ 3, 1–13 (2012). Google Scholar * Storm, B. C., Friedman, M. C., Murayama, K. & Bjork, R. A. On the transfer of
prior tests or study events to subsequent study. _J. Exp. Psychol. Learn. Mem. Cogn._ 40, 115–124 (2014). PubMed Google Scholar * Halamish, V. & Bjork, R. A. When does testing enhance
retention? A distribution-based interpretation of retrieval as a memory modifier. _J. Exp. Psychol. Learn. Mem. Cogn._ 37, 801–812 (2011). PubMed Google Scholar * Caplan, J. B., Boulton,
K. L. & Gagné, C. L. Associative asymmetry of compound words. _J. Exp. Psychol. Learn. Mem. Cogn._ 40, 1163–1171 (2014). PubMed Google Scholar * Kahana, M. J. Associative symmetry and
memory theory. _Mem. Cogn._ 30, 823–840 (2002). Google Scholar * Madan, C. R., Glaholt, M. G. & Caplan, J. B. The influence of item properties on association-memory. _J. Mem. Lang._ 63,
46–63 (2010). Google Scholar * Schapiro, A. C., Kustner, L. V. & Turk-Browne, N. B. Shaping of object representations in the human medial temporal lobe based on temporal regularities.
_Curr. Biol._ 22, 1622–1627 (2012). CAS PubMed PubMed Central Google Scholar * Rickard, T. C. & Pan, S. C. A dual memory theory of the testing effect. _Psychon. Bull. Rev._ 25,
847–869 (2018). PubMed Google Scholar * Estes, Z. & Jones, L. L. Integrative priming occurs rapidly and uncontrollably during lexical processing. _J. Exp. Psychol. Gen._ 138, 112–130
(2009). PubMed Google Scholar * Carpenter, S. K. & Delosh, E. L. Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testng
effect. _Memory_ 34, 268–276 (2006). Google Scholar * Carpenter, S. K. Semantic Information Activated During Retrieval Contributes to Later Retention: Support for the Mediator
Effectiveness Hypothesis of the Testing Effect. _J. Exp. Psychol. Learn. Mem. Cogn._ 37, 1547–1552 (2011). PubMed Google Scholar * Rawson, K. A. et al. Does Testing Impair Relational
Processing? Failed Attempts to Replicate the Negative Testing Effect. _J. Exp. Psychol. Learn. Mem. Cogn._, (2015). * Bein, O. et al. Delineating the effect of semantic congruency on
episodic memory: The role of integration and relatedness. _PLoS ONE_ 10, 1–24 (2015). Google Scholar * Kuhl, B. A., Rissman, J., Chun, M. M. & Wagner, A. D. Fidelity of neural
reactivation reveals competition between memories. _Proc. Natl Acad. Sci._ 108, 5903–5908 (2011). ADS CAS PubMed PubMed Central Google Scholar * Wimber, M., Alink, A., Charest, I.,
Kriegeskorte, N. & Anderson, M. C. Retrieval induces adaptive forgetting of competing memories via cortical pattern suppression. _Nat. Neurosci._ 18, 582–589 (2015). CAS PubMed PubMed
Central Google Scholar * Chanales, A. J. H., Tremblay-McGaw, A. G., Drascher, M. L. & Kuhl, B. A. Adaptive Repulsion of Long-Term Memory Representations Is Triggered by Event
Similarity. _Psychol. Sci._ 32, 705–720 (2021). PubMed PubMed Central Google Scholar * Drascher, M. L. & Kuhl, B. A. Long-term memory interference is resolved via repulsion and
precision along diagnostic memory dimensions. _Psychon. Bull. Rev._ 29, 1898–1912 (2022). PubMed PubMed Central Google Scholar * Nelson, D. L., Evoy, C. L. M. C. & Schreiber, T. A.
The University of South Florida free association, rhyme and word fragment norms. _Behav. Res. Methods Instrum. Comput._ 36, 402–407 (2004). PubMed Google Scholar * Günther, F., Dudschig,
C. & Kaup, B. LSAfun - An R package for computations based on Latent Semantic Analysis. _Behav. Res. Methods_ 47, 930–944 (2015). PubMed Google Scholar * Kang, S. H. K., McDermott, K.
B. & Roediger, H. L. Test format and corrective feedback modify the effect of testing on long-term retention. _Eur. J. Cogn. Psychol._ 19, 528–558 (2007). Google Scholar * Troyanskaya,
O. et al. Missing value estimation methods for DNA microarrays. _Bioinformatics_ 17, 520–525 (2001). CAS PubMed Google Scholar * Pedregosa, F. et al. Scikit-learn: Machine Learning in
Python. _J. Mach. Learn. Res._ 12, 2825–2830 (2011). MathSciNet Google Scholar * Glass, G. V., Peckham, P. D. & Sanders, J. R. Consequences of Failure to Meet Assumptions Underlying
the Fixed Effects Analyses of Variance and Covariance. _Rev. Educ. Res._ 42, 237–288 (1972). Google Scholar * Schielzeth, H. et al. Robustness of linear mixed-effects models to violations
of distributional assumptions. _Methods Ecol. Evol._ 11, 1141–1152 (2020). Google Scholar * Kassambara, Alboukadel. rstatix: Pipe-Friendly Framework for Basic Statistical Tests. (2021). *
Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. _Front. Psychol._ 4, 1–12 (2013). Google Scholar * Baey, C.
& Kuhn, E. varTestnlme: variance components testing in mixed-effect models. (2019). * Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. LMERTEST Package: Tests in Linear Mixed
Effects Models. _J. Stat. Softw_. 82, 1–26 (2017). * Lenth, R. V. emmeans: Estimated Marginal Means, aka Least-Squares Means. (2022). * Ben-Shachar, M., Lüdecke, D. & Makowski, D.
effectsize: Estimation of Effect Size Indices and Standardized Parameters. _J. Open Source Softw._ 5, 2815 (2020). ADS Google Scholar * Bates, D., Mächler, M., Bolker, B. & Walker, S.
Fitting Linear Mixed-Effects Models Using LME4. _J. Stat. Softw_. 67, 1–48 (2015). Download references ACKNOWLEDGEMENTS The authors would like to thank Craig Enders for his insight into how
to handle missing data. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-2034835 (CRW). Any opinions,
findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. AUTHOR
INFORMATION AUTHORS AND AFFILIATIONS * Department of Psychology, University of California, Los Angeles, CA, USA Catherine R. Walsh & Jesse Rissman * Department of Psychiatry &
Biobehavioral Sciences, University of California, Los Angeles, CA, USA Jesse Rissman * Brain Research Institute, University of California, Los Angeles, CA, USA Jesse Rissman * Integrative
Center for Learning and Memory, University of California, Los Angeles, CA, USA Jesse Rissman Authors * Catherine R. Walsh View author publications You can also search for this author
inPubMed Google Scholar * Jesse Rissman View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS C.R.W. and J.R designed the experiment. C.R.W.
collected and analyzed the data; J.R. provided guidance. C.R.W. wrote the initial draft of the paper; C.W. and J.R. revised and edited the paper. J.R. acquired funding. CORRESPONDING AUTHOR
Correspondence to Catherine R. Walsh. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Communications_ thanks
Christopher Baldassano, Ariana Giuliano and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available. ADDITIONAL INFORMATION
PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION
PEER REVIEW FILE REPORTING SUMMARY RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons
licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a
credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT
THIS ARTICLE CITE THIS ARTICLE Walsh, C.R., Rissman, J. Behavioral representational similarity analysis reveals how episodic learning is influenced by and reshapes semantic memory. _Nat
Commun_ 14, 7548 (2023). https://doi.org/10.1038/s41467-023-42770-w Download citation * Received: 12 January 2023 * Accepted: 20 October 2023 * Published: 20 November 2023 * DOI:
https://doi.org/10.1038/s41467-023-42770-w SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not
currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative