Benchmarking seeding strategies for spreading processes in social networks: an interplay between influencers, topologies and sizes

Play all audios:

ABSTRACT The explosion of network science has permitted an understanding of how the structure of social networks affects the dynamics of social contagion. In community-based interventions

with spill-over effects, identifying influential spreaders may be harnessed to increase the spreading efficiency of social contagion, in terms of time needed to spread all the largest

connected component of the network. Several strategies have been proved to be efficient using only data and simulation-based models in specific network topologies without a consensus of an

overall result. Hence, the purpose of this paper is to benchmark the spreading efficiency of seeding strategies related to network structural properties and sizes. We simulate spreading

processes on empirical and simulated social networks within a wide range of densities, clustering coefficients, and sizes. We also propose three new decentralized seeding strategies that are

structurally different from well-known strategies: community hubs, ambassadors, and random hubs. We observe that the efficiency ranking of strategies varies with the network structure. In

general, for sparse networks with community structure, decentralized influencers are suitable for increasing the spreading efficiency. By contrast, when the networks are denser, centralized

influencers outperform. These results provide a framework for selecting efficient strategies according to different contexts in which social networks emerge. SIMILAR CONTENT BEING VIEWED BY

OTHERS DETAILED-LEVEL MODELLING OF INFLUENCE SPREADING ON COMPLEX NETWORKS Article Open access 14 November 2024 HABITUATION EFFECT IN SOCIAL NETWORKS AS A POTENTIAL FACTOR SILENTLY CRUSHING

INFLUENCE MAXIMISATION EFFORTS Article Open access 24 September 2021 THE FITNESS-CORRECTED BLOCK MODEL, OR HOW TO CREATE MAXIMUM-ENTROPY DATA-DRIVEN SPATIAL SOCIAL NETWORKS Article Open

access 28 October 2022 INTRODUCTION Information, behaviors, diseases, emotions, and even the adoption of technological innovations spread through social networks1,2,3,4,5. Recently, the

explosion of network science across disciplines has produced many important advances in understanding how the structure of social networks affects the dynamics of social contagion.

Specifically, the study of social networks has provided an opportunity to potentiate interventions with spill-over effects aimed to increase population well-being. For example, several

studies have examined the spreading processes efficiency related to the topological properties of networks4,6,7,8. Other studies have analyzed the role of homophily in spreading

processes9,10,11, while others have focused on identifying influential spreaders in networks and how they may be harnessed to increase the efficiency of public health and poverty reduction

interventions12,13,14,15. A key point for designing interventions with spill-over effects is to allocate resources for the intervention targeting in a wisely way. Thus, it is crucial to have

an appropriate methodological framework for selecting seednodes with the best spreading ability. Several complex networks studies have proposed selecting seednodes by ranking network nodes

based on centrality measures12,15,16,17,18,19,20,21,22,23,24,25,26,27,28. Particularly, nodes with high degree, closeness, and betweenness coefficients have been identified as influential or

high-risk individuals during a spreading process16,23,29. Furthermore, there are random-walk based seeding strategies, such as Page-Rank, that have been shown more efficient than

centrality-based strategies for infecting some networks but less efficient in other ones19,24,25. Also, Kitsak _et al_. have proposed that targeting the core of the network by using a

K-shell decomposition method is more efficient than targeting central nodes26. This approach was later improved by the proposed True core and K-truss decomposition methods27,28. Recently,

Zhang _et al_. proposed the Vote-Rank decentralized strategy, which seems to experimentally outperform centrality and K-shell methods on both spreading rate and computational efficiency30.

Centralized and decentralized seeding strategies have been proved to be efficient using solely data and simulation-based models in specific network topologies without a consensus of an

overall result. There is limited evidence on which network structural properties are related to the performance of each seeding strategy. Few studies show that centralized and K-shell based

strategies are not efficient in networks with a community structure because chosen spreaders may cluster in the same community or their neighborhoods overlap18,30. We address the gap

mentioned above by benchmarking the spreading efficiency of seeding strategies for networks with different structural properties. We simulate spreading processes on a wide range of complex

networks, using empirical social networks data, and simulated networks within a range of densities, clustering coefficients, and sizes. We also propose community hubs, ambassadors, and

simulated hubs as three new decentralized seeding strategies that are structurally different from those reported by the literature. Our main findings are that the efficiency ranking of the

strategies and the degeneracy among strategies differs according to the network structural properties, especially characterized by their density, clustering and size. These results provide a

framework for selecting efficient strategies according to different contexts in which social networks emerge. RESULTS We ranked 10 different seeding strategies according to their spreading

efficiency. For simplicity, we implemented a susceptible-infected (SI) spreading process31 in the largest connected component LCC of five empirical networks and 540 simulated undirected

networks with different topologies, seednodes, and sizes. For each scenario, we varied the probability of contagion and the number of seednodes. For ranking the strategies, we calculated the

spreading efficiency as the time necessary to infect all nodes of the LCC when starting each contagion from the seednodes. For each network, we initialized the spreading process from 10

different sets of seednodes selected using centralized and decentralized strategies (Fig. 1). Both centralized and decentralized strategies are based on global structural measures and

require having data of the full network. Centralized strategies consisted of selecting seednodes with the (a) highest degree centrality: Hubs3, (b) highest betweenness centrality3, (c)

highest closeness centrality3, (d) highest Page-Rank32; and (e) nodes in the k-core26,33,34. Decentralized strategies consisted of selecting (f) nodes with the highest Vote-Rank calculated

as the voting score resulting from the sum of the voting ability of the neighbors of each30, (g) nodes of detected communities with the highest external degree: Ambassadors, (h) nodes of

detected communities with the highest internal degree: Community Hubs, and (i) the most connected neighbor of randomly chosen nodes: Random Hubs. Finally, we measured the spreading

efficiency of each strategy for each topology, and we evaluated the degeneracy among strategies (See methods). For analyzing the results, we categorized each of the empirical and simulated

networks according to their topology within three different ranges of density and clustering coefficient. For both measures, our selected ranges were: _Low_ from 0 to 0.1, _Medium_ from 0.1

to 0.2, and _High_ from 0.2 to 1. We categorized the networks within six types (Table 1). Also, we categorized networks according their size as Small with 200 nodes, Medium with 1000 nodes,

and Largewith 2000 nodes. We did not simulate larger networks because our focus is to recreate contexts where community-based interventions can be implemented. SPREADING EFFICIENCY FOR

SEEDING STRATEGIES IN EMPIRICAL NETWORKS For measuring the spreading efficiency of the seeding strategies on empirical networks, we ran multiple spreading processes on the largest connected

component LCC of five networks representing social systems from different contexts. Ordering from lowest to highest density the networks are: (1) Spanish physicists co-authorships network35.

(2) Karnataka network: a social network of a rural village in the south of India12. (3) Global supply chain project network: an e-mail network between project team roles of a global supply

chain project36. (4) Recreovia Facebook friendship network: an online friendship network of stakeholders in a physical activity program in Colombia37. And (5) School children friendship

network: a friendship network of a primary school in Colombia38. Networks displayed different topological features, where in their LCC the sizes varied from 25 to 1118 nodes and 87 to 5185

edges. The first two networks were considered of Medium size, and the other three were considered of Small size. The mean degree varied from 3.48 to 22, the densities ranged from 0.004 to

0.15 (the first three networks were in the Low range and the other two were in the Medium range), the clustering coefficient ranged from 0.47 to 0.69 (all of them were in the High range),

the average shortest path length ranged from 1.93 to 8.57, and the diameter ranged from 4 to 22 Table 2. The simulation results show that, usually, using a seeding strategy is more efficient

for initializing a spreading process than randomly selecting the seednodes. However, the efficiency of the strategy depends mainly on the density, clustering coefficient, and size of the

network. For Medium networks in the LD-HC category (_Spanish physicists co-authorships network_ and _Karnataka network_), the decentralized seeding strategies, Ambassadors and Community

Hubs, were the most efficient independently of the probability of contagion \(g\) and the number \(s\) of seednodes. In terms of spreading efficiency, these strategies were followed in the

ranking by the centralized strategies Page-Rank, Betweenness, and the decentralized strategy Vote-Rank. Furthermore, in these networks K-core was the less efficient set of seednodes, even

less efficient than choosing seednodes at random. The ranking obtained for these networks is consistent for the different probabilities of contagion and the percentage of seednodes selected

(Fig. 2a,b). Second, we studied our empirical _Small networks_: _Global supply chain project network_, being in the _LD-HC_ category, and _Recreovía facebook friendship network, and School

children friendship network_, being in the _MD-HC_ category. In these networks, the spreading efficiency varied significantly through the different probabilities of contagion and the initial

percentage of seednodes. However, we found that the centralized strategy Page-Rank was the most efficient, being in the top two of the ranking for the three networks. Also, contrary to the

_Medium networks_ of the same structure, the _LD-HC_ category, for _Small networks_, Ambassadors and Community Hubs strategies were the least efficient independently of the probability of

contagion. Although, these two strategies remained better than randomly selecting the seednodes. SPREADING EFFICIENCY FOR SEEDING STRATEGIES IN SIMULATED NETWORKS WITH DIFFERENT STRUCTURES

For assessing differences in the spreading efficiency of each seeding strategy according to the network structure, we initialized spreading processes using the 10 seeding strategies in 540

random networks that were distributed in six categories (30 networks per category), and three sizes as explained in Table 1. For preserving skewed degree distributions and small-world

properties that were found in the empirical networks, we used an algorithm for growing scale-free networks with tunable clustering39,41. We measured structural properties of each type of

network (Table 3). We observe that the modularity coefficient does not present a significant variability across realizations within the different network types. Moreover, we observe that on

average the modularity coefficient increases when clustering coefficient increases, especially for networks with low density as expected in sparse networks with community structures. We

evaluated the seeding strategies in 30 generated networks for each size and type of combinations of density and clustering. For each network we conducted 30 simulations of a particular

seeding strategy. The simulation results suggest that the efficiency of each strategy varied depending on the density, clustering coefficient, and size of the networks. Also, we observe that

the ranking of strategies changed for each network structure and size, where some results remain consistent depending on the type of seeding strategies, namely, decentralized or

centralized. In the case of decentralized seeding strategies, the spreading efficiency was higher when networks were in the _LD-LC_ category, independently of the network size. For this

particular network structure, within the decentralized strategies, the ranking varied according to network size: (a) _Small networks_: Vote-Rank, Community-Hubs, and Ambassadors (Fig. 3

panel (a); (b)) _Medium networks_: Random-Hubs, Vote-Rank, and Ambassadors (Fig. 3 panel (b); (c))_Large networks_: Vote-Rank (Fig. 3 panel (c)). Nevertheless, independently of the size in

_LD_ networks, as clustering coefficient increases to 0.2 (_MC_), the only decentralized strategy that remains efficient is Community-Hubs. For _MD-HC_ in _Small_ and _Large networks_,

Ambassadors remains efficient, while Community-Hubs is the most efficient strategy in _Medium networks_. In addition, for _HD-HC_, Ambassadors strategy is efficient in _Small networks_,

Random-Hubs in _Medium networks_, and Community-Hubs in _Large networks_. In the case of centralized seeding strategies, the spreading efficiency was higher when networks had medium or high

density and clustering coefficient (0.1–1]. In those cases, independently on network size, K-Core was consistently efficient among other centralized strategies in _MD-MC_ and _MD-HC_

networks. Furthermore, Page Rank strategy was efficient for _Small networks_ in _MD-MC_ and _LD-HC_. In the case of _Large networks_, Page Rank was efficient in _LD-HC_ and _HD-HC_

categories. In addition, Closeness strategy was consistently efficient for _Small_ and _Large networks_ in _HD-HC_. In general, the performance of decentralized vs. centralized strategies,

as groups of strategies, does not depend on network size. Moreover, we observe that three particular strategies are consistently in the top three most efficient regardless of the network

size: (1) The decentralized strategies Vote-Rank and Community-Hubs are top ranked for networks with low density and low or medium clustering (_LD-LC_ and _LD-MC_), respectively, and (2) the

centralized strategy K-Core is top ranked for networks with medium density and medium or high clustering (_MD-MC_ and _MD-HD_) (Table 4). Besides, we found that for networks with extreme

connectivity or extremely segregated clusters (_HD-HC_ and _LD-HC_ networks, respectively) rankings are not consistent for different sizes. Nevertheless, when analyzing more in-depth the

efficiency of each particular strategy, we observe that the ranking varies according to network size. We calculated the standard deviation for the density and clustering coefficient for the

30 realizations of every network type and size (Table 3). We observe that the four types of networks (_LD-LC_, _LD-MC_, _MD-MC_, and _MD-HC_) that have more consistent results in the ranking

are those with the lower clustering coefficient variability. We also observe that the standard deviation of density is always lower than \(1.56\times 1{0}^{-4}\), so we discarded that the

variation in density causes differences in the ranking. However, we observe that the results were not consistent for the different sizes of the two types of networks that exhibit the highest

variability in clustering coefficient and density (_LD-HC_ and _HD-HC_, respectively). Our hypothesis is that decentralized strategies could be efficient for _LD-HC_ networks due to their

community structure explained by a high value of modularity, and that centralized strategies were suitable for the _HD-HC_ networks due to their high connectivity. Nevertheless, those

hypotheses were rejected for these types of networks with extreme values of density and clustering in their structures. Our previous results of the most efficient strategies for each network

type remain consistent when considering the modularity and number of communities as metrics for determining the community structure of the network types. The decentralized strategies,

Vote-Rank and Community-hubs are efficient regardless of the network size for _LD-LC_ and _LD-MC_ types, which have higher modularity values and number of communities. Also, the centralized

K-core strategy is in the top three regardless of size for networks with lower modularity values and fewer communities, such as the _MD-MC_ and _MD-HD_ networks. Also, we did not find

consistent results for different sizes of networks with extreme values of modularity and number of communities: (1) _LD-HC_ networks have the highest modularity, and (2) _HD-HC_ networks

have one of the lowest number of communities and modularity values. We could hypothesize that in _LD-HC_ networks, the decentralized strategies are not consistently efficient as the small

number of edges between different communities could be encapsulating the spreading processes inside the seednodes’ communities avoiding an inter-community spreading. On the other hand, in

the _HD-HC_ networks, the centralized strategies are not consistently efficient due to the high connectivity of the network that could lead to a low differentiation among seednodes sets. As

in empirical networks, in most of the topologies and sizes of simulated networks, using a strategy for selecting seednodes was more efficient than choosing the seednodes at random. However,

in _Small networks_ when the clustering coefficient was high (0.2–1] choosing the seednodes at random remained efficient (Fig. 3 panel (a), third sub-panel). DEGENERACY AMONG SEEDNODES The

same node may belong to different sets of seednodes. Thus, to better understand the results observed in the spreading efficiency rankings, we evaluated the degeneracy among each pair of

seeding strategies. We define the degeneracy coefficient of two sets of seednodes (not to confound with k-degeneracy used in graph theory) as the fraction of nodes that belong to both sets.

Let A and B two sets, \(Degeneracy\,(A,B)=| A\cap B| \)/\(| A\cup B| \). For each network size and topology, we calculated the average degeneracy coefficient among each pair of seeding

strategies over the 30 simulated networks. We observe that the degeneracy coefficient shows a pattern that remains similar for the different networks and topologies (Fig. 4). We observe that

all centralized and Vote-Rank strategies shared, on average, more than 50% of nodes independently of the network size. In the case of the decentralized strategies, the proportion of common

nodes with other strategies ranged from 20% to 40% for different network sizes, showing a higher diversification of the seednode selection compared to the centralized strategies.

Furthermore, independently of density, clustering coefficient, and network size; the degeneracy among centralized and decentralized strategies was low. The result for the _LD-LC_ category in

the three network sizes is shown in Fig. 4 as an example of the general pattern observed in the different network sizes and topologies. DISCUSSION This study provides a benchmark for

selecting efficient strategies for initializing interventions with spill-over effects in social networks with different structures. Our main finding is that the efficiency of each seeding

strategy depends on the network structure, particularly on the density and clustering coefficient. In general, for sparse networks with community structure, Community-hubs, which are

decentralized influencers, are suitable for increasing the spreading efficiency. By contrast, when the networks are denser, nodes in the K-core, which are centralized influencers,

outperform. We observe also that, usually, independently on the network structure, having a strategy for selecting seednodes for a spreading process is better than using random sampling.

This result is critical for providing evidence to improve commonly used random sampling methods for delivering interventions. Also, our results are coherent with studies that have shown the

importance of homophily and community structure of networks for understanding the spread and adoption of behaviors9,42. As a first result for both empirical and simulated networks, we found

that the decentralized strategy Community Hubs remained efficient for _Medium networks_ in the _LD-HC_ category. _LD-HC_ networks are likely to have a community structure, therefore

decentralized strategies allow to identify seednodes in the different communities and avoiding a potential overlap among the seednodes dyads. This leads to an increase in the coverage range

of the spreading process by taking advantage of the weak ties as spreading channels between communities43,44. The importance of avoiding overlap in networks with community structure while

selecting a seeding strategy might explain that, for the _LD-HC_ category, in _Medium_ and _Large_ networks sizes, and in both empirical and simulated networks, K-core is not an efficient

strategy. The reason is that K-core seednodes are likely to have a high number of overlapping neighbors causing a reduced coverage of susceptible nodes, at least at the initial steps of the

spreading process. Similar reasons might be suggesting that central seeding strategies, such as Closeness strategy, does not perform as well as decentralized strategies when density is low.

Central nodes have been also shown to be connected by strong ties to other network actors, increasing their overlapping relationships44. Employing decentralized strategies might be desirable

in real contexts with sparse or segregated populations. In those settings, conducting searches for identifying local leaders, Community Hubs, might be more convenient for performing direct

and indirect influence than conducting searches for identifying influential individuals at the population level45. Hence, using the Community Hubs strategy can potentiate the effect of

community-based interventions, by reinforcing individual perceptions and behavioral changes, as Community-Hubs strategy facilitates to conduct customized processes within each community

detected46. Also, Community Hubs could be used as an alternative to the recently proposed Vote-rank strategy, especially when access to the entire network data is limited or unavailable,

Vote-Rank cannot be calculated. As a second general result for both empirical and simulated networks, we found that the centralized seeding strategy Page Rank remained efficient for _Small

networks_ in the _MD-MC_ and _LD-HC_ categories. For different network sizes, K-Core seeding strategy performs efficiently when density is in the medium range, and clustering coefficient

increases, i.e. the _MD-MC_ and _MD-HC_ categories. In fact, due to the network medium density, nodes with high Page-Rank and nodes in the K-Core are likely to be directly connected to nodes

in different areas of the network. Denser networks are not likely to display community structures, and having a medium or high clustering coefficient implies that overlap among nodes is

high. Thus, decentralized strategies are not likely to add more coverage than centralized strategies. This can be evidenced by the low spreading efficiency obtained by Ambassadors and

Community hubs in the categories _MD-MC_ and _MD-HC_ for _Small_ and _Medium_ network sizes. Employing centralized strategies might be desirable in contexts with dense and cohesive

populations. In those settings, identifying global leaders for delivering interventions might be more efficient than conducting local searches in communities that are not well defined. For

simulated networks, we found that Vote-Rank seeding strategy remained efficient for the _LD-LC_ category of networks in the three network sizes. _LD-LC_ networks are likely to have a larger

shortest path length than the other topologies studied due to their low connectivity. Thus, this result is coherent with previous studies where the Vote-Rank strategy was more efficient when

the shortest path length among seednodes was larger30. Of course, this study has some limitations. First, we used a simulation-based approach to calculate the spreading efficiency of

different strategies. The above might bias the results to specific network topologies and spreading conditions. However, we aimed to build different scenarios by considering a wide range of

probabilities of contagion, number of seednodes, and networks with different topologies and sizes. Second, for simplicity, we used the susceptible-infected model for the simulations assuming

a cascade process for the contagion and different results may emerge using other spreading processes. Nevertheless, for this work it was important to compare the different strategies with

the same and most straightforward model to avoid confounding on the efficiency between the spreading process dynamics and the seednodes selection. Although, we consider that future work

should explore different spreading methods. Third, we generated networks to simulate social networks with skewed degree distributions and small-world properties41; hence, our results might

not apply to other situations where networks have other degree distributions. Identifying influential individuals for the design of interventions has been of interest to practitioners and

researchers due to its effect on delivering successful and cost-efficient interventions at the community level. Our results provide a first outlook to selecting efficient strategies for

allocating resources during behavioral interventions with spill-over effects in different contexts, and in terms of centralized and decentralized strategies. Future work should address more

detailed explanations on common features and possible causes of the different rankings at the seednodes sets level within and between centralized and decentralized strategies. METHODS We

propose a simulation-based approach for ranking ten centralized and decentralized seeding strategies for initializing a spreading process according to their spreading efficiency. First, we

conduct the ranking for five empirical networks with different topologies and sizes. Then, we simulated specific network structures to observe possible associations among structural

properties and the seednodes spreading efficiency. We categorized each one of the empirical and simulated networks within three different ranges of both density and clustering coefficient as

explained in Table 2. NETWORK CHARACTERISTICS We gather data from five empirical networks to evaluate the spreading efficiency of the seeding strategies. We calculated structural measures

of the largest connected of these empirical networks, and we listed the information regarding those measures in Table 1). We consider social networks of different contexts. (1) Spanish

physicists co-authorships network: a collaboration network built from the American Physical Society, which covers scientific collaborations between Spanish physicists between 2010 and

201235. In this network, nodes represent researchers and edges represent co-authorship. We categorized it as _LD-HC Medium network_. (2) Karnataka network: a social network built from

village 19 in Karnataka, India for the diffusion of a microfinance program conducted by the Abdul Latif Jameel Poverty Action Lab in 200612. In this network, nodes represent individuals, and

an undirected tie connects two nodes if one of the individuals reported at least one of 12 types of relationships related to trust. We categorized it as _LD-HC Medium network_. (3) Global

supply chain project network: an email network between project team roles of a global supply chain project36. This network is an approach to project management where team members belong to

different organizations of the supply chain, located in more than one geographic location and time zone, and contribute to different phases of a project. In this network, the nodes represent

team members, and directed edges represent the different emails sent and received by the project team members to coordinate and implement the different activities. We categorized it as

_LD-HC Small network_. (4) Recreovía facebook friendship network: an online friendship network of stakeholders in a physical activity program in Colombia. This program aims to promote

physical activity, health habits, and social equity through musicalized and directed group classes in Bogota, Colombia37. In this network, nodes represent Facebook friends of the Recreovia

account, and edges represent a mutual friendship between the nodes. Our research group built the Recreovia friendship network in 2016 for analyzing social cohesion emerging from the program.

We categorized it as _MD-MC Small network_. (5) School children friendship network: a friendship network of one school classroom where nodes represent children, and directed edges represent

friendship nominations38. We collected data from the Colombian site of the International Study of Childhood Obesity, Lifestyle, and Environment (ISCOLE); a collaborative study conducted in

schools of 12 countries.47 We categorized the network as _MD-MC Small network_. SPREADING EFFICIENCY FOR SEEDING STRATEGIES IN EMPIRICAL NETWORKS THE SUSCEPTIBLE-INFECTED SPREADING MODEL For

each network, we simulate the spreading process using the cascade susceptible-infected: SI model, where the spreading driver is interaction3. In this model, each susceptible node may become

infected depending on their infected neighbors31,48 and infected nodes cannot recover. At the time \(t=0\), all network nodes are susceptible except for a set of seednodes that are

infected. We consider the probability of infection \(g\) constant and equal for every infected node. At every time step, for each susceptible node, we randomly choose one of its neighbors

for interacting. If the selected neighbor is infected, then the susceptible node will become infected with a probability \(g\) and will remain susceptible with a probability \(1-g\). If the

neighbor is susceptible, nothing happens. We set the number of seednodes fixed for four proportion values: 0.01, 0.04, 0.07, and 0.10. The process is repeated for each time step until all

the network LCC is infected. We determined the spreading efficiency of each seeding strategy as the time needed to infect all the LCC of the network, starting the spreading from those

seednodes. SEEDING STRATEGIES We compared ten seeding strategies: five centralized, four decentralized, and one random for identifying seednodes based on structural properties of each

network (Fig. 2). Centralized strategies consist of selecting nodes with (a) Highest degree centrality defined as the highest number of edges adjacent to a node3. (b) Highest Betweenness

centrality defined as the highest frequency of appearance of a node in the shortest paths between all the pairs of nodes of the network3. (c) Highest Closeness centrality defined as the

lowest average shortest path length from a node to all the other nodes of the network3. (d) Highest Page-Rank defined as the highest probability that a random walker visits the node32. And

(e) nodes selected from the k-core of the network using a k-shell decomposition algorithm26,33,34. For decentralized strategies, first, we applied the Louvain algorithm to detect communities

maximizing modularity40,49. Then, we selected: (f) Nodes of detected communities with the highest external degree: Ambassador. (g) Nodes of detected communities with the highest internal

degree: Community Hub. (h) Nodes with the highest voting score calculated as the sum of the voting ability of its neighbors: Vote-Rank. The voting ability for each node in the network

represents the number of votes that the node can provide to its neighbors30. (i) the neighbor with the highest degree of randomly chosen nodes (Random Hubs). Finally, we also selected random

seednodes (Random). To build seednodes sets with equal size, for each centralized and decentralized strategy, we assigned a set of a fixed number \(s\) of seednodes equal to the number of

communities detected in each network. For each of the centralized-based seednodes, we selected the \(s\) nodes with the highest respective centrality measure. In case that several nodes had

the same centrality measure, we randomly selected the necessary number of \(s\) seednodes. For the k-core seednodes, we randomly selected \(s\) nodes in the k-core of the network. If \(s\)

was higher than the k-core size, we randomly selected the remaining nodes in the (k-1)-core. For the decentralized strategies Ambassadors and Community Hubs, we sort in descending order the

communities according to their size. Then, we selected one Community Hub or Ambassador per community. We repeated the process until \(s\) nodes were selected. For random seednodes, we chose

\(s\) nodes at random. SPREADING EFFICIENCY FOR SIMULATED NETWORKS For analyzing the relationship among the strategies of seednodes and the structure of the network, we generated 30

simulated networks for the six topologies and the three different sizes (Table 1). We used an algorithm of growing scale-free networks with tunable clustering39, so that it preserves skewed

degree distributions and small-world properties of social networks used in this manuscript41. The algorithm builds networks of a fixed number of nodes, and connects them following a

preferential attachment behavior until a desirable density is reached, as in the traditional Barabasi-Albert model50. Then, it incorporates triad formation among one of the connected nodes

of every new edge until achieving a desirable clustering coefficient. We show structural properties of each type of network in 3, where each value is the average of that measure in the 30

generated networks. After generating each network, we ran 30 times the SI spreading process, initializing from each strategy, and infecting all the network nodes. We ranked the seeding

strategies by taking into account the spreading efficiency, i.e. the time needed to infect the entire LCC of the network, obtained while infecting 30 networks, with 30 runs for each network.

For each run, we calculated the number of seeding strategies that each strategy outperformed, in terms of spreading efficiency. Then for each combination of clustering coefficient and

density, we summed the efficiency score for each strategy over the 30 runs and the 30 networks. Finally, we ranked the strategies based on the total scores obtained. Strategies in the top of

the ranking have a value of 9, meaning that they outperform the other nine strategies over the 900 instances. By contrast, the strategy at the bottom of the ranking has a value of 0,

meaning that it does not outperform any other seeding strategy. DEGENERACY COEFFICIENT AMONG SEEDNODES In order to better understand the results observed in the spreading efficiency

rankings, we evaluated the degeneracy among each pair of sets of seednodes. We define the degeneracy coefficient of two sets of seednodes (not to confound with k-degeneracy used in graph

theory) as the proportion of seednodes shared by both strategies over the total number of nodes of both strategies. Let A and B two sets of seednodes, \(Degeneracy(A,B)=| A\cap B| \)/\(|

A\cup B| \). When degeneracy coefficient equals 1 between a pair of sets of seednodes, it means that both sets contain the same nodes, while degeneracy coefficient equals 0, it means that

that both sets of seednodes are entirely composed by different nodes. REFERENCES * Valente, T. W. _Social Networks and Health: Models, Methods, and Applications_ (Oxford University Press,

2010). * Valente, T. W. Social network thresholds in the diffusion of innovations. _Soc. Networks_ 18, 69–89 (1996). Article Google Scholar * Barrat, A., Barthélemy, M. & Vespignani,

A. _Dynamical Processes on Complex Networks_ (Cambridge University Press, 2008). * Centola, D. The spread of behavior in an online social network experiment. _Science_ 329, 1194–1197 (2010).

Article ADS CAS Google Scholar * Christakis, N. A. & Fowler, J. H. Social contagion theory: Examining dynamic social networks and humanbehavior. _Stat. Medicine_ 32, 556–577 (2013).

Article MathSciNet Google Scholar * Chen, D. B., Xiao, R. & Zeng, A. Predicting the evolution of spreading on complex networks. _Scientific Reports_ 4, 6108 (2014). Article CAS

Google Scholar * Cimini, G. _et al_. Enhancing topology adaptation in information-sharing social networks. _Physical Review E - Statistical, Nonlinear, and Soft Matter Physics_ 85 (2012). *

Guille, A., Hacid, H., Favre, C. & Zighed, D. A. Information diffusion in online social networks: A survey. _SIGMOD Record_ 42, 17–28 (2013). Article Google Scholar * Centola, D. An

experimental study of homophily in the adoption of health behavior. _Science_ 334, 1269–1272 (2011). Article ADS CAS Google Scholar * Aral, S., Muchnik, L. & Sundararajan, A.

Engineering social contagions: Optimal network seeding in the presence of homophily. _Network Science_ 1, 125–153 (2013). Article Google Scholar * McPherson, M., Smith-Lovin, L. &

Cook, J. M. Birds of a Feather: Homophily in Social Networks. _Annual Review of Sociology_ 27, 415–444 (2001). Article Google Scholar * Banerjee, A., Chandrasekhar, A. G., Duflo, E. &

Jackson, M. O. The diffusion of microfinance. _Science_ 341, 1236498 (2013). Article Google Scholar * Christakis, N. A. & Fowler, J. H. Social network sensors for early detection of

contagious outbreaks. _PLoS ONE_ 5, 1–8 (2010). Article Google Scholar * Hunter, R. F. _et al_. “Hidden” Social Networks in Behavior Change Interventions. _American Journal of Public

Health_ 105, 513–516 (2015). Article Google Scholar * Kim, D. A. _et al_. Social network targeting to maximise population behaviour change: A cluster randomised controlled trial. _The

Lancet_ 386, 145–153 (2015). Article Google Scholar * Christley, R. M. _et al_. Infection in social networks: Using network analysis to identify high-risk individuals. _American Journal of

Epidemiology_ 162, 1024–1031 (2005). Article CAS Google Scholar * He, J. L., Fu, Y. & Chen, D.B. A Novel Top-k Strategy for Influence Maximization in Complex Networks with Community

Structure. _PLoS ONE_ 10 (2015). * Zhang, X., Zhu, J., Wang, Q. & Zhao, H. Identifying influential nodes in complex networks with community structure. _Knowledge-Based Systems_ 42, 74–84

(2013). Article CAS Google Scholar * Chen, D. B., Gao, H., Lü, L. & Zhou, T. Identifying influential nodes in large-scale directed networks: The role of clustering. _PLoS ONE_ 8

(2013). * Gao, C., Lan, X., Zhang, X. & Deng, Y. A Bio-Inspired Methodology of Identifying Influential Nodes in Complex Networks. _PLoS ONE_ 8 (2013). * Madotto, A. & Liu, J.

Super-Spreader Identification Using Meta-Centrality. _Scientific Reports_ 6 (2016). * de Arruda, G. F. _et al_. Role of centrality for the identification of influential spreaders in complex

networks. _Phys. Rev. E_ 90, 032812 (2014). Article ADS Google Scholar * Comin, C. H. & Da Fontoura Costa, L. Identifying the starting point of a spreading process in complex

networks. _Physical Review E - Statistical, Nonlinear, and Soft Matter Physics_ 84 (2011). * Miller, J. C. & Hyman, J. M. Effective vaccination strategies for realistic social networks.

_Physica A: Statistical Mechanics and its Applications_ 386, 780–785 (2007). Article ADS Google Scholar * Nowzari, C., Preciado, V. M. & Pappas, G. J. Analysis and control of

epidemics: A survey of spreading processes on complex networks. _IEEE Control Systems Magazine_ 36, 26–46 (2016). MathSciNet Google Scholar * Kitsak, M. _et al_. Identification of

influential spreaders in complex networks. _Nature Physics_ 6, 888–893 (2010). Article ADS CAS Google Scholar * Liu, Y., Tang, M., Zhou, T. & Younghae, D. Core-like groups result in

invalidation of identifying super-spreader by k-shell decomposition. _Scientific Reports_ 5, 9602 (2015). Article ADS CAS Google Scholar * Malliaros, F. D., Rossi, M. E. G. &

Vazirgiannis, M. Locating influential nodes in complex networks. _Scientific Reports_ 6, 19307 (2016). Article ADS CAS Google Scholar * Erkol, Ş. Castellano, C. & Radicchi, F.

Systematic comparison between methods for the detection of influential spreaders in complex networks. _Scientific Reports_ 9, 15095 (2019). * Zhang, J. X., Chen, D. B., Dong, Q. & Zhao,

Z. D. Identifying a set of influential spreaders in complex networks. _Scientific Reports_ 6, 27823 (2016). Article ADS CAS Google Scholar * Anderson, R., Anderson, B. & May, R.

_Infectious Diseases of Humans: Dynamics and Control_. Dynamics and Control (OUP Oxford, 1992). * Page, L. & Brin, S. The anatomy of a large-scale hypertextual Web search engine.

_Computer Networks_ 30, 107–117 (1998). Google Scholar * Seidman, S. B. Network structure and minimum degree. _Social Networks_ 5, 269–287 (1983). Article MathSciNet Google Scholar *

Carmi, S., Havlin, S., Kirkpatrick, S., Shavitt, Y. & Shir, E. A model of Internet topology using k-shell decomposition. _Proceedings of the National Academy of Sciences of the United

States of America_ 104, 11150–11154 (2007). Article ADS CAS Google Scholar * FajardoFontiveros, O., QuinquillaCapdevila, A. & Diaz-Guilera, A. Física y redes complejas. _Revista

Espanola de Física_ 32 (2018). * Meisel, C. _Collaborative Relationships in Supply Chain Management: A Case of Project Management Social Network Analysis_. Ph.D. thesis, ontanuniversitaet

Leoben, Leoben, Austria (2016). * Rios, A., Paez, D., Pinzón, E., Fermino, R. & Sarmiento, O. Logic model of the Recreovía: a community program to promote physical activity in Bogota.

_Revista Brasileira de Atividade Física & Saúde_ 22, 206–2011 (2017). Article Google Scholar * Gutiérrez-Martínez, L. _et al_. Effects of a strategy for the promotion of physical

activity in students from Bogotá. _Revista de Saude Publica_ 52 (2018). Article Google Scholar * Holme, P. & Kim, B. J. Growing scale-free networks with tunable clustering. _Physical

Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics_ 65 (2002). * Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of

communities in large networks. _Journal of Statistical Mechanics: Theory and Experiment_ 2008, P10008, https://doi.org/10.1088/1742-5468/2008/10/p10008 (2008). Article MATH Google Scholar

* Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. _Nature_ 393, 440–442, https://doi.org/10.1038/30918 (1998). Article ADS CAS PubMed MATH Google

Scholar * Borge-Holthoefer, J., Baños, R. A., González-Bailón, S. & Moreno, Y. Cascading behaviour in complex socio-technical networks. _Journal of Complex Networks_ 1, 3–24 (2013).

Article Google Scholar * Granovetter, M. S. The strength of weak ties. _American Journal of Sociology_ 78, 1360–1380 (1973). Article Google Scholar * Onnela, J. P. _et al_. Analysis of a

large-scale weighted network of one-to-one human communication. _New Journal of Physics_ 9, 179 (2007). Article ADS Google Scholar * Valente, T. W. Network interventions. _Science_ 337,

49–53 (2012). Article ADS CAS Google Scholar * Polk, D. E., King, C. M. & Heller, K. Community-based interventions. In _Cambridge Handbook of Psychology, Health and Medicine, Second

Edition_, 344–348 (Cambridge Medicine, 2014). * Katzmarzyk, P. T. _et al_. International study of childhood obesity, lifestyle and the environment (ISCOLE): Contributions to understanding

the global obesity epidemic. _Nutrients_ 11 (2019). Article Google Scholar * Saramäki, J. & Kaski, K. Modelling development of epidemics with dynamic small-world networks. _Journal of

Theoretical Biology_ 234, 413–421 (2005). Article MathSciNet Google Scholar * Newman, M. E. & Girvan, M. Finding and evaluating community structure in networks. _Physical Review E -

Statistical, Nonlinear, and Soft Matter Physics_ 69, 1–16 (2004). Google Scholar * Albert, R., Jeong, H. & Barabási, A. L. Diameter of the world-wide web. _Nature_ 401, 130–131 (1999).

Article ADS CAS Google Scholar Download references ACKNOWLEDGEMENTS We are grateful to Philip Bonacich, Jukka-Pekka Onnela, J. Gomez-Garde˜nes, and Emma Rye for their help at various

stages. FM and AMJ were funded by the FAPA grant of Universidad de los Andes, FM was also funded by The Global Health Equity Scholars Program NIH FIC D43TW010540. JDM received funding from

the Research office from the Universidad de Ibagué (project 17-466-INT). We also thank the support of Fondecyt Grant No. 1190703. ADG acknowledges financial support from MINECO via Project

PGC2018-094754-B-C22 (MINECO/FEDER,UE), OLS received funding from the National Institutes of Health from the U.S. grant number 1P20CA217199-001. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS *

Department of Industrial Engineering, Universidad de los Andes, Social and Health Complexity Center, Bogotá, Colombia Felipe Montes, Ana María Jaramillo & Roberto Zarama * Facultad de

Ingeniería, Universidad de Ibagué, Ibagué, Colombia Jose D. Meisel * Departament de Física de la Matèria Condensada and Universitat de Barcelona Institute of Complex Systems (UBICS),

Universitat de Barcelona, Barcelona, Spain Albert Diaz-Guilera * Departamento de Física, Facultad de Ciencias, Universidad de Chile, Santiago de Chile, Chile Juan A. Valdivia * School of

Medicine, Universidad de los Andes, Social and Health Complexity Center, Bogotá, Colombia Olga L. Sarmiento Authors * Felipe Montes View author publications You can also search for this

author inPubMed Google Scholar * Ana María Jaramillo View author publications You can also search for this author inPubMed Google Scholar * Jose D. Meisel View author publications You can

also search for this author inPubMed Google Scholar * Albert Diaz-Guilera View author publications You can also search for this author inPubMed Google Scholar * Juan A. Valdivia View author

publications You can also search for this author inPubMed Google Scholar * Olga L. Sarmiento View author publications You can also search for this author inPubMed Google Scholar * Roberto

Zarama View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS F.M., A.M.J., J.D.M. conceived the study; analyses were performed by F.M., A.M.J.,

J.D.M.; A.D.G., J.A.V., O.L.S. and R.Z. provided methodological frameworks; all authors wrote the manuscript. CORRESPONDING AUTHOR Correspondence to Felipe Montes. ETHICS DECLARATIONS

COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps

and institutional affiliations. RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons

license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a

credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted

use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT

THIS ARTICLE CITE THIS ARTICLE Montes, F., Jaramillo, A.M., Meisel, J.D. _et al._ Benchmarking seeding strategies for spreading processes in social networks: an interplay between

influencers, topologies and sizes. _Sci Rep_ 10, 3666 (2020). https://doi.org/10.1038/s41598-020-60239-4 Download citation * Received: 02 December 2019 * Accepted: 02 February 2020 *

Published: 28 February 2020 * DOI: https://doi.org/10.1038/s41598-020-60239-4 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable

link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative