Benchmarking seeding strategies for spreading processes in social networks: an interplay between influencers, topologies and sizes

Benchmarking seeding strategies for spreading processes in social networks: an interplay between influencers, topologies and sizes

Play all audios:

Loading...

ABSTRACT The explosion of network science has permitted an understanding of how the structure of social networks affects the dynamics of social contagion. In community-based interventions


with spill-over effects, identifying influential spreaders may be harnessed to increase the spreading efficiency of social contagion, in terms of time needed to spread all the largest


connected component of the network. Several strategies have been proved to be efficient using only data and simulation-based models in specific network topologies without a consensus of an


overall result. Hence, the purpose of this paper is to benchmark the spreading efficiency of seeding strategies related to network structural properties and sizes. We simulate spreading


processes on empirical and simulated social networks within a wide range of densities, clustering coefficients, and sizes. We also propose three new decentralized seeding strategies that are


structurally different from well-known strategies: community hubs, ambassadors, and random hubs. We observe that the efficiency ranking of strategies varies with the network structure. In


general, for sparse networks with community structure, decentralized influencers are suitable for increasing the spreading efficiency. By contrast, when the networks are denser, centralized


influencers outperform. These results provide a framework for selecting efficient strategies according to different contexts in which social networks emerge. SIMILAR CONTENT BEING VIEWED BY


OTHERS DETAILED-LEVEL MODELLING OF INFLUENCE SPREADING ON COMPLEX NETWORKS Article Open access 14 November 2024 HABITUATION EFFECT IN SOCIAL NETWORKS AS A POTENTIAL FACTOR SILENTLY CRUSHING


INFLUENCE MAXIMISATION EFFORTS Article Open access 24 September 2021 THE FITNESS-CORRECTED BLOCK MODEL, OR HOW TO CREATE MAXIMUM-ENTROPY DATA-DRIVEN SPATIAL SOCIAL NETWORKS Article Open


access 28 October 2022 INTRODUCTION Information, behaviors, diseases, emotions, and even the adoption of technological innovations spread through social networks1,2,3,4,5. Recently, the


explosion of network science across disciplines has produced many important advances in understanding how the structure of social networks affects the dynamics of social contagion.


Specifically, the study of social networks has provided an opportunity to potentiate interventions with spill-over effects aimed to increase population well-being. For example, several


studies have examined the spreading processes efficiency related to the topological properties of networks4,6,7,8. Other studies have analyzed the role of homophily in spreading


processes9,10,11, while others have focused on identifying influential spreaders in networks and how they may be harnessed to increase the efficiency of public health and poverty reduction


interventions12,13,14,15. A key point for designing interventions with spill-over effects is to allocate resources for the intervention targeting in a wisely way. Thus, it is crucial to have


an appropriate methodological framework for selecting seednodes with the best spreading ability. Several complex networks studies have proposed selecting seednodes by ranking network nodes


based on centrality measures12,15,16,17,18,19,20,21,22,23,24,25,26,27,28. Particularly, nodes with high degree, closeness, and betweenness coefficients have been identified as influential or


high-risk individuals during a spreading process16,23,29. Furthermore, there are random-walk based seeding strategies, such as Page-Rank, that have been shown more efficient than


centrality-based strategies for infecting some networks but less efficient in other ones19,24,25. Also, Kitsak _et al_. have proposed that targeting the core of the network by using a


K-shell decomposition method is more efficient than targeting central nodes26. This approach was later improved by the proposed True core and K-truss decomposition methods27,28. Recently,


Zhang _et al_. proposed the Vote-Rank decentralized strategy, which seems to experimentally outperform centrality and K-shell methods on both spreading rate and computational efficiency30.


Centralized and decentralized seeding strategies have been proved to be efficient using solely data and simulation-based models in specific network topologies without a consensus of an


overall result. There is limited evidence on which network structural properties are related to the performance of each seeding strategy. Few studies show that centralized and K-shell based


strategies are not efficient in networks with a community structure because chosen spreaders may cluster in the same community or their neighborhoods overlap18,30. We address the gap


mentioned above by benchmarking the spreading efficiency of seeding strategies for networks with different structural properties. We simulate spreading processes on a wide range of complex


networks, using empirical social networks data, and simulated networks within a range of densities, clustering coefficients, and sizes. We also propose community hubs, ambassadors, and


simulated hubs as three new decentralized seeding strategies that are structurally different from those reported by the literature. Our main findings are that the efficiency ranking of the


strategies and the degeneracy among strategies differs according to the network structural properties, especially characterized by their density, clustering and size. These results provide a


framework for selecting efficient strategies according to different contexts in which social networks emerge. RESULTS We ranked 10 different seeding strategies according to their spreading


efficiency. For simplicity, we implemented a susceptible-infected (SI) spreading process31 in the largest connected component LCC of five empirical networks and 540 simulated undirected


networks with different topologies, seednodes, and sizes. For each scenario, we varied the probability of contagion and the number of seednodes. For ranking the strategies, we calculated the


spreading efficiency as the time necessary to infect all nodes of the LCC when starting each contagion from the seednodes. For each network, we initialized the spreading process from 10


different sets of seednodes selected using centralized and decentralized strategies (Fig. 1). Both centralized and decentralized strategies are based on global structural measures and


require having data of the full network. Centralized strategies consisted of selecting seednodes with the (a) highest degree centrality: Hubs3, (b) highest betweenness centrality3, (c)


highest closeness centrality3, (d) highest Page-Rank32; and (e) nodes in the k-core26,33,34. Decentralized strategies consisted of selecting (f) nodes with the highest Vote-Rank calculated


as the voting score resulting from the sum of the voting ability of the neighbors of each30, (g) nodes of detected communities with the highest external degree: Ambassadors, (h) nodes of


detected communities with the highest internal degree: Community Hubs, and (i) the most connected neighbor of randomly chosen nodes: Random Hubs. Finally, we measured the spreading


efficiency of each strategy for each topology, and we evaluated the degeneracy among strategies (See methods). For analyzing the results, we categorized each of the empirical and simulated


networks according to their topology within three different ranges of density and clustering coefficient. For both measures, our selected ranges were: _Low_ from 0 to 0.1, _Medium_ from 0.1


to 0.2, and _High_ from 0.2 to 1. We categorized the networks within six types (Table 1). Also, we categorized networks according their size as Small with 200 nodes, Medium with 1000 nodes,


and Largewith 2000 nodes. We did not simulate larger networks because our focus is to recreate contexts where community-based interventions can be implemented. SPREADING EFFICIENCY FOR


SEEDING STRATEGIES IN EMPIRICAL NETWORKS For measuring the spreading efficiency of the seeding strategies on empirical networks, we ran multiple spreading processes on the largest connected


component LCC of five networks representing social systems from different contexts. Ordering from lowest to highest density the networks are: (1) Spanish physicists co-authorships network35.


(2) Karnataka network: a social network of a rural village in the south of India12. (3) Global supply chain project network: an e-mail network between project team roles of a global supply


chain project36. (4) Recreovia Facebook friendship network: an online friendship network of stakeholders in a physical activity program in Colombia37. And (5) School children friendship


network: a friendship network of a primary school in Colombia38. Networks displayed different topological features, where in their LCC the sizes varied from 25 to 1118 nodes and 87 to 5185


edges. The first two networks were considered of Medium size, and the other three were considered of Small size. The mean degree varied from 3.48 to 22, the densities ranged from 0.004 to


0.15 (the first three networks were in the Low range and the other two were in the Medium range), the clustering coefficient ranged from 0.47 to 0.69 (all of them were in the High range),


the average shortest path length ranged from 1.93 to 8.57, and the diameter ranged from 4 to 22 Table 2. The simulation results show that, usually, using a seeding strategy is more efficient


for initializing a spreading process than randomly selecting the seednodes. However, the efficiency of the strategy depends mainly on the density, clustering coefficient, and size of the


network. For Medium networks in the LD-HC category (_Spanish physicists co-authorships network_ and _Karnataka network_), the decentralized seeding strategies, Ambassadors and Community


Hubs, were the most efficient independently of the probability of contagion \(g\) and the number \(s\) of seednodes. In terms of spreading efficiency, these strategies were followed in the


ranking by the centralized strategies Page-Rank, Betweenness, and the decentralized strategy Vote-Rank. Furthermore, in these networks K-core was the less efficient set of seednodes, even


less efficient than choosing seednodes at random. The ranking obtained for these networks is consistent for the different probabilities of contagion and the percentage of seednodes selected


(Fig. 2a,b). Second, we studied our empirical _Small networks_: _Global supply chain project network_, being in the _LD-HC_ category, and _Recreovía facebook friendship network, and School


children friendship network_, being in the _MD-HC_ category. In these networks, the spreading efficiency varied significantly through the different probabilities of contagion and the initial


percentage of seednodes. However, we found that the centralized strategy Page-Rank was the most efficient, being in the top two of the ranking for the three networks. Also, contrary to the


_Medium networks_ of the same structure, the _LD-HC_ category, for _Small networks_, Ambassadors and Community Hubs strategies were the least efficient independently of the probability of


contagion. Although, these two strategies remained better than randomly selecting the seednodes. SPREADING EFFICIENCY FOR SEEDING STRATEGIES IN SIMULATED NETWORKS WITH DIFFERENT STRUCTURES


For assessing differences in the spreading efficiency of each seeding strategy according to the network structure, we initialized spreading processes using the 10 seeding strategies in 540


random networks that were distributed in six categories (30 networks per category), and three sizes as explained in Table 1. For preserving skewed degree distributions and small-world


properties that were found in the empirical networks, we used an algorithm for growing scale-free networks with tunable clustering39,41. We measured structural properties of each type of


network (Table 3). We observe that the modularity coefficient does not present a significant variability across realizations within the different network types. Moreover, we observe that on


average the modularity coefficient increases when clustering coefficient increases, especially for networks with low density as expected in sparse networks with community structures. We


evaluated the seeding strategies in 30 generated networks for each size and type of combinations of density and clustering. For each network we conducted 30 simulations of a particular


seeding strategy. The simulation results suggest that the efficiency of each strategy varied depending on the density, clustering coefficient, and size of the networks. Also, we observe that


the ranking of strategies changed for each network structure and size, where some results remain consistent depending on the type of seeding strategies, namely, decentralized or


centralized. In the case of decentralized seeding strategies, the spreading efficiency was higher when networks were in the _LD-LC_ category, independently of the network size. For this


particular network structure, within the decentralized strategies, the ranking varied according to network size: (a) _Small networks_: Vote-Rank, Community-Hubs, and Ambassadors (Fig. 3


panel (a); (b)) _Medium networks_: Random-Hubs, Vote-Rank, and Ambassadors (Fig. 3 panel (b); (c))_Large networks_: Vote-Rank (Fig. 3 panel (c)). Nevertheless, independently of the size in


_LD_ networks, as clustering coefficient increases to 0.2 (_MC_), the only decentralized strategy that remains efficient is Community-Hubs. For _MD-HC_ in _Small_ and _Large networks_,


Ambassadors remains efficient, while Community-Hubs is the most efficient strategy in _Medium networks_. In addition, for _HD-HC_, Ambassadors strategy is efficient in _Small networks_,


Random-Hubs in _Medium networks_, and Community-Hubs in _Large networks_. In the case of centralized seeding strategies, the spreading efficiency was higher when networks had medium or high


density and clustering coefficient (0.1–1]. In those cases, independently on network size, K-Core was consistently efficient among other centralized strategies in _MD-MC_ and _MD-HC_


networks. Furthermore, Page Rank strategy was efficient for _Small networks_ in _MD-MC_ and _LD-HC_. In the case of _Large networks_, Page Rank was efficient in _LD-HC_ and _HD-HC_


categories. In addition, Closeness strategy was consistently efficient for _Small_ and _Large networks_ in _HD-HC_. In general, the performance of decentralized vs. centralized strategies,


as groups of strategies, does not depend on network size. Moreover, we observe that three particular strategies are consistently in the top three most efficient regardless of the network


size: (1) The decentralized strategies Vote-Rank and Community-Hubs are top ranked for networks with low density and low or medium clustering (_LD-LC_ and _LD-MC_), respectively, and (2) the


centralized strategy K-Core is top ranked for networks with medium density and medium or high clustering (_MD-MC_ and _MD-HD_) (Table 4). Besides, we found that for networks with extreme


connectivity or extremely segregated clusters (_HD-HC_ and _LD-HC_ networks, respectively) rankings are not consistent for different sizes. Nevertheless, when analyzing more in-depth the


efficiency of each particular strategy, we observe that the ranking varies according to network size. We calculated the standard deviation for the density and clustering coefficient for the


30 realizations of every network type and size (Table 3). We observe that the four types of networks (_LD-LC_, _LD-MC_, _MD-MC_, and _MD-HC_) that have more consistent results in the ranking


are those with the lower clustering coefficient variability. We also observe that the standard deviation of density is always lower than \(1.56\times 1{0}^{-4}\), so we discarded that the


variation in density causes differences in the ranking. However, we observe that the results were not consistent for the different sizes of the two types of networks that exhibit the highest


variability in clustering coefficient and density (_LD-HC_ and _HD-HC_, respectively). Our hypothesis is that decentralized strategies could be efficient for _LD-HC_ networks due to their


community structure explained by a high value of modularity, and that centralized strategies were suitable for the _HD-HC_ networks due to their high connectivity. Nevertheless, those


hypotheses were rejected for these types of networks with extreme values of density and clustering in their structures. Our previous results of the most efficient strategies for each network


type remain consistent when considering the modularity and number of communities as metrics for determining the community structure of the network types. The decentralized strategies,


Vote-Rank and Community-hubs are efficient regardless of the network size for _LD-LC_ and _LD-MC_ types, which have higher modularity values and number of communities. Also, the centralized


K-core strategy is in the top three regardless of size for networks with lower modularity values and fewer communities, such as the _MD-MC_ and _MD-HD_ networks. Also, we did not find


consistent results for different sizes of networks with extreme values of modularity and number of communities: (1) _LD-HC_ networks have the highest modularity, and (2) _HD-HC_ networks


have one of the lowest number of communities and modularity values. We could hypothesize that in _LD-HC_ networks, the decentralized strategies are not consistently efficient as the small


number of edges between different communities could be encapsulating the spreading processes inside the seednodes’ communities avoiding an inter-community spreading. On the other hand, in


the _HD-HC_ networks, the centralized strategies are not consistently efficient due to the high connectivity of the network that could lead to a low differentiation among seednodes sets. As


in empirical networks, in most of the topologies and sizes of simulated networks, using a strategy for selecting seednodes was more efficient than choosing the seednodes at random. However,


in _Small networks_ when the clustering coefficient was high (0.2–1] choosing the seednodes at random remained efficient (Fig. 3 panel (a), third sub-panel). DEGENERACY AMONG SEEDNODES The


same node may belong to different sets of seednodes. Thus, to better understand the results observed in the spreading efficiency rankings, we evaluated the degeneracy among each pair of


seeding strategies. We define the degeneracy coefficient of two sets of seednodes (not to confound with k-degeneracy used in graph theory) as the fraction of nodes that belong to both sets.


Let A and B two sets, \(Degeneracy\,(A,B)=| A\cap B| \)/\(| A\cup B| \). For each network size and topology, we calculated the average degeneracy coefficient among each pair of seeding


strategies over the 30 simulated networks. We observe that the degeneracy coefficient shows a pattern that remains similar for the different networks and topologies (Fig. 4). We observe that


all centralized and Vote-Rank strategies shared, on average, more than 50% of nodes independently of the network size. In the case of the decentralized strategies, the proportion of common


nodes with other strategies ranged from 20% to 40% for different network sizes, showing a higher diversification of the seednode selection compared to the centralized strategies.


Furthermore, independently of density, clustering coefficient, and network size; the degeneracy among centralized and decentralized strategies was low. The result for the _LD-LC_ category in


the three network sizes is shown in Fig. 4 as an example of the general pattern observed in the different network sizes and topologies. DISCUSSION This study provides a benchmark for


selecting efficient strategies for initializing interventions with spill-over effects in social networks with different structures. Our main finding is that the efficiency of each seeding


strategy depends on the network structure, particularly on the density and clustering coefficient. In general, for sparse networks with community structure, Community-hubs, which are


decentralized influencers, are suitable for increasing the spreading efficiency. By contrast, when the networks are denser, nodes in the K-core, which are centralized influencers,


outperform. We observe also that, usually, independently on the network structure, having a strategy for selecting seednodes for a spreading process is better than using random sampling.


This result is critical for providing evidence to improve commonly used random sampling methods for delivering interventions. Also, our results are coherent with studies that have shown the


importance of homophily and community structure of networks for understanding the spread and adoption of behaviors9,42. As a first result for both empirical and simulated networks, we found


that the decentralized strategy Community Hubs remained efficient for _Medium networks_ in the _LD-HC_ category. _LD-HC_ networks are likely to have a community structure, therefore


decentralized strategies allow to identify seednodes in the different communities and avoiding a potential overlap among the seednodes dyads. This leads to an increase in the coverage range


of the spreading process by taking advantage of the weak ties as spreading channels between communities43,44. The importance of avoiding overlap in networks with community structure while


selecting a seeding strategy might explain that, for the _LD-HC_ category, in _Medium_ and _Large_ networks sizes, and in both empirical and simulated networks, K-core is not an efficient


strategy. The reason is that K-core seednodes are likely to have a high number of overlapping neighbors causing a reduced coverage of susceptible nodes, at least at the initial steps of the


spreading process. Similar reasons might be suggesting that central seeding strategies, such as Closeness strategy, does not perform as well as decentralized strategies when density is low.


Central nodes have been also shown to be connected by strong ties to other network actors, increasing their overlapping relationships44. Employing decentralized strategies might be desirable


in real contexts with sparse or segregated populations. In those settings, conducting searches for identifying local leaders, Community Hubs, might be more convenient for performing direct


and indirect influence than conducting searches for identifying influential individuals at the population level45. Hence, using the Community Hubs strategy can potentiate the effect of


community-based interventions, by reinforcing individual perceptions and behavioral changes, as Community-Hubs strategy facilitates to conduct customized processes within each community


detected46. Also, Community Hubs could be used as an alternative to the recently proposed Vote-rank strategy, especially when access to the entire network data is limited or unavailable,


Vote-Rank cannot be calculated. As a second general result for both empirical and simulated networks, we found that the centralized seeding strategy Page Rank remained efficient for _Small


networks_ in the _MD-MC_ and _LD-HC_ categories. For different network sizes, K-Core seeding strategy performs efficiently when density is in the medium range, and clustering coefficient


increases, i.e. the _MD-MC_ and _MD-HC_ categories. In fact, due to the network medium density, nodes with high Page-Rank and nodes in the K-Core are likely to be directly connected to nodes


in different areas of the network. Denser networks are not likely to display community structures, and having a medium or high clustering coefficient implies that overlap among nodes is


high. Thus, decentralized strategies are not likely to add more coverage than centralized strategies. This can be evidenced by the low spreading efficiency obtained by Ambassadors and


Community hubs in the categories _MD-MC_ and _MD-HC_ for _Small_ and _Medium_ network sizes. Employing centralized strategies might be desirable in contexts with dense and cohesive


populations. In those settings, identifying global leaders for delivering interventions might be more efficient than conducting local searches in communities that are not well defined. For


simulated networks, we found that Vote-Rank seeding strategy remained efficient for the _LD-LC_ category of networks in the three network sizes. _LD-LC_ networks are likely to have a larger


shortest path length than the other topologies studied due to their low connectivity. Thus, this result is coherent with previous studies where the Vote-Rank strategy was more efficient when


the shortest path length among seednodes was larger30. Of course, this study has some limitations. First, we used a simulation-based approach to calculate the spreading efficiency of


different strategies. The above might bias the results to specific network topologies and spreading conditions. However, we aimed to build different scenarios by considering a wide range of


probabilities of contagion, number of seednodes, and networks with different topologies and sizes. Second, for simplicity, we used the susceptible-infected model for the simulations assuming


a cascade process for the contagion and different results may emerge using other spreading processes. Nevertheless, for this work it was important to compare the different strategies with


the same and most straightforward model to avoid confounding on the efficiency between the spreading process dynamics and the seednodes selection. Although, we consider that future work


should explore different spreading methods. Third, we generated networks to simulate social networks with skewed degree distributions and small-world properties41; hence, our results might


not apply to other situations where networks have other degree distributions. Identifying influential individuals for the design of interventions has been of interest to practitioners and


researchers due to its effect on delivering successful and cost-efficient interventions at the community level. Our results provide a first outlook to selecting efficient strategies for


allocating resources during behavioral interventions with spill-over effects in different contexts, and in terms of centralized and decentralized strategies. Future work should address more


detailed explanations on common features and possible causes of the different rankings at the seednodes sets level within and between centralized and decentralized strategies. METHODS We


propose a simulation-based approach for ranking ten centralized and decentralized seeding strategies for initializing a spreading process according to their spreading efficiency. First, we


conduct the ranking for five empirical networks with different topologies and sizes. Then, we simulated specific network structures to observe possible associations among structural


properties and the seednodes spreading efficiency. We categorized each one of the empirical and simulated networks within three different ranges of both density and clustering coefficient as


explained in Table 2. NETWORK CHARACTERISTICS We gather data from five empirical networks to evaluate the spreading efficiency of the seeding strategies. We calculated structural measures


of the largest connected of these empirical networks, and we listed the information regarding those measures in Table 1). We consider social networks of different contexts. (1) Spanish


physicists co-authorships network: a collaboration network built from the American Physical Society, which covers scientific collaborations between Spanish physicists between 2010 and


201235. In this network, nodes represent researchers and edges represent co-authorship. We categorized it as _LD-HC Medium network_. (2) Karnataka network: a social network built from


village 19 in Karnataka, India for the diffusion of a microfinance program conducted by the Abdul Latif Jameel Poverty Action Lab in 200612. In this network, nodes represent individuals, and


an undirected tie connects two nodes if one of the individuals reported at least one of 12 types of relationships related to trust. We categorized it as _LD-HC Medium network_. (3) Global


supply chain project network: an email network between project team roles of a global supply chain project36. This network is an approach to project management where team members belong to


different organizations of the supply chain, located in more than one geographic location and time zone, and contribute to different phases of a project. In this network, the nodes represent


team members, and directed edges represent the different emails sent and received by the project team members to coordinate and implement the different activities. We categorized it as


_LD-HC Small network_. (4) Recreovía facebook friendship network: an online friendship network of stakeholders in a physical activity program in Colombia. This program aims to promote


physical activity, health habits, and social equity through musicalized and directed group classes in Bogota, Colombia37. In this network, nodes represent Facebook friends of the Recreovia


account, and edges represent a mutual friendship between the nodes. Our research group built the Recreovia friendship network in 2016 for analyzing social cohesion emerging from the program.


We categorized it as _MD-MC Small network_. (5) School children friendship network: a friendship network of one school classroom where nodes represent children, and directed edges represent


friendship nominations38. We collected data from the Colombian site of the International Study of Childhood Obesity, Lifestyle, and Environment (ISCOLE); a collaborative study conducted in


schools of 12 countries.47 We categorized the network as _MD-MC Small network_. SPREADING EFFICIENCY FOR SEEDING STRATEGIES IN EMPIRICAL NETWORKS THE SUSCEPTIBLE-INFECTED SPREADING MODEL For


each network, we simulate the spreading process using the cascade susceptible-infected: SI model, where the spreading driver is interaction3. In this model, each susceptible node may become


infected depending on their infected neighbors31,48 and infected nodes cannot recover. At the time \(t=0\), all network nodes are susceptible except for a set of seednodes that are


infected. We consider the probability of infection \(g\) constant and equal for every infected node. At every time step, for each susceptible node, we randomly choose one of its neighbors


for interacting. If the selected neighbor is infected, then the susceptible node will become infected with a probability \(g\) and will remain susceptible with a probability \(1-g\). If the


neighbor is susceptible, nothing happens. We set the number of seednodes fixed for four proportion values: 0.01, 0.04, 0.07, and 0.10. The process is repeated for each time step until all


the network LCC is infected. We determined the spreading efficiency of each seeding strategy as the time needed to infect all the LCC of the network, starting the spreading from those


seednodes. SEEDING STRATEGIES We compared ten seeding strategies: five centralized, four decentralized, and one random for identifying seednodes based on structural properties of each


network (Fig. 2). Centralized strategies consist of selecting nodes with (a) Highest degree centrality defined as the highest number of edges adjacent to a node3. (b) Highest Betweenness


centrality defined as the highest frequency of appearance of a node in the shortest paths between all the pairs of nodes of the network3. (c) Highest Closeness centrality defined as the


lowest average shortest path length from a node to all the other nodes of the network3. (d) Highest Page-Rank defined as the highest probability that a random walker visits the node32. And


(e) nodes selected from the k-core of the network using a k-shell decomposition algorithm26,33,34. For decentralized strategies, first, we applied the Louvain algorithm to detect communities


maximizing modularity40,49. Then, we selected: (f) Nodes of detected communities with the highest external degree: Ambassador. (g) Nodes of detected communities with the highest internal


degree: Community Hub. (h) Nodes with the highest voting score calculated as the sum of the voting ability of its neighbors: Vote-Rank. The voting ability for each node in the network


represents the number of votes that the node can provide to its neighbors30. (i) the neighbor with the highest degree of randomly chosen nodes (Random Hubs). Finally, we also selected random


seednodes (Random). To build seednodes sets with equal size, for each centralized and decentralized strategy, we assigned a set of a fixed number \(s\) of seednodes equal to the number of


communities detected in each network. For each of the centralized-based seednodes, we selected the \(s\) nodes with the highest respective centrality measure. In case that several nodes had


the same centrality measure, we randomly selected the necessary number of \(s\) seednodes. For the k-core seednodes, we randomly selected \(s\) nodes in the k-core of the network. If \(s\)


was higher than the k-core size, we randomly selected the remaining nodes in the (k-1)-core. For the decentralized strategies Ambassadors and Community Hubs, we sort in descending order the


communities according to their size. Then, we selected one Community Hub or Ambassador per community. We repeated the process until \(s\) nodes were selected. For random seednodes, we chose


\(s\) nodes at random. SPREADING EFFICIENCY FOR SIMULATED NETWORKS For analyzing the relationship among the strategies of seednodes and the structure of the network, we generated 30


simulated networks for the six topologies and the three different sizes (Table 1). We used an algorithm of growing scale-free networks with tunable clustering39, so that it preserves skewed


degree distributions and small-world properties of social networks used in this manuscript41. The algorithm builds networks of a fixed number of nodes, and connects them following a


preferential attachment behavior until a desirable density is reached, as in the traditional Barabasi-Albert model50. Then, it incorporates triad formation among one of the connected nodes


of every new edge until achieving a desirable clustering coefficient. We show structural properties of each type of network in 3, where each value is the average of that measure in the 30


generated networks. After generating each network, we ran 30 times the SI spreading process, initializing from each strategy, and infecting all the network nodes. We ranked the seeding


strategies by taking into account the spreading efficiency, i.e. the time needed to infect the entire LCC of the network, obtained while infecting 30 networks, with 30 runs for each network.


For each run, we calculated the number of seeding strategies that each strategy outperformed, in terms of spreading efficiency. Then for each combination of clustering coefficient and


density, we summed the efficiency score for each strategy over the 30 runs and the 30 networks. Finally, we ranked the strategies based on the total scores obtained. Strategies in the top of


the ranking have a value of 9, meaning that they outperform the other nine strategies over the 900 instances. By contrast, the strategy at the bottom of the ranking has a value of 0,


meaning that it does not outperform any other seeding strategy. DEGENERACY COEFFICIENT AMONG SEEDNODES In order to better understand the results observed in the spreading efficiency


rankings, we evaluated the degeneracy among each pair of sets of seednodes. We define the degeneracy coefficient of two sets of seednodes (not to confound with k-degeneracy used in graph


theory) as the proportion of seednodes shared by both strategies over the total number of nodes of both strategies. Let A and B two sets of seednodes, \(Degeneracy(A,B)=| A\cap B| \)/\(|


A\cup B| \). When degeneracy coefficient equals 1 between a pair of sets of seednodes, it means that both sets contain the same nodes, while degeneracy coefficient equals 0, it means that


that both sets of seednodes are entirely composed by different nodes. REFERENCES * Valente, T. W. _Social Networks and Health: Models, Methods, and Applications_ (Oxford University Press,


2010). * Valente, T. W. Social network thresholds in the diffusion of innovations. _Soc. Networks_ 18, 69–89 (1996). Article  Google Scholar  * Barrat, A., Barthélemy, M. & Vespignani,


A. _Dynamical Processes on Complex Networks_ (Cambridge University Press, 2008). * Centola, D. The spread of behavior in an online social network experiment. _Science_ 329, 1194–1197 (2010).


Article  ADS  CAS  Google Scholar  * Christakis, N. A. & Fowler, J. H. Social contagion theory: Examining dynamic social networks and humanbehavior. _Stat. Medicine_ 32, 556–577 (2013).


Article  MathSciNet  Google Scholar  * Chen, D. B., Xiao, R. & Zeng, A. Predicting the evolution of spreading on complex networks. _Scientific Reports_ 4, 6108 (2014). Article  CAS 


Google Scholar  * Cimini, G. _et al_. Enhancing topology adaptation in information-sharing social networks. _Physical Review E - Statistical, Nonlinear, and Soft Matter Physics_ 85 (2012). *


Guille, A., Hacid, H., Favre, C. & Zighed, D. A. Information diffusion in online social networks: A survey. _SIGMOD Record_ 42, 17–28 (2013). Article  Google Scholar  * Centola, D. An


experimental study of homophily in the adoption of health behavior. _Science_ 334, 1269–1272 (2011). Article  ADS  CAS  Google Scholar  * Aral, S., Muchnik, L. & Sundararajan, A.


Engineering social contagions: Optimal network seeding in the presence of homophily. _Network Science_ 1, 125–153 (2013). Article  Google Scholar  * McPherson, M., Smith-Lovin, L. &


Cook, J. M. Birds of a Feather: Homophily in Social Networks. _Annual Review of Sociology_ 27, 415–444 (2001). Article  Google Scholar  * Banerjee, A., Chandrasekhar, A. G., Duflo, E. &


Jackson, M. O. The diffusion of microfinance. _Science_ 341, 1236498 (2013). Article  Google Scholar  * Christakis, N. A. & Fowler, J. H. Social network sensors for early detection of


contagious outbreaks. _PLoS ONE_ 5, 1–8 (2010). Article  Google Scholar  * Hunter, R. F. _et al_. “Hidden” Social Networks in Behavior Change Interventions. _American Journal of Public


Health_ 105, 513–516 (2015). Article  Google Scholar  * Kim, D. A. _et al_. Social network targeting to maximise population behaviour change: A cluster randomised controlled trial. _The


Lancet_ 386, 145–153 (2015). Article  Google Scholar  * Christley, R. M. _et al_. Infection in social networks: Using network analysis to identify high-risk individuals. _American Journal of


Epidemiology_ 162, 1024–1031 (2005). Article  CAS  Google Scholar  * He, J. L., Fu, Y. & Chen, D.B. A Novel Top-k Strategy for Influence Maximization in Complex Networks with Community


Structure. _PLoS ONE_ 10 (2015). * Zhang, X., Zhu, J., Wang, Q. & Zhao, H. Identifying influential nodes in complex networks with community structure. _Knowledge-Based Systems_ 42, 74–84


(2013). Article  CAS  Google Scholar  * Chen, D. B., Gao, H., Lü, L. & Zhou, T. Identifying influential nodes in large-scale directed networks: The role of clustering. _PLoS ONE_ 8


(2013). * Gao, C., Lan, X., Zhang, X. & Deng, Y. A Bio-Inspired Methodology of Identifying Influential Nodes in Complex Networks. _PLoS ONE_ 8 (2013). * Madotto, A. & Liu, J.


Super-Spreader Identification Using Meta-Centrality. _Scientific Reports_ 6 (2016). * de Arruda, G. F. _et al_. Role of centrality for the identification of influential spreaders in complex


networks. _Phys. Rev. E_ 90, 032812 (2014). Article  ADS  Google Scholar  * Comin, C. H. & Da Fontoura Costa, L. Identifying the starting point of a spreading process in complex


networks. _Physical Review E - Statistical, Nonlinear, and Soft Matter Physics_ 84 (2011). * Miller, J. C. & Hyman, J. M. Effective vaccination strategies for realistic social networks.


_Physica A: Statistical Mechanics and its Applications_ 386, 780–785 (2007). Article  ADS  Google Scholar  * Nowzari, C., Preciado, V. M. & Pappas, G. J. Analysis and control of


epidemics: A survey of spreading processes on complex networks. _IEEE Control Systems Magazine_ 36, 26–46 (2016). MathSciNet  Google Scholar  * Kitsak, M. _et al_. Identification of


influential spreaders in complex networks. _Nature Physics_ 6, 888–893 (2010). Article  ADS  CAS  Google Scholar  * Liu, Y., Tang, M., Zhou, T. & Younghae, D. Core-like groups result in


invalidation of identifying super-spreader by k-shell decomposition. _Scientific Reports_ 5, 9602 (2015). Article  ADS  CAS  Google Scholar  * Malliaros, F. D., Rossi, M. E. G. &


Vazirgiannis, M. Locating influential nodes in complex networks. _Scientific Reports_ 6, 19307 (2016). Article  ADS  CAS  Google Scholar  * Erkol, Ş. Castellano, C. & Radicchi, F.


Systematic comparison between methods for the detection of influential spreaders in complex networks. _Scientific Reports_ 9, 15095 (2019). * Zhang, J. X., Chen, D. B., Dong, Q. & Zhao,


Z. D. Identifying a set of influential spreaders in complex networks. _Scientific Reports_ 6, 27823 (2016). Article  ADS  CAS  Google Scholar  * Anderson, R., Anderson, B. & May, R.


_Infectious Diseases of Humans: Dynamics and Control_. Dynamics and Control (OUP Oxford, 1992). * Page, L. & Brin, S. The anatomy of a large-scale hypertextual Web search engine.


_Computer Networks_ 30, 107–117 (1998). Google Scholar  * Seidman, S. B. Network structure and minimum degree. _Social Networks_ 5, 269–287 (1983). Article  MathSciNet  Google Scholar  *


Carmi, S., Havlin, S., Kirkpatrick, S., Shavitt, Y. & Shir, E. A model of Internet topology using k-shell decomposition. _Proceedings of the National Academy of Sciences of the United


States of America_ 104, 11150–11154 (2007). Article  ADS  CAS  Google Scholar  * FajardoFontiveros, O., QuinquillaCapdevila, A. & Diaz-Guilera, A. Física y redes complejas. _Revista


Espanola de Física_ 32 (2018). * Meisel, C. _Collaborative Relationships in Supply Chain Management: A Case of Project Management Social Network Analysis_. Ph.D. thesis, ontanuniversitaet


Leoben, Leoben, Austria (2016). * Rios, A., Paez, D., Pinzón, E., Fermino, R. & Sarmiento, O. Logic model of the Recreovía: a community program to promote physical activity in Bogota.


_Revista Brasileira de Atividade Física & Saúde_ 22, 206–2011 (2017). Article  Google Scholar  * Gutiérrez-Martínez, L. _et al_. Effects of a strategy for the promotion of physical


activity in students from Bogotá. _Revista de Saude Publica_ 52 (2018). Article  Google Scholar  * Holme, P. & Kim, B. J. Growing scale-free networks with tunable clustering. _Physical


Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics_ 65 (2002). * Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of


communities in large networks. _Journal of Statistical Mechanics: Theory and Experiment_ 2008, P10008, https://doi.org/10.1088/1742-5468/2008/10/p10008 (2008). Article  MATH  Google Scholar


  * Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. _Nature_ 393, 440–442, https://doi.org/10.1038/30918 (1998). Article  ADS  CAS  PubMed  MATH  Google


Scholar  * Borge-Holthoefer, J., Baños, R. A., González-Bailón, S. & Moreno, Y. Cascading behaviour in complex socio-technical networks. _Journal of Complex Networks_ 1, 3–24 (2013).


Article  Google Scholar  * Granovetter, M. S. The strength of weak ties. _American Journal of Sociology_ 78, 1360–1380 (1973). Article  Google Scholar  * Onnela, J. P. _et al_. Analysis of a


large-scale weighted network of one-to-one human communication. _New Journal of Physics_ 9, 179 (2007). Article  ADS  Google Scholar  * Valente, T. W. Network interventions. _Science_ 337,


49–53 (2012). Article  ADS  CAS  Google Scholar  * Polk, D. E., King, C. M. & Heller, K. Community-based interventions. In _Cambridge Handbook of Psychology, Health and Medicine, Second


Edition_, 344–348 (Cambridge Medicine, 2014). * Katzmarzyk, P. T. _et al_. International study of childhood obesity, lifestyle and the environment (ISCOLE): Contributions to understanding


the global obesity epidemic. _Nutrients_ 11 (2019). Article  Google Scholar  * Saramäki, J. & Kaski, K. Modelling development of epidemics with dynamic small-world networks. _Journal of


Theoretical Biology_ 234, 413–421 (2005). Article  MathSciNet  Google Scholar  * Newman, M. E. & Girvan, M. Finding and evaluating community structure in networks. _Physical Review E -


Statistical, Nonlinear, and Soft Matter Physics_ 69, 1–16 (2004). Google Scholar  * Albert, R., Jeong, H. & Barabási, A. L. Diameter of the world-wide web. _Nature_ 401, 130–131 (1999).


Article  ADS  CAS  Google Scholar  Download references ACKNOWLEDGEMENTS We are grateful to Philip Bonacich, Jukka-Pekka Onnela, J. Gomez-Garde˜nes, and Emma Rye for their help at various


stages. FM and AMJ were funded by the FAPA grant of Universidad de los Andes, FM was also funded by The Global Health Equity Scholars Program NIH FIC D43TW010540. JDM received funding from


the Research office from the Universidad de Ibagué (project 17-466-INT). We also thank the support of Fondecyt Grant No. 1190703. ADG acknowledges financial support from MINECO via Project


PGC2018-094754-B-C22 (MINECO/FEDER,UE), OLS received funding from the National Institutes of Health from the U.S. grant number 1P20CA217199-001. AUTHOR INFORMATION AUTHORS AND AFFILIATIONS *


Department of Industrial Engineering, Universidad de los Andes, Social and Health Complexity Center, Bogotá, Colombia Felipe Montes, Ana María Jaramillo & Roberto Zarama * Facultad de


Ingeniería, Universidad de Ibagué, Ibagué, Colombia Jose D. Meisel * Departament de Física de la Matèria Condensada and Universitat de Barcelona Institute of Complex Systems (UBICS),


Universitat de Barcelona, Barcelona, Spain Albert Diaz-Guilera * Departamento de Física, Facultad de Ciencias, Universidad de Chile, Santiago de Chile, Chile Juan A. Valdivia * School of


Medicine, Universidad de los Andes, Social and Health Complexity Center, Bogotá, Colombia Olga L. Sarmiento Authors * Felipe Montes View author publications You can also search for this


author inPubMed Google Scholar * Ana María Jaramillo View author publications You can also search for this author inPubMed Google Scholar * Jose D. Meisel View author publications You can


also search for this author inPubMed Google Scholar * Albert Diaz-Guilera View author publications You can also search for this author inPubMed Google Scholar * Juan A. Valdivia View author


publications You can also search for this author inPubMed Google Scholar * Olga L. Sarmiento View author publications You can also search for this author inPubMed Google Scholar * Roberto


Zarama View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS F.M., A.M.J., J.D.M. conceived the study; analyses were performed by F.M., A.M.J.,


J.D.M.; A.D.G., J.A.V., O.L.S. and R.Z. provided methodological frameworks; all authors wrote the manuscript. CORRESPONDING AUTHOR Correspondence to Felipe Montes. ETHICS DECLARATIONS


COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps


and institutional affiliations. RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,


adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons


license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a


credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted


use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT


THIS ARTICLE CITE THIS ARTICLE Montes, F., Jaramillo, A.M., Meisel, J.D. _et al._ Benchmarking seeding strategies for spreading processes in social networks: an interplay between


influencers, topologies and sizes. _Sci Rep_ 10, 3666 (2020). https://doi.org/10.1038/s41598-020-60239-4 Download citation * Received: 02 December 2019 * Accepted: 02 February 2020 *


Published: 28 February 2020 * DOI: https://doi.org/10.1038/s41598-020-60239-4 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable


link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative