Category Archives: Crumblenaut

Complexity Crumblenaut — 2

Following is another overview of the most interesting papers, links, talks, events, etc. in the network science, SNA, and complex systems research, which I have come across last two weeks.

Dynamic Communities Identification

In the present time, we analyze the dynamic communities using so-called offline approach, which means that we identify communities in each time-slice of the analyzed network and then we track these communities by matching them using some similarity measure, e.g. Jaccard coefficient. One drawback of this approach is that the notion of community identity of is not very clear — consider the case when the community is continuously influenced by other communities or newcomers until the point, where there will be nobody from the members from the first time slice. The question if, whether this community is still the same or not? Should we refer to it by the same name, or should we rather introduce a new name? For instance, in case of family the name should persist, whereas in case of a music band it should be changed.

This question actually first appeared in the ancient Greece as Ship of Theseus paradox, as it was remarked by Tanya-Berger Wolf et al. in their recent publication, in which they review several existing strategies for dynamic communities identification. On top of that, they define the problem more rigorously, which enabled them to come up with a solution which considers all time slices at once. That is to say, communities identified by their algorithm are stable across time slices. What I like on that algorithm is its grounding in observation of real-world communities: as the best community structure is considered that one, which minimize certain costs of agents involved. They assume that our motivation to be a part of the community is a result of interplay of the costs of switching community, visiting community, and absenting in the community. For example, if it is very costly to switch a community while the cost to visit a foreign community is lower and an agent is from time to time observed within another community, it will still be a part of its home community because community switch is simply too expensive, while in the opposite case it will just switch its community affiliation.

However, that algorithm is not the only one which takes into account possibly all time-slices at once: Mucha et al. generalized modularity to the form, which allows detection of communities in time-dependent networks as well.

Efficient Computation of Group Betweenness in JUNG

Certain groups in a social network may be more important than others. In the world of science, the usual way how to determine the success or impact of a researcher is citation analysis, namely using of impact factor. The success of a research group or domain can be then assessed by the impact factor of publications supported by the grant, or number of patents, etc. The problem of IF is that it doesn’t take time seriously, i.e. it’s static, and that sometimes the impact cannot be identified by counting citations, but by a position in the network. Quite silly, but still useful, example may be: How many people cite Newton or Leibnitz today? I think that not too much, but their work on calculus is simply essential to really a wast majority of sciences. These connecting elements in a network are usually revealed by inspecting their betweenness centrality. But in case of a group of nodes, we need a generalization of this measure. It has been actually proposed by Everett and Borgatti. The single-node betweenness in weighted graphs with n nodes and m edges can be computed efficiently in O(nm+n2logn) time by algorithm proposed by Brandes. He also published a modification which computes a group betweenness in the same time. The problem is that this modification needs this time for every group, so in case one needs to compute group betweenness for several groups, it is better to use another algorithm, which after O(n3) pre-processing steps computes a betweenness of a group with k members in O(k3). Unfortunately, there is not any public implementation of this algorithm in Java, so last week, I have been working on one for JUNG framework. For now, it seems to me that the code is reasonably stable, but it still needs a little bit of testing. Some optimizations are also possible, but sticking with the rule that premature optimization is the source of large portion of bugs, I postponed it until it will be necessary. I will offer it to the JUNG maintainers, so hopefully it will become a part of some future version. For the time being, if you’re interested in it, drop me an e-mail (I cannot publish it here as WordPress doesn’t allow uploading ZIP files — I’ll try to find some solution for that).

Complexity Crumblenaut — 1

Even though I tweet the most interesting information during a week, I decided to summarize the most important papers, ideas, talks, etc. I’ve come across during the week. Hence I started this crumblenaut series more-or-less regularly (hopefully weekly) exploring recent fragments of complexity. It may be particularly interesting for people, who do not use Twitter and who want to keep an eye what happens in the domain.

Link Communities: a Paradigm Shift in Community Detection?

Community detection is one of the hot topics in network science. In many networks, and particularly in social ones, the real-world communities are overlapping. That is to say, usually a person is a member of more than one community, e.g. family, company, school, sport club, etc. Node partitioning methods enforcing each person to be a part of exactly one community are thus providing quite a constrained picture of the community structure, and thus several algorithms for detection of overlapping communities have been proposed. One of the first was a modification of classic Girvan-Newman algorithm, which itself is based on an idea that suitable splitting point are edges with very high betweenness. The modification of GN algorithm for detection of overlapping communities is based on the notion of split betweenness, which is a measure of betweenness of a hypothetical edge connecting a two parts of a split node. If this value is high, the original node is probably a part of two communities and it is better to include that node in both of them, i.e. to split it. In spite of this clever trick, there is IMO more elegant way how to identify overlapping communities by clustering edges instead of nodes. Whereas a node may be a member of many different communities, it is less likely the case if it comes to an edge. For instance, I may be a part of my family, working group, etc., but it is unlikely, that the relations connecting me to my family are the same as the links relating me to my colleagues. Lehman et al.  published an algorithm relying on this idea, where they defined a similarity measure between edges, which can then be clustered. Evans and Lambiotte recently described another way: first they transformed the original graph into a link graph, in which the nodes represent the edges of the original graph, while the edges in the link graph represented the adjacency of the edges of the original graph. What is brilliant on this approach is that you can use any non-overlapping community detection algorithm and by a reverse transformation you obtain the final overlapping communities of people!

Hierarchical Version of Infomap Community Detection Method

One of the non-overlapping community detection algorithm you may consider is Infomap by Rosvall and Bergstrom, that was evaluated as the best from a set of currently popular methods. They came up this week with an extension of Infomap which reveals also a hierarchical structure of communities. An interesting feature of their approach to community detection is that they model an information flow in the network and thus the resulted communities are elements which highly likely interact more mutually than with other elements outside of their community. According to my experience Infomap is able to detect communities even if they are quite small and in a very noisy data-set.

Cheating of Impact Factor

One of the motivations of our analysis of cross-community phenomena in science is the observation that the assessment of research relies too much on static citation measures like impact factor. The impact of a research community may be however very different if inspected from the dynamical point of view. Another good argument for the unsuitability of impact factor is gracefully described by Arnold and Fowler in their recent paper Nefarious Numbers, that reveals the cheating of couple of scientists in applied math, whose citations were mostly self-referential and as they were also in committees of some journals, they gained a lot of citations from papers submitted to those journals. As for me, I was thinking how the dynamic analysis of bibliographic network could help in automatic detection of unfair behaviour like that. What is the pattern? I would expect the cheating authors to be a part of a relatively disconnected community, because they are just citing each other, but nobody cites them so intensively.

Visualization is Beautiful

Visualization of complex system may be gorgeous. And what is the best, there is a gallery of such visualizations!

Is Preferential Attachment Overcome?

Currently widely accepted mechanism of emergence of complex network is preferential attachment. This paper, however, discuss that not a degree of a node, but a similarity between nodes is a key mechanism driving the linkage of new nodes to existing ones. Consider the social network of your friends: how do you create new links? I would say it is to a great extent dependent on the network structure itself (a friend of my friend may become my friend as well), but also by a similarity (a friend of my friend, who is however interested in completely different things, or has contradictory world-view, probably will not become my friend). Hence I think it is necessary to consider both: network structure and nodes’ attributes (i.e., similarity). Things become even more complicated when you consider common social phenomena like suggestibility, because then you deal with feedback loops and the whole system become intrinsically dynamic. On the other hand, such a model would be closer to the real world …

Do We Want Smart Cities or Smart People?

In Portugal, a new city is being constructed. Well, nothing special about that. But the thing is that the architects tried to design a sustainable and green city, where diverse resources like water, sewerage, or electricity are managed by a computer via many different sensors, so that the city will have in a certain sense a central brain. I like the idea, but what I really miss in projects like this one is a solution, where the factories for production of those “clean technologies” will be located and how do they fit into the sustainability paradigm. I think it is quite hypocritical to build and inhabit a green city and move all the dirty production to China or Vietnam. More than brain of cities, we need to start to use our own brain in a way it was not projected to work — we need to understand the planetary scale and accept that what happens beyond our natural horizon (e.g. in China) affects us and vice-versa.

Who Wants to be a Billionaire?

And who feels how at Twitter? And who analyzed how are these two questions related? Bollen et al. analyzed Twitter streams in 2008  and showed that an analysis of sentiment of those feeds can significantly improve accuracy of prediction of Dow Jones index three days in advance. So they didn’t relate the two questions directly, but who knows — maybe they are actually working on making money out of this great piece of work:-). What I found particularly interesting was that a general sentiment (positive/negative) was not very helpful, whereas when they fragmented a sentiment into 6 different dimensions like calmness or happiness, they found out that the former is really helpful.