WebScience: The Next Big Thing or a Buzzword?

Let me start this post with a question each graduate in computer or information science is supposed to be able to answer: What is Web? Besides a completely right but in our context useless Salomonian answer that it’s a spider’s tool to catch insect, we really need to answer this question if we want to study Web. Let’s accept just for the purpose of this post that web is a global communication space using Internet as a medium. Note that this does not directly exclude non-HTTP communication. I will get back to this at the end of the post. Well, we have it defined and now we may study it! Why? Because it has penetrated our lives to an extent where it is advisable to know more precisely:

  • What new types of interactions and behaviours of people it brought?
  • What is the relation between large-scale communication structures like free software movement, social networks, Wikipedia, etc. and individual motivations and actions from which these structures emerge?
  • What is an economic impact of Web? Does it affect the ways we perceive/create wealth? Has it brought some new types of utility that has never occurred before?
  • Are there any differences between social norms and stereotypes between off-line and on-line worlds? Are there two different notions of privacy, friendship, … between those two worlds?
  • What are the proper scientific methods for studying the Web?

Connecting now 28.7% of the Earth’s population and still rapidly growing, the Web is becoming a ubiquitous part of our culture. Therefore, the study of similar questions is inevitable if we want to catch up in understanding it. The Science of the Web, or WebScience, has thus been pushed forward as an independent research stream by figures like Tim Berners-Lee, Nigel Shadbolt, or Dame Wendy Hall. At the end of September, two public events related to WebScience were organized by the Royal Society.

The first one took place at Royal Society centre in London for two days and about two hundreds of participants could be there I guess.  The speakers were mostly the core figures of their respective disciplines which are understood as influential, inspiring, and fundamental elements of what WebScience is supposed to be. Namely, network science (Albert László-Barabási, Jon Kleinberg), mathematics (Jennifer Chayes, Robert May), sociology (Manuel Castells), computer science and engineering (Tim Berners-Lee, Luis von Ahn), communication science (Noshir Contractor), and law (Jonathan Zittrain). There were more speakers, as you can see in the programme, but those listed here particularly arrested my attention and somehow remained in my mind. Being a computer scientist working on graph mining techniques, I was particularly amazed by Jon Kleinberg’s presentation on the state of the art in link prediction, structural balance in graphs, and other things which I surely do not remember completely, so I am looking forward to the video streams recorded on site. Another great talk was given by Luis von Ahn. An excellent presenter with smart ideas that help world to digitize books or translate Wikipedia by employing millions of people (very likely even you!) without them being necessarily aware of it! Jennifer Chayes presented some advancements in mathematical apparatus for handling of network data – in particular a proper sampling of networks and game theoretic approaches for modelling of dynamics on social networks. Having some elementary background and being interested in political economics, I particularly enjoyed Bob May’s talk on how model’s of spread of diseases are similar to models of financial crisis. I also liked his side-comments on current political neo-libertanian doctrine and its influence on the current mess, which were only seemingly of marginal importance – in fact, they were very essential for the whole talk, I would say. I was waiting the whole two days for some presentation about the semantic web – and finally with the presentation of Tim Berners-Lee, I had lived to see it. He mainly told about the current Linked Data project and what are the bottlenecks of the present semantic web – namely it was the lack of higher level libraries/components for work with RDF data. It was nice to hear that because it means that there are RDF data out there already and now it’s time to consume them! In fact, Tim’s talk was not the first one about the semantic web – David Karger showed us an interesting way how to produce and visualize RDF data in browser using Exhibit. I really loved that talk, because it was a nice introduction into rich possibilities structured on-line data give us but without mentioning words like triple, logic, ontology, RDF, etc. And the whole platform seems to be really useful for creation of rich on-line presentation of mid-size data sets. All aforementioned speakers were presenting personally – except Jonathan Zittrain, whose speech was transmitted on-line. His presentation had a provocative title: Will the web break? He spoke about different legislative problems related to services like web archive, which operate on the edge of the law (or even illegally), because of obsolete copyright law. Quite interesting remark was also about URL shorteners like bit.ly, that simply can cause parts of the web to break, as if they stop to operate, part of the hyperlink structure will become dead. Regarding .ly domain, Tim Berners-Lee recently twitted about the potential infringements of the online human rights by the Lybian government, under which jurisdiction this domain belong, so it is really worth to think about which one to use.

The second satellite meeting was in a certain sense a continuation of the big discussion event in London. It was held in a lovely countryside house near Milton Keynes and was organized as a series of short presentations and follow-up workshops focused on several defining aspects of WebScience like network science, linked (government) data, crowdsourcing, etc. There was much more space to discuss things and people made use of it. On Wednesday evening, there was also a poster session, where I presented a one about our work on cross-community analysis. As there were only 9 posters altogether, it was a great opportunity to get a feedback. I think I may say that our work was quite well accepted there:-). All posters are accessible either as a list, or as a map. What I was missing there was a dedicated block on methodology of Web Science. At the end of this two-day event, there was a short workshop in which one group was working on methodology-related topics, but this was IMO insufficient. I think if Web Science is supposed to be a real scientific discipline and not just a label for a bunch of loosely related topics of different disciplines with a Web as a common denominator, we really do need a common language, methodology toolkit – a common paradigm. I am aware of that the whole discipline is just at its infancy and that this may be overcame in the future, but I think it is important to keep this in mind as a number one priority, because otherwise the Web Science itself may become just a buzzword and a missed opportunity.

Now I am getting back to the beginning of this post, where I postponed the question which Internet services we may consider as the Web and which ones we may not? I think it is quite unfortunate to call this endeavour Web Science without properly making a distinction between World Wide Web as a service relying on HTTP and the global communication space in a more general sense. If we constrain the WebScience just only on communication realized via HTTP, we are shooting ourselves into our own feet, because we are putting aside many other interesting parts of cyberspace: IRC, World of Warcraft, Usenet, e-mail, FTP, BitTorrent, … Without any doubts, the World Wide Web is the most important service of the Internet if it comes to communication, but it is not the only one. Things become even more complicated with some people pushing forward a term Internet Science. What are the relation of these two: Web and Internet Sciences? I have always assumed that Internet is a set of low-level protocols, wires, routers and other hardware, whose only purpose is to transmit packets from point A to point B. So in that interpretation there is no space to investigate the actual communication process between humans and an actual impact of these processes on the behaviour of people. And that is what I find the most interesting on Web Science.