July 23, 2010
Does LinkedIn help or disrupt headhunters?
(I am looking for a M.Sc. student(s) to research this question for his/her/their thesis.)
The first users of LinkedIn were, as far as I can tell, headhunters (at least the first users with 500+ contacts and premium subscriptions.) It makes sense - after all, having a large network of professionals in many companies is a requirement for a headhunter, and LinkedIn certainly makes it easy not only to manage the contacts and keep in touch with them, but also allows access to each individual contact's network. However, LinkedIn (and, of course, other services such as Facebook, Plaxo, etc.) offers its services to all, making connections visible and to a certain extent enabling anyone with a contact network and some patience to find people that might be candidates for a position.
I suspect that the evolution of the relationship between headhunters and LinkedIn is a bit like that of fixed-line telephone companies to cell phones: In the early days, they were welcomed because they extended the network and was an important source of additional traffic. Eventually, like a cuckoo's egg, the new technology replaced the old one. Cell phones have now begun to replace fixed lines. Will LinkedIn and similar professional networks replace headhunters?
If you ask the headhunters, you will hear that finding contacts is only a small part of their value proposition - what you really pay for is the ability to find the right candidate, of making sure that this person is both competent, motivated and available, and that this kind of activity cannot be outsourced or automated via some computer network. They will grudgingly acknowledge that LinkedIn can help find candidates for lower-level and middle-management, but that for the really important positions, you will need the network, judgment and evaluative processes of a headhunting company.
On the other hand, if you has HR departments charged with finding people, they will tell you that LinkedIn and to a certain extent Facebook is the greatest thing since sliced bread when it comes to finding people quickly, to vet candidates (sometimes discovering youthful indiscretions) and to establish relationships. I have heard people enthuse over not having to use headhunters anymore.
So, the incumbents see it as a low-quality irrelevance, the users see it as a useful and cheap replacement. To me, this sounds suspiciously like a disruption in the making, especially since, in the wake of the financial crisis, companies are looking to save money and the HR departments dearly would like to provide more value for less money, since they are often marginalized in the corporation.
I would like to find out if this is the case - and am therefore looking for a student or two who would like to do their Master's thesis on this topic, under my supervision. The research will be funded through the iAD Center for Research-based innovation. Ideally, I would want students who want to research this with a high degree of rigor (perhaps getting into network analysis tools) but I am also willing to talk to people who want to do it with more traditional research approaches - say, a combination of a questionnaire and interviews/case descriptions of how LinkedIn is used by headhunters, HR departments and candidates looking for new challenges.
So - if you are interested - please contact me via email at self@espen.com. Hope to hear from you!
Posted by Espen at 12:21 PM | TrackBack
April 28, 2010
Stephen Wolfram's computable universe
I love Wolfram Alpha and think it has deep implications for our relationship with information, indeed our use of language both in a human-computer interaction sense and as a vehicle for passing information to each other.
In this video from TED2010, Stephen Wolfram lays out (and his language and presentation had developed considerably since Alpha was launched a year ago) where Alpha fits as an exploration of a computable universe, enabling the experimental marriage of the precision of mathematics with the messiness of the real world.
This video is both radical and incremental: Radical in its bold statement that a thought experiment such as computable universes (see Neal Stephenson's In the beginning was...the command line, specifically the last chapter, for an entertaining explanation) actually could be generated and investigated is as radical as anything Wolfram has ever proposed. The idea of democratization of programming, on the other hand, is as old as COBOL - and I don't think Alpha or Mathematica is going to provide it - though it might go some way, particularly if Alpha gains some market share and the idea of computing things in real time rather than accessing stored computations takes hold.
Anyway - see the video, enjoy the spark of ideas you get from it - and try out Wolfram Alpha. My best candidate for the "insert brief insightful summary research" button I always have been looking for on my keyboard.
Posted by Espen at 12:00 PM | TrackBack
March 17, 2010
Towards a theory of technology evolution
The Nature of Technology: What It Is and How It Evolves by W. Brian Arthur
My rating: 4 of 5 stars
Arthur sets out to articulate a theory of technology, and to a certain extend succeeds, at least in articulating the importance of technology and the layered, self-referencing and self-creating nature of its evolution.
The two main concepts I took away were the layered nature of technology, consisting of these three points:
- Technology is a combination of components.
- Each component is itself a technology.
- Each technology exploits an effect or phenomenon (and usually several)
Secondly, Arthur lays out, in four separate chapters, the four different ways technology evolves, as summarized on page 163 (my italics added):
There is no single mechanism, instead there are four more or less separate ones. Innovation consists in novel solutions being arrived at in standard engineering - the thousands of small advancements and fixes that cumulate to move practice forward. It consists in radically novel technologies being brought into being by the process of invention. It consists in these novel technologies developing by changing their internal parts or adding to them in the process of structural deepening. And it consists in whole bodies of technology emerging, building out over time, and creatively transforming the industries that encounter them. Each of these types of innovation is important. And each is perfectly tangible. Innovation is not something mysterious. Certainly it is not a matter of vaguely invoking something called "creativity." Innovation is simply the accomplishing of the tasks of the economy by other means."
I liked the book for its ambition, view of technology as something that evolves, and clear-headed way of thinking about and expressing a beginning grand theory. The concepts are intuitive and beguiling, but I did miss references to - and attempts to build on, or differentiate itself from - other valuable concepts of technology, such as sustaining vs. disruptive, competence-enhancing vs. competence-destroying, architectural vs. procedural, and so on. There is a lot of research going on in this area - we are about to break up the formerly black and mysterious box called innovation and show that it really comes down to subcategories and the interplay of quite understandable drivers. Arthur's contribution here is significant - but it is, at least the way I read it, the way of the independent thinker who would have a lot more influence if some of the language and some of the categories were a bit closer to, or at least distinctively positioned in relation to, what others think and say.
Posted by Espen at 10:06 PM | TrackBack
November 4, 2009
GRA6821 Eleventh lecture: Search technology and innovation
(Friday 13th November - 0830-about 1200, room A2-075)
FAST is a Norwegian software company that was acquired by Microsoft about a year and a half ago. In this class (held with an EMBA class, we will hear presentations from people in FAST, from Accenture, and from BI. The idea is to showcase a research initiative, to learn something about search technology, and to see how a software company accesses the market in cooperation with partners.
To prepare for this meeting, it is a good idea to read up on search technology, both from a technical and business perspective. Do this by looking for literature on your own - but here are a few pointers, both to individual articles, blogs, and other resources:
Articles:
- How search engines work: Start with Wikipedia on web search engines, go from there.
- Brin, S. and L. Page (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International WWW Conference, Brisbane, Australia. (PDF). The paper that started Google.
- Rangaswamy, A., C. L. Giles, et al. (2009). "A Strategic Perspective on Search Engines: Thought Candies for Practitioners and Researchers." Journal of Interactive Marketing 23: 49-60. (in Blackboard). Excellent overview of some strategic issues around search technology.
- Ghemawat, S., H. Gobioff, et al. (2003). The Google File System. ACM Symposium on Operating Systems Principles, ACM. (this is medium-to-heavy-duty computer science - I don't expect you to understand this in detail, but not the difference of this system to a normal database system: The search system is optimized towards an enormous number of queries (reads) but relatively few insertions of data (writes), as opposed to a database, which is optimized towards handling data insertion fast and well.)
- These articles on Google and others.
Blogs
- O'Reilly Radar (main page|search specific)
- John Batelle's Searchblog (main page|search specific)
- Yours truly on search (considerable duplication here).
Others
- This changes almost on a daily basis, but here is a blogpost with some pointers.
Longer stuff, such as books:
- Barroso, L. A. and U. Hölzle (2009). The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Synthesis Lectures on Computer Architecture. M. D. Hill, Morgan & Claypool. (Excellent piece on how to design a warehouse-scale data center - i.e., how do these Google-monsters really work?)
- Weinberger, D. (2007). Everything is Miscellaneous: The Power of the New Digital Disorder. New York, Henry Holt and Company. Brilliant on how the availability of search changes our relationship to information.
- Morville, P. (2005). Ambient Findability, O'Reilly. See this blog post.
- Batelle, J. (2005). The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. London, UK, Penguin Portfolio. See this blog post.
Posted by Espen at 3:07 PM | Comments (3) | TrackBack
September 17, 2009
Our search-detected personalities
Personas is an interesting project at the Media Lab which takes your (or anyone else's) name as input and then determines our personalities based on what it finds about us on the web, generating a graphical representation. This is my result:
...which I found rather disturbing: Fame, sports and religion seems to take way to much space here. The reason, of course, is that my name is rather common in Norway, and, for example, a formerly well known skier skews the results, even though I seem to be the most web-known person with that name.
Anyway, if you have a rare name, it might be accurate - and if your name is John Smith, you might be left with an average, possibly tilted a bit towards Pocahontas:
Anyway - try it out. You might be surprised. And please remember - this is an art project, not an accurate representation of anything...
Update September 20:I somehow forgot to point to Naomi Haque's blog post about Personas, with discussion of how social networking changes our perception of self.
Posted by Espen at 6:13 PM | Comments (1) | TrackBack
August 24, 2009
Are social networks a help or a threat to headhunters?
In a currently hot Youtube video which breathlessly evangelizes the revolutionary nature of social networks, I found this statement: "80% of companies are using LinkedIn as their primary tool to find employees". In the comments this is corrected to "80 percent of companies use or are planning to use social networking to find and attract candidates this year", which sounds rather more believable. Social media is where the young people (and, eventually, us in the middle ages as well) are, so that is where you should look.
At the same time, many of the most prolific users of LinkedIn (and, at least according to this guy, Twitter), both in terms of number of contacts and other activities, are headhunters. It is these people's business to know many people and be able to find someone who matches a company's demands.
Headhunters are the proverbial networkers - they derive their value from knowing not just many people, but the right people. In particular, headhunters that know people in many places are valuable, because they would then be the only conduit between one group and another. Your network is more valuable the fewer of your contacts are also in contact with each other.
The American sociologist Ronald S. Burt, in his book Structural Holes: The Social Networks of Competition (1992), showed that social capital accrues to those who not only know many people, but have connections across groups. Or, in other words, if everyone had been directly linked, you would have a dense network structure. The fact that we aren't, means that there are structural holes - hence the term. In the picture to the right, we see a social network of 9 individuals. Person A here derives social capital from being the link two groups that otherwise are only internally connected. A would be an excellent headhunter here. (Much as profits only can be generated if you can locate market imperfections).
LinkedIn is a social networks, indistinguishable from a regular one (i.e., one that is not digitally facilitated) except that you can search across the network, directly up to three levels away, indirectly a bit further. Headhunters like it for this reason, and use it extensively in the early phases of locating a candidate. The trouble is, LinkedIn (not to mention the tendency of more and more people having their CV online on regular websites) makes searching for candidates easy for everyone else as well. In other words - while initially helpful, is the long term result of this searchability that headhunters will no long be necessary.
Search technology - in social networks as well as in general - lowers the transaction cost of finding something. Lower transaction costs favors coordination by markets rather than hierarchy (or, in this case, network). Hence, the value of having a central position in that network should diminish. On the other hand, search technology (in networks in particular) allows you to extend your network, hence increase your social capital. Which effect is stronger remains to be seen.
Anyway, this should make for interesting research. Anyone out there in headhunterland interested in talking to me about their use of these tools?
Posted by Espen at 9:20 PM | Comments (4) | TrackBack
June 2, 2009
Plagiarism showcased - and a call for action
I hate plagiarism, partially because it has happened to me, partially because I publish way too little because I overly self-criticize for lack of original thinking, partly because I have had it happen with quite a few students and am getting more and more tired of having to explain even to executive students with serious job experience that clipping somebody else's text and showing it as your own is not permissible - this year, I even had a student copy things out of Wikipedia and argue that it wasn't plagiarism because Wikipedia is not copyrighted.
I suspect plagiarism is a bigger problem than we think. The most recent spat is noted in Boing Boing - read the comments if you want a good laugh and some serious discussion. (My observation, not particularly original: Even if this thing wasn't plagiarized, isn't this rather thin for a doctoral dissertation?)
The thing is, plagiarism will come back to bite you, and with the search tools out there, I can see a point in a not too distant future where all academic articles ever published will be fed into a plagiarism checker, with very interesting results. Quite a few careers will end, no doubt after much huffing and puffing. Johannes Gehrke and friends at Cornell have already done work on this for computer science articles - I just can't wait to see what will come out of tools like these when they really get cranking. I seem to remember Johannes as saying that most people don't plagiarize, but that a few seem to do it quite a lot.
It is high time we turn the student control protocols loose on published academic work as well. Nothing like a many eyeballs to dig out that shallowness....
Posted by Espen at 11:48 PM | Comments (1) | TrackBack
A wave of Google
This presentation from the Google I/O conference is an 80-minute demonstration of a really interesting collaborative tool that very successfully blends the look and feel of regular tools (email, Twitter) with the embeddedness and immediacy of Wikis and share documents. I am quite excited about this and hope it makes it out in the consumer space and does not just rest inside single organizations - collaborative spaces can create a world of many walled gardens, and being a person that works as much between organizations as in them.
Google wave really shows the power of centralized processing and storage. Here are some things I noted and liked:
- immediate updating (broadcast) to all clients, keystroke by keystroke
- embedded, fully editable information objects
- history awareness (playback interactions)
- central storage and broadcast means you can edit information objects and have the changes reflect back to previous views, which gives a pretty good indication that the architecture of this system is a tape of interactions played forward
- concurrent collaborative editing (I want this! No more refreshes!)
- cool extensions, such as a context-aware spell checker, an immediate link creator, concurrent searcher
- programs are seen as participants much like humans
- easy developer model, all you need to do is edit objects and store them back
- client-side and server-side API
- interactions with outside systems
I can see some strategic drivers behind this: Google is very much threatened by walled gardens such as Facebook, and this could be a great way of breaking that open (remember, programs go from applications to platforms to protocols, and this is a platform built over OpenSocial, which jams open walled gardens). This could just perhaps be what I need to be able to more effectively work over several organizations. Just can't wait to try this out when it finally arrives.
From surfing the net to surfing the waves....
Update: Here is the Google Blog entry describing Wave from Lars Rasmussen.
Posted by Espen at 11:13 PM | Comments (1) | TrackBack
May 21, 2009
From links to seeds: Edging towards the semantic web
Wolfram Alpha just may take us one step closer to the elusive Semantic Web, by evolving a communication protocol out of its query terms.
(this is very much in ruminating form - comments welcome)
Wolfram Alpha officially launched on May 18, an exciting new kind of "computational" search engine which, rather than looking up documents where your questions have been answered before, actually computes the answer. The difference, as Stephen Wolfram himself has said, is that if you ask what the distance is to the moon, Google and other search engines will find you documents that tells you the average distance, whereas Wolfram Alpha will calculate what the distance is right now, and tell you that, in addition to many other facts (such as the average). Wolfram Alpha does not store answers, but creates them every time. And it does primarily answer numerical, computable questions.
The difference between Google (and other search engines) and Wolfram Alpha is not so clear-cut, of course. If you ask Google "17 mpg in liters per 100km" it will calculate the result for you. And you can send Wolfram Alpha non-computational queries such as "Norway" and it will give an informational answer. The difference lies more in what kind of data the two services work against, and how they determine what to show you: Google crawls the web, tracking links and monitoring user responses, in a sense asking every page and every user of their services what they think about all web pages (mostly, of course, we don't think anything about most of them, but in principle we do.) Wolfram Alpha works against a database of facts with a set of defined computational algorithms - it stores less and derives more. (That being said, they will both answer the question "what is the answer to life, the universe and everything" the same way....)
While the technical differences are important and interesting, the real difference between WA and Google lies in what kind of questions they can answer - to use Clayton Christensen's concept, the different jobs you would hire them to do. You would hire Google to figuring out information, introduction, background and concepts - or to find that email you didn't bother filing away in the correct folder. You would hire Alpha to answer precise questions and get the facts, rather than what the web collectively has decided is the facts.
The meaning of it all
Now - what will the long-term impact of Alpha be? Google has made us replace categorization with search - we no longer bother filing things away and remembering them, for we can find them with a few half-remembered keywords, relying on sophisticated query front-end processing and the fact that most of our not that great minds think depressingly alike. Wolfram Alpha, on the other hand, is quite a different animal. Back in the 80s, I once saw someone exhort their not very digital readers to think of the personal computer as a "friendly assistant who is quite stupid in everything but mathematics." Wolfram Alpha is quite a bit smarter than that, of course, but the fact is that we now have access to this service which, quite simply, will do the math and look up the facts for us. Our own personal Hermione Granger, as it is.
I think the long-term impact of Wolfram Alpha will be to further something that may not have started with Google, but certainly became apparent with them: The use of search terms (or, if you will, seeds) as references. It is already common to, rather than writing out a URL, to help people find something by saying "Google this and you will find it". I have a couple of blogs and a web page, but googling my name will get you there faster (and you can misspell my last name and still not miss.) The risk in doing that, of course, is that something can intervene. As I read (in this paper) General Motors, a few years ago, had an ad for a new Pontiac model, at the end of which they exhorted the audience to "Google Pontiac" to find out more. Mazda quickly set up a web page with Pontiac in it, bought some keywords on Google, and quite literally Shanghaied GM's ad.
Wolfram Alpha, on the other hand, will, given the same input, return the same answer every time. If the answer should change, it is because the underlying data has changed (or, extremely rarely, because somebody figured out a new way of calculating it.) It would not be because someone external to the company has figured out a way to game the system. This means that we can use references to Wolfram Alpha as shorthand - enter "budget surplus" in Wolfram Alpha, and the results will stare you in the face. In the sense that math is a language for expressing certain concepts in a very terse and precise language, Wolfram Alpha seeds will, I think, emerge as a notation for referring to factual information.
A short detour into graffiti
Back in the early-to-mid-90s, Apple launched one of the first pen-based PDAs, the Apple Newton. The Newton was, for its time, an amazing technology, but for once Apple screwed it up, largely because they tried to make the device do too much. One important issue was the handwriting recognition software - it would let you write in your own handwriting, and then try to interpret it. I am a physician's son, and I certainly took after my father in the handwriting department. Newton could not make sense of my scribbles, even if I tried to behave, and, given that handwriting recognition is hard, it took a long time doing it. I bought one, and then sent it back. Then the Palm Pilot came, and became the device to get.
The Palm Pilot did not recognize handwriting - it demanded that you, the user, wrote to it in a sign language called Graffiti, which recognized individual characters. Most of the characters resembled the regular characters enough that you could guess what they were, for the others you either had to consult a small plastic card or experiment. The feedback was rapid, to experimenting usually worked well, and pretty soon you had learned - or, rather, your hand had learned - to enter the Graffiti characters rapidly and accurately.
Wolfram Alpha works in the same way as Graffiti did: As Steven Wolfram says in his talk at the Berkman Center, people start out writing natural language but pretty quickly trim it down to just the key concepts (a process known in search technology circles as "anti-phrasing".) In other words, by dint of patience and experimentation, we (or, at least, some of us) will learn to write queries in a notation that Wolfram Alpha understands, much like our hands learned Graffiti.
From links to seeds to semantics
Semantics is really about symbols and shorthand - a word is created as shorthand for a more complicated concept by a process of internalization. When learning a language, rapid feedback helps (which is why I think it is easier to learn a language with a strict and terse grammar rather than a permissive one), simplicity helps, and a structure and culture that allows for creating new words by relying on shared context and intuitive combinations (see this great video with Stephen Fry and Jonathan Ross on language creation for some great examples.)
And this is what we need to do - gather around Wolfram Alpha and figure out the best way of interacting with the system -and then conduct "what if" analysis of what happens if we change the input just a little. To a certain extent, it is happening already, starting with people finding Easter Eggs - little jokes developers leave in programs for users to find. Pretty soon we will start figuring out the notation, and you will see web pages use Wolfram Alpha queries first as references, then as modules, then as dynamic elements.
It is sort of quirky when humans start to exchange query seeds (or search terms, if you will). It gets downright interesting when computers start doing it. It would also be part of an ongoing evolution of gradually increasing meaningfulness of computer messaging.
When computers - or, if you will, programs - needed to exchange information in the early days, they did it in a machine-efficient manner - information was passed using shared memory addresses, hexadecimal codes, assembler instructions and other terse and efficient, but humanly unreadable encoding schemes. Sometime in the early 80s, computers were getting powerful enough that the exchanges gradually could be done in human-readable format - the SMTP protocol, for instance, a standard for exchanging email, could be read and even hand-built by humans (as I remember doing in 1985, to send email outside the company network I was on.) The world wide web, conceived in the early 90s and live to a wider audience in 1994, had at its core an addressing system - the URL - which could be used as a general way of conversing between computers, no matter what their operating system or languages. (To the technology purists out there - yes, WWW relies on a whole slew of other standards as well, but I am trying to make a point here) It was rather inefficient from a machine communication perspective, but very flexible and easy to understand for developers and users alike. Over time, it has been refined from pure exchange of information to the sophisticated exchanges needed to make sure it really is you when you log into your online bank - essentially by increasing the sophistication of the HTML markup language towards standards such as XML, where you can send over not just instructions and data but also definitions and metadata.
The much-discussed semantic web is the natural continuation of this evolution - programming further and further away from the metal, if you will. Human requests for information from each other are imprecise but rely on shared understanding of what is going on, ability to interpret results in context, and a willingness to use many clues and requests for clarification to arrive at a desired result. Observe two humans interacting over the telephone - they can have deep and rich discussions, but as soon as the conversation involves computers, they default to slow and simple communication protocols: Spelling words out (sometimes using the international phonetic alphabet), going back and forth about where to apply mouse clicks and keystrokes, double-checking to avoid mistakes. We just aren't that good at communicating as computers - but can the computers eventually get good enough to communicate with us?
I think the solution lies in mutual adaptation, and the exchange of references to data and information in other terms than direct document addresses may just be the key to achieving that. Increases in performance and functionality of computers have always progressed in a punctuated equilibrium fashion, alternating between integrated and modular architectures. The first mainframes were integrated with simple terminal interfaces, which gave way to client-server architectures (exchanging SQL requests), which gave way to highly modular TCP/IP-based architectures (exchanging URLs), which may give way to mainframe-like semi-integrated data centers. I think those data centers will exchange information at a higher semantic level than any of the others - and Wolfram Alpha, with its terse but precise query structure may just be the way to get there.
Posted by Espen at 4:07 PM | Comments (2) | TrackBack
May 18, 2009
Interesting Wolfram Alpha statistics
Here is the answer you get from entering "budget surplus" into Wolfram Alpha:
Two things I did not know: The fifth largest government surplus in the world is held by Serbia, which surprises me, given that the country has 14% unemployment and a recovering economy, according to Wikipedia. And that Japan's deficit is very close to the US', indicating that things are not as bad in the US as you might think. Or perhaps that the numbers are a bit dated, but according to the source information, most of the numbers are from 2009.
Since May 17th is Norway's national day, I think it behooves me to point out that of the five surplus states listed above, Norway is the nicest place to live, by most measures (weather, culture, politics, human rights, health care, etc. etc.). On the other hand, many of the countries with large deficits are nice places to live, so I wouldn't read too much into the economics at all...
(Hat tip to Karthik, who retweeted one of my tweets, which I misunderstood and started researching....)
Posted by Espen at 1:18 AM | TrackBack
April 29, 2009
Stephen Wolfram talk on Wolphram Alpha
Enough said, watch it. As a colleague twittered: This will change computing.
(That being said, this is a very poor filming - there are no pictures of the screen, aside from a glimmer now and then.)
Posted by Espen at 7:24 PM | TrackBack
April 28, 2009
Notes from Stephen Wolfram webcast
These are my raw notes from the session with Stephen Wolfram on the pre-launch of the Wolfram Alpha service at the Berkman center. Unfortunately, I was on a really bad Internet connection and only got the sound, and missed the first 20 minutes or so running around trying to find something better.
Notes from Stephen Wolfram on Alpha debut
...discussion of queries:
- nutrition in a slice of cheddar
- height of Mount Everest divided by length of Golden Gate bridge
- what's the next item in this sequence
- type in a random number, see what it knows about it
- "next total solar eclipse"
What is the technology?
- computes things, it is harder to find answers on the web the more specifically you ask
- instead, we try to compute using all kinds of formulas and models created from science and package it so that we can walk up to a web site and have it provide the answer
- four pieces of technology:
-- data curation, trillions of pieces of curated data, free/licensed, feeds, verify and clean this (curate), built industrial data curation line, much of it requires human domain expertise, but you need curated data
-- algorithms: methods and models, expressed in Mathematica, there is a finite number of methods and models, but it is a large number.... now 5-6 million lines of math code
-- linguistic analysis to understand input, no manual or documentation, have to interpret natural language. This is a little bit different from trad NL processing. working with more limited set of symbols and words. Many new methods, has turned out that ambiguity is not such a bit problem once we have mapped it onto a symbolic representation
-- ability to automate presentation of things. What do you show people so they can cognitively grasp what you are, requires computational esthetics, domain knowledge.
Will run on 10k CPUs, using Grid Mathematica.
90% of the shelves in a typical reference library we have a decent start on
provide something authoritative and then give references to something upstream that is
know about ranges of values for things, can deal with that
try to give footnotes as best we can
Q: how do you deal with keeping data current
- many people have data and want to make it available
- mechanism to contribute data and mechanism for us to audit it
first instance is for humans to interact with it
there will be a variance of APIs,
intention to have a personalizable version of Alpha
metadata standards: when we open up our data repository mechanism, wn we use that can make data available
Questions from audience:
Differences of opinion in science?
- we try to give a footnote
- Most people are not exposed to science and engineering, you can do this without being a scientist
How much will you charge for this?
- website will be free
- corporate sponsors will be there as well, in sidebars
- we will know what kind of questions people ask, how can we ingest vendor information and make it available, need a wall of auditing
- professional version, subscription service
Can you combine databases, for instance to compute total mass of people in England?
- probably not automatically...
- can derive it
- "mass of people in England"
- we are working on the splat page, what happens when it doesn't know, tries to break the query down into manageable parts
300th largest country in Europe? - answers "no known countries"
Data sources? Population of Internet users. how do you choose?
- identifying good sources is a key problem
- we try do it well, use experts, compare
- US government typically does a really good job
- we provide source information
- have personally been on the phone with many experts, is the data knowable?
- "based on available mortality data" or something
Technology focus in the future, aside from data curation?
- all of them need to be pushed forward
- more, better, faster of what we have, deeper into the data
- being able to deal with longer and more complicated linguistics
- being able to take pseudocode
- being able to take raw data or image input
- it takes me 5-10 years to understand what the next step is in a project...
How do you see this in contrast with semantic web?
- if the semantic web had been there, this would be much easier
- most of our data is not from the web, but from databases
- within Wolfram Alpha we have a symbolic ontology, didn't create this as top down, mostly bottom-up from domains, merged them together when we realized similarities
- would like to do some semantic web things, expose our ontological mechanisms
At what point can we look at the formal specs for these ontologies?
- good news: All in symbolic mathematical code
- injecting new knowledge is complicated - nl is surprisingly messy, such as new terms coming in, for instance putting in people and there is this guy called "50 cent"
- exposure of ontology will happen
- the more words you need to describe the question, the harder it is
- there are holes in the data, hope that people will be motivated to fill them in
Social network? Communities?
- interesting, don't know yet
How about more popular knowledge?
- who is the tallest of Britney Spears and 50 cent
- popular knowledge is more shallowly computable than scientific information
- linguistic horrors, book names and such, much of it clashes
- will need some popularity index, use Wikipedia a lot, can determine whether a person is important or not
The meaning of life? 42....
Integration with CYC?
- CYC is most advanced common sense reasoning system
- CYC takes what they reason about things and make it computing strengths
- human reasoning not that good when it comes to physics, more like Newton and using math
Will you provide the code?
- in Mathematica, code tends to be succinct enough that you can read it
- state of the art of synthesizing human-readable theorems is not that good yet
- humans are less efficient than automated and quantitative qa methods
- in many cases you can just ask it for the formula
- our pride lies in the integration, not in the models, for they come from the world
- "show formula"
Will this be integrated into Mathematica?
- future version will have a special mode, linguistic analysis, pop it to the server, can use the computation
How much more work on the natural language side?
- we don't know
- pretty good at removing linguistic fluff, have to be careful
- when you look at people interacting with the system, but pretty soon they get lazy, only type in the things they need to know
- word order irrelevant, queries get pared down, we see deep structure of language
- but we don't know how much further we need to go
How does this change the landscape of public access to knowledge?
- proprietary databases: challenge is make the right kind of deal
- we have been pretty successful
- we can convince them to make it casually available, but we would have to be careful that the whole thing can't be lifted out
- we have yet to learn all the issues here
- have been pleasantly surprised by the extent to which people have given access
- there is a lot of genuinely good public data out there
This is a proprietary system - how do you feel about a wiki solution outcompeting you?
- that would be great, but
- making this thing is not easy, many parts, not just shovel in a lot of data
- Wikipedia is fantastic, but it has gone in particular directions. If you are looking for systematic data, properties of chemicals, for instance, over the course of the next two years, they get modified and there is not consistency left
- the most useful thing about Wikipedia is the folk knowledge you get there, what are things called, what is popular
- have thought about how to franchise out, it is not that easy
- by the way, it is free anyway...
- will we be inundated by new data? Encouraged by good automated curation pipelines. I like to believe that an ecosystem will develop, we can scale up.
- if you want this to work well, you can't have 10K people feeding things in, you need central leadership
Interesting queries?
- "map of the cat" (this is what I call artificial stupidity)
- does not know anatomy yet
- how realtime is stock data? One minute delayed, some limitations
- there will be many novelty queries, but after that dies down, we are left with people who will want to use this every day
How will you feel if Google presents your results as part of their results?
- there are synergies
- we are generating things on the fly, this is not exposable to search engines
- one way to do it could be to prescan the search stream and see if wolfram alpha can have a chance to answer this
Role for academia?
- academia no longer accumulates data, useful for the world, but not for the university
- it is a shame that this has been seen as less academically respectable
- when chemistry was young, people went out and looked at every possible molecule
- this is much to computer complicated for the typical libraries
- historical antecedents may be Leibniz' mechanical and computational calculators, he had the idea, but 300 years too early
When do we go live?
... a few weeks
- maybe a webcast if we dare...
Posted by Espen at 11:06 PM | TrackBack
April 21, 2009
What if you could remember everything?
I was delighted when I found this video, where James May (the cerebral third of Top Gear) talks to professor Alan Smeaton of Dublin City University about lifelogging - the recording of everything that happens to a person over a period of time, coupled with the construction of tools for making sense of the data.
In this example, James May wears a Sensecam for three days. The camera records everything he does (well, not everything, I assume - if you want privacy, you can always stick it inside your sweater) by taking a picture every 30 seconds, or when something (temperature, IR rays in front (indicating a person) or GPS location) changes. As it is said in the video, some people have been wearing these cameras for years - in fact, one of my pals from the iAD project, Cathal Gurrin, has worn one for at least three years. (He wore it the first time we met, where it snapped a picture of me with my hand outstretched.)
The software demonstrated in the video groups the pictures into events, by comparing the pictures to each other. Of course, many of the pictures can be discarded in the interest of brevity - for instance, for anyone working in an office and driving to work, many of the pictures will be of two hands on a keyboard or a steering wheel, and can be discarded. But the rest remains, and with powerful computers you can spin through your day and see what you did on a certain date.
And here is the thing: This means that you will increasingly have the option of never forgetting anything again. You know how it is - you may have forgotten everything about some event, and then something - a smell, a movement, a particular color - makes you remember by triggering whatever part (or, more precisely, which strands of your intracranial network) of your brain this particular memory is stored. Memory is associative, meaning that if we have a few clues, we can access whatever is in there, even though it had been forgotten.
Now, a set of pictures taken at 30-second intervals, coupled together in an easy-to-use and powerful interface, that is a rather powerful aide-de-memoire.
Forgetting, however, is done for a purpose - to allow you to concentrate on what you are doing rather than using spare brain cycles in constant upkeep of enormous, but unimportant memories. For this system to be effective, I assume it would need to be helpful in forgetting as well as remembering - and since it would be stored, you would actually not have to expend so much remember things - given a decent interface, you could always look it up again, much as we look things up in a notebook.
Think about that - remembering everything - or, at least being able to recall it at will. Useful - or an unnecessary distraction?
Posted by Espen at 9:45 PM | TrackBack
April 17, 2009
Search and effectiveness in creativity
Effective creativity is often accomplished by copying, by the creation of certain templates that work well, which are then changed according to need and context. Digital technology makes copying trivial, and search technology makes finding usable templates easy. So how do we judge creativity when combintations and associations can be done semi-automatically?
One of my favorite quotes is supposedly by Fyodor Dostoyevsky: "There are only two books written: Someone goes on a journey, or a stranger comes to town." Thinking about it, it is surprisingly easy to divide the books you have read into one or the other. The interesting part, however, lies not in the copying, but in the abstraction: The creation of new categories, archetypes, models and templates from recognizing new dimensions of similarity in previously seemingly unrelated instances of creative work.
Here is a demonstration, fresh from Youtube, demonstrating how Disney reuses character movements, especially in dance scenes:
Of course, anyone who has seen Fantasia recognizes that there are similarities between Disney movies, even schools (the "angular" once represented by 101 Dalmatians, Sleeping Beauty and Mulan, and the more rounded, cutesy ones represented by Bambi, The Jungle Book and Robin Hood. (Tom Wolfe referred to this difference (he was talking about car design, but what the heck, as Apollonian versus Dionysian, and apparently borrowed that distinction from Nietsche. But I digress.)
This video, I suspect, was created by someone recognizing movements, and putting the demonstration together manually. But in the future, search and other information access technologies will allow us to find such dimensions simply by automatically exploring similarities in the digital representations of creative works - computers finding patterns were we do not.
One example (albeit aided by human categorization) of this is the Pandora music service, where the user enters a song or an artist, and Pandora finds music that sounds similar to the song or artist entered. This can produce interesting effects: I found, for instance, that there is a lot of similarity (at least Pandora seems to think so, and I agree, though I didn't see it myself) between U2 and Pink Floyd. And imagine my surprise when, on my U2 channel (where the seed song was Still haven't found what I'm looking for) when a song by Julio Iglesias popped up. Normally I wouldn't be caught dead listening to Julio Iglesias, but apparently this one song was sufficiently similar in its musical makeup to make it into the U2 channel. (I don't remember the name of the song now, but remember that I liked it.)
In other words, digital technology enables us to discover categorization schemes and visualize them. Categorization is power, because it shapes how we think about and find information. In business terms, new ways to categorize information can mean new business models or at least disruptions of the old. Pandora has interesting implications for artist brand equity, for instance: If I wanted to find music that sounded like U2 before, my best shot would be to buy a U2 record. Now I can listen to my Youtube channel on Pandora and get music from many musicians, most of whom are totally unknown to me, found based on technical comparisons of specific attributes of their music (effectively, a form of factor analysis) rather than the source of the creativity.
I am not sure how this will work for artists in general. On one hand, there is the argument that in order to make it in the digital world, you must be more predictable, findable, and (like newspaper headlines) not too ironic. On the other hand, is that if you create something new - a nugget of creativity, rather than a stream - this single instance will achieve wider distribution than before, especially if it is complex and hard to categorize (or, at least, rich in elements that can be categorized but inconclusive in itself.)
Susan Boyle, the instant surprise on the Britain's Got Talent show, is now past 20 million views on Youtube and is just that - an instant, rich and interesting nugget of information (and considerable enjoyment) which more or less explodes across the world. She'll do just fine in this world, thank you very much. Search technology or not...
Posted by Espen at 3:13 PM | TrackBack
April 2, 2009
Google edging closer to being "the new Microsoft"
A few years ago, I wrote an essay about how Microsoft had become the new IBM - i.e., the dominant, love-to-hate company of the computer industry. In this interesting article, John Lanchester discusses how Google now is stepping into that role, with its aggressive moves into making the world searchable, and a lot more than you would like findable. Interesting point:
[...] as Google makes clear, nothing short of a court order is going to stop it digitising every book in print. Google doesn’t accept that that constitutes a violation of copyright. But the company won’t even discuss the physical process by which it scans the books: a classic example of how very free it is with other people’s intellectual property, while being highly protective of its own.
This issue, in all its various forms, isn’t going to go away. Book Search, Street View and many of Google’s other offerings simply bulldoze existing ideas of how things are and how they should be done. I was highly critical of Gmail when it first came in, on the grounds that the superbly effective mail system came at the unacceptable price of allowing Google to scan all emails and place text ads. But I soon began using it, because it was free, and because it’s such good software, and because I frankly never noticed the ads.
He goes on to show how a hard disk crash and a botched backup restore left him without his documents, until it dawned on him that, yes, Gmail had them all, ready for download. So big brothers can be nice, but they are still Big Brothers...
Posted by Espen at 8:36 PM | TrackBack
March 17, 2009
Shirky on newspapers
Clay Shirky, the foremost essayist on the Internet and its boisterous intrusion into everything, has done it again: Written an essay on something already thoroughly discussed with a new and fresh perspective. This time, it is on the demise of newspapers - the short message is that this is a revolution, and saving newspapers just isn't going to happen, because this is, well, a revolution:
[..]I remember Thompson [in 1993] saying something to the effect of “When a 14 year old kid can blow up your business in his spare time, not because he hates you but because he loves you, then you got a problem.” I think about that conversation a lot these days.
[..]
Revolutions create a curious inversion of perception. In ordinary times, people who do no more than describe the world around them are seen as pragmatists, while those who imagine fabulous alternative futures are viewed as radicals. The last couple of decades haven’t been ordinary, however. Inside the papers, the pragmatists were the ones simply looking out the window and noticing that the real world was increasingly resembling the unthinkable scenario. These people were treated as if they were barking mad. Meanwhile the people spinning visions of popular walled gardens and enthusiastic micropayment adoption, visions unsupported by reality, were regarded not as charlatans but saviors.
[..]
That is what real revolutions are like. The old stuff gets broken faster than the new stuff is put in its place. The importance of any given experiment isn’t apparent at the moment it appears; big changes stall, small changes spread. Even the revolutionaries can’t predict what will happen. Agreements on all sides that core institutions must be protected are rendered meaningless by the very people doing the agreeing. (Luther and the Church both insisted, for years, that whatever else happened, no one was talking about a schism.) Ancient social bargains, once disrupted, can neither be mended nor quickly replaced, since any such bargain takes decades to solidify.
And so it is today. When someone demands to know how we are going to replace newspapers, they are really demanding to be told that we are not living through a revolution. They are demanding to be told that old systems won’t break before new systems are in place. They are demanding to be told that ancient social bargains aren’t in peril, that core institutions will be spared, that new methods of spreading information will improve previous practice rather than upending it. They are demanding to be lied to.
That simple. He draws the line back to the Gutenberg printing press and the enormous transition that caused - much more chaotic that you would think with 500 year hindsight.
Highly recommended. And another piece of reading for my suffering students....
Posted by Espen at 4:40 PM | TrackBack
March 3, 2009
Interesting search: Oodle.com
Oodle.com is a federated search engine for classified ads - it does not (at least as far as I know) have its own ads, but act as a portal to other ad sites, presumably in return for a share of profits.
The value created is partly from the interface technology (enter "Mercedes 450 SEL 6.9" and it knows you are looking for a car and format the page so that you can drill down on models and years) and partly in that it accesses all kinds of local and community-based listings.
When I was looking for my used Mercedes I searched large advertisers such as cars.com and autotrader.com - but they only show their own ads. Oodle.com would have found me more cars (though not any I would have bought rather than the car I did get.) Useful because many markets are local and therefore hidden if you come from outside.
Posted by Espen at 5:20 AM | TrackBack
February 27, 2009
Interesting search
Since I am doing research on search, I thought I would create a list of interesting search-based web sites here, with individual blog entries describing each site and why they are interesting. Here is a starting list, which, of course, will be added to as I discover more interesting sites.
Interfaces:
- Searchme.com - visual search interface reminiscent of iPod Touch album covers (or, rather, the other way around)
- New York Times - search-based editorial pages (topic pages) (conversational interface)
- Times of London - search-based editorial pages (topic pages) defined by user (conversational interface)
- Yahoo Mindset - intent-driven (or rather, intent-revealing) interface for product search. This is no longer available, but this blogpost has an explanation and a graphic of the "intent slider".
Federated search
Rich media search
- SnapTell - instant product identification from mobile photo
- TinEye - image-matching search (great service, but unfortunately the index is rather small)
- Shazam - music-matching search for mobile phones (not quite query by humming, but close...) See article in CACM.
Regionals
- Indian search engines: asklaila.com (local search)
- Chinese search engines: Baidu (a serious competitor to Google)
- Sesam.no search engine: Specializing in Norwegian content not easily available on Google, such as relationships between people.
Specialists
- OpenCalais - metadata generator, useful for understanding how machines read your text
... more to come ...
By all means - feel free to make suggestions!
Posted by Espen at 12:24 PM | Comments (4) | TrackBack
February 11, 2009
FAST Forward 2009: Notes from the third day
Bjørn Olstad: Microsoft’s vision for enterprise search
Search as a transparent and ubiquitous layer providing information and context seamlessly – from a search box (tell me what you want in 1.4 words and I will answer) to a conversational interface (giving pointers to more information and suggestions for continued searches, to a natural interface.
Demo of Microsoft Surface: Camera interface, can recognize things. Multiuser (as opposed to Apple. Showed an application built on search with touch – whenever you touch an information object a query goes towards an ESP implementation and brings up all the information available on that object.
Very impressive demo of Excel Gemini: How do you fit enterprise data into Excel. (Picture of a VW bug with a jet engine.) Pulls 100 million rows into Excel, sort them (instantly), slices and dices. Built on top of ESP, does extreme compression, takes advantage of high memory, allows publishing of live spreadsheets to Sharepoint. Extremely impressive, worth the whole conference.
Bjørn continues talking about search as a platform: Demoing Globrix.com, where you can ask questions about apartments and houses and get a rich search experience where you can change attributes and the data changes dynamically. Globrix does not hold content themselves, but crawls available content on the web and shows it (much like Kayak.com for airline tickets).
Another demo: Search for entertainment based on location, friends and content. Moving from there to a focused movie site. This is federated search that understands some of the semantics (understands that “David Bowie” refers to a person and therefore only search certain databases.) Also incorporates community (letting users edit the results and feed them back).
FAST AdMomentum – advertising network – has had tremendous growth.
Content analytics: How can you lay a foundation for a good search experience by focusing on data quality? Demo: Content Integration Studio, sucking out semantics from unstructured text and writing it back both to the search engine and to databases (such as an HR database).
Panel session on enterprise search
Hitachi consulting (Ellen): Very big focus on the economy now, almost all conversations are about that topic. eDiscovery is important: Looking at many sources with a view towards risk discovery and risk mitigation.
EMC consulting (Mark Stone): Natural interfaces will be important, frees up the mind to focus on the information rather than the interface. Shows a video of a small girls using the Surface table and how she very quickly starts to focus on the pictures she is manipulating rather than the interface – she completely forgets that she is working with a computer.
Sue Feldman, IDC: We have to get beyond the document paradigm. I want to see interfaces that will immerse me in the sea of information and explore it, without having to think about what application it is in.
Sue Feldman: Core issue with search: Data quality and making it a rich experience for the user. Anthropological, linguistic and cultural issues, getting people to understand both what they are seeing and what they are looking for. We are just beginning on this journey. From keyword matching and relevance ranking to pulling the user in, having a dialogue with the information. What we are seeing is hybrid systems that combine collaboration, search, analysis etc.
AMR Research: There is a religious war going on, between collaborative systems, portals, content management systems, and search. They all claim to be the answer to the problem of connecting users with their data. There is also consolidation in the market, partially driven by the economy, but there is also a consolidation of functionality and an explosion in new ideas, many small companies coming up with new ideas. No one technology is going to solve all of these problems. Lots of opportunity because Microsoft is gobbling up all these technologies, trying to provide one product that covers most (Sharepoint).
Q: Examples of interaction management?
Hitachi consulting: Best examples currently found in collaboration and community software.
EMC: There is a tool out there that searches not only blogs, but specifically the comment sections of blogs, looking for mentions of products. Do sentiment analysis, find out what the customers are saying about you.
Sue Feldman: Searching through corporate communications in lawsuit situations. Ad targeting. And what is the relationship between search and innovation?
Hitachi: Innovation comes from finding what you did not expect to find.
Q: This question always comes up: Search is a commodity – or is it? What is the current market doing for search adoption?
AMR: I am not sure who says that, there is so much room for innovation, so I can’t understand why anyone would say it is commoditized. Go out there and find the opportunities.
Sue F: Well, search is a tool, like a screwdriver. But I really need a screwdriver. The toolbox has expanded so much. I see the search market continuing to explode even though the technology is tanking. Possible that we will see a disruption with a new platform based on information management, access and collaboration.
EMC: We are seeing growth, the business will mature because companies have to focus on what the business really needs.
Sue Feldman & others: Search use awards
Customer awards:
- Best productivity advancement: Verizon Business.
- Best digital market application (I): McGraw-Hill Platts (doing industry-specific searches, 50% increase in trial subscriptions, 40% increase in revenue.)
- Best digital market application (II): SPH Search (reader interaction and content integrated with newspaper sources, federated search.)
- Social computing: Accenture (internal search on people profiles and content)
- User engagement: Kakaku.com, Japan (700m pageviews, 18m unique users)
- User engagement: AutoTrader (peak query level of 1500 qps)
Partner awards:
- Digital market solution: Comperio (use of search for user interaction)
- Social computing solution: NewsGator (enterprise social computing on top of Sharepoint)
- User experience solutions: EMC Consulting
- Partner of the year: Hitachi consulting.
Posted by Espen at 7:08 PM | TrackBack
February 10, 2009
FASTForward 2009 – impressions from the second day
The second day has less of the "big picture" and more of product announcements and more technical detail. Here are some notes as the day progresses:
Kirk Koenigsbauer, Microsoft: Our enterprise search vision & roadmap
Kirk is responsible for the business side of FAST after the acquisition. He is speaking on Microsoft's commitment to search, the roadmap and future business directions, including pricing.
About 15% of the research done in MS Research is search-oriented. 10 years support on current FAST products, even non-MS platform.
Search server express now has more than 100,000 downloads. 1/3 of MS enterprise customers have deployed a MS search solution. Partner #s have doubled.
MS vision: Create experiences that combine the magic of software with the power of Internet services across a world of devices. Search is integral to vision.
Demo: Use of search in a business setting, showing documents in a viewer format, extracting keywords and concepts.
Announcing two new products:
- FAST for Sharepoint, which is FAST ESP integrated into Sharepoint, available at a substantially lower price than FAST ESP, typically 50% lower price. Simpler pricing model: Per-user charge for FAST ESP standalone, included in Sharepoint. Still need to buy a server at 25K a pop, but this is substantially lower price. Will be available from next rollout of Office (wave 14). Will also provide a licensing bridge for those who purchase Sharepoint now.
- FAST Search for Internet business. New functionality for interaction management (promotions, campaigns etc.), Content Integration Studio (graphical interface for managing content restructuring and content integration), and simplified licensing: Language pack and connectors will be part of the standard package.
Valentin Richter, Raytion: User engagement
Low satisfaction with many search solutions, and 70% of search managers do not study search logs with an eye to improve the experience. Went through a list of common myths about search (such as "people know what they are looking for".) People want simplicity - they cannot handle expressions and need more of a drill down approach navigating through related information. Installing search platforms immediately needs to a focus on information quality: You find duplicates, you find confidential documents everywhere, and so on - be ready for it both in a technical and organizational sense.
Walton Smith, Booz Allen Hamilton: Case study of use of FAST and Sharepoint
BAH based in Virginia, traditionally centralized, but expanding. 300 partners, all wanting to go in different directions. De facto collaboration tool was Outlook. Created a social computing platform called hello.bah.com. Among the results: Have given access to more esoteric material, which caused issues with indexing. Were able to pull new people from other parts of the organization on a project. Other application: staffing.bah.com, finding people with the right credentials and experience, pulling information from many sources. search.bah.com crawls hell and iShare. About 1/3 of the firm is now using the platform, lots of information on individuals.
Charlene Li: Transformation based on social technologies
It is all about engaging users in dialogue: H&R Block has a page on Facebook where they discuss tax issues - not trying to pull people in, at least not explicitly. Comcast is on Twitter with their customer service people. Starbucks testing ideas, such as automated purchasing based on a customer card. Beth Israel's CEO blogs about what it is like to run a hospital. Necessary to change search to include social software: Technorati searches blogs, del.icio.us allows social bookmarking. You can use Twitter mapping to see what people are discussing - showing that what is rated high somewhere may not be what is most discussed. Amazon now lets you filter reviews by friends.
Conclusion: Social networks will be like air, and will transform companies from the outside in. Social media is impacting search at multiple levels, refining results based on personalization details derived from their social circles.
Jørn Ellefsen, Comperio: In search of profits
Comperio has more than 100 customers and have created a front application, Comperio Front, that sits between the customer's web pages and their search engine. Introduced Drew Brunell who works with SEO for, among others, News International. Paid search is the growing part of the advertising market, everything else is either flat (display ads) or sinking (traditional ads). Doing a lot of experimentation linking into customer behavior - for instance, matching content with areas that see a lot of conmments, "invisible newspapers". Another notion is the "curated content model", setting up pages with a blend of original content with stuff from the outside web. Topic pages based on "zero-term search", offering editorial content put together automatically around. Stefan Sveen, CTO Comperio, demonstrated topic pages from Times Online: User and journalists can create their own topic pages, based on search results and mark entries coming in after the page is created.
Venkat Krishnamoorthy, Thomson Reuters: Delivering Contextual and Intelligent Information to Premium Customers
Reuters delivers context-sensitive information for pre-investment analysis to premiere customers. They have done this for a long time, but want to change from being a data-delivery company, but to integrate into the user's workflow. Challenges here included having too many applications the customers needed to stitch together, finding information was difficult, especially across different kinds of assets - more than 40 content databases. Solution: Put in a search and navigation layer between their desktop products (they have two, a web-based one and a premium, client-based one).
Posted by Espen at 6:15 PM | TrackBack
November 25, 2008
Self-diagnosis by search engine leads to cyberchondria
I love it. Here is the NY Times article, here is the Microsoft research paper.
Let's see: Search for "lack of ability to concentrate because of Bloglines"....
Posted by Espen at 1:06 PM | TrackBack
November 5, 2008
Liveblogging from Sophia Antipolis
This are my running notes from visiting Accenture's Technology Labs in Sophia Antipolis, as part of a Master of Management program called "Strategic Business Development and Innovation" for the Norwegian School of Management.
Accenture's Technology Labs is a relatively small organization: 200 researchers, 180000 employees in Accenture. There are four tech labs: Silicon Valley, Chicago (the largest), Sophia Antipolis, Bangalore, they should be able to do everything, but in practice there is specialization. The four main activities of the tech labs are technology visioning, research, development of specific platforms, and innovation workshops (with clients, press, consultants etc.) The themes pursued are mobility and sensors; analytics and insight; human interaction & performance; Systems Integration (architecture, development methods); and infrastructure (virtualization, cloud computing).
Kelly Dempski: Power Shift: Accenture Technology vision
The visioning used to be far-thinking, visionary etc., now have a much more immediate focus, want to look at things that you can implement today, make it much more "grounded in reality"
Eight critical trends:
- 1: Cloud computing and SaaS: Hardware cloud (amazon.com, IBM, Google (now the third largest producer of servers in the world)), desktop cloud (Google, Zimbra, MS Office Live Workspace), SaaS cloud (Netsuite, CrownPeak, salesforce.com), and services cloud (Google Checkout, Amazon web services, eBay, Yahoo)
- examples: Flextronics has changed over their HR applications to an SaaS model. AMD emulates chips on software for testing purposes, now contract with Sun to do that in the cloud. New York Times had 4Tb of articles that they wanted to translate to PDF: Translated it all twice (because there was a bug the first time), someone went on Amazon with their credit card, uploaded 4Tb, processed it (24h), there was a bug, had to do it again, 48h, total cost $250 on someone's credit card.
- issues:
- data location (where is the data)
- privacy and security
- performance
- 2: Systems - regular and lite
- SOA as the integration paradigm (regular), mashups (lite)
- traditional back-end apps vs. end-user apps
- small number of apps maintained by CIOs vs. large number of User and user-group created applications (long tail)
- examples:
- REST is a light architectural approach for interoperability & data extraction
- Mashups (JackMe (trading platform tools), Serena, Duet (SAP and Microsoft), IBM) becoming more important in the enterprise arena
- Widgets and gadgets are light-weight desktop UIs that continually update some data
- 3: Enterprise intelligence at scale
- combination of internet-scale computing, petabytes of data, and new algorithms
- almost all the large systems vendors have partnered with or acquired some analytics oriented software company (such as Microsoft acquiring FAST)
- rampant use of data: evolution through access, reporting, external & internal, unstructured etc.
- Trends 1-2-3 together: The new CIO
- hardware and software procured from the cloud
- business units, end-users create their own lightweight apps
- The new CIO:
- "Data Fort Commander" - ensure security, privacy, integrity of corporate data and manage back-end apps
- "Chief Intelligence Officer" - provide data analysis services & insights to business units
- 4: Continuous access
- mobile device "first class" IT object
- No concept of enterprise desktop/laptop
- location-based services
- 5: Social computing
- amplify and support the value of the community
- three major directions: Platformization, inter-operability, identity management
- 6: User-generated content
- community-based CRM (users making videos about how to run certain kinds of software or build something from IKEA)
- new forms of entertainment
- revenue erosion of traditional media companies
- this has marketing implications: You can measure the sentiment out there in the user community. You switch from advertising to engaging.
- 7: Industrialization of software development
- converging trends will increase integration: Predictive metrics, model-driven development, domain-specific languages, service-oriented architecture, agile-development & Forever Beta.
- 8: Green computing
- global warming, energy prices, consumer pressure, compliance and valuation
- switch out energy-intensive processes for information-intensive processes: Electronic collaboration; Warehousing, supply chain & logistics optimization; Smart factories, plants, buildings & homes; and new businesses such as carbon auditing and trading
Cyrille Bataller: Biometric Identity Management
Biometric identification is coming, driven by increasing demand and technological progress. Biometric identification is defined as "automated recognition of individuals based on their physiological and/or behavioral characteristics. Physiological can be face, iris, fingerprint; behavioral can be signature, voice, or walk. Involves a tradeoff, as with all security systems, between the level of security and the convenience of the system. Fingerprint is most used (38%), face is the most natural, iris the most accurate. Many others: Finger/hand vein, gait, ear shape, electricity, heat signature, hand geometry and so on...
Balance between FMR (false (positive identification) m rate) and FNMR, called equal error rate. Iris has an EER of .002%, 10 fingerprints .01%, fingerprint .4%, signature 3%, face recognition 6%, voice 8%. Many parameters in addition to this.
Securimetrix has something called HIIDE, a mobile unit that does a number of biometrics, used in Iran. Voice is very interesting because it can be done over the phone, interesting for call centers, banks etc. Multimodal important, because it is hard to spoof.
Airports is a good example of what you can do with proper identification: You can move 99.9% of the check-in away from the airport. Bag drop can also be almost fully automated. Portugal is the leader in the EU, have automated passport control with facial recognition (scan, use electronic passport etc.). Most people are not concerned very much with privacy given some assurance and convenience. Likely to see lost of automated border clearance for the masses, but also registered travelers that go through even quicker and are interoperable across many airports. One common misunderstanding is that automated identity checking is moving away from 100% accuracy, but human passport/security control is an error-ridden process and mostly automated processes are more accurate.
Antoine Caner: Next Generation Branch
This is a showcase exhibit of best practice banking technology and processes. This showroom has about 40 companies (banks, mostly) visits per year.
Most banks have a multi-channel strategy, have returned from a strategy of getting rid of branches but want to redefine it. Rather than doing low-value transactions, the branches are seen as a mesh network for business development.
Key principles behind the branch of the future:
- generating and taking advantage of the traffic
- flexibility throughout the day
- adaptation to client's value
- sell & service oriented
- modular space according
- entertaining and attractive
- focused on customer experience
Examples:
- turning the branch windows into an interactive display (realty, for instance)
- Bluetooth-enabled push information
- swipe card at entrance to let branch know you are there, let your account manager know, apply Amazon-like features
- digital displays for marketing
- avatar-based teller services
- biometric-based ATMs to allow for more advanced transactions, as well as more opportunistic sales applications
- do both identification and authentication
- digital pen user interface for capturing data from forms
- RFID-based or NFC (Near Field Communication) in brochures, swipe and get info on screen
- "interactive wall" for interaction with clients in information seeking mode
- visual tracking of movement in the branch
- modular office that can change shape during the day, reconfigurable furniture
What impressed me was not the individual applications per se - though they were impressive - but way everything had been put together, with a back-office application that can be used by the branch manager to track how this whole customer interface (i.e., the whole bank branch) works.
Alexandre Naressi: Emerging Web Technologies
Alexandre leads the rich Internet applications community of interest within Accenture. He started off giving some background on Web 2.0 and used Flickr as an example of a Web 2.0 application, where a company use user-generated content and tagging to get network effects on their side. Important here is not only the user interface but also having APIs that allow anyone to create applications and to have your content or services embedded into other platforms. Dimpls is another example. More than one billion people have Internet access, 50% of the world has broadband access, which allows for richer applications. Customers' behavior is changing - it is now a "read-write" web. It has also gotten so much cheaper to launch something: Excite cost $3m, JotSpot $200k, Digg cost $200.
Rich Internet Application and Social Software represent low-hanging fruit in this scenario. RIA allows the functionality of a fat client in a browser interface, with very rich and capable components for programmmers to play around with.
Two families of technologies: Jacascript/Ajax (doesn't require a plugin, advocated by Google), and three different plugin-based platforms: Silverlight (Microsoft), Flash/Flex from Adobe, and JavaFX from Sun. All of them have offline clients that can be downloaded as well. A good example is Searchme.com, which gives a better user interface - Accenture has developed something similar for their internal enterprisesearch.
Social Software: Accenture has its own internal version of Facebook. Youtube is also a possible corporate platform where people can contribute screencasts of all kinds of interesting demos and prototypes.
Kirsti Kierulf: Nordic Innovation Model for Accenture and Microsoft
Accenture and Microsoft collaborating (own a company, Avanade, together), and have set up an Innovation lab in Oslo called the Accenture Innovation Lab on Microsoft Enterprise Search. Three agendas: Network services, enterprise search (iAD), and service innovation. Running a number of innovation processes internally. This happens on a Nordic level, so collaboration is with academic institutions and companies all over.
Have made a number of tools to support innovation methodologies: InnovateIT, InnovoteIT, and InnomindIT (mind maps), as well as a method for making quick prototypes of systems and concepts for testing and experimentation: 6 weeks from idea to test.
Current innovation models are not working for long-term, risky projects. Closed models do not work - hence, looser, more informal and open innovation models with shorter innovation cycles. Pull people in, share costs throughout the network, Try to avoid the funnel which closes down projects with no clear business case and NIH. Try to park ideas rather than kill them.
Important: Ask for advice, stay in the question, maintain relationships, don't spend time on legalities and financials.
Posted by Espen at 11:14 AM | Comments (2) | TrackBack
October 24, 2008
CACM becomes much more readable
CACM (Communications of the ACM) is one of my favorite journals - and it is currently in the throes of an editorial upheaval that I think is very positive. In addition to scholarly articles, it is moving in the direction of essays and more generally accessible articles, without loosening the quality criteria. Ever since BYTE disappeared (a victim of the need for targeted advertising) I have missed a general, quite technical yet accessible journal - CACM is now getting closer to what I am looking for.
Here are two articles I found very interesting:
- "Will the Future of Software be Open Source?, a well reasoned reflection by Martin Campbell-Kelly, giving a very terse, yet comprehensive and useful description of the evolution of software markets. Answer: OS is a tempting conclusion if you extrapolate, but extrapolation has not been a very successful prediction technique so far...
- "Searching the Deep Web", by Alex Wright, which explores two different approaches to searching beyond static web pages - the trawling approach, which relies on local storage, and the angling approach, which produces targeted results in real time.
Posted by Espen at 3:12 PM | TrackBack
October 1, 2008
Search Google from 2001

Google has resurrected their oldest available index (from 2001) - fun to search for "blogging", "wikipedia", "social software", "web 2.0" and "facebook".
Posted by Espen at 4:54 AM | TrackBack
September 22, 2008
Small firm, large firm, we are all equal now
Hal Varian has a good post on the democratization of data over at the Google blog - in short, that small firms now can access information and analysis (including consultants) much like large firms can.
My interpretation: Information access is now close to free. What you now need is understanding. That takes people, and if you can access the smart ones in person as well as their explicated output, you will do well.
Posted by Espen at 10:26 AM | TrackBack
September 11, 2008
One danger of search-collected newspapers
United Airlines' share price dropped 76% when Google News erroneously picked up a six-year old story about UAL filing for bankruptcy and pushed it to the front page.
Not that this couldn't happen in any newspaper, but Google News is automatically generated. This opens for interesting possibilities in pump-and-dump....
Posted by Espen at 11:02 AM | TrackBack
September 4, 2008
Big data in Nature
Nature (the magazine) has an excellent special report on big data, with articles on analysis, history, data centers, and much more. Best of all, it is freely available - enjoy!
Posted by Espen at 7:28 PM | TrackBack
August 28, 2008
IAD center opening
Monday was exciting - not only was it the Fall workshop for the iAD Center for Research-based Innovation, but it was also the opening of the iAD Lab [Norwegian language story here] - a physical manifestation of the
research project, as well as an important tool for drawing the researchers from the five Oslo-based participants (FAST, Accenture, Schibsted, UiO and BI) closer together.
Myself, I plan to spend at least one day per week in the lab - there is nothing like physical proximity to get to know an organization and a field, notwithstanding all the communications capabilities, electronic and otherwise, we surround ourselves with.
The lab itself, incidentally, is just six workspaces, a few computers and access cards for researchers. Gone are the days when the opening of a computing center was photogenic, with blinking lights and spinning tape decks. But it will enable us to store sensitive data in a secure environment, have enough horsepower to really analyze them, and provide a natural focal point for demonstrations, prototypes and experiments.
Posted by Espen at 10:32 AM | TrackBack
July 11, 2008
Serendipity, researchwise
Mary B. has this account of finding interesting material bound with another book from the library - and then discovering that all the stuff was available through Google Booksearch. Which raises the point - how to we make the serendipity often found in research (go into any library and look at the books next to the one you are looking for) in an electronic context?
Online newspapers (as well as domain squatters) face this challenge every day - not just serving what the customer wants, but also something they didn't know they wanted, often sufficiently similar that it may be, if not a substitute, at least a diversion.
Perhaps Google should have a new subcategory on their result screen - an appropriately random link under the heading of "and now, for something completely different..."
Posted by Espen at 2:56 PM | TrackBack
July 2, 2008
Google and network externalities
Here is a bunch of links about Google that I have had lying around for a while - trying to think about the first one and to what extent Hal Varian is right about Google not having a network externality competitive advantage. I think he is wrong, but why is hard to articulate.
So, here goes (note that Google, rather nicely, includes a list of links to each blog post, which is fodder for further discussion):
- Hal Varian: Our secret sauce, arguing that Google's competitive advantage is due to experience and innovation, not network externalities.
- Tom Evslin: Sitemaps and how the rich get richer: Essentially, Google has an advantage because they are the biggest and people adjust their web sites to the Google engine and its various algorithmic quirks.
- Hal Varian: Why data matters. Brief overview of search and PageRank.
- Hal Varian: How auctions set ad prices. Brief explanation of Google's auction system for ads. One interesting effect, not mentioned here, is that the more precisely the user can describe the targeted population, the lower the ad price - thus, Google has both an incentive to make targeting imprecise (to have enough actors competing for a particular keyword/target) and an incentive to make it precise (to increase click rates).
- Marissa Mayer: A peek into our search factory. Various presentations, with notes, about the infrastructure underlying Google's various offerings.
- Udi Manber: Introduction to Google search quality. Overview of what Google does to fight spam, increase precision, and other things. (Reads like a transcript of a talk.)
Here are two articles that everyone trying to understand Google should read (come to think of it, this blog post is starting to resemble the layout for a class):
- Brin, S. and L. Page (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International WWW Conference, Brisbane, Australia. (The classic on PageRank.)
- Ghemawat, S., H. Gobioff, et al. (2003). The Google File System. ACM Symposium on Operating Systems Principles, ACM. Description of the architecture of Google's index, a file system geared for few writes and very many reads, redundancy, and low response time. PDF here.
Posted by Espen at 10:21 AM | TrackBack
June 3, 2008
retrogoogling
Here is goosh.org, a UNIX-like interface to Google. I like it - wonderful how a sparse interface can improve productivity. It is almost so I start to long back to the days of Desqview and all those other text-based multitasking hacks of the 90s. (Mind you, this is just an interface, no full Unix shell.)
(Via David Weinberger.)
Posted by Espen at 9:38 AM | TrackBack
March 28, 2008
Email vs Wiki
This picture really says it all:

(From Chris Rasmussen via Anthony Williams. Apologies for repeat to my BSG Alliance colleagues, but this one is definitely one for a wide audience.)
Posted by Espen at 4:43 PM | TrackBack
March 27, 2008
Google as Baedecker
The Honourable Mr Whimsley has the skinny on Google, liking the service to a path guide that conserves the paths through its very existence.
Whimsley Hall to the blogroll, instantly.
(Via Nick Carr.)
Posted by Espen at 10:45 AM | TrackBack
March 14, 2008
iAD Master students wanted!
(This message is meant for students at the Norwegian School of Management, but I am posting it here for distribution - and if someone from another institution should be interested, by all means, get in touch.)
The Center for Technology Strategy is seeking M.Sc. students who are looking for interesting topics for their thesis, offering the opportunity to write their thesis under the iAD research project. This project is a joint project of FAST, Accenture, Schibsted and six universities, among them NSM. The purpose of NSM's part of the project is to understand the business impact of search technologies and other new technologies for information access.
This opportunity is open to all Master students, at any specialty, and would involve finding a research topic connected to the iAD project. (See a list of proposed topics here, but feel free to come up with your own.) The topic definition will happen in collaboration with faculty from your M.Sc. specialty. Thesis advisor will be either your own faculty, one of the faculty associated with the iAD project (Espen Andersen, Ingunn Myrtveit, Erik Stensrud, Torger Reve), or possibly an advisor from FAST, Accenture or Schibsted, as appropriate.
We are planning an information meeting on
April 2, at 0900-1030 at room C2-040, BI Nydalen
If you are interested, please send me an email so I can know how many will be there.
Updated March 25: A list of some suggested topics can be found here.
Posted by Espen at 11:34 AM | TrackBack
March 13, 2008
Turtlenecks will come back, I predict
...if this technology ever moves from lab to street:
(The reason I am skeptical, is that this is kind of similar to voice recognition and other language processing technologies, which have not had a profound impact on our daily lives yet. Though they certainly have affected standard transactions as well as communication analysis.)
Via Marc Andreesen.
Posted by Espen at 1:17 PM | TrackBack
March 12, 2008
Search me
Searchme is an interesting new search engine with a very smart way of displaying results. Does require broadband, though.(Via John Batelle.)
Posted by Espen at 9:15 AM | TrackBack
February 19, 2008
Don Tapscott on Wikinomics
(Second installment in a series of Notes from FastForward 2008)
Don Tapscott: Wikinomics – setting the stage
Don started by saying that this is not new: Time’s Person of The Year was you, and that is soooo 2006. Mass collaboration changes everything. Buy my book! Now, seriously…
Companies are becoming more professional and peer-oriented, less hierarchical, more meritocratic. This is not new either – Paradigm Shift said this in 1991, and Peter Drucker has said it for a long time before that. Why is it taking so long? The drivers have been missing, but are here now:
Four drivers of change:
1: Technology, particularly 2.0 technologies: Things talking to each other – one friend has his sprinkler system couple to his intrusion system, in case a burglar jumps over the fence. In the new world, you browse the physical world. GPS allows not just positioning, but movement. True multimedia changes what a film is. New web based on XML, the web is becoming a global computational platform. In some ways, search becomes the new operating system, But legacy systems exist and the integration problem will not go away quickly.
2: The net generation: We have this generation that are not afraid of technology because for them, it has always been ubiquitous. We have had boom, bust and echo in demographics, but the echo is larger than the boom – in Asia and South America have tsunami coming along. These kids multitask, don’t use the TV, they are very active with collaborative technology, games and search. Their synaptic connections are actually different, since they have had this during their formative years. They use email technology to send a formal letter of thanks to a friend’s parents.
3: A social revolution: The rise of collaborate communities. XML has overtaken HTML: Flickr beats Kodak, YouTube beats MTV. MySpace has 15,000 bands….. His son created a Facebook group on Wikinomics that exploded and is now placing demands on him….
4: An economic revolution: You are getting new companies: Digital conglomerates. Google is the fourth largest broker of hardware in the United States. Microsoft, Yahoo, Google, amazon.com, ebay – these are not some blips. Coase: Transaction costs is really cost of coordination and contracting. From industrial companies to extended enterprises to business webs, and now we will have mass collaboration. Example: Goldcorp, a mining company ready to be shut down, because the geologists cannot find gold. So they put their geological data on the Internet, hold a competition on the internet, $500,000 prize money, 75 submissions find $3.6b worth of gold. Many of the best submissions came from people who where not geologists.
How do you harness mass collaboration? 7 things:
- Peer pioneers: We are smarter than me, a book written by 1500 people. Spikesource: Tests open source software, certifies it, support it. Marketocracy.com investment fund, zopa.com peer lending.
- Ideagoras: Creating an eBay for innovation. P&G looking for a molecule that will take red wine off a shirt, innocentive.com. Crowdsourcing.
- Prosumers. Turning your consumers into producers. 99% of Linden Labs product (Second Life) is done by its users. The record industry is the poster child for not understanding this. The final chapter of Wikinomics is a wiki…want to be the context provider for the definitive guide to the next century business.
- The new Alexandrians. The sharing of science. The Human Genome has transformed bioscience. Tracking Avian flu through mashups. The alliance for climate protection. Killer app of wikinomics may be saving the planet.
- Open platforms. Amazon.com – open platform from innovation. 1/3 of revenues from API.
- The global plant floor. 787 is a peer produced airplane, with their suppliers. Suppliers co-design airplanes scratch and deliver compelte subassemblies. The Chinese motorcycle industry is run by small companies that meet in tea houses, collaborate, now has 1/3 of all motorcycle production. Next year: 1500 dollar car from China.
- The Wiki workplace: Geek squad (20,000) design products for geek squad.
“New paradigms cause dislocation, conflict, confusion, uncertainty. New paradigms are nearly always received with coolness, even mockery or hostility. Those with vested interests fight the change. The shift demands such a different view of things that established leaders are often last to be won over, if at all.” (Marilyn Ferguson).
Saint-Exupery: We should welcome the future because it will soon be the past.
We should respect the past because it was once all that was humanly possible.
Posted by Espen at 4:05 AM | TrackBack
Andy MacAfee on Enterprise 2.0 success factors
(First installment in a series of Notes from FastForward 2008)Andy MacAfee: Enterprise 2.0: What will it take to bring about a world of change
MacAfee talked about what it takes to bring about change – Enterprise 2.0 (corporate use of Web 2.0, as I see it) has moved from the what through the why to the how. He looked into some of the factors that seem to be connected with success, grouped into technology, initiatives and culture.
Technology must have intuitive and easy tools (meaning that it needs to work with email, for one thing), the tools must be egalitarian and freeform, the borders must seem appropriate to users (meaning that you need some borders and confined spaces), at least some of the tools must be explicitly social, and the toolset must be quickly standardized.
The most difficult part lies in the intuitiveness – avoid feature creep! The egalitarianism and the freeform part has more to do with bosses than with technology. Bosses are not comfortable with letting loose the process definition part – they need to work hard to get out of the way, at least initially.
Initiatives usually involves incentives – they exist, and they should be soft. Not just T-shirts and nerf toys, but not much more, and not monetary. Goals need to be clear and explained – being interested in Enterprise 2.0 is not good in itself. Many companies don’t have a goal – the US Intelligence community is an example of an organization that has one. Most important: You need incentives; having evangelists, and having official and unofficial support from the top. You also need excellent gardeners, bottom-up energy and activity, and clear and explained goals. The CEO Blog is a good thing – Marriott has one, dictates it and it is not created by the PR team.
Most difficult: Getting the incentives right, and getting the excellent gardeners – people that accelerate the emergence of structure in wiki environments. In any population there are not enough of them.Culture: Some important issues are that people should be trusted, there should be slack in the workweek, helpfulness has been a norm, top management accepts lateralization (turns out it is very hard for companies to accept even light user commentary, for fear that it might be negative, even though all statistics show that it it is very powerful – most of it is going to be positive, and the negative comments make the positive ones more valid), there are lots of young people, and there is pent-up demand for better sharing. Most important: trust, lateralization, and pent-up demand for sharing.
Most difficult: Trust, slack in workweek, and top management accepting lateralization. You need spare cycles!
Conclusion: enterprise 2.0 is going to increase differences among companies – technology accentuates differences, and this one will. The data is accumulating. The reason lies in willingness to embark, sincerity of effort, and ability to execute. These differences will matter – it will not be the end of the hierarchy, but it will help companies become more responsive, help capturing and sharing knowledge (particularly as the demographic bulge is leaving the workforce) and then there is this vague notion of collective intelligence. Groups and committees, geographically dispersed, can do spectacularly valuable things with this technology.
Posted by Espen at 4:01 AM | TrackBack
February 16, 2008
Clunky does it
Seth Godin gives examples of how winning web sites often are not those that win design awards, unless you define ”bad design” as ”does not work”.
I am not sure this is a real trend, but here is another example: vg.no. This Norwegian newspaper has a website that breaks all possible criteria for good design: It is seemingly disorganized (there is not thematic order to the articles), has colorful images and distracting images all over, is very long, and is manually put together. And it is wildly successful: VG is Norway’s largest newspaper*, and vg.no has more readers than the paper paper.
Vg.no is also different in that only 5% of the material at the web site comes from the paper version. The managing editor of vg.no, Torry Pedersen, has so far resisted any integration with the paper version tooth and nail – something the very successful media house Schibsted gives him, not least because his profitability levels have consistently been over 40% and he has taken more than a quarter of all news and entertainment traffic in Norway.
I used to think that search would take over newspapers, but Torry begs to differ: Only 10% of his readers come through search engines – the rest arrive in the front doors, looking at the lively, entertaining and rather chaotic front page as a gateway to something interesting, something newsworthy, a break in a hectic or slow day.
In other words, there are more than one way to skin a cat, or, in this case, to bring newspapers to the web. Torry’s way should be something to ponder for traditional papers such as the New York Times, with their rather austere and self-important designs. The Atlantic has an interesting front page, but the content does not change often enough to make it a frequent stopping place. As for the rest – look out for Google News...
*On a personal note: I never read it myself, since it is decidedly tabloid in nature. The Internet version, though, is subtly different.
Posted by Espen at 11:53 AM | Comments (2) | TrackBack
December 3, 2007
Search course links
This Berkeley course has lots of interesting material, including podcasts.Posted by Espen at 2:37 PM | TrackBack
September 8, 2006
Causality and Zipf's Law
Chris Anderson has an interesting post about Zipf's law, which posits that the frequency distribution of words in the English language follows a power law. He shows that if you set up a process that generates random sets of characters, you end up with the same distribution.
I am wondering if we aren't putting the cart before the horse here - might it not be the case that the words we use more often have become shorter, precisely because we use them more often? If language evolves over time with an aim to increase understanding and reduce bandwidth consumption, this is what we would expect.
The words "mama" and "papa" are common throughout many languages because when a baby starts babbling, that is what he or she will say first. So, we made words out of babble, representing what proud parents would want them to represent. Similarly, we reserve the shortest words (single vowels, diphthongs, or combinations of one vowel and one consonant) for the concepts we need most frequently.
Saves bandwidth. Just ask any kid with an SMS thumb.
Posted by Espen at 2:50 PM | Comments (1) | TrackBack
September 6, 2006
Forking Wikipedia?
Nick Carr sees no reconciliation between "deletionists" and "inclusionists" over how Wikipedia should continue to evolve.
Wikipedia was originally started to generate content for a more traditional encyclopædia, called Nupedia. It seems like it worked according to plan. Perhaps it is time to generate Nupedia.
For my part, I remain a "delusionist" a little longer, betting on people's ability to vet out incomprehensive or incorrect information. It seems to me that people deal with information differently when they are in search mode - and that what Wikipedia needs is some sort of disclaimer to alert people that, though it may have a very high Googlerank, anyone can write and the vetting process taking place is the one done by those who read it before you. Given simple and powerful search, however, the process of validation should be quick and simple. I can live with that.
Posted by Espen at 8:48 AM | TrackBack
June 16, 2006
Googlecontext
There has been much speculation following the NYTimes report about Google's amazing new data center, located near a large river primarily because it needs a lot of power. Why do they want all that storage and processing power?
One interesting idea from Ian Betteridge: To learn to model context. Sounds plausible to me, though I think a reasonable model of how we think has much wider applicability than merely getting the right ads in front of you at the right time.
Posted by Espen at 9:54 AM | Comments (1) | TrackBack
