Tuesday, August 23, 2016

Robin: a car concierge

Almost four years ago, when we were working in the Smart Living Room project, I noticed a start-up called Robin Labs and asked: How different is a car assistant from a living room assistant?

(From TechCrunch in 2012
http://techcrunch.com/2012/09/19/magnifis-debuts-an-upgraded-robin-the-kitt-like-android-virtual-assistant-app-for-drivers/)


Now, I recently saw a blog post from Robin Labs  that says something very sensible, that the husband has been saying for a while.

Their blog post (excerpt below) describes four types of `bots':

  1. App-bots - that sounds like an apt name for those micro-apps dressed up as messenger contacts, typically addressing long-tail use cases such as ordering pizza or checking flight schedules - needs that could as well be met with a native app (assuming you managed to get people to actually download one). More importantly, these use cases are not necessarily conversational by nature. [..] they are often better off with standard visual UI element such as menus or buttons. Unless, of course, they rely on voice for input - then, see (4). Bottom line, app-bots are more apps than bots, in the traditional sense of the word. 
  2. Content bots - such as Forbes or CNN bot, for instance. These guys are really content distribution channels, they are all about push and are hardly ever conversational, but can sometimes support basic keyword search. In theory, a dialogue-driven newsbot could make an interesting product, but nobody has really nailed it yet. 
  3. Chatbots - i.e., genuine "chat bots", where the chat medium is in fact key to the experience, namely, where verbal communication actually helps get the job done. One popular use case is of course, customer service, which may very well be the killer app for chatbots. But, beyond run-of-the-mill customer support, we are seeing a surge in conversational concierge bots: from transaction-oriented services such as travel agents, to more casual assistance such as movie recommendations, to virtual friends, etc. Notice that, in principle, chatbots can be powered by either human agents or machines (or both).  Naturally, the trend is to eliminate or at least minimize the reliance on humans - to make the service both more responsive and more scalable. But, even when striving for a fully automated chatbot, one should not completely rule out a hybrid human-in-the-loop approach.
  4. Voice assistants - such as Amazon Echo, our Robin app, etc. - are essentially chatbots that use voice as the main/only communication channel, becoming very handy e.g., in the living room, in the car and other hands-free scenarios. Due to their reliance on voice, these bots have the highest conversational fluency bar of all other categories. As a result, they are the hardest to build, but can be genuinely useful when typing is not a good option - as evidenced by Amazon Echo's popularity. When the experience works, it does feel like the holy grail! 
Well, I wouldn't write it exactly like so, but totally agree that open ended conversation is very different from a bot that is supposed to help you solve a particular problem...

Anyways, they also have an awesome picture of Daleks, reproduced here for your delight.



Thursday, August 18, 2016

Lewis and the mysteries of A*

Some three weeks ago we had the pleasure of a visit by Mike Lewis, from Washington University, originally a student of Mark Steedman in Edinburgh.

He came to Nuance and talked about his A* super efficient parsing system that he talked about at ACL in San Diego. I really wanted him to talk about his older work with Mark, on Combined Distributional and Logical Semantics Transactions of the Association for Computational Linguistics, 2013, but if someone is nice enough to come and talk to you, they may choose whatever they want to talk about. at least in my books.

And besides people in the Lab were super interested in Mike's new work. Mike is a great speaker, one of those that give you the impression that you are really understanding everything he said. Very impressive indeed! Especially if you considered how little  I know about parsing or LSTM (long short-term memory) methods. But the parser is publicly released, everyone can find it in GitHub

 There's even a recorded talk of the presentation I wanted to hear, Combined Distributional and Logical Semantics, so altogether it was a splendid visit. When discussing other work in their paper, Mike and Mark say about our Bridge system: 

'Others attempted to build computational models of linguistic theories based on formal compositional semantics, such as the CCG-based Boxer (Bos, 2008) and the LFG- based XLE (Bobrow et al., 2007). Such approaches convert parser output into formal semantic representations and have demonstrated some ability to model complex phenomena such as negation. For lexical semantics, they typically compile lexical resources such as VerbNet and WordNet into inference rules—but still achieve only low recall on open-domain tasks, such as RTE, mostly due to the low coverage of such resources.' 

I guess I agree that the resources we managed to gather didn't have the coverage we needed. More, other resources like those, are still needed. We need bigger, more complete, more encompassing "Unified Lexica" for different phenomena. and more, many more languages. But I stop now with a very impressive slide from Mike's presentation.



Wednesday, August 17, 2016

Feferman's Farewell

 I was super sad to hear that we lost Professor Sol Feferman on July 26th, 2016. This week WOLLIC is happening in Puebla and Ruy asked me if I wanted to say a few words about Sol in a special session due to happen today in his honour.

I knew I would be busy at the time of the session, as seminars in Nuance Sunnyvale are on Wednesdays at  11 am, so I said I couldn't do it. Ruy then suggested  recording a tribute, so I decided to try it.

 I looked through many emails to, from and about Sol. and I looked at papers and reports and I managed to write a short text. Not as short as I wanted it to be.  when  recorded it it came to 12 minutes, instead of between 5 and 10 minutes that I had aimed for.  I even managed to get to grips with quickmovie (ok the only thing you need to discover is where the button to record something is...) and I recorded my message. Only to send it and discover that the programme had been changed at the last minute and the session in Sol's honor had already happened. oh well.

Here's my tribute to Sol and  Anita Feferman.  Grisha Mints and Bill Craig also show up a little. We're definitely getting poorer!

Semantics: Distributional and Compositional. Dudes and PROPS

(I haven't posted any thing in a long while, the stuff is accumulating in a hazardous way. Today we had Gabi Stanovsky visiting and his talk was great, and it reminded me of posting this.)

There is by now a great deal of literature on the deep problem of unifying distributional semantics (in terms of vectors and cosine distances) and logical or compositional semantics (in terms of negation, conjunction, disjunction, implication, etc.) Because it is an interesting and very topical problem (several of the people involved have sold multi-million dollar companies, for example) several groups have tried to crack the problem, with different theories.

The vision paper, explaining why we need "distributional semantics" as well as "logical semantics" is Combining Symbolic and Distributional Models of Meaning,   by Pulman and Clark. only 4 pages and well worth reading!

Then I  made a list of a few other papers that caught my attention and that might indicate a way forward for what I want to do.  My list:
1. Combined Distributional and Logical Semantics, Lewis and Steedman, 2013.
2. Transforming Dependency Structures to Logical Forms for Semantic Parsing, Reddy et al, 2016.
3. Flexible Semantic Composition with DUDES, Cimiano, 2009.
4. Getting More Out Of Syntax with PROPS, Stanovsky et al, in arXiv on 4 March 2016.

These two last papers form a side trip from the main concern of merging distributional semantics and logical semantics, but are still about meanings. The DUDES is fairly short, old (2009) and the author seems to be more concerned with lexical resources nowadays. The PROPS paper is longer and seems much more useful to my goals. (also, isn't props a great name?)

The basic  ideas of the paper  seem to be:

1. NLP applications often rely on dependency trees to recognize major elements of the proposition structure of sentences.
2. many phenomena are not easily read out of dependency trees, often leading to ad-hoc heuristic post-processing or  information loss.
3. they suggest  PROPS – an output representation designed to explicitly and uniformly express much of the proposition structure which is implied from syntax.
4. they also provide an associated tool for extracting it from dependency trees (yay!!). 

(Project page at PropS -- Syntax Based Proposition Extraction, with online demo.
code in GitHub (GitHub - gabrielStanovsky/props: PropS offers an output representation designed to explicitly and uniformly express much… ) requires python and java 7.

Their desiderata:
a. uniformly represent propositions headed by different types of predicates, verbal or not.
b. canonicalize different syntactic constructions that correspond to the same proposition structure
c. decouple independent propositions while clearly marking proposition boundaries
d. "mask" non-core syntactic detail, yielding cleaner compact structures.
e. enable simple access to the represented propositions by a uniform graph traversal.

Their design principles: 
a.  Want to mask non-core syntactic detail:  
    - remove auxiliary words and instead encode their syntactic function as features; 
    - group atomic units (such as noun compounds) within a single node
b. Represent propositions in a uniform manner (verbal and adjectival)
c. Canonicalize and differentiate syntactic constructions: 
   - Unify the representation of propositions which are semantically equivalent;
    - Differentiate syntactically- similar, yet semantically-different, constructions.
d. Mark proposition boundaries
e. Propagate Relations: every relation which is inferable through parse tree traversal (for instance, through conjunctions) should be explicitly marked in the representation. 

Their output format:
1. similar to dependencies, BUT
2. Typed nodes: (1) Predicates, which evoke a proposition and 
   (2) Non-predicates, which can be either arguments or modifiers.
3. simplify the graph structure by allowing multi-word nodes (e.g., Barack Obama), versus having each node corresponding to a single word in dependency trees.
4. resulting structures are no longer limited to trees, but are DAGS.
5. a label set of 14 relations (compared with approximately 50 in Stanford dependencies) 

I need to check how Bridge/XLE deals with the pair: The director who edited ‘Rear Window’ released Psycho” and
Hitchcock, who edited ‘Rear Window’, released Psycho”. Need also to check and mark what  they call raising verbs?
They say [...]``we heuristically use a set of approximately 30 verbs which were found by (ChrupaƂa and van Genabith, 2007) to frequently occur in raising constructions. For these verbs do not produce a proposition." Seems sensible to me and I don't think we did this in Bridge.


Evaluation:
MCTest corpus for machine comprehension (Richardson et al., 2013), composed of 500 short stories, each followed by 4 multiple choice questions. The MCTest comprehension task does not require extensive world knowledge. Focus on questions which are marked in the corpus as answerable from a single sentence in the story (905 questions followed by 3620 candidate answers). Richardson et al (2013) introduce a lexical matching algorithm, which they adapt to use either dependency or PROPS structures, both obtained using the Berkeley parser. (numbers show the progression expected, but still low).