Wednesday, October 12, 2016

Papers about our Portuguese WordNet

We need to add a proper bibliography to the webpage of our project OpenWordNet-PT.

Here are some things that need to be in there. I guess we should have one list for the work on the resource itself and one for applications? I don't know quite yet.

  1. de Paiva, Valeria, Fabricio Chalub, Livy Real, and Alexandre Rademaker. 2016. “Making Virtue Of Necessity: a Verb Lexicon.” In PROPOR – International Conference On the Computational Processing of Portuguese. Tomar, Portugal.Details
  2. Chalub, Fabricio, Livy Real, Alexandre Rademaker, and Valeria de Paiva. 2016. “Semantic Links For Portuguese.” In 10th Edition Of Its Language Resources and Evaluation Conference (LREC). Portoroz, Slovenia.Details
  3. Real, Livy and Valeria de Paiva, "Plurality in Wordnets". In Proceedings of the LexSem+Logics Workshop 2016,  arXiv preprint arXiv:1608.04767, Tomar, Portugal.
  4. de Paiva, Valeria and Livy Real,  "Universal POS Tagging for Portuguese: Issues and Opportunities". In Proceedings of the LexSem+Logics Workshop 2016arXiv preprint arXiv:1608.04767, Tomar, Portugal.
  5. Real, Livy, Valeria de Paiva, Fabricio Chalub, and Alexandre Rademaker. 2016. “Gentle With Gentilics.” In Joint Second Workshop On Language and Ontologies (LangOnto2) and Terminology and Knowledge Structures (TermiKS) (Co-Located with LREC 2016). Slovenia.Details
  6. de Paiva, Valeria, Livy Real, Hugo Gonçalo Oliveira, Alexandre Rademaker, Cláudia Freitas, and Alberto Simões. 2016. “An Overview of Portuguese WordNets.” In Global Wordnet Conference 2016. Bucharest, Romenia.Details
  7. Real, Livy, Fabricio Chalub, Valeria de Paiva, Claudia Freitas, and Alexandre Rademaker. 2015. “Seeing Is Correcting: Curating Lexical Resources Using Social Interfaces.” In Proceedings Of 53rd Annual Meeting of The Association for Computational Linguistics and The 7th International Joint Conference on Natural Language Processing of Asian Federation of Natural Language Processing - Fourth Workshop on Linked Data in Linguistics: Resources and Applications (LDL 2015). Beijing, China.Details
  8. Rademaker, Alexandre, Dário Augusto Borges Oliveira, Valeria de Paiva, Suemi Higuchi, Asla Medeiros e Sá, and Moacyr Alvim. 2015. “A Linked Open Data Architecture for the Historical Archives of the Getulio Vargas Foundation.” International Journal On Digital Libraries 15 (2-4). Springer Berlin Heidelberg: 153–67. doi:10.1007/s00799-015-0147-1.Details
  9. Oliveira, Hugo Gonçalo, Valeria de Paiva, Cláudia Freitas, Alexandre Rademaker, Livy Real, and Alberto Simões. 2015. “As Wordnets Do Português.” Oslo Studies In Language 7 (1): 397–424.Details
  10. de Paiva, Valeria, Dário Oliveira, Suemi Higuchi, Alexandre Rademaker, and Gerard De Melo. 2014. “Exploratory Information Extraction From a Historical Dictionary.” In IEEE 10th International Conference On e-Science (e-Science), 2:11–18. IEEE. doi:
  11. Real, Livy, Valeria de Paiva, and Alexandre Rademaker. 2014. “Extending NomLex-PT Using AnCora-Nom.” In Proceedings Of Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish (ToRPorEsp), edited by Laura Alonso Alemany, Muntsa Padró, Alexandre Rademaker, and Aline Villavicencio. São Carlos, Brazil: Biblioteca Digital Brasileira de Computação, UFMG, Brazil.
  12. de Paiva, Valeria, Cláudia Freitas, Livy Real, and Alexandre Rademaker. 2014. “Improving The Verb Lexicon of OpenWordnet-PT.” In Proceedings Of Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish (ToRPorEsp), edited by Laura Alonso Alemany, Muntsa Padró, Alexandre Rademaker, and Aline Villavicencio. São Carlos, Brazil: Biblioteca Digital Brasileira de Computação, UFMG, Brazil.       

  13. Freitas, Cláudia, Valeria de Paiva, Alexandre Rademaker, Gerard de Melo, Livy Real, and Anne de Araujo Correia da Silva. 2014. “Extending a Lexicon Of Portuguese Nominalizations with Data from Corpora.” In Computational Processing Of the Portuguese Language, 11th International Conference, PROPOR 2014, edited by Jorge Baptista, Nuno Mamede, Sara Candeias, Ivandré Paraboni, Thiago A. S. Pardo, and Maria das Graças Volpe Nunes. São Carlos, Brazil: Springer.Details
  14. de Paiva, Valeria, Livy Real, Alexandre Rademaker, and Gerard de Melo. 26ADAD. “NomLex-PT: A Lexicon Of Portuguese Nominalizations.” In Proceedings Of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), edited by Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis. Reykjavik, Iceland: European Language Resources Association (ELRA).Details
  15. Real, Livy, Alexandre Rademaker, Valeria de Paiva, and Gerard de Melo. 2014. “Embedding NomLex-BR Nominalizations into OpenWordnet-PT.” In Proceedings Of the 7th Global WordNet Conference, edited by Heili Orav, Christiane Fellbaum, and Piek Vossen, 378–82. Tartu, Estonia.
  16. Rademaker, Alexandre, Valeria de Paiva, Gerard de Melo, Livy Real, and Maira Gatti. 2014. “OpenWordNet-PT: A Project Report.” In Proceedings Of the 7th Global WordNet Conference, edited by Heili Orav, Christiane Fellbaum, and Piek Vossen. Tartu, Estonia.
  17. de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. 2012. “OpenWordNet-PT: An Open Brazilian Wordnet For Reasoning.” In Proceedings Of COLING 2012: Demonstration Papers, 353–60. Mumbai, India: The COLING 2012 Organizing Committee.
  18. de Paiva, Valeria, and Alexandre Rademaker. 2012. “Revisiting a Brazilian WordNet.” In Proceedings of Global Wordnet Conference. Matsue: Global Wordnet Association.Details

Some Workshops we organized:

1.  Logics and Ontologies for Portuguese

November, 21-25th 2011, FGV, Rio de Janeiro.

2. Workshop on Logics and Ontologies for Natural Language (LogOnto)

September 22, 2014, Associated with FOIS2014, Rio de Janeiro.

3. Third Workshop on Logics and Ontologies, 

 First Workshop on Lexical Semantics for Lesser-Resourced Languages

LexSem+Logics 2016

13th July, 2016,Tomar, Portugal.


Tuesday, October 11, 2016

Ada Lovelace Day 2016

This year I am honouring Manuela Sobral in my Ada Lovelace Day post.
(phew I almost missed it again!..)

 Just in case you're new to the idea of Ada Lovelace's Day, every one who blogs (even eventually) should post something on 11th October about the achievements of women in Science or Maths. 
Now there's plenty of women doing Category Theory in Portugal and Manuela is partially to `blame' for it. Coimbra University (a beautiful place) has always been very welcoming to women mathematicians, especially category theorists. 

The picture of Manuela by  Juergen Koslovski  (thanks!!) is from the meeting

Categorical Methods in Algebra and Topology

for her 70th birthday in 2014!

I believe I've met met Manuela in Montreal at Category Theory 1991, where the picture below was taken. Maria Manual Clementino, who organized the  meeting for Manuela is also in the picture, with a  young looking me.

Wednesday, September 7, 2016

Finding the wood from the Trees

I want to discuss the  work of three young guys (two are still phd students) that I'm very excited about.

First there's Mike Lewis (who talked at Nuance recently) about  his A* parsing algorithm. His webpage is A very clear talk, a very nice guy.

I am  more interested in his old work on SRL(Semantic Role Labelling), but people at work were excited about his A* parsing, which seems extremely good, so he talked most about it.
There's an online demo of his EasySRL in EasySRL Parser Demo.

Then  I am also  excited about Gabriel Stanovsky's work described in
Gabi, who's interning this Summer at IBM Almaden, also has a DEMO online at (I  asked if he would like to talk to us about his phd work with Ido Dagan, and  he did. Talking to him was really fun!). His webpage is Gabriel Stanovsky and the project page is PropS -- Syntax Based Proposition Extraction .

Lastly, just because he was the last to come to Sunnyvale,  I am also very excited about Siva Reddy's work in (There is a demo here DepLambda Demo.)
His paper appeared in NAACL2016 and his Github has rules, but no real code, yet.

The paper is  Siva Reddy, Oscar Täckström, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, Mirella Lapata (2016). Transforming Dependency Structures to Logical Forms for Semantic Parsing. Transactions of the Association for Computational Linguistics, vol 4. 

Siva also talked at Nuance and talking to him was great, lots of interesting ideas and plenty of energy.
Three very interesting and  useful frameworks, very much in the directions I'm interested in pursuing. Three very bright and very nice guys. 

How to grab the best of these frameworks and make it work for me,  producing logical forms in a logic of contexts and concepts as the one in (Valeria de Paiva. Contexts for Quantification. 29 May 2013. Proceedings of the 11th International Symposium on Logical Formalizations of Commonsense Reasoning, Ayia Napa, Cyprus, 27-29 May 2013. [PDF])  is the question for me right now.

Tuesday, August 23, 2016

Robin: a car concierge

Almost four years ago, when we were working in the Smart Living Room project, I noticed a start-up called Robin Labs and asked: How different is a car assistant from a living room assistant?

(From TechCrunch in 2012

Now, I recently saw a blog post from Robin Labs  that says something very sensible, that the husband has been saying for a while.

Their blog post (excerpt below) describes four types of `bots':

  1. App-bots - that sounds like an apt name for those micro-apps dressed up as messenger contacts, typically addressing long-tail use cases such as ordering pizza or checking flight schedules - needs that could as well be met with a native app (assuming you managed to get people to actually download one). More importantly, these use cases are not necessarily conversational by nature. [..] they are often better off with standard visual UI element such as menus or buttons. Unless, of course, they rely on voice for input - then, see (4). Bottom line, app-bots are more apps than bots, in the traditional sense of the word. 
  2. Content bots - such as Forbes or CNN bot, for instance. These guys are really content distribution channels, they are all about push and are hardly ever conversational, but can sometimes support basic keyword search. In theory, a dialogue-driven newsbot could make an interesting product, but nobody has really nailed it yet. 
  3. Chatbots - i.e., genuine "chat bots", where the chat medium is in fact key to the experience, namely, where verbal communication actually helps get the job done. One popular use case is of course, customer service, which may very well be the killer app for chatbots. But, beyond run-of-the-mill customer support, we are seeing a surge in conversational concierge bots: from transaction-oriented services such as travel agents, to more casual assistance such as movie recommendations, to virtual friends, etc. Notice that, in principle, chatbots can be powered by either human agents or machines (or both).  Naturally, the trend is to eliminate or at least minimize the reliance on humans - to make the service both more responsive and more scalable. But, even when striving for a fully automated chatbot, one should not completely rule out a hybrid human-in-the-loop approach.
  4. Voice assistants - such as Amazon Echo, our Robin app, etc. - are essentially chatbots that use voice as the main/only communication channel, becoming very handy e.g., in the living room, in the car and other hands-free scenarios. Due to their reliance on voice, these bots have the highest conversational fluency bar of all other categories. As a result, they are the hardest to build, but can be genuinely useful when typing is not a good option - as evidenced by Amazon Echo's popularity. When the experience works, it does feel like the holy grail! 
Well, I wouldn't write it exactly like so, but totally agree that open ended conversation is very different from a bot that is supposed to help you solve a particular problem...

Anyways, they also have an awesome picture of Daleks, reproduced here for your delight.

Thursday, August 18, 2016

Lewis and the mysteries of A*

Some three weeks ago we had the pleasure of a visit by Mike Lewis, from Washington University, originally a student of Mark Steedman in Edinburgh.

He came to Nuance and talked about his A* super efficient parsing system that he talked about at ACL in San Diego. I really wanted him to talk about his older work with Mark, on Combined Distributional and Logical Semantics Transactions of the Association for Computational Linguistics, 2013, but if someone is  nice enough to come and talk to you, they may choose whatever they want to talk about. at least in my books.

And besides people in the Lab were super interested in Mike's new work. Mike is a great speaker, one of those that give you the impression that you are really understanding everything he said. Very impressive indeed! Especially if you considered how little  I know about parsing or LSTM (long short-term memory) methods. But the parse is publicly released, everyone can find it in github

 There's even a recorded talk of the presentation I wanted to hear, Combined Distributional and Logical Semantics, so altogether it was an splendid visit. When discussing other work in their paper, Mike and Mark say about our Bridge system: 

'Others attempted to build computational models of linguistic theories based on formal compositional semantics, such as the CCG-based Boxer (Bos, 2008) and the LFG- based XLE (Bobrow et al., 2007). Such approaches convert parser output into formal semantic representations, and have demonstrated some ability to model complex phenomena such as negation. For lexical semantics, they typically compile lexical resources such as VerbNet and WordNet into inference rules—but still achieve only low recall on open-domain tasks, such as RTE, mostly due to the low coverage of such resources.' 

I guess I agree that the resources we managed to gather didn't have the coverage we needed. More, other resources like those, are still needed. We need bigger, more complete, more  encompassing "Unified Lexica" for different phenomena. and more, many more languages. But I stop now with a very impressive slide from Mike's presentation.

Wednesday, August 17, 2016

Feferman's Farewell

 I was super sad to hear that we lost Professor Sol Feferman on July 26th, 2016. This week WOLLIC is happening in Puebla and Ruy asked me if I wanted to say a few words about Sol in a special session due to happen today in his honour.

I knew I would be busy at the time of the session, as seminars in Nuance Sunnyvale are on Wednesdays at  11 am, so I said I couldn't do it. Ruy then suggested  recording a tribute, so I decided to try it.

 I looked through many emails to, from and about Sol. and I looked at papers and reports and I managed to write a short text. Not as short as I wanted it to be.  when  recorded it it came to 12 minutes, instead of between 5 and 10 minutes that I had aimed for.  I even managed to get to grips with quickmovie (ok the only thing you need to discover is where the button to record something is...) and I recorded my message. Only to send it and discover that the programme had been changed at the last minute and the session in Sol's honor had already happened. oh well.

Here's my tribute to Sol and  Anita Feferman.  Grisha Mints and Bill Craig also show up a little. We're definitely getting poorer!

Semantics: Distributional and Compositional. Dudes and PROPS

(I haven't posted any thing in a long while, the stuff is accumulating in a hazardous way. Today we had Gabi Stanovsky visiting and his talk was great, and it reminded me of posting this.)

There is by now a great deal of literature on the deep problem of unifying distributional semantics (in terms of vectors and cosine distances) and logical or compositional semantics (in terms of negation, conjunction, disjunction, implication, etc.) Because it is an interesting and very topical problem (several of the people involved have sold multi-million dollar companies, for example) several groups have tried to crack the problem, with different theories.

The vision paper, explaining why we need "distributional semantics" as well as "logical semantics" is Combining Symbolic and Distributional Models of Meaning,   by Pulman and Clark. only 4 pages and well worth reading!

Then I  made a list of a few other papers that caught my attention and that might indicate a way forward for what I want to do.  My list:
1. Combined Distributional and Logical Semantics, Lewis and Steedman, 2013.
2. Transforming Dependency Structures to Logical Forms for Semantic Parsing, Reddy et al, 2016.
3. Flexible Semantic Composition with DUDES, Cimiano, 2009.
4. Getting More Out Of Syntax with PROPS, Stanovsky et al, in arXiv on 4 March 2016.

These two last papers form a side trip from the main concern of merging distributional semantics and logical semantics, but are still about meanings. The DUDES is fairly short, old (2009) and the author seems to be more concerned with lexical resources nowadays. The PROPS paper is longer and seems much more useful to my goals. (also, isn't props a great name?)

The basic  ideas of the paper  seem to be:

1. NLP applications often rely on dependency trees to recognize major elements of the proposition structure of sentences.
2. many phenomena are not easily read out of dependency trees, often leading to ad-hoc heuristic post-processing or  information loss.
3. they suggest  PROPS – an output representation designed to explicitly and uniformly express much of the proposition structure which is implied from syntax.
4. they also provide an associated tool for extracting it from dependency trees (yay!!). 

(Project page at PropS -- Syntax Based Proposition Extraction, with online demo.
code in GitHub (GitHub - gabrielStanovsky/props: PropS offers an output representation designed to explicitly and uniformly express much… ) requires python and java 7.

Their desiderata:
a. uniformly represent propositions headed by different types of predicates, verbal or not.
b. canonicalize different syntactic constructions that correspond to the same proposition structure
c. decouple independent propositions while clearly marking proposition boundaries
d. "mask" non-core syntactic detail, yielding cleaner compact structures.
e. enable simple access to the represented propositions by a uniform graph traversal.

Their design principles: 
a.  Want to mask non-core syntactic detail:  
    - remove auxiliary words and instead encode their syntactic function as features; 
    - group atomic units (such as noun compounds) within a single node
b. Represent propositions in a uniform manner (verbal and adjectival)
c. Canonicalize and differentiate syntactic constructions: 
   - Unify the representation of propositions which are semantically equivalent;
    - Differentiate syntactically- similar, yet semantically-different, constructions.
d. Mark proposition boundaries
e. Propagate Relations: every relation which is inferable through parse tree traversal (for instance, through conjunctions) should be explicitly marked in the representation. 

Their output format:
1. similar to dependencies, BUT
2. Typed nodes: (1) Predicates, which evoke a proposition and 
   (2) Non-predicates, which can be either arguments or modifiers.
3. simplify the graph structure by allowing multi-word nodes (e.g., Barack Obama), versus having each node corresponding to a single word in dependency trees.
4. resulting structures are no longer limited to trees, but are DAGS.
5. a label set of 14 relations (compared with approximately 50 in Stanford dependencies) 

I need to check how Bridge/XLE deals with the pair: The director who edited ‘Rear Window’ released Psycho” and
Hitchcock, who edited ‘Rear Window’, released Psycho”. Need also to check and mark what  they call raising verbs?
They say [...]``we heuristically use a set of approximately 30 verbs which were found by (Chrupała and van Genabith, 2007) to frequently occur in raising constructions. For these verbs do not produce a proposition." Seems sensible to me and I don't think we did this in Bridge.

MCTest corpus for machine comprehension (Richardson et al., 2013), composed of 500 short stories, each followed by 4 multiple choice questions. The MCTest comprehension task does not require extensive world knowledge. Focus on questions which are marked in the corpus as answerable from a single sentence in the story (905 questions followed by 3620 candidate answers). Richardson et al (2013) introduce a lexical matching algorithm, which they adapt to use either dependency or PROPS structures, both obtained using the Berkeley parser. (numbers show the progression expected, but still low).