Table of Contents

ICML 2009

ICML 2009 took place in June. Michael Littman and I were the program co-chairs. Since we were expecting a lot of work, we tried to make it interesting by experimenting with a number of changes in the review process. Read more for a little explanation and a few conclusions…

Motivations

It is usually accepted the increasing volume of submission tests the community's ability to perform high quality reviews. Without high quality reviews, the job of a program chair is hopeless. But what exactly makes a high quality review?

Spotting the important new ideas is difficult because new ideas appear when one takes a fresh point of view. Strong forces tend to restrict machine learning to the intersection of computer science and statistics. Yet machine learning grew from a great variety of fields. Who knows where the next breakthrough is going to come from?

In addition, machine learning permeates a number of applicative fields. Challenging applications can trigger conceptual advances because they force us to push the boundaries of our conceptual models.

Conflicting remedies

Two rounds of reviews

In general, when a submission does not receive sufficiently informative reviews, the program committee seeks additional reviews from trusted reviewers. In recent years, a large fraction of accepted NIPS and ICML submissions have received one to three additional reviews!

We decided to formalized this practice by setting up a two round review process. Each paper received at least two first round reviews. The area chairs were instructed to immediately reject papers receiving negative first round reviews whose criticism surely could not be reversed by additional reviews. About one quarter of the papers were rejected at this stage. The remaining papers received one to three second round reviews.

Author's responses were sought after the first round. We would have liked to do the same after the second round, but the reviewing period was unfortunately too short. (The 2009 conference took place in June instead of July.)

Conclusions

Reverse bidding

The best way to obtain high quality reviews is to assign the right reviewers. Unfortunately, this assignment is very labor intensive. As explained above, algorithmic solutions are not without problems. Therefore we chose to implement an original idea.

Each area chair first was asked to recruit a dozen reviewers. In the paper submission form, authors were asked to name preferred area chairs (three first choices, three second choices). The objective was to increase diversity by letting authors name the area chairs whose reviewers are most likely to appreciate their work.

The area chairs were asked to name the first round reviewers from their own pool of reviewers. On the other hand, area chairs were encouraged to pick second round reviewers without restrictions. To ensure that no second round reviewer was overloaded, we simply set up a web page showing in real time which reviewers were still available. Such a simple approach worked because the first round reviews and the early rejection had considerably reduced the second round reviewing load.

Conclusions

"Food for Thought" session

After collecting all the reviews, we observed that sorting the borderline papers always punishes novel ideas. Two reviewers pointing out obvious minor flaws usually prevail over a single reviewer excited by a new idea. Therefore, in addition to the papers that were selected by the normal process, we asked the area chairs to point out additional papers by weighting novelty and potential impact more than correctable flaws. These papers were grouped in an additional session called “Food for Thought”.

The impact of the papers published in this session remains to be seen. However we are proud to report that the session was packed. Since attendees and reviewers are the same persons, one wonder why these papers received such criticism.

Software

Running such experiments was not easy because they do not always match the design of existing conference management software packages. Area chairs and reviewers sometimes had to deal with contrived user interfaces; reviewer discussions were hampered by an email bug.

Yet we must warmly thank the SoftConf development team. We could not have implemented our experimental process without their continued efforts. Our software advice for future program chairs is to make sure that their choice is backed by a responsive team.