Leader Board

ID Team Name Description Submission Time nDCG@10
88 OKSAT run20 2017-04-21 23:40:37 UTC 0.44471
87 cdlab Last 2017-04-21 16:10:07 UTC 0.41131
86 YJRS Baseline + multiple BM25F features + nDCG@10. 2017-04-21 01:06:33 UTC 0.41894
85 SLOLQ test #05 2017-04-20 21:45:33 UTC 0.31329
84 OKSAT run19 2017-04-20 16:19:10 UTC 0.43767
83 cdlab #21 2017-04-20 15:41:28 UTC 0.41800
82 YJRS 8foldCV_LambdaMART 2017-04-20 00:36:57 UTC 0.38087
81 SLOLQ test #04 2017-04-19 21:11:38 UTC 0.31516
80 OKSAT run18 2017-04-19 16:16:59 UTC 0.43516
79 cdlab #20 2017-04-19 15:37:19 UTC 0.31272
78 TUA1 RF 2000 bags 2017-04-19 07:56:00 UTC 0.35447
77 YJRS Baseline + multiple BM25F features. 2017-04-19 00:32:51 UTC 0.39637
76 cdlab #19 2017-04-18 15:22:33 UTC 0.30804
75 OKSAT run17 2017-04-18 09:37:52 UTC 0.43241
74 TUA1 RF 1000 bags 2017-04-18 05:40:58 UTC 0.36140
73 Erler test 2017-04-17 16:33:46 UTC 0.35451
72 cdlab #18 2017-04-17 15:14:37 UTC 0.28756
71 YJRS 8foldCV_RandomForest 2017-04-17 13:17:54 UTC 0.37091
70 OKSAT run16 2017-04-17 09:34:59 UTC 0.40094
69 SLOLQ test #04 2017-04-17 06:48:00 UTC 0.30167
68 Erler test translation 0.9 0.0 0.1 2017-04-16 16:26:58 UTC 0.35596
67 cdlab #17 2017-04-16 14:54:10 UTC 0.29235
66 YJRS Five-fold cross validation (2). 2017-04-16 08:58:59 UTC 0.40167
65 Erler test T2LM 0.4 0.4 0.1 0.1 2017-04-15 16:12:01 UTC 0.38670
64 cdlab #16 2017-04-15 14:36:06 UTC 0.29043
63 OKSAT run15 2017-04-15 11:28:36 UTC 0.42514
62 Erler test T2LM 0.4 0.1 0.4 0.1 2017-04-14 16:00:42 UTC 0.35415
61 SLOLQ test #03 2017-04-14 09:28:52 UTC 0.31346
60 Erler test translation 0.4 0.7 0.0 2017-04-13 15:58:21 UTC 0.34709
59 cdlab #15 2017-04-13 13:03:24 UTC 0.32396
58 OKSAT run14 2017-04-13 12:25:20 UTC 0.24125
57 Erler test translation 0.4 0.5 0.1 2017-04-12 15:54:23 UTC 0.37304
56 cdlab #14 2017-04-12 12:23:30 UTC 0.41352
55 OKSAT run13 2017-04-12 06:24:10 UTC 0.41960
54 SLOLQ test #02 2017-04-11 21:43:51 UTC 0.31908
53 Erler test translation 0.4 0.5 0.1 2017-04-11 15:49:31 UTC 0.39139
52 cdlab #13 2017-04-11 09:55:57 UTC 0.41623
51 OKSAT run12 2017-04-11 05:30:34 UTC 0.37958
50 YJRS Five-fold cross validation (fix). 2017-04-11 03:13:05 UTC 0.41157
49 Erler lda 70 2000 improve 2017-04-10 15:24:16 UTC 0.37968
48 YJRS Five-fold cross validation. 2017-04-10 02:36:27 UTC 0.37965
47 Erler lda 70 2000 2017-04-09 15:10:57 UTC 0.37985
46 OKSAT run11 2017-04-09 13:31:17 UTC 0.33449
45 Erler Origin 2017-04-08 14:28:27 UTC 0.38193
44 Erler LDA 2017-04-07 14:17:36 UTC 0.37985
43 OKSAT run10 2017-04-06 11:02:25 UTC 0.36669
42 cdlab #12 2017-04-05 13:05:29 UTC 0.41586
41 cdlab #11 2017-04-02 09:09:14 UTC 0.40222
40 OKSAT run9 2017-03-31 23:59:21 UTC 0.37837
39 cdlab #10 2017-03-31 14:53:38 UTC 0.40251
38 YJRS Baseline + naive BM25F. 2017-03-31 06:16:38 UTC 0.37965
37 OKSAT run8 2017-03-30 17:08:45 UTC 0.33365
36 OKSAT run7 2017-03-29 10:00:44 UTC 0.30427
35 SLOLQ test 2017-03-28 19:21:37 UTC 0.31384
34 cdlab #9 2017-03-28 15:11:37 UTC 0.40323
33 OKSAT run6 2017-03-28 09:55:53 UTC 0.32638
32 cdlab #8 2017-03-27 15:09:34 UTC 0.35070
31 TUA1 rank with RandomForests model 300bags 2017-03-27 08:38:30 UTC 0.34849
30 OKSAT run5 2017-03-27 07:03:10 UTC 0.30756
29 cdlab #7 2017-03-26 11:47:00 UTC 0.37515
28 YJRS BM25F, roughly optimized with CA where n = 3 and sf = 0.8 . 2017-03-26 03:21:10 UTC 0.34316
27 OKSAT run0 2017-03-25 12:24:56 UTC 0.35451
26 cdlab #6 2017-03-25 11:44:34 UTC 0.29530
25 YJRS BM25F, roughly optimized with CA where n = 3 . 2017-03-25 03:00:59 UTC 0.33341
24 cdlab #5 2017-03-24 11:02:16 UTC 0.37518
23 OKSAT run4 2017-03-24 08:42:37 UTC 0.36388
22 Erler Test 2017-03-24 08:03:40 UTC 0.40566
21 cdlab #4 2017-03-23 02:48:00 UTC 0.37207
20 OKSAT run3 2017-03-23 00:23:08 UTC 0.29426
19 TUA1 ubuntu14.04 amd64 test1 2017-03-22 12:11:49 UTC 0.37670
18 KUIDL LambdaMART (without normalization) 2017-03-22 09:13:58 UTC 0.35788
17 ORG Example result with Coordinate Ascent (with improved rel labels, no norm) 2017-03-22 09:02:12 UTC 0.35957
16 YJRS Roughly optimized BM25F. 2017-03-22 01:55:44 UTC 0.33337
15 OKSAT run2 2017-03-21 17:19:39 UTC 0.29214
14 KUIDL LambdaMART (with smaller amount of training data) 2017-03-21 09:09:30 UTC 0.32683
13 ORG Example result with Coordinate Ascent (with improved rel labels) 2017-03-21 08:53:33 UTC 0.36642
12 cdlab #3 2017-03-20 16:22:53 UTC 0.26786
11 OKSAT run1 2017-03-20 15:35:30 UTC 0.37083
10 YJRS Naive BM25F. 2017-03-20 14:44:52 UTC 0.36452
9 cdlab #2 2017-03-19 10:02:54 UTC 0.36321
8 KUIDL LambdaMART 2017-03-19 01:31:07 UTC 0.34231
7 ORG Example result with Coordinate Ascent 2017-03-19 01:10:23 UTC 0.41328
6 cdlab #1 2017-03-18 05:40:24 UTC 0.33105
5 YJRS Test run. 2017-03-17 06:46:18 UTC 0.34371
4 ORG This is a sample (almost identical to the distributed file). 2017-02-20 09:50:32 UTC 0.35451

Overview

OpenLiveQ (Open Live Test for Question Retrieval) provides an open live test environment in a community Q&A service of Yahoo Japan Corporation for evaluating question retrieval systems. We offer opportunities of more realistic system evaluation and help research groups address problems specific to real search systems in a production environment (e.g. ambiguous/underspecified queries and diverse relevance criteria). The task is simply defined as follows: given a query and a set of questions with answers, return a ranked list of questions.

NOTE: OpenLiveQ provides only Japanese data and a Japanese open test environment, while we strongly support participants by providing a tool for feature extraction, i.e. Japanese NLP is not required for participation.

Schedule

Dec 15, 2016
Feb 28, 2017
Registration due ( Registration at NTCIR-13 Web site )*
Jan 1 - Mar 31 Apr 21, 2017 Offline test (evaluation with relevance judgment data) *
Apr - Jun, 2017 Online test (evaluation with real users) #
Jul 1, 2017 Online test result release #
Sep 1, 2017 Task overview paper (draft) release #
Oct 1, 2017 Task participant paper (draft) submission due *
Nov 1, 2017 Task participant paper (camera-ready) submission due *
Dec 5 - 8, 2017 NTCIR-13 Conference at NII, Tokyo, Japan *
* and # indicate schedules that should be done by participants and organizers, respectively.

Participation

To participate in the NTCIR-13 OpenLiveQ task, please read through What participants must do .

Please then take the following steps:
  1. Register through online registration
  2. Make two signed original copies of the user agreement forms
  3. Send the signed copies by postal mail or courier to the NTCIR Project Office

After the agreement is concluded, we will provide the information on how to download the data.

Data

Participants can obtain the following data:

  • 1,000 training and 1,000 test queries input into Yahoo! Chiebukuro search
    • The clickthrough rate of each question in the SERP for each query
    • Demographics of users who clicked on each question
      • Fraction of male and female
      • Fraction of each age
  • At most 1,000 questions with answers for each query, including information presented in the SERP (e.g. snippets)

Task

A set of questions \(D_q \subset D\) (\(D\) is a set of all the questions) is given for each query \(q \in Q\). Only a task in OpenLiveQ is to rank questions in \(D_q\) for each query \(q\).

Input

The input consists of queries and questions for each query.

Queries are included in file "OpenLiveQ-queries-test.tsv", in which each line contains a query. The file format is shown below:
[QueryID_1]\t[Content_1]
[QueryID_2]\t[Content_2]
...
[QueryID_n]\t[Content_n]

where[QueryID_i]is a query ID and[Content_i]is a query string.

A set of all the questions are included in file "OpenLiveQ-questions-test.tsv", in which each line contains a pair of a query ID and a question ID. The file format is shown below:
[QueryID_1]\t[QuestionID_1_1]
[QueryID_1]\t[QuestionID_1_2]
...
[QueryID_n]\t[QuestionID_n_m]

where a pair of a query ID and a question ID indicates which documents correspond to a query, i.e. question \(d\) belongs to \(D_q\) for query \(q\). Line[QueryID_i]\t[QuestionID_i_j]indicates question[QuestionID_i_j]belongs to \(D_q\) for query [QueryID_i] .

Sample of Input

OpenLiveQ-queries.tsv
OLQ-0001 野球
OLQ-0002 広島
OLQ-0003 神社


OpenLiveQ-questions.tsv
OLQ-0001 q0000000000
OLQ-0001 q0000000001
OLQ-0002 q0000000000
OLQ-0002 q0000000002
OLQ-0003 q0000000003
OLQ-0003 q0000000004

Output

The output is a ranked list of questions for each query. Ranked lists should be saved in a single file, in which each line includes a pair of a query ID and a question ID. The file format is shown below:
[Description]
[QueryID_1]\t[QuestionID_1_1]
[QueryID_1]\t[QuestionID_1_2]
...
[QueryID_n]\t[QuestionID_n_m]

where[Description]is a simple description about your system, which should not include newline characters. The content of the output file except for the first line must be exactly the same as that of the question file "OpenLiveQ-questions.tsv" except for the order of lines. In the output file, line[QueryID_i]\t[QuestionID_i_j]shown before line[QueryID_i]\t[QuestionID_i_j']indicates that the rank of question[QuestionID_i_j]is higher than that of question [QuestionID_i_j'] for query[QueryID_i].

Sample of Output

OLQ-0001 q0000000001
OLQ-0001 q0000000000
OLQ-0002 q0000000002
OLQ-0002 q0000000000
OLQ-0003 q0000000004
OLQ-0003 q0000000003


The output above represents the following ranked lists:

  • OLQ-0001: q0000000001, q0000000000
  • OLQ-0002: q0000000002, q0000000000
  • OLQ-0003: q0000000004, q0000000003

Resources

To rank the questions, participants can leverage some resources such as training queries, training questions, question data including titles and body, and clickthrough data.

Training Queries

Training queries are included in file "OpenLiveQ-queries-train.tsv", and the file format is the same as that of "OpenLiveQ-queries-test.tsv".

Training Questions

Training questions are included in file "OpenLiveQ-questions-train.tsv", and the file format is the same as that of "OpenLiveQ-questions-test.tsv".

Question Data

Information about all the questions as of December 1-9, 2016 are included in "OpenLiveQ-question-data.tsv", and each line of the file contains the following values of a question (values are separated by tabs):

  1. Query ID (a query for the question)
  2. Rank of the question in a Yahoo! Chiebukuro search result for the query of Query ID
  3. Question ID
  4. Title of the question
  5. Snippet of the question in a search result
  6. Status of the question (accepting answers, accepting votes, solved)
  7. Last update time of the question
  8. Number of answers for the question
  9. Page view of the question
  10. Category of the question
  11. Body of the question
  12. Body of the best answer for the question
The number of questions is 1,967,274.

Clickthrough Data

Clickthrough data are available for some of the questions. Based on the clickthrough data, one can estimate the click probability of the questions, and understand what kinds of users click on a certain question. The clickthrough data were collected from August 24, 2016 to November 23, 2016. The clickthrough data are included in file "OpenLiveQ-clickthrough-data.tsv", and each line consits of the following values separated by tabs:

  1. Query ID (a query for the question)
  2. Question ID
  3. Most frequent rank of the question in a Yahoo! Chiebukuro search result for the query of Query ID
  4. Clickthrough rate
  5. Fraction of male users among those who clicked on the question
  6. Fraction of female users among those clicked on the question
  7. Fraction of users under 10 years old among those who clicked on the question
  8. Fraction of users in their 10s among those who clicked on the question
  9. Fraction of users in their 20s among those who clicked on the question
  10. Fraction of users in their 30s among those who clicked on the question
  11. Fraction of users in their 40s among those who clicked on the question
  12. Fraction of users in their 50s among those who clicked on the question
  13. Fraction of users over 60 years old among those who clicked on the question
The clickthrough data contain click statistics of a question identified by Question ID when a query identified by Query ID was submitted. The rank of the question can change even for the same query. This is why the third value indicates the most frequent rank of the question.
The number of query-question pairs in the clickthrough data is 440,163. The question information can be found in "OpenLiveQ-question-data.tsv" for 390,502 query-question pairs, while it is not included for the other pairs.

Evaluation

Offline Test

Evaluation with relevance judgment data

Offline test is carried out before online test explained later, and determines participants whose systems are evaluated in the online test, based on results in the offline test. Evaluation is conducted in a similar way to traditional ad-hoc retrieval tasks, in which results are evaluated by relevance judgment results and evaluation metrics such as nDCG (normalized discounted cumulative gain), ERR (expected reciprocal rank), and Q-measure. During the offline test period, participants can submit their results once per day through this Web site, and obtain evaluation results right after the submission.

Relevance Judgment

To simulate the online test in the offline test, we conduct relevance judgment with an instruction shown below: "Suppose you input query \(q\) and received a set of questions \(D_q\). Please select all the questions on which you want to click". Assessors are not present with the full content of each question, and requested to evaluate questions in a similar page to the real SERP in Yahoo! Chiebukuro. This type of relevance judgment is different from traditional ones, and expected to result in being similar to results of the online test. Multiple assessors are assigned to each query, and the relevance grade of each question is estimated as the number of assessors who select the question in the relevance judgment. For example, the relevance grade is 2 if two out of three assessors marked a question.

Evaluation Metrics

The following evaluation metrics are used in our plan:
  • nDCG (normalized discounted cumulative gain)
  • ERR (expected reciprocal rank)
  • Q-measure

Submission

You can submit your run by the following command in Linux or Mac environments:

curl http://www.openliveq.net/runs -X POST -H "Authorization:[AUTH_TOKEN]" -F run_file=@[PATH_TO_YOUR_RUN_FILE]

where [AUTH_TOKEN] is distributed only to participants.

For example, curl http://www.openliveq.net/runs -X POST -H "Authorization:ORG:AABBCCDDEEFF" -F run_file=@data/your_run.tsv

Please note that

  1. It takes a few minutes to upload a run file,
  2. Each team is not allowed to submit two or more runs within 24 hours, and
  3. The submission deadline is March 31 April 21.

The evaluation result (nDCG@10) will be displayed on the top of this website. The top 10 teams in terms of nDCG@10 will be invited to the online evaluation. Details of evaluation results will be sent after the submission deadline.

Online Test

Evaluation with real users

Submitted results are evaluated by multileaving1. At most 10 systems are selected by the results of the offline test, and evaluated in the online test. Submitted results are combined into a single SERP by the multileaving, presented to real users during the online test period, and evaluated on the basis of clicks observed. Results submitted in the offline test period are used in as-is in the online test. Note that some questions can be excluded in the online test if they are deleted for some reasons before or during the online test.

Note that the best result from each team at the offline evaluation will be used at the online evaluation.

1 Schuth et al. "Multileaved Comparisons for Fast Online Evaluation." CIKM 2014.

Organizers

  • Makoto P. Kato (Kyoto University)
  • Takehiro Yamamoto (Kyoto University)
  • Sumio Fujita (Yahoo Japan Corporation)
  • Akiomi Nishida (Yahoo Japan Corporation)
  • Tomohiro Manabe (Yahoo Japan Corporation)