Collins offers the highest quality dictionary data to meet your language needs. Our data is held in our proprietary XML format and is a rich lexical resource. We cover more than thirty world languages.

Below is a selection of data available, please note that this is not a full list of our range. Please contact us to find out more and to request samples.

 

Language Format Information Language data

Arabic/English

Headwords:
15,000
References:
23,000
Translations:
24,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
15000
References:
23000
Translations:
24000

Arabic/Spanish

Headwords:
16,000
References:
25,000
Translations:
24,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
16000
References:
25000
Translations:
24000

Bengali/English

Headwords:
20,000
References:
30,000
Translations:
25,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
20000
References:
30000
Translations:
25000

English-Bengali semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
10000
References:
15000
Translations:
12000

Catalan/English

Headwords:
18,000
References:
42,000
Translations:
47,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
18000
References:
42000
Translations:
47000

English-Chinese semi-bilingual

Headwords:
40,000
References:
190,000
Translations:
190,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
40000
References:
190000
Translations:
190000

Chinese/English

Headwords:
36,000
References:
68,000
Translations:
95,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
36000
References:
68000
Translations:
95000

Chinese/Spanish

Headwords:
16,000
References:
26,000
Translations:
25,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
16000
References:
26000
Translations:
25000

English to Chinese

Headwords:
240,000
References:
376,000
Translations:
500,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
240000
References:
376000
Translations:
500000

Chinese to English

XML
Dataset type:
Bilingual
Audio:
No

English (UK) monolingual

Headwords:
229,000
References:
770,000
ColleXML
Dataset type:
Monolingual
Audio:
No
Headwords:
229000
References:
770000

English (UK) learner’s monolingual

Headwords:
40,000
References:
170,000
XML
Dataset type:
Monolingual
Audio:
Yes
Headwords:
40000
References:
170000

English (UK) thesaurus

Headwords:
16,500
References:
377,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
16500
References:
377000

English (UK) lemmatised lists

Headwords:
113,000
References:
200,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
113000
References:
200000

Dictionary of Business

Headwords:
3,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
3000

Dictionary of Medicine

Headwords:
13,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
13500

Dictionary of Astronomy

Headwords:
3,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
3500

English (UK) verb tables

Headwords:
10,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
10500

Dictionary of Economics

Headwords:
2,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
2000

Dictionary of Sociology

Headwords:
2,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
2500

Dictionary of Biology

Headwords:
7,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
7500

Dictionary of Law

Headwords:
3,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
3500

Dictionary of Idioms

Headwords:
7,000
References:
19,000
Tagged
Dataset type:
Monolingual
Audio:
No
Headwords:
7000
References:
19000

Dictionary of Phrasal Verbs

Headwords:
5,500
References:
26,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
5500
References:
26000

Scrabble word list (official)

Headwords:
276,663
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
276663

English (USA) learner’s monolingual

Headwords:
38,000
References:
136,000
XML
Dataset type:
Monolingual
Audio:
Yes
Headwords:
38000
References:
136000

English (USA) monolingual

Headwords:
85,000
References:
217,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
85000
References:
217000

English (USA) thesaurus

Headwords:
11,000
References:
130,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
11000
References:
130000

French monolingual

Headwords:
60,000
References:
300,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
60000
References:
300000

French thesaurus

Headwords:
60,000
References:
300,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
60000
References:
300000

French/English

Headwords:
86,000
References:
174,000
Translations:
233,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
86000
References:
174000
Translations:
233000

French/German

Headwords:
75,000
References:
107,000
Translations:
121,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
75000
References:
107000
Translations:
121000

French/Spanish

Headwords:
54,000
References:
117,000
Translations:
126,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
54000
References:
117000
Translations:
126000

French/Italian

Headwords:
48,000
References:
108,000
Translations:
120,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
48000
References:
108000
Translations:
120000

French lemmatised list

Headwords:
48,000
References:
409,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
48000
References:
409000

French verb tables

Headwords:
6,500
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
6500

German monolingual

Headwords:
66,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
66000

German/English

Headwords:
110,000
References:
192,000
Translations:
243,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
110000
References:
192000
Translations:
243000

German/Italian

Headwords:
65,500
References:
100,000
Translations:
130,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
65500
References:
100000
Translations:
130000

German/Spanish

Headwords:
62,500
References:
88,000
Translations:
113,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
62500
References:
88000
Translations:
113000

German lemmatised list

Headwords:
108,000
References:
3,800,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
108000
References:
3800000

German verb tables

Headwords:
12,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
12000

Greek/English

Headwords:
69,000
References:
127,000
Translations:
164,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
69000
References:
127000
Translations:
164000

Greek/Italian

Headwords:
19,500
References:
24,000
Translations:
27,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
19500
References:
24000
Translations:
27000

English-Gujarati semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

Hindi/English

Headwords:
20,000
References:
30,000
Translations:
25,000
Excel
Dataset type:
Bilingual
Audio:
No
Headwords:
20000
References:
30000
Translations:
25000

English-Hindi semi-bilingual

Headwords:
14,000
References:
60,000
Translations:
25,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
14000
References:
60000
Translations:
25000

Irish/English

Headwords:
27,000
References:
46,000
Translations:
62,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
27000
References:
46000
Translations:
62000

Italian/English

Headwords:
92,000
References:
172,000
Translations:
230,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
92000
References:
172000
Translations:
230000

Italian/Spanish

Headwords:
62,000
References:
89,000
Translations:
103,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
62000
References:
89000
Translations:
103000

Italian/Portuguese

Headwords:
17,500
References:
23,000
Translations:
21,500
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
17500
References:
23000
Translations:
21500

Italian lemmatized lists

Headwords:
32,000
References:
398,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
32000
References:
398000

English-Japanese semi-bilingual

Headwords:
19,000
References:
50,000
Translations:
105,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
19000
References:
50000
Translations:
105000

Japanese/English

Headwords:
16,000
References:
26,000
Translations:
27,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
16000
References:
26000
Translations:
27000

Japanese/Spanish

Headwords:
16,000
References:
27,000
Translations:
26,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
16000
References:
27000
Translations:
26000

English-Kannada semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

English-Korean semi-bilingual

Headwords:
19,000
References:
142,000
Translations:
149,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
19000
References:
142000
Translations:
149000

Korean/English

Headwords:
14,000
References:
25,000
Translations:
26,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
14000
References:
25000
Translations:
26000

Latin/English

Headwords:
36,000
References:
60,000
Translations:
80,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
36000
References:
60000
Translations:
80000

Malay/English

Headwords:
27,000
References:
51,000
Translations:
56,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
27000
References:
51000
Translations:
56000

English-Malayalam semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

Malayalam/English

Headwords:
20,000
References:
30,000
Translations:
25,000
Excel
Dataset type:
Bilingual
Audio:
No
Headwords:
20000
References:
30000
Translations:
25000

English-Marathi semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

Norwegian/English

Headwords:
25,000
References:
54,000
Translations:
75,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
25000
References:
54000
Translations:
75000

English-Odia semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

Polish/English

Headwords:
67,000
References:
96,000
Translations:
99,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
67000
References:
96000
Translations:
99000

Portuguese/English

Headwords:
59,000
References:
83,000
Translations:
107,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
59000
References:
83000
Translations:
107000

Portuguese/Spanish

Headwords:
37,000
References:
48,000
Translations:
55,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
37000
References:
48000
Translations:
55000

Russian/English

Headwords:
74,000
References:
132,000
Translations:
117,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
74000
References:
132000
Translations:
117000

Russian/Spanish

Headwords:
17,000
References:
26,000
Translations:
26,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
17000
References:
26000
Translations:
26000

Spanish monolingual

Headwords:
53,000
References:
112,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
53000
References:
112000

Spanish thesaurus

Headwords:
16,000
References:
71,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
16000
References:
71000

English-Spanish semi-bilingual

Headwords:
14,000
References:
59,000
Translations:
64,000
Tagged
Dataset type:
Bilingual
Audio:
No
Headwords:
14000
References:
59000
Translations:
64000

Spanish/English

Headwords:
142,000
References:
310,000
Translations:
442,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
142000
References:
310000
Translations:
442000

Spanish lemmatized list

Headwords:
55,000
References:
2,000,000
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
55000
References:
2000000

Spanish verb tables

Headwords:
16,200
XML
Dataset type:
Monolingual
Audio:
No
Headwords:
16200

English-Tamil semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
10000
References:
15000
Translations:
12000

Tamil/English

Headwords:
20,000
References:
30,000
Translations:
25,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
20000
References:
30000
Translations:
25000

English-Telugu semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

Thai/English

Headwords:
13,000
References:
23,000
Translations:
23,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
13000
References:
23000
Translations:
23000

Turkish/English

Headwords:
14,000
References:
25,000
Translations:
27,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
14000
References:
25000
Translations:
27000

Ukrainian/English

Headwords:
37,000
References:
35,000
Translations:
41,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
37000
References:
35000
Translations:
41000

English-Urdu semi-bilingual

Headwords:
10,000
References:
15,000
Translations:
12,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
References:
15000
Translations:
12000

Vietnamese/English

Headwords:
13,000
References:
28,000
Translations:
30,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
13000
References:
28000
Translations:
30000

Welsh/English

Headwords:
34,000
References:
36,000
Translations:
53,000
XML
Dataset type:
Bilingual
Audio:
No
Headwords:
34000
References:
36000
Translations:
53000

multilingual database (inc. Arabic, Bengali, Chinese, Croatian, Czech, Danish, Dutch, English (GB), English (US), Farsi, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Malayalam, Norwegian, Polish, Portuguese (BR), Portuguese (PT), Romanian, Russian, Spanish (ES), Spanish (LatAm), Swedish, Tamil, Thai, Turkish, Ukrainian, Vietnamese)

Headwords:
10,000
Translations:
10,000
XML
Dataset type:
Bilingual
Audio:
Yes
Headwords:
10000
Translations:
10000