WEX
The Freebase Wikipedia Extraction (WEX) is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted in tabular form.
Freebase WEX is provided as a set of database tables in TSV format for PostgreSQL, along with tables providing mappings between Wikipedia articles and Freebase topics, and corresponding Freebase Types.
Download
Freebase WEX is provided free of charge for any purpose with regular updates by Metaweb Technologies. It is distributed, like Wikipedia itself, under the terms of version 1.2 of the GNU Free Documentation License or any later version published by the Free Software Foundation.
Latest update: June 23, 2008
| File | Download size | Uncompressed size |
|---|---|---|
| All files † | 7.8 GB | 55 GB |
| Setup files | 12 KB | 44 KB |
| articles | 3.9 GB | 27 GB |
| sections | 3.1 GB | 23 GB |
| template_calls | 111 MB | 875 MB |
| template_values | 496 MB | 4 GB |
| category_members | 50 MB | 312 MB |
| redirects | 45 MB | 135 MB |
| freebase_names | 46 MB | 247 MB |
| freebase_wpid | 28 MB | 195 MB |
| freebase_types | 8.3 MB | 128 MB |
† Note that due to the large size of Freebase WEX, each data file within the All files tar archive is compressed individually, so that an individual table may be more easily extracted on systems with limited disk space.
Documentation
See here for complete documentation.
Contact
Questions and comments about Freebase WEX should be directed to the Freebase Developer Email List.
Citing
If you'd like to cite WEX in a publication, you may use:
- Metaweb Technologies, Freebase Wikipedia Extraction (WEX), http://download.freebase.com/wex/, June 23, 2008
Or as BibTeX:
@misc{metaweb:wex,
title = "Freebase Wikipedia Extraction (WEX)",
author = "Metaweb Technologies",
howpublished = "\url{http://download.freebase.com/wex/}",
edition = "June 23, 2008",
year = "2008"
}
Related Work
- DBpedia, "a community effort to extract structured information from Wikipedia and to make this information available on the Web", http://dbpedia.org
- Hugo Zaragoza, Jordi Atserias, Massimiliano Ciaramita and Giuseppe Attardi (Yahoo! Research Barcelona), Semantically Annotated Snapshot of the English Wikipedia, http://www.yr-bcn.es/semanticWikipedia, 2007.

