Yunqa • The Delphi Inspiration

Delphi Components and Applications

User Tools

Site Tools


wiki:wikitaxi:index

WikiTaxi: Wiki

Please register and / or log in to edit. Anonymous Wiki edits are disabled to protect against vandalism.

Frequently Asked Questions

WikiTaxi_Importer.exe stops and shows "Error: XML Parser Error -1"

There are at least two reasons why this can happen:

1. Wrong input file. Wikipedia offers many different xml.bz2 dump files with different contents. WikiTaxi needs the correct one, and only one of them. The file name must follow this pattern: wikiname-yyyymmdd–pages-articles.xml.bz2 where

  • wikiname is the name of the Wiki, i.e. enwiki, dewiki, simplewiki.
  • yyyymmdd is the export date, like 20200720.

Examples:

  • simplewiki-20200701-pages-articles.xml.bz2
  • enwiki-20200701-pages-articles.xml.bz2

This file is a single file only, not multiple files. Depending on the Wiki, it can be many gigabytes in size.

2. WikiTaxi has not been updated for a long time. It is possible that Wikipedia has since changed the file format of the dump file so WikiTaxi cannot read it anymore.

The same is true for the Wikipedia syntax. There are new elements which WikiTaxi cannot handle and instead shows them as plain text.

Thanks, my previous file is multistream version, and the correct file can be imported easily.

What are the WikiTaxi command line parameters?

WikiTaxi.exe currently takes two command line arguments:

WikiTaxi.exe <Database File> <Page Name>

Please note that the page name must match exactly (including case!). If it contains white space, it must be wrapped in quotation marks. This example will start WikiTaxi, open the English.taxi database and display the page on “Albert Einstein”:

WikiTaxi.exe English.taxi "Albert Einstein"

Tutorial: How to create a Windows shortcut to open a database automatically?

Batch Importing

Currently WikiTaxi Importer does not support command line arguments and you must pre-compress your XML files with bzip2. You can find a WikiTaxi: Batch Import written in AutoIT to save time compressing/importing.

Feature Suggestions

Please check this list before suggesting new features!

WikiTaxi for other operating systems: Linux & Max OS X, Symbian

WikiTaxi is Win32 only. There are reports on the Internet that WikiTaxi runs well on Linux using the Wine compatability layer. Max OS X or Symbian are unsupported.
Yes, just tested. WikiTaxi works great (Import and all!). In Ubuntu install package with Synaptic Package Mgr. Then Wine appears on the bottom of the Applications Menu. Run “Browse C:\” Double Click on WikiTaxi Importer, select file and enter output database name. Click Execute. Close after finish. Go back to Wine C:\ browser, double click on WikiTaxi. Scroll down the opening page 1 screen and click on hyperlink “Open File” browse to the just created Wiki Database from Importer and you are in. Enter search term in top line. Thank you so much for making WikiTaxi! M

Full-text search

Full text article search requires considerable drive space to store the inverted index. Since the database size is already quite large for some Wikis, there is no article search right now. However, this might change in a future version.

Some form of image support

Taking Wikipedia images offline is a question of size: All images make sum up to more than 400 GB in total! I am not yet sure if this is the right thing to do for WikiTaxi.

If it's mostly a matter of size and not coding to display images, perhaps a slider to choose maximum individual image size in bytes or maximum total byte size of all images; both of which would sort images from smallest image size in bytes to largest, and add images until limit is reached, or all images with a maximum per-image byte size, and deresolve/decimate larger images until they meet that byte size.

Category pages working as list of articles in that Category (the same way as at Wiki)

Category listings add to the database size. Import time will also take longer because an extra database scan is required to extract categories. This is planned for a future version.

Create wiki backups for a specified page

During WikiTaxi design I kept in mind downloading or updating selected Wiki pages from MediaWiki servers using the Special:Export extension, but this has not been implemented yet.

I'd like to see WikiTaxi_Importer.exe work on the command line (and possibly a Linux-statically compiled version) so that I can automate the translation process

Noted, but this is low priority only.

Support for the ParserFunctions extension

WikiTaxi supports most parser variables, magic words, and parser functions.

Support for the importer to work with the uncompressed XML file

WikiMedia provides these as BZ2 but some wikis (like those on Wikia and possibly others) provide it as GZ instead. It'd be great if you could just uncompress the XML and use it, rather than having to uncompress, then recompress as BZ2.

Save pages to HTML files

Bonus if we can dump the whole contents of the .taxi out as a directory of HTML files for use on obscure platforms.

Wikipedia already provides this service: http://static.wikipedia.org/ But the expanded HTML is much larger than the compressed WikiTaxi database.
But Wikia and many other Wikis do not, but do provide the XML. This would also allow cut & paste without need to make changes to the built-in HTML viewer.

Copy to the clipboard from the HTML view

I'm sure there's a good reason we can't do this at the moment, but I can't imagine what it is.

Clipboard copying is currently not available for the build-in HTML viewer.

Then how about copying from Wiki Source view ?

Support for Right-to-Left layout for Hebrew and Arabic

Bi-directional text layout is unfortunately unsupported by the build-in HTML viewer.

Ability to bookmark pages for easy access next time

Noted, but focus right now is on improving the MediaWiki parser.

A triple-click to select the whole search line

Possibility to work with tabs (maybe even fixed number of tabs) in order to have more than one article open at once (this is maybe simple than the comment below)

Possibility to open links in tabs with the right mouse button

This will be low-priority. Focus is on parser improvements.

Print from HTML view and/or Wiki Source view

Split into and use under-4GiB FAT32-friendly set of .taxi files

Being able to have a set of .taxi files, each under the 4GiB FAT32 filesize limit, would be helpful – specifically to put it on a non-NTFS flash drive.

Namespace-aliases

Especially WP: as a short form for Wikipedia: doesn't work.

The common namespace aliases are implemented as of WikiTaxi 1.2.0. Ralf

Add links to the Headlines (like the edit links in Wikipedia) to jump directly to that section in the source code

WikiTaxi Internals

I'd like to see the WikiTaxi being free software (GPL, BSD, etc.)

WikiTaxi is free software already. If you are interested in the sources, please let me know how you would like to contribute to the WikiTaxi development: mail@wikitaxi.org

Source code of the Wiki-to-HTML converter?

Is it possible to have the source code of the Wiki-to-HTML converter that you have used in WikiTaxi?

The WikiTaxi source code is not freely available. If you are interested, please e-mail to mail@wikitaxi.org about how you would like to contribute to the WikiTaxi development.

Format of WikiTaxi's *.taxi database?

Is it possible to have the format of your database so that your data may be used for processing in other platforms?

Third party applications or platforms would likely be better off importing the *.xml dump into their specific data structure. If you have a particular idea, you are welcome to share your thoughts at mail@wikitaxi.org.

Bug Reports

You may choose this Wiki section to report bugs, or you may report them via e-mail to mail@wikitaxi.org. In any case, it is not necessary to report them twice.

The Wiki is good if it helps other people to know about a particular problem and especially if you can suggest a workaround. E-Mail is better for intricate problems which might ask for more details to reproduce. Thank you!

'Red'-hyperlinks (hyperlinks to lemmas that have not been written, i.e. hyperlinks it nowere) in Wikipedia are converted to 'blue'-hyperlinks by WikiTaxi. I would suggest the conversion of the red links to plain text.

Not implemented right now for speed. Ralf

Does not Import

When I try to Import a database in the importer the text box just stays blank. So I can't get it to work.

This bug report does not provide any information to reproduce your problem. Please elaborate.

Note: I suspect that your problems result from corrupted *.xml.bz2 file downloads. Please verify their MD5 checksums against those listed on the Wikipedia download page.

Spanish Wikipedia Problem

First of all, I would like to thank you for doing this program. It's awesome. I was always looking for a program that could do the same very functions WikiTaxi does. I'm glad I found you :)

I've downloaded different dated bz2 files, converted them to WikiTaxi Format, and there's a problem, a 40% of the articles appear mixed with other articles. There's no missing info. They just appear with content from other random articles. I hope you can help me. I mean, At home I have the bz2 file of a backup from 2008, and it works great. The issue is in the last months version.

This is not a WikiTaxi problem but caused by wrong templates' contents like in eswiki-20090825-pages-articles.xml.bz2.

Take the article on “Ellen Muth”, for example. It includes the template “Plantilla:Edad”. To see its bad contents, extract the above *.bz2 file to eswiki-20090825-pages-articles.xml and open it with a hex editor. Then search for “Plantilla:Edad” or go directly to hex offset 66B7A479. The <text> there is clearly not the “Plantilla:Edad” template. Therefore I suppose some export problems or a bug in the export generator which was “MediaWiki 1.16alpha-wmf”.
2010-02-22: I received an e-mail today which confirms that this problem is indeed caused by a Spanish Wikipedia export bug. The full report is available here: https://bugzilla.wikimedia.org/show_bug.cgi?id=18694.

Unfortunately, no date has yet been schedule for a fix. For the time being, the recommended workaround is to download the pages-meta-current.xml.bz2 file instead of the usual pages-articles.xml.bz2 dump. Ralf

Wikibooks problem

On the Subject pages (i.e when you type in a subject not a book name), it displays “<dynamicpagelist> category=Linux category=Completed books namespace=main suppresserrors=true shownamespace=false ordermethod=categorysortkey order=ascending </dynamicpagelist>” instead of a link to a specific book. Is there a way to fix this?

The <dynamicpagelist> MediaWiki extension has recently become increasingly popular with WikiBooks. Unfortunately, fixing this would require updating WikiTaxi. I am not sure when I will have the time to do so, but I would appreciate if you could e-mail to mail@wikitaxi.org which WikiBook you are using (the imported file) and which page exactly shows the problem.
wiki/wikitaxi/index.txt · Last modified: 2020/08/28 12:51 by 127.0.0.1