Tuesday, 21 February 2012

Extracting teletext from VHS tapes

I'm currently experimenting with Alistair Buxton's vhs-teletext software, which applies a deconvolution to the VBI as recorded on domestic VCRs and can actually recover most of the data.  Almost perfect versions of pages can be retrieved by stacking the data to weed out bad data.

It runs in Linux, which is foreign territory for most of us, but it's worth persevering with it.  You need a Hauppauge WinTV card  to connect a VCR into to dump VBI lines to disk.  You then run scripts on the data, which processes it as descibed above.

I've got a grab of ITV1 Teletext from 2005 being processed at the moment.  15 minutes comes out as 25,000 VBI files, which will take the best part of a week to run on a virtual machine on my dual-core Windows 7 PC.  Looking good so far, depending on how many versions of the page have been retrieved, and I'm about half way through.  I'll post the results here.

In the mean time, here's a screenshot of the output generated:

2003 BBC1 Ceefax page - certainly readable.  The first number is the magazine, the second the row/packet number.  Packet 0 is the header which will tell you what page it is, and packet 27 contains Fastext data.  The next set of four digits is the timecode, nowadays used to store the sub-page numbers.  Not sure what the last digit is!

