Article written by :: (RSS)

tigtog (aka Viv) is the founder of this blog. She lives in Sydney, Australia: husband, 2 kids, cat, house, garden, just enough wine-racks and (sigh) far too few bookshelves.

This author has written 3448 posts for Hoyden About Town. Read more about tigtog »

10 responses to “Asylum Seeker Fact Sheet and Myth Buster”

  1. Cristy

    I’ve always loved that cartoon!

  2. tigtog

    It’s a corker, isn’t it?

  3. tigtog
  4. Chris

    One of the guys I work with has for a while refused to believe #1 isn’t true because the ALP promised they would end it.

    The fact sheet is in PDF format, making it un-indexable by search engines and inaccessible to some readers with disabilities, so I have transcribed it below

    btw there’s no problems with search engines indexing PDFs as long as they don’t just contain images (and this one doesn’t, it has the text in it).

  5. tigtog

    btw there’s no problems with search engines indexing PDFs as long as they don’t just contain images

    Is that a relatively new thing, or have I been misinformed all along regarding PDFs?

    ETA: I certainly noticed relatively recently that it is now possible to select text and cut and paste it from some PDFs where I was previously unable to. I can see how if I can now do that then search engines ought to also be able to index the text.

  6. Mary

    Extracting text from (non-image) PDFs is a bit of an art: the format is designed to place shapes (including letters) nicely on pages and tends to specify the text in terms of where it is placed on a page (ie, presentation markup) rather than in terms of how it relates to other text (semantic markup).

    But precisely because there’s so much good information locked up in them (in particular, in academic publications) there’s been a lot of work put into PDF-to-text conversion. The fact sheet here is probably pretty trivial: the properties said it was produced by Microsoft Word, which probably does lay it out in more or less the correct letter order. Open source state-of-the-art still has trouble with two column text and similar, it tends to be laid out line-by-line, regardless of the fact that in two-column documents, lines will contain text from two entirely different columns!

  7. lilacsigil

    Thank you for posting this useful list – it’s good to have somewhere to direct the poorly informed and aggressive!

  8. Mary

    The short version is: if you can copy-and-paste out of it and get fairly sensible results, it’s fairly indexable too. (I don’t know about being accessible to screen-readers so much.)

  9. Helen

    Rob Corr’s infogram is just full of WIN.

  10. Chris

    Is that a relatively new thing, or have I been misinformed all along regarding PDFs?

    Its been around on Google for at least 3-4 years. I don’t know when it started. But searching on academic research topics I often end up with links into PDFs.

    Re: cut and paste – it depends on your pdf viewer. Also some are pdf viewers modal and you can switch between a mode where you can select text and one where you can’t so it can get a bit confusing.

    Some convert-to-pdf generation programs used to just generate images rather than a “real” pdf because it was easy. Which is what probably started the misleading belief that pdfs aren’t indexable.

The commenting period has expired for this post. If you wish to re-open the discussion, please do so in the latest Open Thread.