A mic stands on a stage, ready for #EdFringe

#EdFringe for Language Learners, 2026 Edition

#EdFringe is here again, and true to form, there’s something for everyone. Language learners and European culture vultures are no exception, of course, with some proper treats in this year’s rich programme.

As per last year (and the year before, and the year before that… I’ve been keeping tabs for a while now), there’s a good balance between shows in the target language as well as shows about target language counties. Here’s my watch list – but be sure to have a browse too, and let me know if I’ve missed any must-sees!

French

Music

Le Vent du Nord – Québecois progressive folk from a well-regarded five-piece.

Mary, Queen of Scots – Queen of 3 Kingdoms (Marianne Beate Kielland & Ben-San Lau) – French arias mingle with Scots and English in a programme celebrating the life of Scotland’s famous queen.

Afternoon Arias (Brian Bannatyne-Scott, Beth Taylor and friends) – French classical highlights from Berlioz to Debussy.

La Chatte Chanteuse (Kat Brooks) – It honestly wouldn’t be #EdFringe without a bit of Piaf. There’s that and more in this hour of chanson from Kat Brooks.

Theatre

Madame La Mort (Full Moon Theatre / Labyrinth Productions) – a radical reimagining of Rachilde’s French Symbolist play (in translation)

Comedy

Tori Morancay – Le French C’est Freak – An anglophone set that nonetheless plunges straight into the francophone world!

German

Music

Handel – Nine German Arias (Angela Hicks)  – Soprano Angela Hicks and ensemble present some of Handel’s few German-language works.

JS Bach (Aidan Jones) – Pianist Aidan Jones plays and presents, taking the audience on a winsome tour through the life of the great composer.

Scottish Lieder (Brian Bannatyne-Scott) – A lovely crossover presenting music of Schubert, Schumann, Loewe and Strauss, inspired by Scottish poets.

Comedy

Michael Brunström: William Tell vs the Algorithm – Swiss surrealist comedy (in English) that takes aim squarely at the Swiss background of this award-nominated return performer.

Jürgen Strack: Achtung! The Only Sauerkraut in Town – Riffing on his Germanness (with shades of Henning Wehn here), Strack has won fans for his sheer originality.

Spanish

Music & Dance

Sobremesa – Where Words and Music Meet at the Table (Nus Duo) – Billed as an interactive musical experience where audience and artists shape something wonderful from Spanish and Latin American texts.

Sounds of St Cecilia’s III: Spanish Flavours: Dance, Fire and Elegance (Cokus Duo) – A harpsichord-led exploration of the music of 18th-century Europe, with a nod to both France and Spain.

Flamenkids (TuFlamenco) – Flamenco is as permanent a fixture on Spanish #EdFringe as Piaf for the French. This family-friendly show introduces the rhythms of Spain in an hour-long show that sounds wonderfully interactive.

Alegria Flamenca (Alba Flamenca) – Appetite freshly whetted, if you’re now hooked on the flamenco check out this vibrant show, with a nearby bar on hand for tapas and drinks!

Theatre

Bull / Fight – Fresh Edinburgh ensemble Mythography presents this odyssey through Lorca’s Spain (in English).

Comedy

Escocia con Ñ (Jotace Loaiza) – Scottish life retold through Spanish eyes, in Spanish! An excellent (and rare) opportunity to attend a full Spanish-language set during the festival.

Mi Casa Es Su Casa (El Purnell) – Billed as a true ‘duo-lingo’ act (hope that’s been run past the owl!), laughs are promised for hispanists and non-hispanists alike.

And the rest…

With hundreds and hundreds of shows, I can’t possibly do the whole programme justice in one short blog post. While I’ve focused on French, German and Spanish, there’s plenty else there too, from the very tempting Four Courses of Italian Song (Anna Vanosi) and the intriguing I Can Make You Italian in 55 Minutes (Stefania Licari), to the Scottish and Norwegian sacred music presentation The Maid of Norway (Nordic Voices Norway) and online viral comedy hitster Thor Stenhaug. I’ll certainly be trying to tick some of those off my list this August, too.

Is there something you’ve bookmarked to see without fail? What other languages are represented in the listings? Let us know in the comments!

Bulgarian : The Slavic Outlier That Feels Strangely Familiar

Maybe you’re a polyglot looking for a new, interesting, off-the-beaten-track language to learn. Or maybe you’re a language aficionado who allows the outcome of a popular music competition to dictate their holiday and learning plans for the next year. Whichever category you fall into, Bulgarian is well worth a look.

Bulgaria has a lot going for it in climate and culture – it’s not for nothing that the country has featured on numerous place in the sun type programmes. Stunning landscapes, seaside escapes, vibrant cultural life across four very different urban centres – and a particularly interesting language for serious linguists and dabblers alike.

The Strange Case(lessness) of Bulgarian

Bulgarian is a South Slavic language, putting it in the same branch as languages like Croatian and Slovenian. But Bulgarian – and closely related Macedonian – are grammatical outliers. Unlike their South Slavic siblings – and wider Slavic cousins – they’ve pretty much lost all of their noun cases and endings.

Now that feels really unusual if you’ve ever tried learning other languages in that tree. Across all branches, from Polish to Russian to Croatian, noun case morphology is characteristically complex. Learners run a gauntlet of declensions and endings, one of the chief reasons they’re considered ‘hard’ languages.

Not so with Bulgarian. It’s moved from what we call a heavily synthetic language, relying on complex morphology to express relationships between words, to being much more analytic – using standalone units like prepositions to do that work. A case in point is the phrase in London. In Polish, with its rich case system, Londyn changes to its locative with -ie in the phrase w Londynie. In Bulgarian, London is Лондон wherever you place it, giving us simply в Лондон.

Your Features Sound Familiar…

Another lovely curiosity that marks Bulgarian out is the definite article. Slavic languages famously do without any articles on the whole, so no a, an or the. It’s so typically Slavic that you almost do a double-take when you realise the language has one. Not only that, but it’s attached to the end of the noun – something you might know from Scandilangs, but never expected to crop up south of the Vistula. Cinema, for example, is кино (kino) – the cinema is киното (kinoto).

Albanian also happens to have this postnominal definite article, which brings us on to the next point. Bulgarian forms part of the Balkan Sprachbund, a grouping of Indo-European languages that, while not especially closely related, have come to resemble each other, particularly in syntax, through centuries of contact. For example, if you know any Albanian, Modern Greek, or Romanian, you’ll feel strangely at home with Bulgarian and its lack of infinitives. In Bulgarian, you don’t say I want to sing, because there is no ‘to sing’. Instead, you say I want that I sing (Искам да пея), just as in Modern Greek (Θέλω να τραγουδήσω).

Bulgarian Specialities

That’s not to say that Bulgarian doesn’t have its own unique secrets either, though. What it lost in cases, it makes up for in novel verb paradigms. There’s a very special past tense – the renarrated or evidential form – that is used when you are talking about something you didn’t witness.

The Bulgarian renarrated past is based on its perfect tense paradigm. This looks very much like the past tense in Croatian and similar languages, which use the verb ‘to be’ as an auxiliary. For example, we have той е видял куче (toi e vidyal kuche) – he saw a dog: a simple fact. Drop the е and you have the renarrated той видял куче (toi vidyal kuche)  – he allegedly saw a dog, but I only know this from hearsay!

Although it’s a novel Slavic innovation in Bulgarian, if you know some German, the idea might be familiar. German does a similar thing by using the subjunctive for reported speech. He said he’s ill becomes er hat gesagt, dass er krank sei – the sei rather than indicative ist indicates that these aren’t your experiences or words, but someone else’s.

Bulgarian Learning Resources

If these tidbits whet your appetite for some български, there are a couple of good places to start. It is off the beaten track as a classroom language, so certainly not as well-served by resources as more mainstream choices.

But the gold standard, as ever, is Routledge. Colloquial Bulgarian is a great starter course, balancing a skills-based communicative approach with a good, solid grammar grounding. You can download all of the audio materials for free at the Routledge website too.

Teach Yourself don’t disappoint, either, with the first Bulgarian title appearing in the 1990s. You can pick the original title up second-hand for under a tenner. Since then, it’s also seen a reissue and rebranding as Complete Bulgarian, with the audio available for free via the Teach Yourself web app.

Finally – and I haven’t tried these, personally, yet – are a pair of titles, Intensive Bulgarian 1 and Intensive Bulgarian 2. These come across as good, academic style ab initio coursebooks. I’ll definitely be dipping into these at some point soon.

If you’ve tried these, or any other Bulgarian learning resources, let us know how you found them in the comments! Or maybe catch me in Sofia next year for a chat about them over an облак (oblak) – Bulgarian’s famous drink of mint liqueur and mastica.

Наздраве!

You Don’t Need to Be a Developer to Start Playing with AI Models in Python

I’ve been singing the praises of local models of late, for so many reasons. From intelligent OCR to data crunching with enhanced privacy, there are gains to be had and they’re easy to access with free inferencing software like LM Studio and Ollama.

That said, there’s a moment that happens to a lot of people who work adjacent to tech – linguists, teachers, researchers – where they think: I’d love to tinker with these AI models properly – and maybe even build them directly into my own tech projects.

This post addresses that tinkering itch. The good news: it’s genuinely easier than you think, and you can get something running in an afternoon.

Why Python?

I ask this a lot, myself, coming from a totally different development background (full-stack and native web app coding). Going back into academia, Python seems to be everywhere.

Python has become the de facto language of AI and data science for a reason. Its syntax is readable almost like pseudocode, its libraries are extraordinarily well-developed and vast, and – linked to that last point – calling an API takes a handful of lines, not pages of custom routines. If you’re coming from a research or humanities background, Python also has the advantage of being widely taught in academic contexts, which means the community, tutorials, and Stack Overflow threads are abundant.

Compare calling an LLM in Python to doing the same in JavaScript or Swift, and you’ll understand immediately why the ‘AI for academia’ world standardised on Python.

And a big plus – it’s probably already installed on your machine. Open your terminal / command prompt interface, and type python --version or python3 --version. If you see a version number come back, you’re good to go. If not, head to python.org/downloads and grab the latest stable release – it’s a straightforward installer on every platform.

Two Ways In: Cloud or Local

Option 1: Hugging Face’s Free Inference API (great for experimenting, zero cost)

Hugging Face is essentially the GitHub of AI models – tens of thousands of open-source models, all in one place. The Serverless Inference API lets you call many of them without setting up any infrastructure, and the free tier is perfectly generous for tinkering and learning. You’ll hit rate limits if you go overboard, but for exploration it’s hard to beat.

Here’s what you need to get started:

  1. Create a free account at huggingface.co
  2. Go to Settings → Access Tokens and generate a token with Read permissions
  3. Install the library: pip install huggingface_hub

Then you can call a model like this:

from huggingface_hub import InferenceClient

client = InferenceClient(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct",
    token="hf_your_token_here"
)

response = client.text_generation("Explain enregisterment in simple terms.")
print(response)

That’s genuinely it for a first experiment. A few lines. No GPU. No cloud bill.

One gotcha: some popular models require you to accept their licence terms on the Hugging Face website before you can access them via the API. If you get a 403 error, that’s almost certainly why — head to the model page, accept the terms, and try again.

Option 2: LM Studio (run models locally, completely private)

If you’d rather not send your data to any external service – which matters for research involving sensitive text – LM Studio is still a brilliant solution. It gives you a clean interface to download and run open-source models on your own machine, with no internet connection required once the model is downloaded.

The local model landscape has improved dramatically. Models like Qwen3 (the 4B and 14B variants especially) are genuinely impressive on a modern laptop or desktop with a decent amount of RAM. You wouldn’t have believed this was possible two years ago.

LM Studio exposes a local API that mimics the OpenAI format, so you can call it from Python the same way:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"  # LM Studio doesn't require auth locally
)

response = client.chat.completions.create(
    model="qwen3-14b",  # whatever model you've loaded in LM Studio
    messages=[{"role": "user", "content": "Hello, what can you do?"}]
)

print(response.choices[0].message.content)

The openai library here is just a convenient HTTP client — you’re not actually talking to OpenAI. You’re talking to a model running on your own machine.

Common stumbling block: LM Studio’s server needs to be running and a model needs to be loaded before your script will work. The error message when it’s not running is a bit cryptic (ConnectionRefusedError or similar) — if you see that, it just means you didn’t start the server yet.

Making the Output Actually Readable

Once you’re getting responses back, the next temptation is to do something with them in your terminal – loop through results, display analysis, format comparisons. The default print() approach quickly gets messy.

My namesake, the rich library is a revelation here (how nice to have a Python library named after me). It adds colour, formatting, tables, and syntax highlighting to terminal output with almost no effort:

pip install rich
from rich.console import Console
from rich.markdown import Markdown

console = Console()

response_text = client.text_generation("Write a haiku about Python.")
console.print(Markdown(response_text))

If the model returns markdown (which most do), rich will render it beautifully right in your terminal. Headers, code blocks, bold text — all of it. This is genuinely transformative for readability when you’re doing exploratory work.

Don’t Stop at Chat: Sentence Transformers Are Worth Knowing About

Here’s where it gets interesting for researchers and linguists in particular. Large language models are great for generation — producing text, summarising, answering questions. But there’s a whole other class of model designed for understanding text semantically: sentence transformers.

The Sentence Transformers library (also called sbert) lets you turn text into numerical vectors that capture meaning. Two sentences that mean the same thing will have vectors that are close together; two unrelated sentences won’t. This is called a semantic embedding.

Why does this matter? A few examples:

  • Corpus linguistics for semantics: Automatically cluster dialect examples by semantic similarity rather than just keyword matching
  • Research assistants: Find the most relevant papers or passages from a large collection based on meaning, not just exact words
  • Teaching tools: Build a quiz that detects when a learner’s answer is semantically equivalent to the model answer, even if the wording is different
pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "The dialect features of the Black Country are highly distinctive.",
    "Black Country speech has unique phonological characteristics.",
    "The weather in Edinburgh is famously miserable."
]

embeddings = model.encode(sentences)
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.3f}")  # will be high

This runs entirely locally (the model downloads once and caches), is fast even on a modest laptop, and opens up a whole world of computational approaches to language that go well beyond chatting with an LLM.

Getting Set Up: The Boring-but-Important Bit

Beyond that, there are just a few things I’ve learnt from my initial tinkerings that will save you headaches.

Use a virtual environment. Every time. Before you install anything for a new project, do:

python -m venv venv
source venv/bin/activate  # on Mac/Linux
venv\Scripts\activate     # on Windows

This keeps your project’s dependencies isolated and prevents the infuriating “but it worked yesterday” problem where one project’s libraries silently break another’s.

Keep API secrets out of your code. Don’t paste your Hugging Face token directly into a script you might share or commit to GitHub. Use a .env file and the python-dotenv library:

pip install python-dotenv
# .env file (this file stays off GitHub — add it to .gitignore)
HF_TOKEN=hf_your_token_here
# your script
from dotenv import load_dotenv
import os

load_dotenv()
token = os.getenv("HF_TOKEN")

Read error messages. This sounds obvious, but: most Python errors from LLM libraries tell you exactly what went wrong. A 401 means authentication failed (wrong or missing token). A 503 means the model is loading on the server side – wait a moment and retry. A ConnectionRefusedError from a local API almost always means LM Studio’s server isn’t running.

What Next?

Once you’ve got a basic script running, the natural next steps are:

  • Build a simple chat loop that keeps track of conversation history and lets you have a back-and-forth with a model
  • Experiment with system prompts to give the model a persona or set of instructions
  • Try different models on the same prompts and compare the results – it’s illuminating
  • Start combining LLMs with sentence transformers for retrieval-augmented approaches where you search a corpus semantically before feeding results to a generative model

The Python AI ecosystem is genuinely exciting right now, and the barrier to entry has never been lower. You don’t need a GPU, you don’t need a cloud account, and you don’t need to be a professional developer. You just need an afternoon and a bit of curiosity.

Have questions or want to share what you built? Drop a comment below.

OCR for Historical Newsprint: Four Models Worth Running Locally in LM Studio

If you work with scanned, typeset documents from archives like the British Newspaper Archive, you will know the frustration of running standard OCR tools on material they were never really designed for: degraded print, Victorian column layouts, eccentric typography, and occasionally deliberate non-standard spelling.

You can leverage the power of local AI models, however, to automate this process, and with free inferencing software like LM Studio, the learning curve isn’t at all steep. Below, I take a look at four specialist OCR models you can run entirely locally using the package – and why you might prefer doing so over handing your documents to a web service.

Why Run OCR Locally?

There are some truly excellent web-based OCR services. There’s Transkribus, for instance, which is widely used in the academic community. Tool like this are powerful and convenient, but they come with some real trade-offs:

  • Privacy: Your document images leave your machine and are processed on someone else’s server. For sensitive archival material or unpublished research corpora, that matters.
  • Cost at scale: Processing hundreds or thousands of newspaper pages through a paid API adds up quickly.
  • No customisation: Cloud OCR engines don’t always offer many pipeline options. You cannot instruct them to preserve dialect spellings, flag ambiguous characters, or respect the orthographic conventions of a specific historical variety of English.
  • Reproducibility: Web services update their models silently. A corpus processed in 2024 may produce different output if you re-run it in 2026. A local model stays consistent – important for methodological reproducibility.

Running OCR-trained models in an inferencing software like LM Studio removes most of this friction. The program handles multiple model download and management through a clean interface, and also allows you to customise model settings, up to the inclusion of system prompts that persist across sessions. For historical document work, that means you can instruct the model once about the linguistic conventions of your material and have it apply those rules to every page you send it.

The Four Models

1. OLMOCR 2 (7B) — Best Overall for Documents

Developed by the Allen Institute for AI (Ai2), olmOCR 2 is built on Qwen2.5-VL-7B-Instruct and fine-tuned using reinforcement learning with unit-test rewards specifically targeting document OCR tasks. It is one of the few models designed from the ground up for this use case rather than adapted from a general vision assistant.

Size: 7 billion parameters. Available as a ~4.7 GB GGUF (Q4 quantisation) or ~8.85 GB at Q8. Needs around 5–10 GB RAM depending on quantisation.

Why it works for newspaper archives: Handles multi-column layouts, mixed content (tables, headings, body text), and degraded print reliably. Scores 82.4 on olmOCR-Bench. It responds well to system prompt instructions, making it a strong candidate for dialect-preservation workflows.

LM Studio: There’s a GGUF in the native catalogue – search and download directly in the app.
🔗 lmstudio.ai/models/allenai/olmocr-2-7b-1025

✅ Pros: Best-in-class document OCR accuracy; strong layout understanding; instruction-following is reliable; native LM Studio support.
❌ Cons: 7B means slower inference on modest hardware; not ideal for rapid bulk processing.

2. NANONETS-OCR-S — Clean Catalogue Option

Developed by Nanonets, a document AI company, this model is also based on the Qwen2.5-VL architecture but fine-tuned specifically on structured document extraction tasks including forms, invoices, and archival print.

Size: Approximately 7B parameters, similar footprint to olmOCR 2. Available directly via the LM Studio model catalogue as a GGUF.

Why it works for newspaper archives: Strong on structured layout extraction and clean Markdown output. Useful when you want transcription that preserves document structure (headings, columns, captions) as well as raw text.

LM Studio: Native catalogue – findable by searching “Nanonets” in the model browser.
🔗 lmstudio.ai/models (search: Nanonets-OCR-s)

✅ Pros: Easy one-click setup; good structural output; reliable on clean and moderately degraded scans.
❌ Cons: Less tested on heavily damaged historical material than olmOCR 2; similar hardware demands.

3. DOTS.OCR (1.7B) — Best for Complex Column Layouts

Released by Rednote (小红书) in late 2025, dots.ocr is a compact 1.7B vision-language model that combines layout detection and text recognition in a single pass. Unusually for its size, it explicitly predicts reading order — the sequence in which text blocks should be read — which is critical for Victorian newspaper pages where columns can be irregular and text wraps around illustrations.

Size: 1.7 billion parameters; approximately 2 GB as a GGUF. Runs comfortably on 3 GB VRAM.

Why it works for newspaper archives: Reading order prediction alone makes it worth considering for multi-column broadsheet layouts. Supports over 100 languages, outputs JSON, Markdown, or HTML, and benchmarks show Table TEDS accuracy of 88.6% — ahead of Gemini 2.5 Pro on that metric.

LM Studio: Load via HuggingFace GGUF import (paste the HuggingFace URL into LM Studio’s search bar).
🔗 huggingface.co/dotsdocx/dots.ocr-1.7B-GGUF

✅ Pros: Tiny footprint; reading order detection; fast; strong on multi-column layouts; multilingual.
❌ Cons: Smaller context window means system prompts may drift on very long sessions; can hallucinate on heavily degraded scans; not in the native LM Studio catalogue.

4. GLM-OCR (0.9B) — Best for Bulk Processing on Modest Hardware

Released by Z.ai (Zhipu AI) in early 2026, GLM-OCR is built on the GLM-V encoder–decoder architecture and fine-tuned exclusively for OCR. At under 1 billion parameters it is the smallest model here, yet it scores 94.0 on OCRBench and 93.96% Table TEDS accuracy – results that comfortably outperform much larger general-purpose models.

Size: 0.9 billion parameters; approximately 1 GB quantised (Q8). Needs under 1.5 GB VRAM – it will run on almost any laptop made in the last five years.

Why it works for newspaper archives: Speed and low resource use make it ideal for processing large batches of pages. It is not a chat model — it takes an image and outputs text, triggered by the phrase Text Recognition: — so it is best suited to pure transcription pipelines rather than interactive use.

LM Studio: Load via HuggingFace GGUF import using the ggml-org GGUF repository.
🔗 huggingface.co/ggml-org/GLM-OCR-GGUF

✅ Pros: Tiny; fast; runs on minimal hardware; excellent accuracy for its size; good for bulk workflows.
❌ Cons: Not a chat/instruction model — no system prompt support for dialect customisation; requires a separate layout detection step for complex multi-column pages; not in the native LM Studio catalogue.

Quick Comparison

Model Size (GGUF) VRAM LM Studio Route Best For
olmOCR 2 (7B) ~4.7 GB 5 GB+ Native catalogue Best accuracy, complex layouts, dialect workflows
Nanonets-OCR-s ~4.7 GB 5 GB+ Native catalogue Structured document extraction, clean output
dots.ocr (1.7B) ~2 GB 3 GB HuggingFace GGUF import Multi-column layouts, reading order, low VRAM
GLM-OCR (0.9B) ~1 GB <1.5 GB HuggingFace GGUF import Bulk processing, minimal hardware

A Practical Workflow for Newspaper Archives

For a large corpus like material from the British Newspaper Archive, a two-tier approach works well. Use GLM-OCR for the bulk of clean, well-preserved pages – it is fast and accurate enough for standard 20th-century newsprint. Then escalate difficult pages (damaged, illegible columns, unusual typefaces, pre-1880 material) to olmOCR 2 for a more careful second pass. If column order is scrambling your output, switch to dots.ocr for those pages specifically.

For dialect writing research – where you need the transcription to preserve non-standard spellings rather than silently normalise them – load olmOCR 2 or Nanonets-OCR-s and write a system prompt that explicitly instructs the model to treat all orthographic choices as intentional. That single step does something no traditional OCR engine is capable of: it makes the tool linguistically aware of your material.

All four models run fully offline once downloaded. No subscription, no API key, no usage limits — just your hardware and your documents.

The GLM-OCR model running in LM Studio, transcribing a 19th-century newspaper article

The GLM-OCR model running in LM Studio, transcribing a 19th-century newspaper article

Eurovision 2026: Languages, Lyrics and Understanding

It’s something of a cheat week on Polyglossic this week. I’d normally write a blog on language diversity in the upcoming Eurovision Song Contest around this time. Yes, it’s that time of year again! But the boss asked first, so I’ll point you in the direction of the ESC article I penned for Linguascope this week. There’s a round-up of the songs – from a classroom teacher point of view, of course (not that the glam has no part to play), along with some fun game ideas.

As I’ve said there, it’s no secret that the contest has had a tough ride lately, falling into a highly contested, politically charged space. And that’s to be expected, as a living, breathing event out there in a very complex world. There are simply no easy answers that reconcile the many hues of opinion in the fan community. The most we can do is respect others’ points of view and their right to express them.

For my part, I’m on the side that believes there’s something worth fighting for there. Eurovision is a unique event that has celebrated different, not mainstream, quirky – whatever you want to call it. For kids (and adults) who feel some or all of those things, that’s always been a very special place to escape to.

Superfan Matti Bunzl summed it up well to Austria’s Profil magazine this week. He’s the director of the Wien Museum, host site for this year’s Eurofan House by wiwibloggs. Matti explains:

Warum soll der Song Contest von solchen Dynamiken ausgeschlossen sein? Natürlich kann man wünschen, dass sich die Welt für ein paar Stunden auf völkerverständigenden Eskapismus einigt, aber man kann die Menschen ja nicht zwingen.

(Why should the song contest be excluded from such dynamics? Of course you can wish for the world to unite for a few hours of international, understanding escapism, but you can’t force people.)

Profil, Saturday May 2nd 2026

If you are following the shows next week, then have a wonderful Eurovision. May the best song win, and long live healthy argument about whether it really was the best song!

Forever the Optimist : Hugo’s In Three Months Language Courses

I loved Hugo’s In Three Months books as a language-obsessed teenager. Slim volumes, tiny, dense chapters… And of course, that promise that you’d reach some level of fluency in a quarter of a year.

I’ve collected a fair few of them over the years. My first was Italian in Three Months, of Hugo’s late-80s dark blue incarnation. I bought it ahead of a school wind band trip to Venice, and probably made my way through about a third of it before the trip was over, I packed my tuba away, and faddishly drifted onto the next project.

They’ve had a much longer history than that. Hugo’s Language Learning Books pops up in 1950s UK, and quickly starts churning out titles for speedy learners. Like other language series in the mid-20th Century, they expanded their Simplified System rapidly across languages, and became bookshop staples.

Expansion, incorporation and multiple reincarnations

By the end of the 80s, they’d already moved well beyond French, German and Spanish. The second-hand trail on eBay shows that spread, with the publication of courses from Arabic to Scottish Gaelic (still a solid reference for Gaelic grammar if you can get a copy). Curiously, the first foray into Japanese dropped the ‘in three months’, instead going for Japanese Simplified – the confidence wobble didn’t last long, as Japanese in Three Months is the title that made it into the 90s!

The In Three Months books have never really gone away. The series was picked up by Dorling Kindersley (DK), gaining a splash of the prototypical DK colour and gloss. While the range of languages is a little shrunken now, they’re still going strong, now as part of Penguin’s catalogue. Gone are the boxes of cassettes, replaced by online audio. But the familiar format remains: tight, reference-style chapters giving that “all the essentials for very busy people” vibe.

It’s in that spirit I picked up one of the new editions recently – appropriately, Italian in Three Months again, this time in its fancy new green sleeve. It’s a refresher ahead of a trip to Milan for a conference, and the perfect choice for that – not too heavy (for the suitcase or the reading).

But coming full circle like this takes me right back, and I can sense the excitement I felt in that titular promise all those years ago. Somehow, that promise still works. Long live the In Three Months series – may they continue to lure language nerds to their next obsession!

Michel Thomas on Tap : Language Courses for Spotify Premium Users

I’ve always been a bit baffled by how un-trumpeted the Michel Thomas courses are. For sure, they do pretty well – they’ve been around in some form or other since the 80s, and have made the transition from cassettes, to CDs, to digital. But I rarely see them as a first-choice recommendation in polyglot circles.

Which is a shame, because there’s something quite uniquely effective about the approach of Thomas and educators like them (Paul Noble has carried that torch admirably well, too). They use what you might call a ‘modified Socratic’ approach to language tuition. Each course follows a teacher-student conversational format that builds language knowledge with gradual exposure and prompting. As the listener, you are the third person in the room, answering and learning along.

If you were seeking a fancier term, then perhaps structured elicitation is the one. And it works. Especially for getting into a new language – I’ve used them as intros to several languages, and the stuff really sticks. Perhaps its usefulness tapers off at more advanced levels, but the format is such a great stepping stone.

Michel Thomas on Tap

In any case, I was browsing Spotify for podcasts and audiobooks the other day. As a premium subscriber you get free access to a certain amount of premium content a month, and as I’ve recently ditched Audible (part of my ecosystem economy drive), I was interested in what was available. And there I spotted them – the entire catalogue of Michel Thomas courses (and Paul Noble!).

That’s quite a cache of premium language learning content. Years back, I paid a small fortune for those CD packs. And they’re all there, from the foundation, to the intermediate, and even the vocabulary builder courses, with plenty more titles added in the meantime. Needless to say, I’ve added the Hindi one to my current playlist as I fancy a dabble. If you’re up for the same, you’ll be surprised how much is available!

While I’m at it, let me give LanguageTransfer a big shout out, too. It uses the same techniques as the courses above, but is a personal labour of love for creator Michalis Eleftheriou, and completely free! His Greek course in particular is a resource I’d recommend to anyone to start learning that language.

Gramophone Language Courses: The Original Multimedia Learning

If you’ve ever wondered about the origins of the multimedia language course, then some newly published archive material might surprise and intrigue you. The British Newspaper Archive recently added the early 20th-century title Sound Wave magazine to its growing catalogue. This record review title served phonographic fans from 1907 to 1933, and it’s surprisingly full of language learning history.

In those days, of course, it was the gramophone that reigned supreme. Recordings on the new flat disc format had been around since the late 1880s, but by the first decade of the 20th century, gramophones had become affordable enough for middle-class households. Sound Wave dates from that early tech spread, the publishers no doubt spotting a gap in the market for listening recommendations.

Only it wasn’t just music. What we’d now recognise as audiobooks was already in circulation – outfits like The Talking-Book Corporation were pumping out gramophone literature for adults and kids. There were elocution resources for improving one’s spoken English, too. One particularly enticing release was this special set of discs with the voice of Bernard Shaw himself (life imitating art – his own art!).

And language learning was there from the start too, in the form of regular ads from the Linguaphone Institute.

A 1907 advert from the Linguaphone Institute in the magazine Sound Wave

A 1907 advert from the Linguaphone Institute in the magazine Sound Wave

Linguaphone – a brand built on gramophone

Linguaphone is a real heritage brand for language learners, and pops up all over the newspaper archives. It started up in 1901, and is still going today – you may have come across their language training centres. Competition may have widened since then, but for over half a century they were the first word in audio course materials.

Testimonials in this 1927 edition vouch for their success. One C.B. of London reports that the Spanish course made travels “much easier and cheaper than they would otherwise have been”. A reviewer in 1929 praised the “French as it is really spoken” in a dialogue set in a hairdresser’s, on record no. 21 of that set. Yes, record 21 – these sets could run into dozens of discs, and usually shipped in a hefty, solid case.

No wonder they came with an equally solid price tag. In 1907, a box would set you back £3 and three shillings, easily several hundred pounds in today’s money. You can still pick them up second-hand today, and for much less – a lovely bit of language learning history.

Proto Language Lab

Beyond the well-heeled turntable owner, the gramophonic method wasn’t just for individuals; it was used in classrooms too. In 1914, a Leicester teacher, Mr. Cunfliffe, introduced records into his lessons at the Working Men’s College, to great success. One particularly modern-seeming innovation of Mr. Cunliffe’s was the provision of “24 pairs of hearing tubes” for the students! In this way, one element of language teaching that seems so late 20th-century, so proto language lab, had its roots decades before tape reels and CDs.

The BNA‘s inclusion of Sound Wave offers some lovely insights into the history of language learning and teaching. There’s doubtless much more to find in there. Let me know in the comments if you come across any other gems!

Screenscot of Cell to Singularity, an immersive casual clicker game available on Steam.

Cell to Singularity : Casual Play for TL Immersion

Osmosis isn’t just for cells – it’s for language learners too! Soaking up target language simply by placing it in your everyday line of sight is one of the most effective strategies for fluency. From your instagram feed to cosy telly-watching, consolidation can be about throwing more of the things you love in your way.

Gaming is another entz stream that is really easy to target language-ify, since many titles have multiple language options. The Steam platform is a particular goldmine here – a huge multi-platform marketplace, with loads of free-to-play offerings. The trick is to find quite text-heavy games with dialogue and interactions, exposing you to as much content as possible in-play. There’s honestly something for everyone here, from word games to fully-fledged RPG.

This week, I chanced across a casual clicker on Steam that has been working its quiet way into the hearts of users since its inception in 2018. It’s Cell to Singularity, a game that simulates the blossoming of life on Earth, from eukaryotes, to jellyfish, to humans (and beyond). It’s the kind of game you can have running inconspicuously in the background while you work, slowly developing and growing like a bonsai that needs occasional tending. Very Zen.

Screenscot of Cell to Singularity, an immersive casual clicker game available on Steam.

As you can see from the screenshot, it’s also a great way to revise the building blocks of life. That’s the root educational application the game has been feted for, covering evolutionary biology in a fun, laddered way. Switching my interface to German gives me a ton of fun natural world vocab.

Beyond word level

But the game is also full of conversational exchanges you have with the ‘supercomputer’ running your life simulation, as well as Wikipedia-style descriptions of all your finds. In short, it supports word, sentence and text-level language skills in a rich, engaging environment. What more could you ask for?

Screenshot from Cell to Singularity showing dinosaurs

The range of languages available right now is already impressive. Not only the ‘mainstream’ school ones, but also Korean, Japanese, Polish and Portuguese, amongst others.

Screenscot of the language options in Cell to Singularity, an immersive casual clicker game available on Steam.

Cell to Singularity currently has an 89% positive rating from thousands of Steam users. I wonder how many of them are playing to improve their target language? Hopefully I’ve enticed a few more of you to do just that!

Escaping the Ecosystem : AI Edition

We live in such unexpected, shifting, fracturing geopolitical times just now. A stability taken for granted for decades no longer seems a given. So much so, that many have begun to question the global tech ecosystem we are embedded in, considering the safety of our data and workflows, and seeking less exposed, closer-to-home alternatives.

It’s something we can explore without straying into conspiracy territory, and it goes beyond data security. Tech writer Cory Doctorow has written at length on the downsides to walled garden platforms that make leaving costs high while degrading (or enshittifying – Macquarie Dictionary’s 2024 word of the year) their services. Linguaphiles should know – our own beloved Duo is one of them. It’s a compelling argument, and one that national consumer protection agencies are starting to incorporate into policy. The notion that we can take meaningful steps to decouple from tech monopolies is beginning to take hold.

Ecosystem creep : AI

This leads us to AI firms – arguably the fastest growing of tech behemoths, whose services nonetheless are working their way into many of our workflows. It’s not all doom and gloom here, though; Anthropic in particular has emerged as one US company willing to stand up for an ethical stance in the field.

That said, most European LLM traffic still goes down that American route, collecting on servers users’ states have no jurisdiction over. Users come to rely more and more on these services for key elements of their day-to-day, although have little control over their place in that ecosystem.

So what to do? LLMs are incredibly useful tools for a number of creative applications. For language teachers, they are particularly good at creating authentic-sounding materials for worksheets. In fact, I’ve often argued that LLMs are a tech almost tailor-made for language learning and teaching – in few other fields is the language structure more important than the actual content! They’re genuinely brilliant at creating copy, often highly nuanced, for learning.

AI Swaps

Well, one quick and easy swap is Le Chat by French AI company Mistral. It’s a ‘full fat’ LLM on a par with the big US names, running your prompts remotely on a multi-billion parameter model. Not so remote, though – their server activity remains within EU jurisdiction.

Then, of course, there is the ‘peak privacy’ option – running your own LLM. That’s a lot easier than it sounds, thanks to easy-setup software like LM Studio or Ollama (both US-based projects, but run locally on your own machine). Install, download a model, and prompt away. While few (to no) people will have the hardware to run full-sized LLMs, small models are getting better and better, rivalling the biggies for everyday use.

Google’s Gemma 4 is a case in point, a new small model (you can get a sub-20gb version) achieving some really impressive benchmark scores. Multi-language support is one of its strengths, and believe me, it does more than a good enough job of worksheet authoring and lesson planning. And it comes with an extra ‘externalities’ bonus, too – the only energy it’s using it your laptop battery, rather than spinning up some red-hot servers on a remote farm somewhere.

That has to be a win-win – using open source releases from the industry leads, without getting trapped inside the matrix.

We may have little control over geopolitics. But there are always choices when it comes to our exposure to it in the tech we use. I’m working on a list of these swaps as part of my own digital hygiene plan, and hope to share much more of this in coming weeks!