Wichtige Info

Die Inhalte, die du hier siehst stelle ich dir ohne Werbeanzeigen und ohne Tracking deiner Daten zur Verfügung. Trotzdem muss ich die Server bezahlen sowie Zeit in Recherche, Umsetzung sowie Mail Support stecken.
Um dies leisten zu können, verlinke ich in einigen Artikeln auf die Plattform Amazon. Alle diese Links nennen sich Afiliate Links. Wenn du dir mit diesem Link etwas kaufst, dann erhalte ich eine kleine Provision. Dies ändert jedoch NICHT den Preis, den du bezahlst!
Falls du mich also unterstützen möchtest, kannst du auf den Link zum Produkt klicken und hilfst mir dabei, dieses Hobby weiter zu betreiben.
Da ich Keine Werbung schalte und keine Spenden sammle, ist dies die einzige Möglichkeit, meine Systeme und mich zu finanzieren. Ich hoffe du kannst das verstehen :)



Use Python and LibreTranslate for multilingual content


Introduction

This post gives a small insight into how this blog and the work behind it works.

In addition to my main job, I hardly have time to carry out projects, make social things and then also maintain a blog as regularly as possible. That's why I'm trying to automate some things. For a long time, I have been playing with the idea of making my content available in other languages, not only to increase the range a little, but also to have the opportunity to get more input.

For this reason, I show in this article how my texts are automatically translated into other languages in the background.

In a future article, I will continue to show how I have my texts automatically corrected for errors, as the corrective reading is time-consuming and just as an author of a text will quickly overlook small errors.

Main part

First steps

As so often as you need code, I'll bet on Python. I can quickly integrate Python and is a wonderful language when it comes to multimodal content from different sources. At the beginning I had the idea to use ChatGPT, but after the first versions it turned out that it didn't work so neatly. You notice at the end of the day that ChatGPT does not really understand what you want and is therefore badly suited for translation. There were often problems, with uniform outputs and occasionally the meaning of sentences was simply changed. After some sampling and customizing values, I then decided against ChatGPT, which has 2 advantages at the end, as it turns out, because due to my final solution, I have no need to pay for translation (apart from the own operator costs) and 2. my new solution comes very flexible and can also translate HTML content natively.

Final version

After ChatGPT didn't work, as I needed, I first paused the project due to lack of proper alternatives. In November, because of my work, I came up with a project that LibreTranslate called. This is a (also) local LLM that can be used to translate texts. You can transfer both files and text extracts and get them translated back (plus some metadata).

LibreTranslate convinced me of the first moment. It has gaps in some languages, as part of my work several hundred surface documents (work instructions) had to be translated into 8 languages to facilitate the incorporation of new people. I have already been able to do some things with LibreTranslate, and I have also taken a lot for my project. Because of the API, LibreTranslate can be used very dynamically and there are even libraries for some programming languages, including for Python. I did not use this at the end or improved/modified for my claims, as some settings were not possible, but I needed, for example, because you could not specify the input format (text or HTML) and this parameter saved extremely much work.

How does it work?

I will not post the source code myself here, as the program is not yet Battle-Tested and I would like to remove some mistakes and install a Daemon for Unix systems so that the translation can take place fully automatically. To discuss the function, however, I have built a simplified flow chart that summarizes the translation.

Due to the circumference, essential components are missing in the program sequence, for example the read-in of the configuration. The program is able to translate different blog structures using the configuration, so that not only this blog, but theoretically several blogs can be translated with the CMS and API. In my example, the Config looks like:

{
    "translator_api": {
        "target_url": "http://translate.jr.local"
    },
    "cms_api": {
        "target": "https://pure-smart.de",
        "endpoint": "api/pages",
        "page": "blog",
        "limit": 1,
        "api_user_needed": true,
        "api_username": "<<my_user>>",
        "api_password": "<<my_pass>>"   
    },    
    "log_level": "Warning",
    "target_lang": [
        {
            "name": "english",
            "lang_code": "en",
            "fields": {
                "check_marker": "transl-enabled",
                "success_marker": "english-translated"
            }
        }
    ],
    "limit": 3,
    "rules": [
        {
            "wildcard_fields": false,
            "allowed_fields": ["cover", "content", "summary", "main_title", "gt", "descr", "m_tit", "og_tit", "og_descr"],
            "exceptions": [],
            "page_name": "blog",
            "fields": [
                {
                    "field_name": "cover",
                    "translate": false
                },
                {
                    "field_name": "main_title",
                    "translate": "raw"
                },
                {
                    "field_name": "caption",
                    "translate": "raw"
                },
                {
                    "field_name": "p_caption",
                    "translate": "raw"
                },
                {
                    "field_name": "no_posts",
                    "translate": "raw"
                },
                {
                    "field_name": "pinned_warn",
                    "translate": "raw"
                },
                {
                    "field_name": "summary",
                    "translate": "raw"
                },
                {
                    "field_name": "descr",
                    "translate": "raw"
                },
                {
                    "field_name": "m_tit",
                    "translate": "raw"
                },
                {
                    "field_name": "og_tit",
                    "translate": "raw"
                },
                {
                    "field_name": "og_descr",
                    "translate": "raw"
                },
                {
                    "field_name": "gt",
                    "translate": "raw"
                },
                {
                    "rule_name": "content-Gallery_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.caption",
                    "filter": [
                        {
                            "key": "type",
                            "value": "gallery"
                        }
                    ]
                },
                {
                    "rule_name": "content-Heading_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.text",
                    "filter": [
                        {
                            "key": "type",
                            "value": "heading"
                        }
                    ]
                },
                {
                    "rule_name": "content-Image_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.alt",
                    "filter": [
                        {
                            "key": "type",
                            "value": "image"
                        }
                    ]
                },
                {
                    "rule_name": "content-Image_Block_2",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.caption",
                    "filter": [
                        {
                            "key": "type",
                            "value": "image"
                        }
                    ]
                },
                {
                    "rule_name": "content-list_block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.text",
                    "filter": [
                        {
                            "key": "type",
                            "value": "list"
                        }
                    ]
                },
                {
                    "rule_name": "content-Quote_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.text",
                    "filter": [
                        {
                            "key": "type",
                            "value": "quote"
                        }
                    ]
                },
                {
                    "rule_name": "content-Quote_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.citation",
                    "filter": [
                        {
                            "key": "type",
                            "value": "quote"
                        }
                    ]
                },
                {
                    "rule_name": "content-Text_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.text",
                    "translation_type": "html",
                    "filter": [
                        {
                            "key": "type",
                            "value": "text"
                        }
                    ]
                },
                {
                    "rule_name": "content-Video_Block",
                    "field_name": "content",
                    "translate": "json-array", 
                    "content_key": "content.caption",
                    "filter": [
                        {
                            "key": "type",
                            "value": "video"
                        }
                    ]
                },
                {
                    "rule_name": "content-Code_Block",
                    "field_name": "content",
                    "translate": false, 
                    "content_key": "content.code",
                    "filter": [
                        {
                            "key": "type",
                            "value": "code"
                        }
                    ]
                }
            ]

        }
    ]
}

With the help of the configuration file, I have the opportunity to adapt the program dynamically to the blog and make changes at any time.

Finally, a simplified flow chart of the program is shown below. It summarizes the process very simplified. There are missing things like authentication to the systems and the processing of the blog posts. But at least one gets an idea how this blog is translated :)


Back…