Intuitive Way to Add Custom Segmentation for Text-Based Formats

6 mins read

Custom Segmentation for Text-Based Formats

For a non-tech-savvy person, creating an SRX file with the segmentation rules might become a time-consuming task. You need to go through the list of specifications and compose the rules using the XML vocabulary. On the other hand, merging and splitting the strings manually might take an even longer time. Luckily, there’s now an intuitive way to add custom segmentation rules, test, and apply them to similar strings automatically.

In this post, we’ll take a closer look at content segmentation, discuss why it is important in localization, and give you step-by-step instructions on how you can quickly add custom segmentation with the Segmentation Rules Generator app.

How and Why Crowdin Divides Your Content Into Segments

When you upload a file to Crowdin in non-key-value formats like DOCX, HTML, XML, MD, and similar, the system divides this content into strings (segments) based on the SRX 2.0 standard. SRX stands for Segmentation Rules Exchange and is an established XML-based standard that describes how translation and other language-processing tools should divide texts into fragments.

Text segmentation makes the Translation Memory more usable. With longer text pieces divided into smaller ones, you will be able to use TM suggestions with different similarity matches, which would be much harder to do for the longer copy.

Two Approaches to Custom Segmentation Combined Together

Now imagine, you localize an app Bingo! in Crowdin. It’s a great app that helps users arrange ideas during brainstorming sessions. And yes, its name has an exclamation mark at the end. This means, based on the SRX 2.0 standard, every string where this name appears is split into two.

Previously, to fix this, you had to either manually go through the content and merge the strings or create an SRX file with the segmentation rules as in this sample. The first approach is applicable if there are few strings to correct, the second if you’re perfectly aware of the SRX specifications and XML formatting.

To simplify creating custom segmentation rules, we launched the Segmentation Rules Generator app that combines the two approaches. In the app visual interface, you can go through the strings, merge, and split them where necessary. As you do, the SRX file with rules is updated automatically so that you can apply similar rules to multiple strings.

With the app, you can also edit the existing SRX file and preview right away how the content is segmented based on the rules you’ve added or edited.

Add Custom Segmentation to Your Localization Files With Ease

To try out the new approach and change segmentation within a specific file, follow these steps.

  1. Install the Segmentation Rules Generator app on Crowdin or Crowdin Enterprise.
    In Crowdin, go to Resources in the menu bar and select Marketplace in the drop-down menu.
    If you use Crowdin Enterprise, use the left-side menu of your workspace to open the Marketplace.
    Learn more about Crowdin Store.
  2. Open the app.
    In Crowdin, go to Project Settings > Integrations and scroll down to Applications and select the app.
    In Crowdin Enterprise, go to Project Home > Applications > Custom > Segmentation Rules Generator.
  3. Select the necessary project file.
    You can currently create segmentation rules for the following format types: DOCX, MD, HTML, DITA, IDML, TXT, and XML.
    Once you select the file, you’ll see the existing rules for segmentation and all the strings you can split or join.
    Select file for custom segmentation
  4. To split strings, put the cursor in the necessary place and click the scissors icon. Clicking on the magnet icon next to the string will merge it to the previous one. Add custom segmentation rules
  5. After you make any changes to the segmentation via UI, the necessary rule will be added to the file and will be automatically applied to similar strings. You can also edit the existing rules and preview the newly arranged segmentation directly in the app. Preview SRX file
  6. Save the SRX file and use it to change the segmentation for a specific file.
    In Crowdin, go to Project Settings > Files and upload the file you’ve generated. Learn more.
    In Crowdin Enterprise, go to Project Home > Content > Files and upload the SRX file. Learn more.

Translators Will Still See the Whole Text for Reference

Even though segmentation is crucial for creating Translation Memories, translators often need to see the whole text to get the main idea and necessary context. Translation per string might influence the translation quality, and that’s where Crowdin WYSIWYG becomes of use.

In the Crowdin Editor, translators can switch between views and, if necessary, preview the whole document with the pictures, tables, columns, lists it might contain.

Learn more about the context for translators.

WYSIWYG for DOCX

Explore More Crowdin Apps

Custom segmentation is usually a set-and-forget-it type of configuration. So once you’re done with it, discover more useful apps on Crowdin Store. They will help you customize your company’s localization experience and get more of Crowdin and Crowdin Enterprise.

Go to Crowdin Store.

Iryna Namaka

Link
Previous Post
How to Prepare Content for Localization: 6 Tips
Next Post
5 Best Practices for UI Localization