For a non-tech-savvy person, creating an SRX file with the segmentation rules might become a time-consuming task. You need to go through the list of specifications and compose the rules using the XML vocabulary. On the other hand, merging and splitting the strings manually might take an even longer time. Luckily, there’s now an intuitive way to add custom segmentation rules, test, and apply them to similar strings automatically.
In this post, we’ll take a closer look at content segmentation, discuss why it is important in localization, and give you step-by-step instructions on how you can quickly add custom segmentation with the Segmentation Rules Generator app.
How and Why Crowdin Divides Your Content Into Segments
When you upload a file to Crowdin in non-key-value formats like DOCX, HTML, XML, MD, and similar, the system divides this content into strings (segments) based on the SRX 2.0 standard. SRX stands for Segmentation Rules Exchange and is an established XML-based standard that describes how translation and other language-processing tools should divide texts into fragments.
Text segmentation makes the Translation Memory more usable. With longer text pieces divided into smaller ones, you will be able to use TM suggestions with different similarity matches, which would be much harder to do for the longer copy.
Two Approaches to Custom Segmentation Combined Together
Now imagine, you localize an app Bingo! in Crowdin. It’s a great app that helps users arrange ideas during brainstorming sessions. And yes, its name has an exclamation mark at the end. This means, based on the SRX 2.0 standard, every string where this name appears is split into two.
Previously, to fix this, you had to either manually go through the content and merge the strings or create an SRX file with the segmentation rules as in this sample. The first approach is applicable if there are few strings to correct, the second if you’re perfectly aware of the SRX specifications and XML formatting.
To simplify creating custom segmentation rules, we launched the Segmentation Rules Generator app that combines the two approaches. In the app visual interface, you can go through the strings, merge, and split them where necessary. As you do, the SRX file with rules is updated automatically so that you can apply similar rules to multiple strings.
With the app, you can also edit the existing SRX file and preview right away how the content is segmented based on the rules you’ve added or edited.
Add Custom Segmentation to Your Localization Files With Ease
To try out the new approach and change segmentation within a specific file, follow these steps.
- Install the Segmentation Rules Generator app on Crowdin or Crowdin Enterprise.
In Crowdin, go to Resources in the menu bar and select Marketplace in the drop-down menu.
If you use Crowdin Enterprise, use the left-side menu of your workspace to open the Marketplace.
Learn more about Crowdin Store.
- Open the app.
In Crowdin, go to Project Settings > Integrations and scroll down to Applications and select the app.
In Crowdin Enterprise, go to Project Home > Applications > Custom > Segmentation Rules Generator.
- Select the necessary project file.
You can currently create segmentation rules for the following format types: DOCX, MD, HTML, DITA, IDML, TXT, and XML.
Once you select the file, you’ll see the existing rules for segmentation and all the strings you can split or join.
- To split strings, put the cursor in the necessary place and click the scissors icon. Clicking on the magnet icon next to the string will merge it to the previous one.
- After you make any changes to the segmentation via UI, the necessary rule will be added to the file and will be automatically applied to similar strings. You can also edit the existing rules and preview the newly arranged segmentation directly in the app.
- Save the SRX file and use it to change the segmentation for a specific file.
In Crowdin, go to Project Settings > Files and upload the file you’ve generated. Learn more.
In Crowdin Enterprise, go to Project Home > Content > Files and upload the SRX file. Learn more.
Translators Will Still See the Whole Text for Reference
Even though segmentation is crucial for creating Translation Memories, translators often need to see the whole text to get the main idea and necessary context. Translation per string might influence the translation quality, and that’s where Crowdin WYSIWYG becomes of use.
In the Crowdin Editor, translators can switch between views and, if necessary, preview the whole document with the pictures, tables, columns, lists it might contain.
Explore More Crowdin Apps
Custom segmentation is usually a set-and-forget-it type of configuration. So once you’re done with it, discover more useful apps on Crowdin Store. They will help you customize your company’s localization experience and get more of Crowdin and Crowdin Enterprise.