COLLATION GUIDE
Levels of collation
Our software package ‘Prabhed’ carries out collation at three levels: ‘gross’ or macro-collation at Section and Segment levels, and ‘fine’ or micro-collation at Word level.
- Section. A section is a chapter of a novel or other long prose work, a scene in a long play, or a canto in a long poem. At this level, Prabhed will compare the sectional division of different versions of a text, and note their matches and differences. A short poem, song or essay will consist of a single section.
NB: Owing to the division parameters used, parts of a single chapter, scene etc. separated by a blank line (e.g., stanzas of a poem embedded in a prose novel or drama) may sometimes appear as separate sections. - Segment. A segment is a paragraph of a prose text, a single speech in a play, or a stanza of a poem. At this level, Prabhed will compare the corresponding segments of different versions of a text, and note their matches and differences. In most cases, matching segments occur within matching sections. However, matching segments in other sections are also recorded.
A short poem or song without stanza division will appear as a single segment: i.e., here the section and segment will be identical. - Word. At the most detailed level, Prabhed will compare all matching segments word for word and note their matches and differences.
How to use Prabhed:
- Click ‘Collation’ on the home page menu bar. A bibliographical table will open.
- Click the collation logo in the title column on the left. The Section-level display will open.
Section-level display:
- The Section-level display consists of horizontal bands, each indicating the full text of one version (print or manuscript) of the work. The bands are colour-coded: each version has a different colour. The parts or blocks (‘slices’) into which each band is divided represent the sections (chapters, scenes, etc.) within the full text. These blocks are differently shaded in four light-to-dark gradations for easy viewing. The width of the block is proportionate to the bulk of that section within the full text.
A short poem, song or essay consists of a single section, hence the band will not be divided but show as a uniform shade.
The sections are numbered, starting with 0. E.g., 1316/0 means section 0 or the first section of the 1316 edition of the work. 1316/1 means section 1 or the second section of the 1316 edition. - Choose any version as the base for comparison by moving your mouse over the appropriate band. That band will be highlighted.
- Now select and click on the part or block (‘slice’) representing the section you want to compare. The matching blocks in all the versions will be indicated by red underscoring. The number of that section in the base document will show in the top right corner.
- A panel will appear at the bottom of the page, showing all matching sections with match percentages vis-à-vis the base. A text-link box will appear alongside each section number. Click it to see the text of that section in a pop-up window.
- In the bottom panel, click the section and version of the base document (left-most entry) to open the Segment-level collation page using that version as base.
- In the bottom panel, click any other document to open a colour-coded vertical panel to the right, showing segment-by-segment matches in that section between the base document and the one you have selected. The matching segments are linked by grey bands.
Segment-level display:
- The Segment-level display operates in the same way as the Section-level display.
- Open the Segment-level display as indicated in 5 above. You will see a colour-coded band standing for that section in your chosen base version. By clicking on it, you will see more such bands standing for matching sections in the other versions. The parts or blocks (‘slices’) within each band indicate the segments (paragraphs, speeches, stanzas etc.) into which it is divided.
The width of the block is proportionate to the bulk of that segment within the full text.
A short poem, song or essay consists of a single segment, hence the band will not be divided but show as a uniform shade.
The segments are numbered, starting with 0. E.g., 1316/0/0 means the first segment of the first section of the 1316 edition of the work. 1316/1/0 means the first segment of section 1 or the second section of the 1316 edition. 1316/1/1 means the second segment of the second section. - Now click on the coloured part or block (‘slice’) in the topmost band representing the segment in the base document that you want to compare with the others. The blocks standing for matching segments in all the versions will be underscored in red. The number of that segment in the base document will show in the top right corner.
- A panel will appear at the bottom of the page, showing all matching segments and their respective match percentages vis-à-vis the base. A text-link box will appear alongside each segment number. Click it to see the text of that segment in a pop-up window.
- Use the left and right arrows at the top right to move from block to block of a colour band, especially to reach narrow blocks difficult to pinpoint with the mouse.
- Click the ‘Grid View’ box at the top right to see the collation results in the form of a grid or table. Here, the first column on the left gives the segment numbers in that section of the base document. The later columns show the match percentage in corresponding segments of the other versions, indicated as 1, 2, 3 etc. according to the position in the bottom panel of the previous segment-level colour display.
- In the bottom panel, click the segment and version of the base document (left-most entry) to open the word-level ‘fine collation’ page using that version as base.
Word-level or fine collation:
- The fine collation results appear in a four-pane display. The base document (with section and segment number) is indicated in the header. The other versions (with section and segment number) are listed in the left column outside the four-pane frame.
- The top left pane gives the text of the segment being used as base.
- The bottom left pane will display the text of any one other version. Select this version by clicking on its number in the left-column list.
- The top right pane shows the text of the base segment, colour-coded to indicate its correspondence with the matching segments.
- Black means the word is the same in all versions.
- Red means the word is similar but not the same in one or more other versions. Click on the word to see the variant readings in the bottom right pane.
- Blue means the word is found in the base version but not in one or more others. Click on the word to see a dot in the bottom right pane indicating which versions lack that word.
- Green means the word is found in one or more other versions but not in the base version. Click on a green dot to see what the word is, and in which versions it is found, in the bottom right pane.
- A dot means a word missing in the current version but with a word in the matching position in one or more other versions.
Version names
Versions/editions are indicated as follows:
- for printed works, four digits indicating the Bengali year of publication in case of Bengali works, and the English year of publication in case of English works. If there is more than one version published in the same year, they are distinguished by the letters a, b etc.
- for manuscript works, [R+ms.no.] for manuscripts in the main sequence of Rabindra-Bhavana; [B+ ms.no.] for manuscripts in the ‘MSF-Bengali’sequence of Rabindra-Bhavana; [E+ ms.no.] for manuscripts in the ‘MSF-English’sequence of Rabindra-Bhavana; [H+ ms.no.] for manuscripts from Harvard University.
Match percentage settings
Word match:
Words of 1 to 4 characters are held to match if all the characters match. For words of more than four characters, one difference is allowed for every four additional characters or fraction thereof: i.e., one difference for words of 5 to 8 characters, two for words of 9 to 12 characters, and so on.
NB:
- All vowel markers other than অ-কার count as characters. Each element of a conjunct letter (yuktakshar) counts as a separate character. Thus কাল has 3 characters, বর্ষা has 4 characters, নম্রতা has 5 characters, রবীন্দ্র্নাথ has 9 characters.
- Punctuation marks, including hyphen and apostrophe/quotation marks, do not count as characters. Two words separated by a hyphen are counted as a single word.
- The default match level between sections and segments is 60% of the word count between base text and reference text, counting from both directions.
- This will not cover cases where a long section/segment in one text is divided into two or more sections/segments in the other text. To cover such cases, where a match percentage of 60%+ is found in one direction only, a ‘tension’ count with default value of 15% is made in the opposite direction. Where the tension count exceeds 15%, the two sections/segments are taken to match.
- However, this means some invalid matches will also be shown, owing to accidental matches totalling 15%+ between random words. This is most likely to happen with very short segments consisting of one or a few words.
Users should be cautious about accepting low matches (below 30-35%) without checking the actual text. This is specially the case where a high match (60%+) is also shown for the same section/segment. - Sometimes a word or phrase occurs repeatedly through a text – e.g., as stage direction or choric refrain. This can result in very high matches (85%+, often 100%) being shown repeatedly between random sections of the text. To obviate such cases, only the first occurrence in each section of multiple very high matches (85%+) is included in the collation.