Arabic characters from html content to pdf using iText -


i having trouble display arabic characters html content in pdf generation "?"

i able display arabic text string variable. @ same time not able generate arabic text html string.

i want display pdf 2 column, left side english , right side arabic text.

when use following program convert pdf. please me in regard.

try {     document document = new document(pagesize.a4, 50, 50, 50, 50);     bytearrayoutputstream out = new bytearrayoutputstream();     pdfwriter writer = pdfwriter.getinstance(document, out);     basefont bf = basefont.createfont("c:\\arial.ttf", basefont.identity_h, basefont.embedded);     font font = new font(bf, 8);     document.open();      bufferedreader br = new bufferedreader(new filereader("c:\\style.css"));     stringbuffer filecontents = new stringbuffer();     string line = br.readline();     while (line != null)     {         filecontents.append(line);         line = br.readline();     }     br.close();     string styles = filecontents.tostring(); //"p { font-family: arial;}";       paragraph cirnoen = null;     paragraph cirnoar = null;      string htmlcontenten = null;     string htmlcontentar = null;      pdfpcell contentencell = new pdfpcell();     pdfpcell contentarcell = new pdfpcell();      cirnoen = new paragraph("circular no. (" + cirenno + ")", new font(bf, 14, font.bold | font.underline));     cirnoar = new paragraph("رقم التعميم (" + cirarno + ")", new font(bf, 14, font.bold | font.underline));      htmlcontenten = “< p >< span > dear….</ span ></ p >”;     htmlcontentar = “< p >< span > رقم التعميم رقم التعميم </ p >< p > رقم التعميم ….</ span ></ p >”;     (element e : xmlworkerhelper.parsetoelementlist(htmlcontenten, styles))     {         (chunk c : e.getchunks())         {             c.setfont(new font(bf));         }         contentencell.addelement(e);     }     (element e : xmlworkerhelper.parsetoelementlist(htmlcontentar, styles))     {         (chunk c:e.getchunks())         {             c.setfont(new font(bf));         }         contentarcell.addelement(e);     }      pdfpcell emptycell = new pdfpcell();     pdfpcell cirnoencell = new pdfpcell(cirnoen);     pdfpcell cirnoarcell = new pdfpcell(cirnoar);      cirnoencell.sethorizontalalignment(element.align_center);     cirnoarcell.sethorizontalalignment(element.align_center);      emptycell.setborder(rectangle.no_border);     emptycell.setfixedheight(15);      cirnoencell.setborder(rectangle.no_border);     cirnoarcell.setborder(rectangle.no_border);     contentencell.setborder(rectangle.no_border);     contentarcell.setborder(rectangle.no_border);      cirnoarcell.setrundirection(pdfwriter.run_direction_rtl);     contentarcell.setrundirection(pdfwriter.run_direction_rtl);      contentencell.setnowrap(false);     contentarcell.setnowrap(false);      pdfptable circularinfotable = null;      emptycell.setcolspan(2);     circularinfotable = new pdfptable(2);     circularinfotable.addcell(cirnoencell);     circularinfotable.addcell(cirnoarcell);     circularinfotable.addcell(emptycell);     circularinfotable.addcell(emptycell);     circularinfotable.addcell(emptycell);     circularinfotable.addcell(contentencell);     circularinfotable.addcell(contentarcell);     circularinfotable.addcell(emptycell);      circularinfotable.getdefaultcell().setborder(pdfpcell.no_border);     circularinfotable.setwidthpercentage(100);     document.add(circularinfotable);      document.close();  } catch (exception e) {  } 

please take @ parsehtml7 , parsehtml8 examples. take html input arabic characters , create pdf same arabic text:

a pdf table html content an html table in pdf

before @ code, allow me explain it's not idea use non-ascii characters in source code. instance: not done:

 htmlcontentar = “<p><span> رقم التعميم رقم التعميم</p><p>رقم التعميم ….</span></p>”; 

you never know how java file containing these glyphs stored. if it's not stored utf-8, characters may end looking different. versioning systems known have problems non-ascii characters , compilers can encoding wrong. if want stored hard-coded string values in code, use unicode notation. part of problem encoding problem, , can read more here: can't czech characters while generating pdf

for examples shown in screen shots, saved following files using utf-8 encoding:

this you'll find in file arabic.html:

<html> <body style="font-family: noto naskh arabic"> <p>رقم التعميم رقم التعميم</p> <p>رقم التعميم</p> </body> </html> 

this you'll find in file arabic2.html:

<html> <body style="font-family: noto naskh arabic"> <table> <tr> <td dir="rtl">رقم التعميم رقم التعميم</td> <td dir="rtl">رقم التعميم</td> </tr> </table> </body> </html> 

the second part of problem concerns font. important use font knows how draw arabic glyphs. hard believe have arial.ttf right @ root of c: drive. that's not idea. expect use c:/windows/fonts/arialuni.ttf knows arabic glyphs.

selecting font isn't sufficient. html needs know font family use. because of examples in documentation use arial, decided use noto font. discovered these fonts reading question: itext pdf not displaying chinese characters when using noto fonts or source hans. these fonts because nice , (almost) every language supported. instance, used notonaskharabic-regular.ttf means need define font familie this:

style="font-family: noto naskh arabic" 

i defined style in body tag of xml, it's obvious can choose define it: in external css file, in styles section of <head>, @ level of <td> tag,... choice entirely yours, have define somewhere font use.

of course: when xml worker encounters font-family: noto naskh arabic, itext doesn't know find corresponding notonaskharabic-regular.ttf unless register font. can this, creating instance of fontprovider interface. chose use xmlworkerfontprovider, you're free write own fontprovider implementation:

xmlworkerfontprovider fontprovider = new xmlworkerfontprovider(xmlworkerfontprovider.dontlookforfonts); fontprovider.register("resources/fonts/notonaskharabic-regular.ttf"); 

there 1 more hurdle take: arabic written right left. see want define run direction @ level of pdfpcell , add html content cell using elementlist. that's why first wrote similar example, named parsehtml7:

public void createpdf(string file) throws ioexception, documentexception {     // step 1     document document = new document();     // step 2     pdfwriter writer = pdfwriter.getinstance(document, new fileoutputstream(file));     // step 3     document.open();     // step 4     // styles     cssresolver cssresolver = new styleattrcssresolver();     xmlworkerfontprovider fontprovider = new xmlworkerfontprovider(xmlworkerfontprovider.dontlookforfonts);     fontprovider.register("resources/fonts/notonaskharabic-regular.ttf");     cssappliers cssappliers = new cssappliersimpl(fontprovider);     // html     htmlpipelinecontext htmlcontext = new htmlpipelinecontext(cssappliers);     htmlcontext.settagfactory(tags.gethtmltagprocessorfactory());     // pipelines     elementlist elements = new elementlist();     elementhandlerpipeline pdf = new elementhandlerpipeline(elements, null);     htmlpipeline html = new htmlpipeline(htmlcontext, pdf);     cssresolverpipeline css = new cssresolverpipeline(cssresolver, html);      // xml worker     xmlworker worker = new xmlworker(css, true);     xmlparser p = new xmlparser(worker);     p.parse(new fileinputstream(html), charset.forname("utf-8"));      pdfptable table = new pdfptable(1);     pdfpcell cell = new pdfpcell();     cell.setrundirection(pdfwriter.run_direction_rtl);     (element e : elements) {         cell.addelement(e);     }     table.addcell(cell);     document.add(table);     // step 5     document.close(); } 

there no table in html, create our own pdfptable, add content html pdfpcell run direction ltr, , add cell table, , table document.

maybe that's actual requirement, why in such convoluted way? if need table, why don't create table in html , define cells rtl this:

<td dir="rtl">...</td> 

that way, don't have create elementlist, can parse html pdf done in parsehtml8 example:

public void createpdf(string file) throws ioexception, documentexception {     // step 1     document document = new document();     // step 2     pdfwriter writer = pdfwriter.getinstance(document, new fileoutputstream(file));     // step 3     document.open();     // step 4     // styles     cssresolver cssresolver = new styleattrcssresolver();     xmlworkerfontprovider fontprovider = new xmlworkerfontprovider(xmlworkerfontprovider.dontlookforfonts);     fontprovider.register("resources/fonts/notonaskharabic-regular.ttf");     cssappliers cssappliers = new cssappliersimpl(fontprovider);     htmlpipelinecontext htmlcontext = new htmlpipelinecontext(cssappliers);     htmlcontext.settagfactory(tags.gethtmltagprocessorfactory());      // pipelines     pdfwriterpipeline pdf = new pdfwriterpipeline(document, writer);     htmlpipeline html = new htmlpipeline(htmlcontext, pdf);     cssresolverpipeline css = new cssresolverpipeline(cssresolver, html);      // xml worker     xmlworker worker = new xmlworker(css, true);     xmlparser p = new xmlparser(worker);     p.parse(new fileinputstream(html), charset.forname("utf-8"));;     // step 5     document.close(); } 

there less code needed in example, , when want change layout, it's sufficient change html. don't need change java code.

one more example: in parsehtml9, create table english name in 1 column ("lawrence of arabia") , arabic translation in other column ("لورانس العرب"). because need different fonts english , arabic, define font @ <td> level:

<table> <tr> <td>lawrence of arabia</td> <td dir="rtl" style="font-family: noto naskh arabic">لورانس العرب</td> </tr> </table> 

for first column, default font used , no special settings needed write left right. second column, define arabic font , set run direction "rtl".

the result looks this:

english next arabic

that's easier you're trying in code.


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -