Arabic characters from html content to pdf using iText -
i having trouble display arabic characters html content in pdf generation "?"
i able display arabic text string variable. @ same time not able generate arabic text html string.
i want display pdf 2 column, left side english , right side arabic text.
when use following program convert pdf. please me in regard.
try { document document = new document(pagesize.a4, 50, 50, 50, 50); bytearrayoutputstream out = new bytearrayoutputstream(); pdfwriter writer = pdfwriter.getinstance(document, out); basefont bf = basefont.createfont("c:\\arial.ttf", basefont.identity_h, basefont.embedded); font font = new font(bf, 8); document.open(); bufferedreader br = new bufferedreader(new filereader("c:\\style.css")); stringbuffer filecontents = new stringbuffer(); string line = br.readline(); while (line != null) { filecontents.append(line); line = br.readline(); } br.close(); string styles = filecontents.tostring(); //"p { font-family: arial;}"; paragraph cirnoen = null; paragraph cirnoar = null; string htmlcontenten = null; string htmlcontentar = null; pdfpcell contentencell = new pdfpcell(); pdfpcell contentarcell = new pdfpcell(); cirnoen = new paragraph("circular no. (" + cirenno + ")", new font(bf, 14, font.bold | font.underline)); cirnoar = new paragraph("رقم التعميم (" + cirarno + ")", new font(bf, 14, font.bold | font.underline)); htmlcontenten = “< p >< span > dear….</ span ></ p >”; htmlcontentar = “< p >< span > رقم التعميم رقم التعميم </ p >< p > رقم التعميم ….</ span ></ p >”; (element e : xmlworkerhelper.parsetoelementlist(htmlcontenten, styles)) { (chunk c : e.getchunks()) { c.setfont(new font(bf)); } contentencell.addelement(e); } (element e : xmlworkerhelper.parsetoelementlist(htmlcontentar, styles)) { (chunk c:e.getchunks()) { c.setfont(new font(bf)); } contentarcell.addelement(e); } pdfpcell emptycell = new pdfpcell(); pdfpcell cirnoencell = new pdfpcell(cirnoen); pdfpcell cirnoarcell = new pdfpcell(cirnoar); cirnoencell.sethorizontalalignment(element.align_center); cirnoarcell.sethorizontalalignment(element.align_center); emptycell.setborder(rectangle.no_border); emptycell.setfixedheight(15); cirnoencell.setborder(rectangle.no_border); cirnoarcell.setborder(rectangle.no_border); contentencell.setborder(rectangle.no_border); contentarcell.setborder(rectangle.no_border); cirnoarcell.setrundirection(pdfwriter.run_direction_rtl); contentarcell.setrundirection(pdfwriter.run_direction_rtl); contentencell.setnowrap(false); contentarcell.setnowrap(false); pdfptable circularinfotable = null; emptycell.setcolspan(2); circularinfotable = new pdfptable(2); circularinfotable.addcell(cirnoencell); circularinfotable.addcell(cirnoarcell); circularinfotable.addcell(emptycell); circularinfotable.addcell(emptycell); circularinfotable.addcell(emptycell); circularinfotable.addcell(contentencell); circularinfotable.addcell(contentarcell); circularinfotable.addcell(emptycell); circularinfotable.getdefaultcell().setborder(pdfpcell.no_border); circularinfotable.setwidthpercentage(100); document.add(circularinfotable); document.close(); } catch (exception e) { }
please take @ parsehtml7 , parsehtml8 examples. take html input arabic characters , create pdf same arabic text:
before @ code, allow me explain it's not idea use non-ascii characters in source code. instance: not done:
htmlcontentar = “<p><span> رقم التعميم رقم التعميم</p><p>رقم التعميم ….</span></p>”;
you never know how java file containing these glyphs stored. if it's not stored utf-8, characters may end looking different. versioning systems known have problems non-ascii characters , compilers can encoding wrong. if want stored hard-coded string
values in code, use unicode notation. part of problem encoding problem, , can read more here: can't czech characters while generating pdf
for examples shown in screen shots, saved following files using utf-8 encoding:
this you'll find in file arabic.html
:
<html> <body style="font-family: noto naskh arabic"> <p>رقم التعميم رقم التعميم</p> <p>رقم التعميم</p> </body> </html>
this you'll find in file arabic2.html
:
<html> <body style="font-family: noto naskh arabic"> <table> <tr> <td dir="rtl">رقم التعميم رقم التعميم</td> <td dir="rtl">رقم التعميم</td> </tr> </table> </body> </html>
the second part of problem concerns font. important use font knows how draw arabic glyphs. hard believe have arial.ttf
right @ root of c:
drive. that's not idea. expect use c:/windows/fonts/arialuni.ttf
knows arabic glyphs.
selecting font isn't sufficient. html needs know font family use. because of examples in documentation use arial, decided use noto font. discovered these fonts reading question: itext pdf not displaying chinese characters when using noto fonts or source hans. these fonts because nice , (almost) every language supported. instance, used notonaskharabic-regular.ttf
means need define font familie this:
style="font-family: noto naskh arabic"
i defined style in body tag of xml, it's obvious can choose define it: in external css file, in styles section of <head>
, @ level of <td>
tag,... choice entirely yours, have define somewhere font use.
of course: when xml worker encounters font-family: noto naskh arabic
, itext doesn't know find corresponding notonaskharabic-regular.ttf
unless register font. can this, creating instance of fontprovider
interface. chose use xmlworkerfontprovider
, you're free write own fontprovider
implementation:
xmlworkerfontprovider fontprovider = new xmlworkerfontprovider(xmlworkerfontprovider.dontlookforfonts); fontprovider.register("resources/fonts/notonaskharabic-regular.ttf");
there 1 more hurdle take: arabic written right left. see want define run direction @ level of pdfpcell
, add html content cell using elementlist
. that's why first wrote similar example, named parsehtml7:
public void createpdf(string file) throws ioexception, documentexception { // step 1 document document = new document(); // step 2 pdfwriter writer = pdfwriter.getinstance(document, new fileoutputstream(file)); // step 3 document.open(); // step 4 // styles cssresolver cssresolver = new styleattrcssresolver(); xmlworkerfontprovider fontprovider = new xmlworkerfontprovider(xmlworkerfontprovider.dontlookforfonts); fontprovider.register("resources/fonts/notonaskharabic-regular.ttf"); cssappliers cssappliers = new cssappliersimpl(fontprovider); // html htmlpipelinecontext htmlcontext = new htmlpipelinecontext(cssappliers); htmlcontext.settagfactory(tags.gethtmltagprocessorfactory()); // pipelines elementlist elements = new elementlist(); elementhandlerpipeline pdf = new elementhandlerpipeline(elements, null); htmlpipeline html = new htmlpipeline(htmlcontext, pdf); cssresolverpipeline css = new cssresolverpipeline(cssresolver, html); // xml worker xmlworker worker = new xmlworker(css, true); xmlparser p = new xmlparser(worker); p.parse(new fileinputstream(html), charset.forname("utf-8")); pdfptable table = new pdfptable(1); pdfpcell cell = new pdfpcell(); cell.setrundirection(pdfwriter.run_direction_rtl); (element e : elements) { cell.addelement(e); } table.addcell(cell); document.add(table); // step 5 document.close(); }
there no table in html, create our own pdfptable
, add content html pdfpcell
run direction ltr, , add cell table, , table document.
maybe that's actual requirement, why in such convoluted way? if need table, why don't create table in html , define cells rtl this:
<td dir="rtl">...</td>
that way, don't have create elementlist
, can parse html pdf done in parsehtml8 example:
public void createpdf(string file) throws ioexception, documentexception { // step 1 document document = new document(); // step 2 pdfwriter writer = pdfwriter.getinstance(document, new fileoutputstream(file)); // step 3 document.open(); // step 4 // styles cssresolver cssresolver = new styleattrcssresolver(); xmlworkerfontprovider fontprovider = new xmlworkerfontprovider(xmlworkerfontprovider.dontlookforfonts); fontprovider.register("resources/fonts/notonaskharabic-regular.ttf"); cssappliers cssappliers = new cssappliersimpl(fontprovider); htmlpipelinecontext htmlcontext = new htmlpipelinecontext(cssappliers); htmlcontext.settagfactory(tags.gethtmltagprocessorfactory()); // pipelines pdfwriterpipeline pdf = new pdfwriterpipeline(document, writer); htmlpipeline html = new htmlpipeline(htmlcontext, pdf); cssresolverpipeline css = new cssresolverpipeline(cssresolver, html); // xml worker xmlworker worker = new xmlworker(css, true); xmlparser p = new xmlparser(worker); p.parse(new fileinputstream(html), charset.forname("utf-8"));; // step 5 document.close(); }
there less code needed in example, , when want change layout, it's sufficient change html. don't need change java code.
one more example: in parsehtml9, create table english name in 1 column ("lawrence of arabia") , arabic translation in other column ("لورانس العرب"). because need different fonts english , arabic, define font @ <td>
level:
<table> <tr> <td>lawrence of arabia</td> <td dir="rtl" style="font-family: noto naskh arabic">لورانس العرب</td> </tr> </table>
for first column, default font used , no special settings needed write left right. second column, define arabic font , set run direction "rtl"
.
the result looks this:
that's easier you're trying in code.
Comments
Post a Comment