json - bash: split pipe stream into records and combine all lines in a record into one -


i have file containing million individual xml files (simply concatenated) convert json. file looks this:

<amf xmlns="...">  <test>    1 content  </test> </amf> <amf xmlns="...">  <test>    2 content  </test> </amf> 

note above file not formatted xml file (i.e. individual entries not nested), cannot convert using `xml2json'.

to achieve want separate file records, each record corresponds individual xml file, concatenate xml file 1 line, , use parallel on each line applying xml2json achieve json output.

when try use awk or gawk on osx, have trouble splitting pipe records. here's code tried ("useless" cat readability):

cat bigfile.xml | awk '{print nr "<amf xml"$0}' rs="<amf xml" 

which gives:

1<amf xml 2<amf xmlns="...">  <test>    1 content  </test> </amf>  3<amf xmlns="...">  <test>    2 content  </test> </amf> 

it's easy remove first 'record', can't collapse output of other records 1 line each record. tried experimenting fs="\n" , ofs=" " without luck.

can me output these records on 1 line per record?

with gnu awk multi-char rs , rt:

$ awk -v rs='</amf>\n' '{$1=$1; ors=rt}1' file <amf xmlns="..."> <test> 1 content </test></amf> <amf xmlns="..."> <test> 2 content </test></amf> 

Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -