json - bash: split pipe stream into records and combine all lines in a record into one -
i have file containing million individual xml files (simply concatenated) convert json. file looks this:
<amf xmlns="..."> <test> 1 content </test> </amf> <amf xmlns="..."> <test> 2 content </test> </amf>
note above file not formatted xml file (i.e. individual entries not nested), cannot convert using `xml2json'.
to achieve want separate file records, each record corresponds individual xml file, concatenate xml file 1 line, , use parallel
on each line applying xml2json
achieve json
output.
when try use awk
or gawk
on osx, have trouble splitting pipe records. here's code tried ("useless" cat readability):
cat bigfile.xml | awk '{print nr "<amf xml"$0}' rs="<amf xml"
which gives:
1<amf xml 2<amf xmlns="..."> <test> 1 content </test> </amf> 3<amf xmlns="..."> <test> 2 content </test> </amf>
it's easy remove first 'record', can't collapse output of other records 1 line each record. tried experimenting fs="\n" , ofs=" " without luck.
can me output these records on 1 line per record?
with gnu awk multi-char rs , rt:
$ awk -v rs='</amf>\n' '{$1=$1; ors=rt}1' file <amf xmlns="..."> <test> 1 content </test></amf> <amf xmlns="..."> <test> 2 content </test></amf>
Comments
Post a Comment