
Read legacy profile XML into a data frame
xlm_profiles_to_df.Rd
Parses a legacy XML dump where records are stored under <ROW>
nodes. The
function reads raw bytes, drops NULs, interprets as Latin-1/Windows-1252,
converts to UTF-8, fixes/sets the XML declaration, and extracts each <ROW>
as a row. Columns are the union of child element names across rows; missing
fields are filled with NA
.
Value
A base data.frame
with one row per <ROW>
and character columns for each
child element name found in the file.
Details
This helper is designed for messy, non-UTF-8 dumps (e.g., Windows-1252) that may contain NUL bytes. It performs minimal normalization and does not coerce types; parse dates or numerics downstream as needed.
Examples
# \donttest{
xml_txt <- paste0(
'<?xml version="1.0"?>',
'<ROWSET>',
' <ROW><ID>1</ID><PROFILE_TYPE>buyer</PROFILE_TYPE></ROW>',
' <ROW><ID>2</ID><PROFILE_TYPE>seller</PROFILE_TYPE><CITY>Panamá</CITY></ROW>',
'</ROWSET>'
)
f <- tempfile(fileext = ".xml"); writeLines(xml_txt, f)
xlm_profiles_to_df(f)
#> ID PROFILE_TYPE CITY
#> 1 1 buyer <NA>
#> 2 2 seller Panamá
unlink(f)
# }