I have a number of very large XML documents, and I'd like to have a script made in Java that will read data within the XML document, and persist it into a database with configurable batch sizes.
The specific structure of the XML documents may very, but they all have the same basic parent/child structure. For example, we might have:
As you can see above, the first file has two columns, Child_1 and Child_2, while the second file has Child3 and Child4. The application will work as follows in each above case:
1. Parse the XML document to get the list of ALL record element names. Just as with above, we need to review within the entire document to get a cumulative, distinct list of any and all column names;
2. Create a table with the desired structure (e.g. in the above case, tables would be created with the two columns indicated; and
3. Persist all data from the XML documents into the table created.
The application should accept three parameters, the first is the filename to be read, the second is batch size, and the third is target table name. For example, if my app name is PersistRMXML, and I want a batch size of 10,000, and the tablename I want to create is SomeTableName, then the command would be:
java -jar PersistRMXML [login to view URL] 10000 SomeTableName
Upon execution, the application will first read the input file to determine any and all possible column names (each file can contain over one million records, and so the file should not be loaded all into memory at once). STaX can be used to accomplish this.
Once this is done, the new table will be created on the target database with the desired structure. The application should contain an [login to view URL] file where the target MSSQL database connection string should be configured.
Once the target table is created, the application should read records in batch sizes specified above, and insert read records into memory. Once the desired number of records has been parsed from the XML document, and stored in memory, the application should then persist the data into the new table using the BulkCopy operation of MSSQL. More on Bulk Copy can be found here:
[login to view URL]
As a final step, the application should validate that the number of records inserted into the target table is the same record count as the number of records persisted. If not, the application should print an error to the log files indicating 'Record Persistence Incomplete - XML record count 11000 - Table record count 50'
The application should use a logger, like log4j for logging, and should log all details to the log file.
the application should be compiled using the Java 8 platform, and preferable will be distributed as a far jar.