Sunday, September 20, 2009

Import XML Data into SQL Server

How to automate comparing, updating, and inserting data from an XML feed against an existing SQL Server table.

I have an existing table in SQl Server called AcmeProductTable.

It contains the following columns:

AcmeProductID
AcmeProductName
AcmeProductPrice
AcmeProductStockQuantity

The Acme Product Company has a url that provides me with an XML file of the latest data for their products. The feed is located at:

http://www.acme.com/acmeproductfeed.


The task is to:

1. Get the daily XML product feed from their url
2. Save the XML file with today's date
3. Compare today's XML file with my existing data in SQL Server
4. Update any changes to price and stock quantity for existing products
5. Insert any new products
6. Automate the entire process so I don't need to do anything

Well, for someone who knows SQL Server and XML, it's a simple task. But, like me, if you are Googling SQL Server and XML well....

The place to start is here: http://support.microsoft.com/kb/316005

This gives an example of creating an XML map, which you will need to import your XML data file.


My XML map looks something like this:


<?xml version="1.0" ?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:xml:datatypes"
xmlns:sql="urn:schemas-microsoft-com:xml-sql" >

<ElementType name="product_id" dt:type="string" />
<ElementType name="product_name" dt:type="string" />
<ElementType name="product_price" dt:type="string" />
<ElementType name="product_stock_quanitity" dt:type="string" />


<ElementType name="acmeproductfeed" sql:is-constant="1">
<element type="acmefeed" />
</ElementType>
<ElementType name="acmefeed" sql:relation="acmefeed">
<element type="product_id" sql:field="product_id" />
<element type="product_name" sql:field="product_name" />
<element type="product_price" sql:field="product_price" />
<element type="product_stock_quanitity" sql:field="product_stock_quanitity" />
</ElementType>

</Schema>




In order to automate comparing the XML feed from the url above to my existing data in SQL Server,the AcmeProductTable, I will need the following four items:


1. An XML Map of the Acme xml feed file (see MSDN link and my example above).
2. A vbs script that will get the latest xml file from the url, date the file with today's date, save the file, and then locate the latest saved file (today's) and use the file to create a table in SQL Server.
3. A stored procedure in SQL Server that will compare the table we created in in step 2 above with my AcmeProductTable and perform any required updating and inserting.
4. A simple batch file to automate all of above.




Firstly, I am going to save my above XML map "AcmeMap.xml" into a folder on my desktop:

C:\Documents and Settings\Administrator\Desktop\Acme\AcmeMap.xml

I am going to use this same folder for all of the other files below as well.

For my VBS scipt, I will use the following, which I have saved as "acmescript.vbs":

C:\Documents and Settings\Administrator\Desktop\Acme\acmescript.vbs

Blogging, like SQL Server, is not my forte, so I have used italic where portions should be quoted out of the script:

' VBScript source code to get the file from the url then stamp the file with todays date and save it to my Acme file folder.
The important bit here is inserting the +sDate+ into the name of the saved file


sDay = Day(Now())
If Len(sDay) = 1 Then sDay = "0" & Day(Now())
sMonth = Month(Now())
If Len(sMonth) = 1 Then sMonth = "0" & Month(Now())
sYear = Year(Now())
sDate = sYear & sMonth & sDay
strFileURL = "http://www.acme.com/acmeproductfeed"
strHDLocation = "C:\Documents and Settings\Administrator\Desktop\Acme\acmefeed"+"-"+sDate+".xml"


Set objXMLHTTP = CreateObject("MSXML2.XMLHTTP")

objXMLHTTP.open "GET", strFileURL, false
objXMLHTTP.send()

If objXMLHTTP.Status = 200 Then
Set objADOStream = CreateObject("ADODB.Stream")
objADOStream.Open
objADOStream.Type = 1 'adTypeBinary

objADOStream.Write objXMLHTTP.ResponseBody
objADOStream.Position = 0 'Set the stream position to the start

Set objFSO = Createobject("Scripting.FileSystemObject")
If objFSO.Fileexists(strHDLocation) Then objFSO.DeleteFile strHDLocation
Set objFSO = Nothing

objADOStream.SaveToFile strHDLocation
objADOStream.Close
Set objADOStream = Nothing
End if

Set objXMLHTTP = Nothing



' Now create a staging stable - acmefeed - in my database I will use this acmefeed table to compare to my existing production table AcmeProductTable. Note - make the name of the staging table the same as the element below the ROOT element of the XML file. In my case, acmefeed. There is probably some way around this, but I could not find it.

Const DB_CONNECT_STRING = "provider=SQLOLEDB.1;data source=IP;database=YourDB;uid=user;pwd=password"

Set myConn = CreateObject("ADODB.Connection")

set myCommand = CreateObject("ADODB.Command" )

myConn.Open DB_CONNECT_STRING

Set myCommand.ActiveConnection = myConn

myCommand.CommandText = "CREATE TABLE acmefeed (acmefeed nvarchar(255),product_id nvarchar(255),product_name nvarchar(255),product_price nvarchar(255),product_stock_quantity nvarchar(255))"

myCommand.Execute

myConn.Close



' Now - find today's xml file that I saved above and use my xml map to insert it into the staging table, acmefeed, in my database. Note again that I using +sDate+ in the file name to get TODAY's file

sDay = Day(Now())
If Len(sDay) = 1 Then sDay = "0" & Day(Now())
sMonth = Month(Now())
If Len(sMonth) = 1 Then sMonth = "0" & Month(Now())
sYear = Year(Now())
sDate = sYear & sMonth & sDay
Set objBL = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
objBL.ConnectionString = "provider=SQLOLEDB.1;data source=IP;database=YourDB;uid=user;pwd=password"
objBL.ErrorLogFile = "C:\Documents and Settings\Administrator\Desktop\Acme\error.log"
objBL.Execute "C:\Documents and Settings\Administrator\Desktop\Acme\AcmeMap.xml", "C:\Documents and Settings\Administrator\Desktop\Acme\acmefeed"+"-"+sDate+".xml"
Set objBL = Nothing


' Now with the latest XML feed in my database in the form of a table, I can execute a stored procedure to update Price and Stock Quantity and then insert new new data into the appropriate table and then drop my acmefeed table

Const DB_CONNECT_STRINGSP = "provider=SQLOLEDB.1;data source=IP;database=YourDB;uid=user;pwd=password"

Set mySPConn = CreateObject("ADODB.Connection")

set mySPCommand = CreateObject("ADODB.Command" )

mySPConn.Open DB_CONNECT_STRINGSP

Set mySPCommand.ActiveConnection = mySPConn

mySPCommand.CommandText = "EXEC dbo.update_acmefeed"

mySPCommand.Execute

mySPConn.Close






Below is the stored procedure, "update_acmefeed", that I am executing in the vb script above. This will compare the contents of my staging table, "acmefeed", against my actual production table AcmeProductTable. It will update any existing products where the price and/or stock quantity has changed, as well as insert any new products. I then drop the staging table (acmefeed).


CREATE PROCEDURE update_acmefeed
AS
BEGIN
-- First Update any changes to price and stock quantity for existing items
-- Then insert any new items into the AcmeProductTable
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

declare @id varchar(255),@price numeric(9,2),@quantity varchar(255)
declare curFeed cursor for select product_id, product_price, product_quantity, from acmefeed
where exists( select AcmeProductID from AcmeProductTable where product_id=AcmeProductID)

open curFeed

Fetch next from curFeed into @id,@price,@quantity

while (@@fetch_Status = 0)
BEGIN

update AcmeProductTable set product_price=@price,product_stock_quantity=@quantity
where AcmeProductID=@id


Fetch next from curFeed into @id,@price,@quantity
end

CLOSE curFeed DEALLOCATE curFeed



insert into AcmeProductTable
select product_id, product_name,product_price,product_stock_quantity
from acmefeed A where not exists(select AcmeProductID from AcmeProductTable where AcmeProductID=a.product_id)

drop table acmefeed

END




Finally, I am going to schedule the above VBS to run daily by creating a simple batch file with the following contents:


wscript.exe C:\Documents and Settings\Administrator\Desktop\Acme\acmescript.vbs


There is obviously redundancy in the date function as well as the ADO connections, but this is the original scripting I used to get this to work.

I am sure there are more elegant ways of doing this, but I hope this helps someone...


Sources:

Download File:
http://blog.netnerds.net/2007/01/vbscript-download-and-save-a-binary-file/

Add File Date:
http://www.visualbasicscript.com/m29679.aspx

No comments: