Tuesday, March 27, 2012

DTS question reguarding text import and tranformation

MS SQL 2000 sp4 on WinXp Pro SP2
I am very new to this so please let me know what I can do to make it
easier for you to understand the problem.

I have a non delimited text file. This text file has several columns
that for the most part are fixed length but..
The fixed format starts with a variable length number( char 10) as the
first column and a max (char 30) description field and another (char
50) long description field.

if the (char 30) description field has a measurement ( ie 4.5" ) then
the first field stats with a ". The subsequent description will end
with a "" or double quote. The fields can only work if the double
qoutes are replaced with a single space and then the leading single
quote is deleted. If this is done in this order then the fixed field
lengths work.

example of actual data***************

This example does not have any quotes in it*****

1015304 Sof Sand Block Holder RUDOLPH
INTE BH y N N nbrd EA 1.00000 .83 1.25

This example does have the quotes**********

"10154 Rud Zeb NonDisf Cush Cur ve 180Gr 7"" RUDOLPH
INTE DUZ180F5 y N N nbrd EA 1.00000 .43 .65"

This is the next several lines***************

1015401 Rud Blk Disf Cush Curved 100/180Gr RUDOLPH
INTE DUB100/1 y N N nbrd EA 1.00000 .46 .69

1015402 Rud Blk Disf Cush Curved 100Gr RUDOLPH
INTE DUB100F5 y N N nbrd EA 1.00000 .46 .69

1015405 Rud Myl Disif Violet 100 Gr RUDOLPH
INTE GLMM100F y N N nbrd EA 1.00000 .43 .65

1015406 Rud Myl Purple Disf 80Gr RUDOLPH
INTE GLMM080F y N N nbrd EA 1.00000 .43 .65

1015407 Rud Myl Light Green Disf 120Gr RUDOLPH
INTE GLMM120F y N N nbrd EA 1.00000 .43 .65

"1015408 Rud Myl Yellow Disf 180G r 7"" RUDOLPH
INTE GLMM180F y N N nbrd EA 1.00000 .43 .65"

end of example*******************

right now I am doing this by hand in notepad and then importing to
excel to generate a csv file. Can anyone suggest a better way to do
this or just point me to an example.

Thanks very much for anyIn the past I had to perform similar tasks very often to import text files
generated from mainframe systems (or other third party sources that could
not directly import) to SQL Server. Normally I have done it two ways based
on the tools available:

1) Use a pre-processing tool to clean up the data and then import to SQL
Server via BCP or DTS. Sometimes I had to write my own little utility
program or script to clean up the data, sometimes I was able to find tools
that are available out there. In your case a simple search and replace tool
should do the job (probably better to have command line interface so you can
automate the process). There are many available and if you do not feel
comfortable writing your own tool you just need to Google for "search and
replace" and you will find a few. I did a quick search and here are two
links, but please test and evaluate yourself:
http://www.microsoft.com/technet/sc...05/hey0208.mspx
http://www.thefreecountry.com/progr...ndreplace.shtml
2) The second approach is to use BCP or DTS to import the text file to a
staging table with a single varchar (or nvarchar if you have UNICODE
characters in the data) column. This works well if you can fit one row of
the text file into the size of varchar(8000) or nvarchar(4000). Otherwise
you can still split it into multiple columns but processing afterwards
becomes more complex. Then you can use the various string functions in SQL
Server to perform the data clean up, split into columns and insert the final
results into your production table. In your case the REPLACE and SUBSTRING
functions could do it.

Hope this helps.

Regards,

Plamen Ratchev
http://www.SQLStudio.com|||To add on to Plamen's response, you can include an ActiveX script to task in
your DTS package to remove the extraneous quotes and then process the
cleaned up file in your transformation. You can change your transformation
to process this file as fixed-length instead of delimited.

Below is the body of an ActiveX script that will correct the sample data you
posted:

Const ForReading = 1
Const ForWriting = 2
Dim FSO, inFile, outFile, record

Set FSO = CreateObject("Scripting.FileSystemObject")
Set inFile = FSO.OpenTextFile( _
"C:\MyFiles\MyInputFile.txt", ForReading)
Set outFile = FSO.OpenTextFile( _
"C:\MyFiles\MyOutputFile.txt", ForWriting, True)

Do While inFile.AtEndOfStream = False
record = InFile.ReadLine()
record = Replace(record, """""", " ") 'replace "" with single space
record = Replace(record, """", "") 'remove "
outFile.WriteLine record
Loop

inFile.Close
outFile.Close
Set inFile = Nothing
Set outFile = Nothing
Set FSO = Nothing

--
Hope this helps.

Dan Guzman
SQL Server MVP

"lwhite" <angryaardvark@.excite.comwrote in message
news:rbadr2lnhmtl988f4brdm75btckjjpvm8i@.4ax.com...

Quote:

Originally Posted by

>
MS SQL 2000 sp4 on WinXp Pro SP2
I am very new to this so please let me know what I can do to make it
easier for you to understand the problem.
>
>
I have a non delimited text file. This text file has several columns
that for the most part are fixed length but..
The fixed format starts with a variable length number( char 10) as the
first column and a max (char 30) description field and another (char
50) long description field.
>
if the (char 30) description field has a measurement ( ie 4.5" ) then
the first field stats with a ". The subsequent description will end
with a "" or double quote. The fields can only work if the double
qoutes are replaced with a single space and then the leading single
quote is deleted. If this is done in this order then the fixed field
lengths work.
>
example of actual data***************
>
This example does not have any quotes in it*****
>
1015304 Sof Sand Block Holder RUDOLPH
INTE BH y N N nbrd EA 1.00000 .83 1.25
>
This example does have the quotes**********
>
"10154 Rud Zeb NonDisf Cush Cur ve 180Gr 7"" RUDOLPH
INTE DUZ180F5 y N N nbrd EA 1.00000 .43 .65"
>
This is the next several lines***************
>
1015401 Rud Blk Disf Cush Curved 100/180Gr RUDOLPH
INTE DUB100/1 y N N nbrd EA 1.00000 .46 .69
>
1015402 Rud Blk Disf Cush Curved 100Gr RUDOLPH
INTE DUB100F5 y N N nbrd EA 1.00000 .46 .69
>
1015405 Rud Myl Disif Violet 100 Gr RUDOLPH
INTE GLMM100F y N N nbrd EA 1.00000 .43 .65
>
1015406 Rud Myl Purple Disf 80Gr RUDOLPH
INTE GLMM080F y N N nbrd EA 1.00000 .43 .65
>
1015407 Rud Myl Light Green Disf 120Gr RUDOLPH
INTE GLMM120F y N N nbrd EA 1.00000 .43 .65
>
"1015408 Rud Myl Yellow Disf 180G r 7"" RUDOLPH
INTE GLMM180F y N N nbrd EA 1.00000 .43 .65"
>
end of example*******************
>
right now I am doing this by hand in notepad and then importing to
excel to generate a csv file. Can anyone suggest a better way to do
this or just point me to an example.
>
Thanks very much for any

No comments:

Post a Comment