sql - Error: Input array is longer than number of columns in this table powershell -


i trying load 160gb csv file sql , using powershell script got github , error

    iexception calling "add" "1" argument(s): "input array longer number of columns in table." @ c:\b.ps1:54 char:26 + [void]$datatable.rows.add <<<< ($line.split($delimiter))     + categoryinfo          : notspecified: (:) [], methodinvocationexception     + fullyqualifiederrorid : dotnetmethodexception 

so checked same code small 3 line csv , of columns match , have header in first row , there no delimiters not sure why getting error.

the code below

<# 8-faster-runspaces.ps1 #> # set csv attributes $csv = "m:\d\s.txt" $delimiter = "`t"  # set connstring $connstring = "data source=.;integrated security=true;initial catalog=presentationoptimized;packet size=32767;"  # set batchsize 2000 $batchsize = 2000  # create datatable $datatable = new-object system.data.datatable  # add generic columns $columns = (get-content $csv -first 1).split($delimiter)  foreach ($column in $columns) {  [void]$datatable.columns.add() }  # setup runspace pool , scriptblock runs inside each runspace $pool = [runspacefactory]::createrunspacepool(1,5) $pool.apartmentstate = "mta" $pool.open() $runspaces = @()  # setup scriptblock. workhorse. think of function. $scriptblock = {    param ( [string]$connstring, [object]$dtbatch, [int]$batchsize    )  $bulkcopy = new-object data.sqlclient.sqlbulkcopy($connstring,"tablelock") $bulkcopy.destinationtablename = "abc" $bulkcopy.batchsize = $batchsize $bulkcopy.writetoserver($dtbatch) $bulkcopy.close() $dtbatch.clear() $bulkcopy.dispose() $dtbatch.dispose() }  # start timer $time = [system.diagnostics.stopwatch]::startnew()  # open text file disk , process. $reader = new-object system.io.streamreader($csv)  write-output "starting insert.." while ((($line = $reader.readline()) -ne $null)) { [void]$datatable.rows.add($line.split($delimiter))  if ($datatable.rows.count % $batchsize -eq 0)  {    $runspace = [powershell]::create()    [void]$runspace.addscript($scriptblock)    [void]$runspace.addargument($connstring)    [void]$runspace.addargument($datatable) # <-- send datatable    [void]$runspace.addargument($batchsize)    $runspace.runspacepool = $pool    $runspaces += [pscustomobject]@{ pipe = $runspace; status = $runspace.begininvoke() }     # overwrite object shell of   $datatable = $datatable.clone() # <-- create new datatable object } }  # close file $reader.close()  # wait runspaces complete while ($runspaces.status.iscompleted -notcontains $true) {}  # end timer $secs = $time.elapsed.totalseconds  # cleanup runspaces  foreach ($runspace in $runspaces ) {  [void]$runspace.pipe.endinvoke($runspace.status) # endinvoke method retrieves results of asynchronous call $runspace.pipe.dispose() }  # cleanup runspace pool $pool.close()  $pool.dispose()  # cleanup sql connections [system.data.sqlclient.sqlconnection]::clearallpools()  # done! format output display $totalrows = 1000000 $rs = "{0:n0}" -f [int]($totalrows / $secs) $rm = "{0:n0}" -f [int]($totalrows / $secs * 60) $mill = "{0:n0}" -f $totalrows  write-output "$mill rows imported in $([math]::round($secs,2)) seconds ($rs rows/sec , $rm rows/min)" 

working 160 gb input file going pain. can't load kind of editor - or @ least don't analyze such data mass without serious automation.

as per comments, seems data has quality issues. in order find offending data, try binary searching. approach shrinks data fast. so,

1) split file in 2 equal chunks. 2) try , load first chunk. 3) if successful, process second chunk. if not, see 6). 4) try , load second chunk. 5) if successful, files valid, got data quality issue. start looking other causes. if not, see 6). 6) if either load failed, start beginning , use failed file input file. 7) repeat until narrow down offending row(s). 

another method using etl tool ssis. configure package redirect invalid rows error log see data not working properly.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -