sql - Error: Input array is longer than number of columns in this table powershell -
i trying load 160gb csv file sql , using powershell script got github , error
iexception calling "add" "1" argument(s): "input array longer number of columns in table." @ c:\b.ps1:54 char:26 + [void]$datatable.rows.add <<<< ($line.split($delimiter)) + categoryinfo : notspecified: (:) [], methodinvocationexception + fullyqualifiederrorid : dotnetmethodexception
so checked same code small 3 line csv , of columns match , have header in first row , there no delimiters not sure why getting error.
the code below
<# 8-faster-runspaces.ps1 #> # set csv attributes $csv = "m:\d\s.txt" $delimiter = "`t" # set connstring $connstring = "data source=.;integrated security=true;initial catalog=presentationoptimized;packet size=32767;" # set batchsize 2000 $batchsize = 2000 # create datatable $datatable = new-object system.data.datatable # add generic columns $columns = (get-content $csv -first 1).split($delimiter) foreach ($column in $columns) { [void]$datatable.columns.add() } # setup runspace pool , scriptblock runs inside each runspace $pool = [runspacefactory]::createrunspacepool(1,5) $pool.apartmentstate = "mta" $pool.open() $runspaces = @() # setup scriptblock. workhorse. think of function. $scriptblock = { param ( [string]$connstring, [object]$dtbatch, [int]$batchsize ) $bulkcopy = new-object data.sqlclient.sqlbulkcopy($connstring,"tablelock") $bulkcopy.destinationtablename = "abc" $bulkcopy.batchsize = $batchsize $bulkcopy.writetoserver($dtbatch) $bulkcopy.close() $dtbatch.clear() $bulkcopy.dispose() $dtbatch.dispose() } # start timer $time = [system.diagnostics.stopwatch]::startnew() # open text file disk , process. $reader = new-object system.io.streamreader($csv) write-output "starting insert.." while ((($line = $reader.readline()) -ne $null)) { [void]$datatable.rows.add($line.split($delimiter)) if ($datatable.rows.count % $batchsize -eq 0) { $runspace = [powershell]::create() [void]$runspace.addscript($scriptblock) [void]$runspace.addargument($connstring) [void]$runspace.addargument($datatable) # <-- send datatable [void]$runspace.addargument($batchsize) $runspace.runspacepool = $pool $runspaces += [pscustomobject]@{ pipe = $runspace; status = $runspace.begininvoke() } # overwrite object shell of $datatable = $datatable.clone() # <-- create new datatable object } } # close file $reader.close() # wait runspaces complete while ($runspaces.status.iscompleted -notcontains $true) {} # end timer $secs = $time.elapsed.totalseconds # cleanup runspaces foreach ($runspace in $runspaces ) { [void]$runspace.pipe.endinvoke($runspace.status) # endinvoke method retrieves results of asynchronous call $runspace.pipe.dispose() } # cleanup runspace pool $pool.close() $pool.dispose() # cleanup sql connections [system.data.sqlclient.sqlconnection]::clearallpools() # done! format output display $totalrows = 1000000 $rs = "{0:n0}" -f [int]($totalrows / $secs) $rm = "{0:n0}" -f [int]($totalrows / $secs * 60) $mill = "{0:n0}" -f $totalrows write-output "$mill rows imported in $([math]::round($secs,2)) seconds ($rs rows/sec , $rm rows/min)"
working 160 gb input file going pain. can't load kind of editor - or @ least don't analyze such data mass without serious automation.
as per comments, seems data has quality issues. in order find offending data, try binary searching. approach shrinks data fast. so,
1) split file in 2 equal chunks. 2) try , load first chunk. 3) if successful, process second chunk. if not, see 6). 4) try , load second chunk. 5) if successful, files valid, got data quality issue. start looking other causes. if not, see 6). 6) if either load failed, start beginning , use failed file input file. 7) repeat until narrow down offending row(s).
another method using etl tool ssis. configure package redirect invalid rows error log see data not working properly.
Comments
Post a Comment