google bigquery - Data Quality on date field -
i'm having problems deal date fields on string format, i'm trying sanitize base. default format date string in base "%d/%m/%y", since comes csv file, there undesirable strings. i'm getting errors when try parse with:
parse_date('%d/%m/%y',trim(e.checkin))
like this, every time run new problematic char:
illegal non-space trailing data '�' in string "25/07/201€"
what best way deal this?
core in addressing issue in identifying rows "broken" checkin
field
then, can skip processing them below
#standardsql e ( select '25/07/201€' checkin union select '25/07/2016' ) select checkin checking_string, parse_date('%d/%m/%y',trim(e.checkin)) checking_date e regexp_contains(e.checkin, r'^(\d){2}/(\d){2}/(\d){4}$')
or can come logic of treating somehow specially broken values
i case, using where regexp_contains(e.checkin, r'^(\d){2}/(\d){2}/(\d){4}$')
approach allows control situation
hope gives yo enough directions
Comments
Post a Comment