azure data lake - How to parse big string U-SQL Regex -
i have got big csvs contain big strings. wanna parse them in u-sql.
@t1 = select regex.match("id=881cf2f5f474579a:t=1489536183:s=alni_mzsmmpa4voge4kqmyxoocew2aor0q", "id=(?<id>\\w+):t=(?<t>\\w+):s=(?<s>[\\w\\d_]*)") p (values(1)) fe(n); @t2 = select p.groups["id"].value gads_id, p.groups["t"].value gads_t, p.groups["s"].value gads_s @t1; output @t "/inhabit/test.csv" using outputters.csv();
severity code description project file line suppression state error e_csc_user_invalidcolumntype: 'system.text.regularexpressions.match' cannot used column type.
i know how in sql way explode/cross apply/group by. may possible without these dances?
one more update
@t1 = select regex.match("id=881cf2f5f474579a:t=1489536183:s=alni_mzsmmpa4voge4kqmyxoocew2aor0q", "id=(?<id>\\w+):t=(?<t>\\w+):s=(?<s>[\\w\\d_]*)").groups["id"].value id, regex.match("id=881cf2f5f474579a:t=1489536183:s=alni_mzsmmpa4voge4kqmyxoocew2aor0q", "id=(?<id>\\w+):t=(?<t>\\w+):s=(?<s>[\\w\\d_]*)").groups["t"].value t, regex.match("id=881cf2f5f474579a:t=1489536183:s=alni_mzsmmpa4voge4kqmyxoocew2aor0q", "id=(?<id>\\w+):t=(?<t>\\w+):s=(?<s>[\\w\\d_]*)").groups["s"].value s (values(1)) fe(n); output @t1 "/inhabit/test.csv" using outputters.csv();
this wariant works fine. there question. regex evauated 3 times per row? exists chance hint u-sql engine - function regex.match deterministic.
you should using more efficient regex.match. answer original question:
system.text.regularexpressions.match
not part of built-in u-sql types.
thus need convert built-in type, such string
or sqlarray<string>
or wrap udt provides iformatter
make user-defined type.
Comments
Post a Comment