sql - What is the difference between TRUNC and TO_DATE in Hive -
hi trying find out difference between using trunc , to_date in hive.
currently within oracle wrote following case statement against data shown below:
order_no | name | date_ | task_no abc123 | humpty | 07-oct-16 12:30:54 | 1 abc123 | humpty | 07-oct-16 12:30:54 | 2 abc123 | humpty | 07-oct-16 12:32:20 | 6 select order_no, name, date_, task_no (case when date_ - lag(date_) on (partition order_no, name, trunc(date_) order date_) <= 1/48 0 else 1 end) count1
and gives me result:
order_no | name | date_ | task_no | count1 abc123 | humpty | 07-oct-16 12:30:54 | 1 | 1 abc123 | humpty | 07-oct-16 12:30:54 | 2 | 0 abc123 | humpty | 07-oct-16 12:32:20 | 6 | 1
which correct. if use same query in hive against full data set error message:
error while compiling statement: failed: semanticexception failed breakup windowing invocations groups. @ least 1 group must depend on input columns.
so changed trunc to_date , works , gives me following results:
select order_no, name, date_, task_no (case when date_ - lag(date_) on (partition order_no, name, to_date(date_) order date_) <= 1/48 0 else 1 end) count1
and gives me result:
order_no | name | date_ | task_no | count1 abc123 | humpty | 07-oct-16 12:30:54 | 1 | 1 abc123 | humpty | 07-oct-16 12:32:20 | 6 | 1 abc123 | humpty | 07-oct-16 12:30:54 | 2 | 1
which different in oracle. can gather date value stored string results arent ordered in date/time , think problem lies not sure changes need make fix it.
would appreciate advice.
updated code:
select order_no ,name ,date_fixed ,task_no ,case when date_uts - lag(date_uts) on (partition order_no, name, to_date(date_fixed) order date_fixed) <= 60*30 0 else 1 end count1 ( select order_no ,name ,task_no ,from_unixtime(unix_timestamp(date_, 'dd-mmm-yy hh:mm:ss')) date_fixed ,unix_timestamp(date_, 'dd-mmm-yy hh:mm:ss') date_uts table1 ) t
1
hive operators , user-defined functions (udfs)
to_date
returns date part of timestamp string (pre-hive 2.1.0):
to_date("1970-01-01 00:00:00") = "1970-01-01".
of hive 2.1.0, returns date object.
prior hive 2.1.0 (hive-13248) return type string because no date type existed when method created.
trunc
returns date truncated unit specified format (as of hive 1.2.0).
supported formats: month/mon/mm, year/yyyy/yy.
example: trunc('2015-03-17', 'mm') = 2015-03-01.
2
you have errors in original query
- there no comma between
task_no
,(case when
trunc
in hive must take 1 parameter, , there no parameter day.- there no minus operator dates (and definitly not strings). results in null.
3
the recognize date format in hive yyyy-mm-dd not match data.
applying date functions on invalid string result in null.
this how convert data format dates:
hive> select from_unixtime(unix_timestamp('07-oct-16 12:30:54','dd-mmm-yy hh:mm:ss')); ok 2016-10-07 12:30:54
and whole query:
select order_no ,name ,date_fixed ,task_no ,case when date_uts - lag(date_uts) on ( partition order_no,name,to_date(date_fixed) order date_fixed ) <= 60*30 0 else 1 end count1 (select order_no ,name ,task_no ,from_unixtime(unix_timestamp(date_,'dd-mmm-yy hh:mm:ss')) date_fixed ,unix_timestamp(date_,'dd-mmm-yy hh:mm:ss') date_uts t ) t ;
abc123 humpty 2016-10-07 12:30:54 2 1 abc123 humpty 2016-10-07 12:30:54 1 0 abc123 humpty 2016-10-07 12:32:20 6 0
this results when tested on oracle
Comments
Post a Comment