By J’son Cannelos – Partner / Principal Architect, eSage Group
“In God we trust; all others pay cash.”
- Bob French, New Orleans Tuxedo Jazz Musician (1938 – 2012)
This fairly simple Hive issue was driving me nuts for a while, so I wanted to get it out to the blog while its still fresh on my mind.
Take the following innocent Hive query:
select distinct s.date_local, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ’2012-10-07′ and s.date_local <= ’2012-10-08′ order by s.date_local desc limit 150;
Time and time again this would return:
Error in semantic analysis. Invalid table alias or column reference s
After removing each piece of the query, it turns out that the culprit was the ORDER BY clause. This piece is seems to be illegal.
order by s.date_local
Why you ask? Because, apparently, Hive doesn’t allow table references in the ORDER BY clause! Ack!
The solution is pretty simple, but not intuitive. You need to either a) remove the table reference in the fields in your ORDER BY clause or b) alias the columns you would like to use in the order by clause. Here is the corrected Hive query that works:
select distinct s.date_local as date_pacific, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ’2012-10-07′ and s.date_local <= ’2012-10-08′ order by date_pacific desc limit 150;
I’ve fell into this trap several times now. In our Hive implementation, we pretty much force strict mode (hive.mapred.mode = strict), so we have to alias tables, use existing partitions in the WHERE clause, et.
According to this JIRA link (https://issues.apache.org/jira/browse/HIVE-1449), it’s a known issue. It just says that table references are a no-no, so you don’t need to really alias your columns, however; column aliases seem safer to me. I could just as easily be joining to several tables with a “date_local” column.
Hope this helps and happy coding!