hadoop - hive date & time stamp from unix_timestamp()

I need two columns to be inserted with current date(sysdate) and time stamp.I have created the table and inserting data using unix_timestamp. I am not able to convert into hive date and time stamp format.############ Hive create table #############create table informatica_p2020.M23_MD_LOC_BKEY(group_nm string,loc string,natural_key string,loc_sk_id int,**load_date date,load_time timestamp)**ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','LINES TERMINATED BY '\n'STORED AS TEXTFILELOCATION '/user/spanda20/informatica_p2020/infor_external/m23_md_loc...Read more

hadoop - sqoop inserting data into wrong hive column from rdbms table

I have a table called 'employee' in SQL Server :ID NAME ADDRESS DESIGNATION1 Jack XXX Clerk2 John YYY EngineerI have created an external table (emp) in hive and through sqoop import I imported data from employee to hive table using --query argument of sqoop. If I mention --query as 'select * from employee' then data gets inserted to hive table correctly.But if I mention --query as 'select ID,NAME,DESIGNATION' from employee' then data in DESIGNATION column of 'employee' table(rdbms) is getting inserted to address co...Read more

hadoop - Hive update all values in a column

I have an external partitioned Hive table. One of its columns is a string named OLDDATE that has the date in a different format(DD-MM-YY). I want to update the column and store dates in YYYY-MM-DD format. All years are 20XX. So I thought of thisselect CONCAT('20',SPLIT(OLDDATE ,'-')[2],'-',SPLIT(OLDDATE ,'-')[1],'-',SPLIT(OLDDATE ,'-')[0]) from tableThis gives me the dates in the format I want. Now how do I overwrite the old date with this new date?...Read more

hadoop - hive ngram stopword list?

While listed as one of the example use cases ... I haven't found an example of filtering out junk words (and, or, etc) from a Hive n-gram.SELECT explode(context_ngrams(sentences(lower(description)), array("criminal", null), 10)) AS x FROM mapped_discussions;{"ngram":["justice"],"estfrequency":274.0}{"ngram":["behavior"],"estfrequency":121.0}{"ngram":["law"],"estfrequency":92.0}{"ngram":["activity"],"estfrequency":69.0}{"ngram":["acts"],"estfrequency":41.0}{"ngram":["procedure"],"estfrequency":35.0}{"ngram":["and"],"estfrequency":29.0}{"ngram":[...Read more

hadoop - data insertion into hive table

I want to insert data into hive table.1) create database.2) create table in particular database.3) create a dummy table in particular position.4)Using dummy table insert data into main table.when i insert data process complete without exception but data not insert in table.hive> create database final;OKTime taken: 2.56 secondshive> create table final.abc (user_name string, password string)> ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n'> STORED AS TEXTFILE;OKTime taken: 0.591 secondshive> creat...Read more

hadoop - hive external partitioned table

First i created hive external table partitioned by code and date CREATE EXTERNAL TABLE IF NOT EXISTS XYZ(ID STRING,SAL BIGINT,NAME STRING,)PARTITIONED BY (CODE INT,DATE STRING)ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"LOCATION '/old_work/XYZ';and then i execute insert overwrite on this table taking data from other tableINSERT OVERWRITE TABLE XYZ PARTITION (CODE,DATE)SELECT*FROM TEMP_XYZ;and after that i count the...Read more

hadoop - How Hive stores the data (loaded from HDFS)?

I am fairly new to Hadoop (HDFS and Hbase) and Hadoop Eco system (Hive, Pig, Impala etc.). I have got a good understanding of Hadoop components such as NamedNode, DataNode, Job Tracker, Task Tracker and how they work in tandem to store the data in efficient manner.While trying to understand fundamentals of data access layer such as Hive, I need to understand where exactly a table’s data (created in Hive) gets stored? We can create external and internal table in Hive. As external tables can be in HDFS or any other file system, Hive doesnt store ...Read more

hadoop - Difference between Pig and Hive? Why have both?

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS (PDF link).I understand that-Pig's language Pig Latin is a shiftfrom(suits the way programmers think)SQL like declarative style ofprogramming and Hive's query language closelyresembles SQL. Pig sits on top of Hadoop and inprinciple can also sit on top ofDryad. I might be wrong but Hive isclosely coupled to Hadoop. Both Pig Latin and Hive commandscompiles to Map and Reduce jobs.My quest...Read more

hadoop - How can I partition a table with HIVE?

I've been playing with Hive for few days now but I still have a hard time with partition.I've been recording Apache logs (Combine format) in Hadoop for few months. They are stored in row text format, partitioned by date (via flume):/logs/yyyy/mm/dd/hh/*Example:/logs/2012/02/10/00/Part01xx (02/10/2012 12:00 am)/logs/2012/02/10/00/Part02xx/logs/2012/02/10/13/Part0xxx (02/10/2012 01:00 pm)The date in the combined log file is following this format [10/Feb/2012:00:00:00 -0800] How can I create a external table with partition in Hive that use my phys...Read more

hadoop - how to verify and match different date formats in hive

I have below dates in my hive table :Jan 2014Oct-138-Nov8-Oct30-Nov-11I need to convert them in the 'yyyy-MM-dd' format.I have used from_unixtime(unix_timestamp(change_log_date ,'yyyyMMdd'), 'yyyy-MM-dd') to covert date format which is working fine for 30-Nov-11 however since I have different date formats in the data so how to write generic code which will check date format and convert it into 'yyyy-MM-dd'.I need to put 0 for day/month/year if its not present.for eg. I need to convert 8-Oct into '0000-10-08'need help...Read more

hadoop - Wrong result for count(*) in hive table

I have created a table in HIVECREATE TABLE IF NOT EXISTS daily_firstseen_analysis ( firstSeen STRING, category STRING, circle STRING, specId STRING, language STRING, osType STRING, count INT) PARTITIONED BY (day STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS orc;count(*) is not giving me correct result for this tablehive> select count(*) from daily_firstseen_analysis;OK75Time taken: 0.922 seconds, Fetched: 1 row(s)While ...Read more