Learning HADOOP Cluster

1. What is Hadoop Cluster?

A. Hadoop is a Framework which allows Large Data Sets of data distrubuted across the number of systems around the cluster and scales from single server to thousands of Servers online.

A Quick Dirty Pic of Hadoop.
Hadoop Distributed Architechture

2. How does it Process files?

A. Hadoop Breaks the Files (ex: 1GB) into Blocks (ex: 64MB block each) and Splits across the cluster and Replicate the files into all the Nodes attached.  This makes sure that no Data has been lost or corrupted if One or More nodes goes down due to Hardware failure or anything..

3. What is MapReduce in Hadoop?

A. Map and Reduce are 2 different Jobs done in Hadoop Cluster.

Map: After Loading a Big Data File,  Data is then Mapped First using the Key value pairs and distributed across the system,

Reduce:  These value pairs are then Re-grouped Upon Request while reading data. (there’s a lot more it can do but this is just an overview.)

For more info on Map Reduce.. click here 

4. What Sizes of Data does Hadoop Handles?

A. Tera Bytes and Peta Bytes of Data (Very Large Sets of Data will be Processed and Distributed along the Nodes).
for ex: Hadoop Processed 3 Peta Bytes of Compressed Data i.e., 7 PB of Uncompressed Data which is very huge).

5. What is HIVE?

A. HIVE is a Software which sits on top of Hadoop Cluster to Retrieve Data using HiveQL (Hive Query Language) similar to SQL reads from Data Files, Select Queries, Joins, Sub Queries can be used to Retrieve Information from These Large files.

6. What else?

I am still trying to Learn Hadoop and will Post more details as and when i get to know more about it.

Click here for Hadoop Documentation.

Happy Reading..

Leave a comment

Posted by on September 12, 2012 in Database Administration


Tags: , ,

Fast Installation of Percona RPM

Often we find it difficult to Install MySql RPM Files One by One which takes a lot of time and Lookups for each command,   here’s the Easiest way to Remove Existing Percona Packages and Install them in 2 steps.  Ifound it easy and time saving.

Here’s the below command to find Existing Percona Packages on the Server.

# rpm -qa | grep Percona-Server


To Remove Percona Packages without Dependencies

# rpm -qa | grep Percona | xargs rpm –nodeps -ev

use this for removing MySql packages

 rpm -qa | grep -i MySql | xargs rpm –nodeps -ev


To Install all RPM files with Dependencies use the below command. (Note: Please cd the directory where percona Packages are Installed and then run the Command from the Shell Script)

rpm -ivh *.*


Hope This Helps..


Leave a comment

Posted by on September 10, 2012 in Database Administration


Tags: , , ,

Win Free MySQL Conference Tickets!

Percona is giving Away  three full conference passes (worth $995 each) to the Percona Live MySQL Conference and Expo, and you can win one simply by sharing the conference with your friends and colleagues! Second prize is one of ten copies of the newHigh Performance MySQL, 3rd Edition (worth $55 each), the recently released update to the popular second edition!

Please click the below link on how to win these Prizes …

Percona Free Prizes

Thanks & Have a Great Day.

Leave a comment

Posted by on March 15, 2012 in Database Administration


Tags: , , , , , ,

MySql Interview DBA Questions & Answers

Coming soon,

Please subscribe to this thread under Follow Blog Via section, so you can get an email when i post these Important MySql DBA Questions & Answers, which are very crucial for any DBA’s Learning or Experienced.

I will discuss more on DBA job duties, daily Tasks, How to Resolve Issues quickly, Identify Bottlenecks, Upgrading Servers etc in my Coming Posts, if you are truly Interested to know more about these topics or if you want to get hand’s on these topics, please subscribe to this thread.

and for more info on MySql, Please click on the Links under Database Section, i will update them with more info soon.

Thanks and have a Great day.


Tags: , , , , , , , , ,

Commonly used daily Linux Commands.

1) How to Replace a big string with WildCard Characters.  just dump the string in between the hash (#) and  it does the job.  use /g at the end to replace strings everywhere.


2) Search for a word and delete to the end of the line.  for ex: in an XML file  sometimes you need to delete 1000’s of lines matching a word or a <string> spread across the XML file,  you can’t simply replace some lines like below ex.. in MS-WORD or any other editors,   from the below ex: only delete those which starts with <pcode> until the end of the line and preseve the “\>”. use the below simple command to do the job.

<xml=? >
<scalar variable   <pcode  value=1000  test1 test3 test4> />
<vector variable   <pcode  value=1001  test5 test6 test7> />
<stellar value       <pcode  value=1002  test8 test9 test10> />


3)  How to find Un-wanted files, if you’r clean folder has been messed up.  I had this situation where some of the junk files has been added to my MySql Data direcotory where i see only the Database Tables which are useful and i used the below command to find all the files excluding MySql Database Tables.

grep ./ --exclude=*.{ibd,MYD,MYI,frm} *

4)  CHOP the file to 1gb from a 9.5GB file : i had this big file which is 9.5gb Data MySql Log file for my analysis purposes, however my script takes too long time to read this file and i had no choice to chop this file to 1GB and read the data from this 1GB file which makes it easier for MySql to read faster.  I used the below command.

 dd if=10gbfilename of=1gb_new_filename bs=100M count=10

5)  How to get extract 100 lines of data from a file which has 10,000 lines.
sed -n 1,100p test1.log > outputfile.log
6) How to find which raid your Linux software has..

for i in /dev/md*; do printf ‘%s: %s\n’ $i “$( sudo /sbin/mdadm –detail $i 2>/dev/null | grep ‘Raid Level’ )”; done

7) Convert files to Unix &  UTF8 format.

Convert to UTF8 format

/usr/bin/iconv -c -f LATIN1 -t UTF8 insert_statements_postgres1.sql > utf8_postgres_inserts.sql

/usr/bin/iconv -c -f LATIN1 -t UTF8 delete_statements_post1.sql > utf8_postgres_deletes.sql

Convert Bulk files to utf8 format: (csv files)

for file in *.csv; do
/usr/bin/iconv -c -f LATIN1 -t UTF8 “$file” -o “${file%.csv}.csv”


Tags: , , , , , ,

Replication enhancements in MySql 5.5

Recently we upgraded all our databases from MySql 5.1 to 5.5 Version and we see Couple of good enhancements done on the Replication side, couple of them i noticed are.

1) Ignore any DDL commands if they doesn’t comply with master : For ex: in a  Master-Master Setup environment

Master1 (MySql 5.1 Version)  – Slave of Master2
Master2 (MySql 5.1 Version)  – Slave of Master 1
Replica 1 (Mysql 5.5 Version)
Replica 2 (MySql 5.5 Version)

i)  we had 2 temporary tables created on Master and replicated to all the slaves where slaves are upgraded to Mysql 5.5 and Both Masters running MySql 5.1 version (bad scenario),
ii) I deleted 2 temporary tables with set Sql_log_bin=0  on master1 (not Master 2)
iii) I ran the same script on all Replication Slaves separately and Deleted temp tables.
iv) i forgot to set SQL_LOG_BIN=0 on Master2 and executed the script.  Master1  slave stopped with an error “unable to locate temp tables 1 and 2. but on Replication Slaves never been stopped or had any errors.  MySql 5.5 Version simply ignored those commands when tables doesn’t exist.

This is great enhancement.

Read the rest of this entry »


Tags: , , , , , , , , , ,

Improving Replication Performance

Have  you ever seen your replica not catching upto speed even though you have everything setup properly in your config file, i faced this situation when i setup a new replication slave and replication lag keeps increasing or not catching up fast,  tried many ways to tweak the my.cnf  configuration file, increased memory, modified buffer_pool_size to the max limit of the memory, increased additional buffer, added more cpu’s does not helped me much to fix the replication lag. finally after reading to some other blogs i found  innodb_flush_log_at_trx_commit =1 is the reason for replication lag,

When the value is 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file

Solution: Set the value is 2, the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2.

innodb_flush_log_at_trx_commit = 1

Hope this helps..


Tags: , , , , , , , , ,

%d bloggers like this: