=============================== Tricks for parsing API's output =============================== .. highlight:: bash Not everyone is glad to write a XML parser from scratch, so followings are some one-line commands to parse the xml file. and output them as a Tab-separated text. You can save and open it with Microsoft Excel or use any command-line utillity such as ``grep`` and ``awk`` to do more data minings conveniently. Get the vocabulary ================== Maybe you can't find the dataset you want because you choose an alias. For example, if you search a factor with ``JMJ``, you will get nothing returned because we used ``Jarid2`` to name that factor. In the situation above, you may need to download the whole vocabulary to find the standard name of the factor you're looking for, then search the database using that keyword. The following commands will parse the xml file into a vocabulary table with ``ID`` and ``Name``: Get the whole Factor vocabulary:: curl http://cistrome.org/cisapi/get/f -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' Get the whole CellLine vocabulary:: curl http://cistrome.org/cisapi/get/cl -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' Get the whole CellPop vocabulary:: curl http://cistrome.org/cisapi/get/cp -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' Get the whole CellType vocabulary:: curl http://cistrome.org/cisapi/get/ct -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' Get the whole TissueType vocabulary:: curl http://cistrome.org/cisapi/get/t -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' Get the whole DiseaseState vocabulary:: curl http://cistrome.org/cisapi/get/ds -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' Or do all the above 6 commands together:: meta_data=`for i in f cl cp ct t ds;do printf "http://cistrome.org/cisapi/get/$i\n"; done ` curl $meta_data|awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' > all_volcabulary Get the Big Table of Sample =========================== The following commands will parse all the Sample data with the prefix ``ESM``, ``SRX``, or ``CSM`` into a big table:: curl http://cistrome.org/cisapi/get/s2?uid=esm http://cistrome.org/cisapi/get/s2?uid=srx http://cistrome.org/cisapi/get/s2?uid=csm > todo cat todo|awk -F '[<>]' '{if (/factor/) printf ("\n%s\t",$3); else if (!/sample/) printf ("%s\t",$3)}END{printf ("\n")}' Fetch all GSM IDs of a factor ============================= Write these lines into a script, input the factor name as the first parameter:: curl -s http://cistrome.org/cisapi/get/s2?fname=$1| \ awk -F '[<>]' '{if (/factor/) printf ("\n%s\t",$3); else if (!/sample/) printf ("%s\t",$3)}END{printf ("\n")}'| \ cut -f 11| \ grep GSM > $1.tab