Tricks for parsing API’s output¶

Not everyone is glad to write a XML parser from scratch, so followings are some one-line commands to parse the xml file. and output them as a Tab-separated text.

You can save and open it with Microsoft Excel or use any command-line utillity such as grep and awk to do more data minings conveniently.

Get the vocabulary¶

Maybe you can’t find the dataset you want because you choose an alias. For example, if you search a factor with JMJ, you will get nothing returned because we used Jarid2 to name that factor.

In the situation above, you may need to download the whole vocabulary to find the standard name of the factor you’re looking for, then search the database using that keyword.

The following commands will parse the xml file into a vocabulary table with ID and Name:

Get the whole Factor vocabulary:

curl http://cistrome.org/cisapi/get/f -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}'

Get the whole CellLine vocabulary:

curl http://cistrome.org/cisapi/get/cl -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}'

Get the whole CellPop vocabulary:

curl http://cistrome.org/cisapi/get/cp -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}'

Get the whole CellType vocabulary:

curl http://cistrome.org/cisapi/get/ct -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}'

Get the whole TissueType vocabulary:

curl http://cistrome.org/cisapi/get/t -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}'

Get the whole DiseaseState vocabulary:

curl http://cistrome.org/cisapi/get/ds -s | awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}'

Or do all the above 6 commands together:

meta_data=`for i in f cl cp ct t ds;do printf "http://cistrome.org/cisapi/get/$i\n"; done `
curl $meta_data|awk -F '[<>]' '/id/{printf "%s\t",$3}/name/{print $3}' > all_volcabulary

Get the Big Table of Sample¶

The following commands will parse all the Sample data with the prefix ESM, SRX, or CSM into a big table:

curl http://cistrome.org/cisapi/get/s2?uid=esm http://cistrome.org/cisapi/get/s2?uid=srx http://cistrome.org/cisapi/get/s2?uid=csm > todo
cat todo|awk -F '[<>]' '{if (/factor/) printf ("\n%s\t",$3); else if (!/sample/) printf ("%s\t",$3)}END{printf ("\n")}'

Fetch all GSM IDs of a factor¶

Write these lines into a script, input the factor name as the first parameter:

curl -s http://cistrome.org/cisapi/get/s2?fname=$1| \
    awk -F '[<>]' '{if (/factor/) printf ("\n%s\t",$3); else if (!/sample/) printf ("%s\t",$3)}END{printf ("\n")}'| \
        cut -f 11| \
           grep GSM > $1.tab

Tricks for parsing API’s output¶

Get the vocabulary¶

Get the Big Table of Sample¶

Fetch all GSM IDs of a factor¶

Project Versions

Table Of Contents

Previous topic

This Page

Navigation

Tricks for parsing API’s output¶

Get the vocabulary¶

Get the Big Table of Sample¶

Fetch all GSM IDs of a factor¶

Project Versions

RTD Search

Table Of Contents

Previous topic

This Page

Quick search

Navigation