| | |
- SOFTDownloader
- SOFTParser
- entity
class SOFTDownloader |
| | |
Methods defined here:
- __init__(self, filename, url='ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/GDS/', output_directory='data', timeout=30)
- getFile(self, url, filename)
- Note raises urllib2.URLError when it can't find the file
- getFilePath(self)
- writeFile(self, file_obj, dir, filename)
|
class SOFTParser |
| |
A parser for SOFT formatted datasets GEO SOFT format is documented here:
http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat
Note the docs do not describe everything. |
| |
Methods defined here:
- __init__(self, filename)
- filename is the name of the file you wish to parse.
Should handle .soft and .soft.gz
- addEntitiesToDatabase(self, host='localhost', dbName='SOFTFile', user='AUREA', password='URDumb')
- A simple entity insertion function.
This is really a helper function that was used in testing AUREA.
It needs to be modified if used.
- getColumnHeadings(self, tableNum=0)
- Returns a list of all column headings
- getColumnHeadingsInfo(self, tableNum=0)
- Returns a list of tuples containing (column heading, column description)
- getDataColumnHeadings(self)
- Using the subset entities, this function returns the data column names.
This is actually the union of the subsets.
- getEntities(self)
- Returns a list of entities
Entities are meta-data objects.
they have a type, value and a dict of attributes
- getIDENTIFIER(self, identifier_label='IDENTIFIER', tableNum=0)
- This is not guaranteed to be there, but so far it has been.
If it is not available we will have to handle it somehow
This should map to genes and each value will not be unique.
- getID_REF(self, id_ref='ID_REF', tableNum=0)
- It appears that this is a required column in SOFT data tables.
It is required to correspond to the probes. This in general should be our key
It will probably be the first column, but we can't really be sure
This function returns the ordered values of this column for mapping
back to the rows.
Each value should be unique
- getKeyColumnHeadings(self)
- Returns any non data column headers.
i.e. COLUMN_HEADINGS - getDataColumnHeadings()
- getNumTables(self)
- Returns the number of tables
NOTE: I have yet to find a softfile with multiple tables. This needs to be tested.
- getRowHeadings(self, tableNum=0)
- ************DEPRECATED*********************
use getID_REF and getIDENTIFIER as row labels
This returns list of the row headings in the order they
exist in the table
- getSubsetSamples(self, subset)
- Takes a entity object of type SUBSET
Returns a list of the subset sample id's found in subset entity
- getSubsets(self)
- Returns a list of all entities that are marked as subsets
- getTable(self, tableNum=0, lock=False)
- Returns the data table at index tableNum, defaults to the first table
- printTable(self)
- Helper function.
- setRowHeadings(self, colNum, tableNum=0)
- This lets the user set the column that contains the row classifications.
In this case that means the gene names.
|
class entity |
| |
A container for meta-data provided by the soft file.
Type: the type of entity (Database, subset, etc)
Value: usually a unique id
attributes: a dict with related attributes where each key points to a list if provided values |
| |
Methods defined here:
- __init__(self, type, value)
- __repr__(self)
- Tag based, called with 'print entity'
- addToDatabase(self, cursor, DATABASE=None, DATASET=None, SUBSET=None, file_name=None)
- prettyPrint(self)
- Returns a nice human readable description of the entity
| |