Folder Structure - Where to find what Files (In- and Output)¶
Unless the paths are not adapted/overwritten in config.ini, Comprior builds up the following folder infrastructure during processing:
data
├── input
│ ├── dataset
│ └── example
├── intermediate
│ ├── dataset
│ ├── crossvalidation
│ └── externalKnowledge
└── results
└── XXX
├── timeLogs
├── preanalysis
├── geneRankings
└── evaluation
├── rankings
├── annotation
└── metrics
├── reducedData
└── classification
├── metrics
└── crossEvaluation
├── reducedData
└── classification
input/¶
- dataset/: put your input dataset here
- example/: folder with example files for trying out Comprior
intermediate/¶
- dataset/: preprocessed input data (currently metadata added to one file)
- crossvalidation/: contains preprocessed dataset for cross-validation (e.g. mapped to the right identifier or pathway features)
- externalKnowledge/: one sub-folder per knowledge base that is queried with query results
results/¶
XXX/: output folder for the current run, whose name is specified by the outputDir_name parameter in config.ini (if there already exists a folder with such a name, Comprior adds a number to the name)
timeLogs/: one file for every selected approach, containing logs with time durations of different selection activities, e.g. external knowledge retrieval or statistical feature selection
preanalysis/: contains - if selected via preanalysis_plots and evaluateKBcoverage parameters in config.ini - plots on data set characteristics and knowledge base coverage
geneRankings/: contains the actual feature rankings, one CSV file for every selected approach
evaluation/: contains all evaluation results
- rankings/: contains evaluation results from analyzing the feature rankings
- annotation/: contains annotation/enrichment files for every ranking
- metrics/: contains the actual metrics results to compare the rankings
- reducedData/: one sub-folder per selection approach containing input data for the top k features; these files are used for the actual classification/prediction
- classification/:
- metrics/: contains the actual classification metrics results, one CSV file for every selected metric, also contains pdfs for visualizations
- crossEvaluation/: contains evaluation data from the second data set for cross-validation
- reducedData/: one sub-folder per selection approach containing input data (second data set) for the top k features; these files are used for the actual classification/prediction
- classification/: contains the actual classification metrics results, one CSV file for every selected metric, also contains pdfs for visualizations