This is a prospective study and The investigators use routine pathological specimens for whole exome sequencing (WES) and immunohistochemical stains.
Pathological examinations include PD-L1, EGFR status, ALK, and ROS-1.
WES: Total DNA were extracted from paraffin-embedded tumor specimens with the QIAamp DNA FFPE Tissue Kit (QIAGEN GmbH, Hilden, Germany). The coding size was 45 Mb. For DNA whole exome sequence, briefly, tumors were sonicated by Covaris M220 sonicator (Life Technologies Europe, Gent, Belgium) and then ligated to adaptor for further amplification (Illumina® TruSeq Exome Library Prep, USA). After library preparation, all samples were sequenced using the NextSeq500 system according to the manufacturer's instructions (Illumina, San Diego, USA). The investigators run sequencing with 12 samples simultaneously (a total of 100 Gb). The sequence length was 150 bp with a paired-end (2\*150bp). The average depth of sequencing is 100X. After sequencing performance, quality of reads file (fastq) was assessed by FastQC and then mapped using human Hg19 as the reference. Bam files were used as input for the Varscan algorithm to identify germline and somatic mutations. Variants annotated and filtered are manually checked using IGV (Integrative Genomics Viewer), then confirmed by Sanger sequence. The investigators analyze the clinical related gene alterations including actionable gene mutations (EGFR, BRAF, KRAS, and MET.) Also, clinically important genes including the mutation status of TP53 and SDH genes are analyzed. The investigators also analyzed the mutation status of glucose metabolic cluster genes.
TMB (tumor mutation burden) per megabase: The total number of mutations counted is divided by the size of the coding region of the targeted territory.
MATH (mutant-allele tumor heterogeneity): The investigators first obtain the MAF (the fraction of DNA that shows the mutated allele at a gene locus) of each tumor specimen. The MAF distribution will be used to calculate the median (center of distribution) and the MD (median deviation) of MAFs in a tumor. The MD is determined by obtaining the absolute differences of all MAFs from the median MAF. Then the median of the absolute differences is multiplied by a factor of 1.4826 to obtain the MD. The MATH is calculated as the percentage ratio of the MD to the median: MATH = (MD/median)×100 \[45\].
Shannon diversity index (Shannon entropy) \[50\]: The MAF distribution (histogram) of each patient's tumor specimen was obtained with different bin sizes (total bin size = S). The Shannon diversity index is then calculated according to the distribution of probabilities of each MAF bin.
The image features of FDG PET the investigators extracted as follows, \<Handcrafted radiomics\> The traditional image parameters include SUVmax, metabolic tumor volume (MTV) and total lesion glycolysis (TLG) of the primary tumor. The traditional FDG PET parameters are calculated using commercialized software (PBAS, PMOD 4.0). Radiomics (texture analysis) will be calculated only for pre-treatment FDG PET. The matrices of radiomic analysis include histogram analysis, Gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), gray-level size zone matrix (GLSZM), neighborhood gray-tone difference matrix (NGTDM), and shape features.
\<Deep radiomics\> The investigators put the segmented volume into convolutional neural network (CNN) for analysis. The investigators will use supervised CNN to analyze the relationship between imaging with other outcomes.