patents.google.com

CN115461473A - Spatially resolved single-cell RNA-sequencing method - Google Patents

  • ️Fri Dec 09 2022

本申请要求2020年2月20日提交的美国临时申请No.62/979,235的优先权,所述美国临时申请通过引用整体并入本文。This application claims priority to U.S. Provisional Application No. 62/979,235, filed February 20, 2020, which is hereby incorporated by reference in its entirety.

具体实施方式detailed description

可以通过参考实施方案的以下详述、图式和其中包括的实施例更容易地理解本公开。The present disclosure can be understood more readily by reference to the following detailed description of the embodiments, drawings, and Examples included therein.

在公开和描述本发明的方法和组合物之前,应理解除其不限于特定的合成方法(除非另有说明)或特定的试剂(除非另有说明),因此这些当然可以变化。还应理解本文中使用的术语仅为了描述特定的方面,不意图限制。虽然类似于或等同于本文所述的那些的任何方法和材料都可用于实践或检验本发明,但现在描述示例方法和材料。Before the methods and compositions of the present invention are disclosed and described, it is to be understood that they are not limited to particular synthetic methods (unless otherwise indicated) or particular reagents (unless otherwise indicated), as these may, of course, vary. It is also to be understood that terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, example methods and materials are now described.

此外,应理解,除非另外明确说明,否则决非意图将本文所阐述的任何方法解释为要求以特定的顺序来进行其步骤。因此,当方法权利要求实际上并未陈述其步骤待遵循的顺序或在权利要求书或说明书中没有特别地说明步骤限于特定的顺序时,决非意图在任何方面推断顺序。这适用于任何可能的非表达解释基础,包括关于步骤排列或操作流程的逻辑事项、从语法组织或标点中得到的明显含义或者在说明书中所描述的方面的数字或类型。Furthermore, it should be understood that in no way is any method set forth herein intended to be construed as requiring that its steps be performed in a particular order, unless expressly stated otherwise. Thus, when a method claim does not actually state the order in which its steps are to be followed, or where the steps are not specifically stated in either the claims or the description to be limited to a particular order, no order is intended to be inferred in any respect. This applies to any possible non-express basis of interpretation, including matters of logic with respect to arrangement of steps or flow of operation, obvious meaning derived from grammatical organization or punctuation, or number or type of aspects described in the specification.

本文所提到的所有公布都通过引用并入本文以公开和描述与公布所引用相关的方法和/或材料。提供本文讨论的公布仅仅是因为它们的公开内容在本申请的申请日之前。本文没有任何内容被解释为承认本发明没有资格先于在先发明的这种公布。此外,本文提供的公布日期可能与实际公布日期不同,这可能需要单独确认。All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication of prior invention. In addition, the publication dates provided herein may differ from the actual publication dates, which may need to be independently confirmed.

定义definition

除非另有定义,否则本文中使用的所有技术和科学术语都具有与本发明所属领域的普通技术人员通常理解的含义相同的含义。Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

如在说明书和权利要求书中使用的术语“包含”可以包括“由……组成”和“基本上由……组成”的方面。包含还可以意指“包括但不限于”。The term "comprising" as used in the specification and claims may include the aspects "consisting of" and "consisting essentially of". Comprising can also mean "including but not limited to".

除非上下文另外明确规定,否则如本说明书和所附权利要求书中使用的单数形式“一(a/an)”和“所述(the)”可包括复数指示物。因此,例如,提及“一种化合物”包括化合物的混合物;提及“药物载体”包括两种或更多种这类载体的混合物等。As used in the specification and the appended claims, the singular forms "a" and "the" may include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to "a compound" includes mixtures of compounds; reference to "a pharmaceutical carrier" includes mixtures of two or more such carriers, and the like.

如本文所用的字词“或”意指特定清单的任何一个成员,并且还包括该清单的成员的任何组合。As used herein, the word "or" means any one member of a particular list and also includes any combination of members of that list.

本文使用的术语“约”意指在本领域的典型公差范围内。例如,“约”可以理解为与平均值相差约2个标准差。根据某些实施方案,当提及可测量的值,如量等时,“约”意在涵盖与指定值有±20%、±10%、±5%、±1%、±0.9%、±0.8%、±0.7%、±0.6%、±0.5%、±0.4%、±0.3%、±0.2%或±0.1%的变化,因为这类变化适合于执行所公开的方法。当“约”出现在一系列数字或范围之前时,应理解“约”可以修饰该系列或范围中的每个数字。As used herein, the term "about" means within a range of tolerance typical in the art. For example, "about" can be understood as about 2 standard deviations from the mean. According to certain embodiments, when referring to a measurable value, such as an amount, etc., "about" is intended to encompass ±20%, ±10%, ±5%, ±1%, ±0.9%, ± Variations of 0.8%, ±0.7%, ±0.6%, ±0.5%, ±0.4%, ±0.3%, ±0.2%, or ±0.1%, as such variations are suitable for performing the disclosed methods. When "about" precedes a list of numbers or a range, it is understood that "about" can modify each number in the list or range.

如本文所用,术语“活化的基底”是指一种材料,它上面的相互作用或反应性化学官能团通过暴露于本领域技术人员已知的试剂而被氧化或还原或以其它方式官能化,以使表面在该官能团处进行反应。例如,包含羧基的基底在使用前必须活化。此外,有一些可用的基底含有可以与核酸引物中已经存在的特定部分反应的官能团。As used herein, the term "activated substrate" refers to a material on which interactive or reactive chemical functional groups have been oxidized or reduced or otherwise functionalized by exposure to reagents known to those skilled in the art to The surface is reacted at this functional group. For example, substrates containing carboxyl groups must be activated prior to use. In addition, some available substrates contain functional groups that can react with specific moieties already present in the nucleic acid primer.

如本文所用,术语“多个(a plurality of)”或“多个(multiple)”是指两个或更多个,或至少两个,例如3个、5个、10个、15个、20个、30个、40个、50个、60个、70个、80个、90个、100个、150个、200个、400个、500个、1000个、2000个、5000个、10,000个或更多个。因此,例如,阵列上的微孔数量或多孔板上的孔数量可以是上述数字中的任何两者之间的任何范围内的任何整数。As used herein, the term "a plurality of" or "multiple" refers to two or more, or at least two, such as 3, 5, 10, 15, 20 , 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 400, 500, 1000, 2000, 5000, 10,000 or more. Thus, for example, the number of microwells on an array or the number of wells on a multiwell plate can be any integer in any range between any two of the above numbers.

如本文所用,“细胞索引引物”是指用于扩增从逆转录获得的cDNA分子并用多孔板的每个孔独有的第二索引条形码(本文定义为细胞条形码结构域)标记每个扩增的cDNA分子的引物或寡核苷酸。As used herein, "cell indexing primer" refers to a primer used to amplify cDNA molecules obtained from reverse transcription and to label each amplified with a second index barcode (herein defined as the cell barcode domain) unique to each well of a multi-well plate. primers or oligonucleotides for cDNA molecules.

如本文所用,“空间索引引物”是指用于从位于组织样品,例如薄组织样品切片或“切片”中不同位置处的所有单细胞捕获转录物并进行标记的引物或寡核苷酸。As used herein, "spatially indexed primer" refers to a primer or oligonucleotide used to capture and label transcripts from all single cells located at different locations in a tissue sample, eg, a thin tissue sample section or "section".

“阵列”如该术语在本文中所用,通常是指实体在空间上相对于彼此离散的位置中的排列,并且通常采用允许排列的实体同时暴露于潜在的相互作用搭配物(例如,细胞)或其它试剂、基底等的格式。在一些实施方案中,阵列包含固体基底,例如塑料,其在固体载体上的空间离散位置中包含相邻排列的微孔。在一些实施方案中,阵列上的空间离散位置被称为“微孔”或“点”(无论它们的形状如何)。在一些实施方案中,阵列上的空间离散位置相对于彼此以规则的图案排列(例如,在网格中)。在一些实施方案中,阵列包含约90至约400个微孔,这些微孔沿着固体基底的平坦表面排列在相邻位置。在一些实施方案中,阵列是微阵列板。"Array", as the term is used herein, generally refers to an arrangement of entities in spatially discrete positions relative to one another, and generally in a manner that allows simultaneous exposure of the arranged entities to potential interaction partners (e.g., cells) or Formats for other reagents, substrates, etc. In some embodiments, an array comprises a solid substrate, such as plastic, comprising adjacent arrays of microwells in spatially discrete locations on the solid support. In some embodiments, the spatially discrete locations on the array are referred to as "wells" or "spots" (regardless of their shape). In some embodiments, the spatially discrete locations on the array are arranged in a regular pattern (eg, in a grid) relative to each other. In some embodiments, the array comprises from about 90 to about 400 microwells arranged in adjacent positions along the planar surface of the solid substrate. In some embodiments, the array is a microarray plate.

如本文所用,术语“条形码”是指能够鉴定核酸片段的来源的任何独特的非天然存在的核酸序列。在一些实施方案中,条形码是与阵列上的至少一个空间位置相对应的独特的非天然存在的核酸序列,使得阵列上的条形码位置也与接触该位置的一个或多个细胞的位置相对应。As used herein, the term "barcode" refers to any unique non-naturally occurring nucleic acid sequence that enables identification of the source of a nucleic acid fragment. In some embodiments, the barcode is a unique non-naturally occurring nucleic acid sequence corresponding to at least one spatial location on the array such that a barcode location on the array also corresponds to the location of one or more cells contacting that location.

术语“结合”在整个本公开中广泛使用,是指以非共价或共价的方式将两个或更多个组分、实体或对象连接或偶联的任何形式。举例来说,两种或更多种组分可以通过化学键、共价键、离子键、氢键、静电力、沃森-克里克杂交(Watson-Crick hybridization)等相互结合。在互补核酸序列的情况下,两条互补链结合形成核酸的氢键结合双链体。The term "binding" is used broadly throughout this disclosure to refer to any form of linking or coupling two or more components, entities or objects, whether non-covalent or covalent. For example, two or more components can be associated with each other by chemical bonds, covalent bonds, ionic bonds, hydrogen bonds, electrostatic forces, Watson-Crick hybridization, and the like. In the case of complementary nucleic acid sequences, the two complementary strands join to form a hydrogen bonded duplex of nucleic acid.

术语“多核苷酸”、“寡核苷酸(oligo)”、“寡核苷酸(oligonucleotide)”和“核酸”通篇可互换使用,包括DNA分子(例如,cDNA或基因组DNA)、RNA分子(例如,mRNA)、使用核苷酸类似物(例如肽核酸和非天然存在的核苷酸类似物)产生的DNA或RNA的类似物以及它们的杂合物。核酸分子可以是单链或双链的。在一些实施方案中,本公开的核酸分子包含编码抗体或其片段的连续开放阅读框,如本文所述。如本文所用的“核酸”或“寡核苷酸”或“多核苷酸”可意指共价连接在一起的至少两个核苷酸。单链的描绘也界定了互补链的序列。因此,核酸还涵盖所描绘的单链的互补链。核酸的许多变体可以用于达成与给定核酸相同的目的。因此,核酸还涵盖基本上相同的核酸和其互补序列。单链提供可以在严格杂交条件下与靶序列杂交的探针。因此,核酸还涵盖在严格杂交条件下杂交的探针。核酸可以是单链或双链的,或可以含有双链序列和单链序列的部分。核酸可以是DNA(基因组和cDNA)、RNA或杂交体,其中核酸可含有脱氧核糖核苷酸和核糖核苷酸的组合以及碱基的组合,所述碱基包括尿嘧啶、腺嘌呤、胸腺嘧啶、胞嘧啶、鸟嘌呤、肌苷、黄嘌呤、次黄嘌呤、异胞嘧啶和异鸟嘌呤。核酸可通过化学合成方法或通过重组方法获得。核酸一般含有磷酸二酯键,不过可包括核酸类似物,所述核酸类似物可具有至少一个不同的键联,例如,氨基磷酸酯、硫代磷酸酯、二硫代磷酸酯或o-甲基亚磷酰胺键联以及肽核酸主链和键联。其它类似核酸包括具有阳性主链、非离子主链和非核糖主链的那些核酸,包括通过引用整体并入的美国专利No.5,235,033和5,034,506中所述的那些核酸。含有一个或多个非天然存在的核苷酸或修饰核苷酸的核酸也包括在核酸的一个定义内。修饰核苷酸类似物可位于例如核酸分子的5′末端和/或3′末端。核苷酸类似物的代表性实例可选自糖修饰的核糖核苷酸或主链修饰的核糖核苷酸。然而,应当指出,核碱基修饰的核糖核苷酸也是合适的,即含有非天然存在的核碱基而不是天然存在的核碱基的核糖核苷酸,如在5-位上修饰的尿苷或胞苷,例如5-(2-氨基)丙基尿苷、5-溴尿苷;在8-位上修饰的腺苷和鸟苷,例如8-溴鸟苷;脱氮核苷酸,例如7-脱氮-腺苷;o-和N-烷基化核苷酸,例如N6-甲基腺苷。2’-OH-基团可以被选自H、OR、R、卤基、SH、SR、NH2、NHR、N2或CN的基团置换,其中R是C1-C6烷基、烯基或炔基,卤基是F、Cl、Br或I。修饰核苷酸还包括通过例如羟脯氨醇键联与胆固醇缀合的核苷酸,如Krutzfeldt等人,Nature(2005年10月30日)、Soutschek等人,Nature 432:173-178(2004)和美国专利公布No.20050107325中所描述的,所述文献和专利公布通过引用并入本文。修饰核苷酸和核酸还可包括锁核酸(LNA),如美国专利No.20020115080中所描述的,所述专利通过引用并入本文。其它修饰核苷酸和核酸在美国专利公布No.20050182005中有所描述,所述专利公布通过引用并入本文。磷酸核糖主链的修饰可出于各种原因来进行,例如为了增加这类分子在生理环境中的稳定性和半衰期、增强穿过细胞膜的扩散或作为生物芯片上的探针。可制备天然存在的核酸和类似物的混合物;或者,可制备不同核酸类似物的混合物,和天然存在的核酸和类似物的混合物。在一些实施方案中,可表达的核酸序列呈DNA形式。在一些实施方案中,可表达的核酸呈具有编码本文公开的多肽序列的序列的RNA形式,并且在一些实施方案中,可表达的核酸序列是编码本文公开的任何一种或多种多肽序列的RNA/DNA杂合分子。The terms "polynucleotide,""oligo,""oligonucleotide," and "nucleic acid" are used interchangeably throughout and include DNA molecules (e.g., cDNA or genomic DNA), RNA molecules, Molecules (eg, mRNA), analogs of DNA or RNA produced using nucleotide analogs (eg, peptide nucleic acids and non-naturally occurring nucleotide analogs), and hybrids thereof. Nucleic acid molecules can be single-stranded or double-stranded. In some embodiments, a nucleic acid molecule of the disclosure comprises a contiguous open reading frame encoding an antibody or fragment thereof, as described herein. "Nucleic acid" or "oligonucleotide" or "polynucleotide" as used herein may mean at least two nucleotides covalently linked together. The delineation of the single strand also defines the sequence of the complementary strand. Thus, a nucleic acid also encompasses the complementary strand of a single strand depicted. Many variants of a nucleic acid can be used to achieve the same purpose as a given nucleic acid. Thus, nucleic acid also encompasses substantially identical nucleic acids and their complements. The single strand provides a probe that can hybridize to the target sequence under stringent hybridization conditions. Thus, a nucleic acid also encompasses probes that hybridize under stringent hybridization conditions. A nucleic acid can be single-stranded or double-stranded, or can contain portions of both double-stranded and single-stranded sequences. Nucleic acids can be DNA (genomic and cDNA), RNA or hybrids, where nucleic acids can contain combinations of deoxyribonucleotides and ribonucleotides and combinations of bases including uracil, adenine, thymine , cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine and isoguanine. Nucleic acids can be obtained by chemical synthesis methods or by recombinant methods. Nucleic acids typically contain phosphodiester linkages, but may include nucleic acid analogs that may have at least one different linkage, for example, phosphoramidate, phosphorothioate, phosphorodithioate, or o-methyl Phosphoramidite linkages and peptide nucleic acid backbones and linkages. Other similar nucleic acids include those with cationic backbones, nonionic backbones, and nonribose backbones, including those described in US Patent Nos. 5,235,033 and 5,034,506, which are incorporated by reference in their entirety. Nucleic acids that contain one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids. Modified nucleotide analogs may be located, for example, at the 5' end and/or the 3' end of the nucleic acid molecule. Representative examples of nucleotide analogs may be selected from sugar-modified ribonucleotides or backbone-modified ribonucleotides. However, it should be noted that nucleobase-modified ribonucleotides are also suitable, i.e. ribonucleotides containing non-naturally occurring nucleobases instead of naturally occurring nucleobases, such as ribonucleotides modified at the 5-position. glycosides or cytidines, such as 5-(2-amino)propyluridine, 5-bromouridine; adenosine and guanosine modified at the 8-position, such as 8-bromoguanosine; deaza nucleotides, For example 7-deaza-adenosine; o- and N-alkylated nucleotides such as N6-methyladenosine. The 2'-OH- group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH2 , NHR, N2 or CN, where R is C1 - C6 alkyl, alkenyl radical or alkynyl, halo is F, Cl, Br or I. Modified nucleotides also include nucleotides conjugated to cholesterol by, for example, hydroxyprolinol linkages, such as Krutzfeldt et al., Nature (2005-10-30), Soutschek et al., Nature 432:173-178 (2004 ) and US Patent Publication No. 20050107325, which are incorporated herein by reference. Modified nucleotides and nucleic acids may also include locked nucleic acids (LNAs), as described in US Patent No. 20020115080, which is incorporated herein by reference. Other modified nucleotides and nucleic acids are described in US Patent Publication No. 20050182005, which is incorporated herein by reference. Modification of the phosphoribosyl backbone can be done for various reasons, such as to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes or as probes on biochips. Mixtures of naturally occurring nucleic acids and analogs can be prepared; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs can be prepared. In some embodiments, the expressible nucleic acid sequence is in the form of DNA. In some embodiments, the expressible nucleic acid is in the form of RNA having a sequence encoding a polypeptide sequence disclosed herein, and in some embodiments, the expressible nucleic acid sequence is a sequence encoding any one or more of the polypeptide sequences disclosed herein. RNA/DNA hybrid molecules.

两个多核苷酸或两个多肽序列的“同一性百分比”或“同源性百分比”是通过使用GAP计算机程序(GCG Wisconsin Package版本10.3(Accelrys,San Diego,Calif.)的一部分)使用它的默认参数比较序列来确定的。在两个或更多个核酸或氨基酸序列的背景中,如本文所用的"同一"或"同一性"可意指序列具有指定百分比的在指定区域上相同的残基。该百分比可通过如下方法进行计算:最佳地比对两个序列,在指定区域上比较该两个序列,确定这两个序列中出现相同残基的位置数以产生匹配位置数,将匹配位置数除以指定区域中位置的总数,以及将结果乘以100以得到序列同一性的百分比。在该两个序列具有不同长度或该比对产生一个或多个交错末端以及指定比较区域只包括单个序列的情况下,在计算式的分母而不是分子中包括单个序列的残基。当比较DNA和RNA时,可以认为胸腺嘧啶(T)和尿嘧啶(U)是等同的。可手动地或通过使用计算机序列算法例如BLAST或BLAST2.0来计算同一性。简单地说,代表基本局部比对搜索工具的BLAST算法适用于确定序列相似性。执行BLAST分析的软件可通过国家生物技术信息中心(National Center for BiotechnologyInformation)(ncbi.nlm.nih.gov)公开获得。该算法涉及首先通过鉴定查询序列中长度为W的短字来鉴定高评分序列对(HSP),该长度为W的短字当与数据库中相同长度的字比对时,匹配或满足某个正值阈值分数T。T被称为邻域字得分阈值(Altschul等人)。这些初始邻域字命中充当启动搜索以查找含其HSP的种子。字命中沿着每个序列在两个方向上扩展,只要可以增加累积比对分数。在以下情况下,每个方向上的字命中扩展将停止:1)累积比对分数从其最大实现值下降了数量X;2)由于一个或多个负评分残基比对的累积,累积分数变为零或更低;或3)到达任一序列的末端。Blast算法参数W、T和X确定比对的灵敏度和速度。Blast程序默认使用11的字长(W),即BLOSUM62评分矩阵(参见Henikoff等人,Proc.Natl.Acad.Sci.USA,1992,89,10915-10919,其通过引用整体并入本文),比对(B)为50,期望(E)为10,M=5,N=4,以及两条链的比较。BLAST算法(Karlin等人,Proc.Natl.Acad.Sci.USA,1993,90,5873-5787,其通过引用整体并入本文)和GappedBLAST对两个序列之间的相似性进行统计分析。BLAST算法提供的一种相似性度量是最小总概率(P(N)),它表明了两个核苷酸序列之间偶然发生匹配的概率。举例来说,如果测试核酸与另一核酸相比的最小总概率小于约1、小于约0.1、小于约0.01和小于约0.001,则认为核酸与另一核酸相似。在不引入间隙,并且没有任一序列的5’或3’末端的核苷酸未配对下,如果两个单链多核苷酸的序列可以以反平行方向比对,使得一种多核苷酸中的每个核苷酸与其在另一种多核苷酸中的互补核苷酸相反,则它们彼此是“互补”的。如果一种多核苷酸与另一多核苷酸可以在中等严格条件下相互杂交,则该两种多核苷酸“互补”。因此,一种多核苷酸可以与另一多核苷酸互补而不是它的补体。"Percent Identity" or "Percent Homology" of two polynucleotide or two polypeptide sequences was calculated by using the GAP computer program (part of GCG Wisconsin Package Version 10.3 (Accelrys, San Diego, Calif.)) using it The default argument is determined by comparison sequence. "Identity" or "identity" as used herein in the context of two or more nucleic acid or amino acid sequences may mean that the sequences have a specified percentage of residues that are identical over a specified region. This percentage can be calculated by optimally aligning the two sequences, comparing the two sequences over a specified region, determining the number of positions in the two sequences where the same residue occurs to generate the number of matching positions, dividing the matching positions The number is divided by the total number of positions in the specified region, and the result is multiplied by 100 to give the percent sequence identity. Where the two sequences are of different lengths or the alignment produces one or more staggered ends, and where it is specified that the compared region includes only a single sequence, the residues of the single sequence are included in the denominator rather than the numerator of the calculation. When comparing DNA and RNA, thymine (T) and uracil (U) can be considered equivalent. Identity can be calculated manually or by using computer sequence algorithms such as BLAST or BLAST2.0. Briefly, the BLAST algorithm, which stands for Basic Local Alignment Search Tool, is adapted to determine sequence similarity. Software for performing BLAST analyzes is publicly available through the National Center for Biotechnology Information (ncbi.nlm.nih.gov). The algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that, when aligned with a word of the same length in a database, match or satisfy some positive Value threshold score T. T is referred to as the neighborhood word score threshold (Altschul et al.). These initial neighborhood word hits act as seeds to initiate searches to find HSPs containing them. Word hits are extended in both directions along each sequence as long as the cumulative alignment score can be increased. Word hit expansion in each direction will stop if: 1) the cumulative alignment score falls by the amount X from its maximum achieved value; 2) due to the accumulation of one or more negatively scored residue alignments, the cumulative score becomes zero or lower; or 3) reaches the end of either sequence. The Blast algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The Blast program defaults to a word length (W) of 11, the BLOSUM62 scoring matrix (see Henikoff et al., Proc. Natl. Acad. Sci. USA, 1992, 89, 10915-10919, which is incorporated herein by reference in its entirety), compared For (B) 50, expectation (E) 10, M=5, N=4, and comparison of the two chains. The BLAST algorithm (Karlin et al., Proc. Natl. Acad. Sci. USA, 1993, 90, 5873-5787, which is hereby incorporated by reference in its entirety) and GappedBLAST perform a statistical analysis of the similarity between two sequences. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which indicates the probability by which a match between two nucleotide sequences would occur by chance. For example, a nucleic acid is considered similar to another nucleic acid if the smallest overall probability of a test nucleic acid compared to another nucleic acid is less than about 1, less than about 0.1, less than about 0.01, and less than about 0.001. Without introducing gaps, and without unpaired nucleotides at the 5' or 3' ends of either sequence, if the sequences of two single-stranded polynucleotides can be aligned in an antiparallel orientation, such that in one polynucleotide They are "complementary" to each other when each nucleotide of each of them is the opposite of its complementary nucleotide in another polynucleotide. Two polynucleotides are "complementary" if they can hybridize to each other under conditions of moderate stringency. Thus, one polynucleotide can be complementary to another polynucleotide rather than its complement.

“基本上相同”是指核酸分子(或多肽)表现出与参考氨基酸序列(例如,本文所述的任何一种氨基酸序列)或核酸序列(例如,本文所述的任何一种核酸序列)至少50%同一性。优选地,在氨基酸水平或核酸水平上,这样的序列与用于比较的序列至少60%、更优选80%或85%,并且更优选90%、95%或甚至99%同一。"Substantially identical" means that the nucleic acid molecule (or polypeptide) exhibits at least 50 % identity. Preferably, such sequences are at least 60%, more preferably 80% or 85%, and more preferably 90%, 95% or even 99% identical to the sequences being compared at the amino acid or nucleic acid level.

如本文所用,术语“杂交(hybridization)”或“杂交(hybridizes)”是指在充分互补而能通过沃森-克里克碱基配对形成双链体的核苷酸序列之间形成双链体。当两个核苷酸序列共享碱基对组织同源性时,这些分子相互“互补”。“互补”核苷酸序列将在适当的杂交条件下特异性结合,形成稳定的双链体。举例来说,当第一个序列的一部分可以以反平行的方式结合第二个序列的一部分时,两个序列是互补的,其中每个序列的3’-末端与另一个序列的5’-末端结合,然后将一个序列的每个A、T(U)、G和C分别与另一个序列的T(U)、A、C和G对准。RNA序列还可以包括互补的G=U或者U=G碱基对。因此,两个序列不需要具有完美的同源性才能“互补”。通常两个序列在至少约90%(优选至少约95%)的核苷酸在分子的界定长度上共享碱基对组织是就充分互补。在本公开中,每个空间索引引物的捕获结构域包含与组织样品的核酸,例如RNA(优选mRNA)互补的区域。在一些实施方案中,包含在每个空间索引引物的捕获结构域中的这种互补性区域包含多聚胸苷序列以通过多聚-A尾捕获mRNA。As used herein, the terms "hybridization" or "hybridizes" refer to the formation of duplexes between nucleotide sequences that are sufficiently complementary to form duplexes by Watson-Crick base pairing . These molecules are "complementary" to each other when two nucleotide sequences share base pair tissue homology. A "complementary" nucleotide sequence will specifically bind under appropriate hybridization conditions to form a stable duplex. For example, two sequences are complementary when a portion of a first sequence can bind a portion of a second sequence in an antiparallel fashion, wherein the 3'-end of each sequence is aligned with the 5'-end of the other sequence. End binding then aligns each A, T(U), G, and C of one sequence with the T(U), A, C, and G of the other sequence, respectively. RNA sequences may also include complementary G=U or U=G base pairs. Thus, two sequences do not need to have perfect homology to be "complementary". Typically two sequences are sufficiently complementary when at least about 90%, preferably at least about 95%, of the nucleotides share base pair organization over a defined length of the molecule. In the present disclosure, the capture domain of each spatially indexed primer comprises a region complementary to nucleic acid, eg RNA (preferably mRNA) of the tissue sample. In some embodiments, this region of complementarity comprised in the capture domain of each spatially indexed primer comprises a poly-thymidine sequence to capture mRNA via the poly-A tail.

如本文所用,术语“样品”是指从感兴趣的来源获得或衍生的生物样品,如本文所述。在一些实施方案中,感兴趣的来源包含生物体,例如动物或人。在一些实施方案中,生物样品包含生物组织或体液。在一些实施方案中,生物样品可以是或包含骨髓;血液;血细胞;腹水;组织或细针活检样品;含有细胞的体液;自由漂浮的核酸;痰;唾液;尿;脑脊液、腹膜液;胸水;粪便;淋巴;妇科液体;皮肤拭子;阴道拭子;口腔拭子;鼻拭子;洗涤液或灌洗液,例如导管灌洗液或支气管肺泡灌洗液;吸出物;刮片;骨髓标本;组织活检标本;手术标本;其它体液、分泌物和/或排泄物;和/或来自这些中的细胞等。在一些实施方案中,生物样品是或包含从个体获得的细胞。在一些实施方案中,样品是通过任何适当方式直接从感兴趣的来源获得的“原始样品”。例如,在一些实施方案中,通过选自由以下组成的组的方法获得原始生物样品:活检(例如细针抽吸或组织活检)、手术、体液收集(例如血液、淋巴液、粪便等)。在一些实施方案中,如从上下文中将清楚的,术语“样品”是指通过处理(例如,通过去除一种或多种组分和/或通过将一种或多种剂添加到)原始样品获得的制剂。例如,使用半透膜过滤。这样的“处理过的样品”可包含例如从样品中提取的或通过对原始样品进行例如mRNA的扩增或逆转录、分离和/或纯化某些组分,例如细胞器、核酸或膜结合蛋白技术而获得的核酸或蛋白质。在一些实施方案中,样品是包含多种细胞类型的组织。在一些实施方案中,样品是结缔组织、肌肉组织、神经组织或上皮组织。As used herein, the term "sample" refers to a biological sample obtained or derived from a source of interest, as described herein. In some embodiments, a source of interest comprises an organism, such as an animal or a human. In some embodiments, the biological sample comprises biological tissue or bodily fluid. In some embodiments, the biological sample can be or comprise bone marrow; blood; blood cells; ascites; tissue or fine needle biopsy sample; cell-containing body fluids; free-floating nucleic acids; sputum; saliva; urine; cerebrospinal fluid, peritoneal fluid; pleural fluid; Stool; Lymph; Gynecological fluid; Skin swab; Vaginal swab; Oral swab; Nasal swab; ; tissue biopsy specimens; surgical specimens; other bodily fluids, secretions and/or excretions; and/or cells from these, etc. In some embodiments, the biological sample is or comprises cells obtained from an individual. In some embodiments, a sample is a "raw sample" obtained directly from a source of interest by any suitable means. For example, in some embodiments, the original biological sample is obtained by a method selected from the group consisting of biopsy (eg, fine needle aspiration or tissue biopsy), surgery, bodily fluid collection (eg, blood, lymph, stool, etc.). In some embodiments, as will be clear from the context, the term "sample" refers to a raw sample obtained by processing (e.g., by removing one or more components and/or by adding one or more agents to) obtained preparations. For example, use semi-permeable membrane filtration. Such a "processed sample" may comprise, for example, certain components, such as organelles, nucleic acids or membrane-bound proteins, extracted from the sample or obtained by, for example, amplification or reverse transcription of mRNA, isolation and/or purification of the original sample. obtained nucleic acid or protein. In some embodiments, the sample is tissue comprising multiple cell types. In some embodiments, the sample is connective tissue, muscle tissue, neural tissue, or epithelial tissue.

如本文所用,术语“扩增反应”是指增加核酸拷贝数的反应。这可以通过例如聚合酶链反应(PCR)(包括但不限于qPCR、RT-qPCR、RACE-PCR和RT-LAMP)、连接酶链反应(LCR)、转录介导的扩增和切口酶扩增反应(NEAR)等方法进行。用于扩增核酸的上述方法的任何变体也包括在该术语中。As used herein, the term "amplification reaction" refers to a reaction that increases the copy number of a nucleic acid. This can be achieved by, for example, polymerase chain reaction (PCR) (including but not limited to qPCR, RT-qPCR, RACE-PCR, and RT-LAMP), ligase chain reaction (LCR), transcription-mediated amplification, and nickase amplification. Reaction (NEAR) and other methods. Variations of any of the above methods for amplifying nucleic acids are also included within this term.

如本文所用,术语“插入酶”是指能够将核酸序列插入多核苷酸中的酶。在一些情况下,插入酶可以以基本上不依赖于序列的方式将核酸序列插入多核苷酸中。插入酶可以是原核的或真核的。插入酶的实例包括但不限于转座酶、HERMES和HIV整合酶。转座酶可以是Tn转座酶(例如,Tn3、Tn5、Tn7、Tn10、Tn552、Tn903)、MuA转座酶、Vibhar转座酶(例如,来自Vibrio harveyi)、Ac-Ds、Ascot-1、Bs1、Cin4、Copia、En/Spm、F元件、hobo、Hsmar1、Hsmar2、IN(HIV)、IS1、IS2、IS3、IS4、IS5、IS6、IS10、IS21、IS30、IS50、IS51、IS150、IS256、IS407、IS427、IS630、IS903、IS911、IS982、IS1031、ISL2、L1、Mariner、P元件、Tam3、Tc1、Tc3、Te1、THE-1、Tn/O、TnA、Tn3、Tn5、Tn7、Tn10、Tn552、Tn903、Tol1、Tol2、Tn1O、Ty1、任何原核转座酶或与上面列出的那些转座酶相关和/或源自那些转座酶的任何转座酶。在某些情况下,与亲本转座酶相关和/或源自亲本转座酶的转座酶可包含与亲本转座酶的对应肽片段具有至少约50%、约55%、约60%、约65%、约70%、约75%、约80%、约85%、约90%、约91%、约92%、约93%、约94%、约95%、约96%、约97%、约98%或约99%氨基酸序列同源性的肽片段。肽片段长度可以为至少约10个、约15个、约20个、约25个、约30个、约35个、约40个、约45个、约50个、约60个、约70个、约80个、约90个、约100个、约150个、约200个、约250个、约300个、约400个或约500个氨基酸。举例来说,源自Tn5的转座酶可以包含长度为50个氨基酸并且与亲本Tn5转座酶中的相应片段具有约80%同源性的肽片段。在一些情况下,可以通过添加一种或多种阳离子促进和/或触发插入。阳离子可以是二价阳离子,例如Ca2+、Mg2+和Mn2+As used herein, the term "insertase" refers to an enzyme capable of inserting a nucleic acid sequence into a polynucleotide. In some cases, an insertionase can insert a nucleic acid sequence into a polynucleotide in a substantially sequence-independent manner. Insertases can be prokaryotic or eukaryotic. Examples of insertion enzymes include, but are not limited to, transposase, HERMES, and HIV integrase. The transposase may be a Tn transposase (e.g., Tn3, Tn5, Tn7, Tn10, Tn552, Tn903), MuA transposase, Vibhar transposase (e.g., from Vibrio harveyi), Ac-Ds, Ascot-1, Bs1, Cin4, Copia, En/Spm, F element, hobo, Hsmar1, Hsmar2, IN(HIV), IS1, IS2, IS3, IS4, IS5, IS6, IS10, IS21, IS30, IS50, IS51, IS150, IS256, IS407, IS427, IS630, IS903, IS911, IS982, IS1031, ISL2, L1, Mariner, P element, Tam3, Tc1, Tc3, Te1, THE-1, Tn/O, TnA, Tn3, Tn5, Tn7, Tn10, Tn552 , Tn903, Tol1, Tol2, Tn10, Ty1, any prokaryotic transposase or any transposase related to and/or derived from those transposases listed above. In some cases, a transposase related to and/or derived from a parent transposase may comprise at least about 50%, about 55%, about 60%, About 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97 %, about 98% or about 99% amino acid sequence homology of peptide fragments. The peptide fragments may be at least about 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 60, about 70, About 80, about 90, about 100, about 150, about 200, about 250, about 300, about 400 or about 500 amino acids. For example, a Tn5-derived transposase may comprise a peptide fragment that is 50 amino acids in length and has about 80% homology to the corresponding fragment in the parental Tn5 transposase. In some cases, intercalation can be facilitated and/or triggered by the addition of one or more cations. The cations may be divalent cations such as Ca 2+ , Mg 2+ and Mn 2+ .

在一些实施方案中,转座酶是DDE基序转座酶,例如来自ISs、Tn3、Tn5、Tn7或Tn10的原核转座酶;来自噬菌体Mu的噬菌体转座酶;或真核“剪切和粘贴”转座酶。美国专利No.6,593,113;9,644,199;Yuan和Wessler(2011)Proc Natl Acad Sci USA 108(19):7884-7889。在一些实施方案中,转座酶包括逆转录病毒转座酶,例如HIV。Rice和Baker(2001)Nat Struct Biol.8:302-307。In some embodiments, the transposase is a DDE motif transposase, such as a prokaryotic transposase from ISs, Tn3, Tn5, Tn7, or Tn10; a phage transposase from bacteriophage Mu; or a eukaryotic "cut and Paste" transposase. US Patent Nos. 6,593,113; 9,644,199; Yuan and Wessler (2011) Proc Natl Acad Sci USA 108(19):7884-7889. In some embodiments, the transposase comprises a retroviral transposase, such as HIV. Rice and Baker (2001) Nat Struct Biol. 8:302-307.

在一些实施方案中,转座酶是转座酶IS50家族的成员,例如Tn5转座酶或Tn5转座酶的变体。Tn5转座酶源自Tn5转座子,一种可以编码抗生素抗性基因的细菌转座子。点突变E54K和/或L372P可增加Tn5转座酶的活性。在特定实施方案中,转座酶是Tn5转座酶的E54K/L372P突变体,具有增加的转座酶活性。示例性E54K/L372P Tn5转座酶包含以下序列:In some embodiments, the transposase is a member of the IS50 family of transposases, such as a Tn5 transposase or a variant of a Tn5 transposase. The Tn5 transposase is derived from the Tn5 transposon, a bacterial transposon that can encode antibiotic resistance genes. Point mutations E54K and/or L372P can increase the activity of Tn5 transposase. In a particular embodiment, the transposase is an E54K/L372P mutant of Tn5 transposase, having increased transposase activity. An exemplary E54K/L372P Tn5 transposase comprises the following sequence:

MITSALHRAADWAKSVFSSAALGDPRRTARLVNVAAQLAKYSGKSITISSEGSKAMQEGAYRFIRNPNVSAEAIRKAGAMQTVKLAQEFPELLAIEDTTSLSYRHQVAEELGKLGSIQDKSRGWWVHSVLLLEATTFRTVGLLHQEWWMRPDDPADADEKESGKWLAAAATSRLRMGSMMSNVIAVCDREADIHAYLQDKLAHNERFVVRSKHPRKDVESGLYLYDHLKNQPELGGYQISIPQKGVVDKRGKRKNRPARKASLSLRSGRITLKQGNITLNAVLAEEINPPKGETPLKWLLLTSEPVESLAQALRVIDIYTHRWRIEEFHKAWKTGAGAERQRMEEPDNLERMVSILSFVAVRLLQLRESFTPPQALRAQGLLKEAEHVESQSAETVLTPDECQLLGYLDKGKRKRKEKAGSLQWAYMAIARLGGFMDSKRTGIASWGALW(SEQ ID NO:42)MITSALHRAADWAKSVFSSAALGDPRRTARLVNVAAQLAKYSGKSITISSEGSKAMQEGAYRFIRNPNVSAEAIRKAGAMQTVKLAQEFPELLAIEDTTSLSYRHQVAEELGKLGSIQDKSRGWWVHSVLLLEATTFRTVGLLHQEWWMRPDDPADADEKESGKWLAAAATSRLRMGSMMSNVIAVCDREADIHAYLQDKLAHNERFVVRSKHPRKDVESGLYLYDHLKNQPELGGYQISIPQKGVVDKRGKRKNRPARKASLSLRSGRITLKQGNITLNAVLAEEINPPKGETPLKWLLLTSEPVESLAQALRVIDIYTHRWRIEEFHKAWKTGAGAERQRMEEPDNLERMVSILSFVAVRLLQLRESFTPPQALRAQGLLKEAEHVESQSAETVLTPDECQLLGYLDKGKRKRKEKAGSLQWAYMAIARLGGFMDSKRTGIASWGALW(SEQ ID NO:42)

其它增加Tn5转座酶活性的突变公开于以下中:美国专利No.5,965,443;6,406,896;7,608,434;以及Reznikoff(2003)Molecular Microbiology 47(5):1199-1206,所有这些都通过引用明确并入本文。在一些实施方案中,Tn5转座酶是具有降低的GC插入偏好的突变型转座酶(Tn5-059)。Kia等人(2017)BMC Biotechnology 17:6。Other mutations that increase Tn5 transposase activity are disclosed in U.S. Patent Nos. 5,965,443; 6,406,896; 7,608,434; and Reznikoff (2003) Molecular Microbiology 47(5):1199-1206, all of which are expressly incorporated herein by reference. In some embodiments, the Tn5 transposase is a mutant transposase with reduced GC insertion preference (Tn5-059). Kia et al. (2017) BMC Biotechnology 17:6.

方法method

如上所提到,本公开的方法涉及一种整合拆分池索引和空间条形码化的方法。因此,本公开使用一组条形码化的索引引物从组织样品中获得单细胞基因表达剖析或转录组,同时保留它们相应的空间信息。As mentioned above, the methods of the present disclosure relate to a method of integrating split pool indexing and spatial barcoding. Accordingly, the present disclosure uses a set of barcoded indexing primers to obtain single-cell gene expression profiles or transcriptomes from tissue samples while preserving their corresponding spatial information.

因此,本公开涉及一种基因表达的空间识别方法,所述方法包括通过检测样品中一个或多个结构域鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中细胞的空间位置相关联。Accordingly, the present disclosure relates to a method for spatial identification of gene expression, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample by detecting one or more domains in the sample . In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue sample on the array.

本公开还涉及一种基于空间基因表达剖析鉴定样品中的细胞类型的方法,所述方法包括检测样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中细胞的空间位置相关联。在一些实施方案中,步骤样品中检测空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量包括将一个或多个互补核酸退火至细胞条形码结构域和/或空间条形码结构域并对序列进行聚合酶链反应以鉴定一个或多个结构域的存在或数量。The present disclosure also relates to a method of identifying cell types in a sample based on spatial gene expression profiling, the method comprising detecting the presence, absence or amount of a combination of spatial and cellular barcode domains in the sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue sample on the array. In some embodiments, the step of detecting the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a sample comprises annealing one or more complementary nucleic acids to the cellular barcode domain and/or the spatial barcode domain and The polymerase chain reaction is performed on the sequence to identify the presence or amount of one or more domains.

本公开还涉及一种鉴定样品细胞中的染色质可及性的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中细胞的空间位置相关联。The present disclosure also relates to a method of identifying chromatin accessibility in a sample cell, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue sample on the array.

本公开另外涉及一种对组织中的单细胞进行空间条形码化的方法,所述方法包括鉴定或检测核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中细胞的空间位置相关联。在一些实施方案中,该检测步骤包括检测共价或非共价结合至一个或两个结构域的荧光信号或探针;或检测一个或多个拷贝The present disclosure additionally relates to a method of spatially barcoding a single cell in a tissue, the method comprising identifying or detecting the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue sample on the array. In some embodiments, the detecting step comprises detecting a fluorescent signal or probe bound covalently or non-covalently to one or both domains; or detecting one or more copies of

本公开还涉及一种空间鉴定组织内细胞群的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中细胞的空间位置相关联。The present disclosure also relates to a method of spatially identifying a population of cells within a tissue, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue sample on the array.

本公开还涉及一种检测组织中的单细胞中的基因表达的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中细胞的空间位置相关联。The present disclosure also relates to a method of detecting gene expression in a single cell in a tissue, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue sample on the array.

本公开还涉及一种分离与组织内的空间位置对应的细胞的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织中细胞的空间位置相关联。The present disclosure also relates to a method of isolating cells corresponding to a spatial location within a tissue, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the cells in the tissue on the array.

本公开另外涉及一种检测器官中的间充质干细胞的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上器官的组织样品中间充质干细胞的空间位置相关联。The present disclosure additionally relates to a method of detecting mesenchymal stem cells in an organ, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the mesenchymal stem cells in the tissue sample of the organ on the array.

本公开还涉及一种量化单细胞中的RNA表达的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中单细胞的空间位置相关联。The present disclosure also relates to a method of quantifying RNA expression in a single cell, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the single cell in the tissue sample on the array.

本公开还涉及一种量化与组织样品内的空间位置对应的RNA表达的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中RNA表达的空间位置相关联。The present disclosure also relates to a method of quantifying RNA expression corresponding to a spatial location within a tissue sample, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in a nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of RNA expression in the tissue sample on the array.

本公开还涉及一种制备组织样品内单细胞的核酸的方法,所述方法包括鉴定核酸样品中空间条形码结构域与细胞条形码结构域的组合的存在、不存在或数量。在一些实施方案中,所述方法还包括将空间条形码结构域和细胞条形码结构域的存在、不存在或数量与阵列上组织样品中核酸样品的空间位置相关联。The present disclosure also relates to a method of preparing nucleic acid from a single cell within a tissue sample, the method comprising identifying the presence, absence or amount of a combination of a spatial barcode domain and a cellular barcode domain in the nucleic acid sample. In some embodiments, the method further comprises correlating the presence, absence or amount of the spatial barcode domain and the cellular barcode domain with the spatial location of the nucleic acid sample in the tissue sample on the array.

本公开涉及一种获得单细胞的转录组的方法,所述方法包括:The present disclosure relates to a method of obtaining a transcriptome of a single cell, the method comprising:

(a)使样品与阵列接触,所述阵列包含多个孔,所述孔包含一个或多个空间引物和/或条形码;(a) contacting the sample with an array comprising a plurality of wells comprising one or more spatial primers and/or barcodes;

(b)从每个孔中的样品中分离RNA;(b) isolating RNA from the sample in each well;

(c)通过将每个孔中的一种或多种引物与分离的RNA退火来扩增RNA,对分离的RNA进行定量PCR;(c) performing quantitative PCR on the isolated RNA by amplifying the RNA by annealing one or more primers in each well to the isolated RNA;

(d)将分离的RNA的扩增产物与处于与样品内的位置对应的位置处的细胞相关联。(d) associating amplification products of the isolated RNA with cells at locations corresponding to locations within the sample.

在一些实施方案中,细胞是间充质细胞、癌细胞、肝细胞或脾细胞。在一些实施方案中,孔包含1个、2个、3个、4个、5个、6个、7个、8个、9个或10个细胞。在一些实施方案中,所述方法还包括在每个孔上重复这些步骤以创建表达谱;以及计算由每个孔中的细胞数量加权的每个孔的表达谱的平均表达平均值。In some embodiments, the cells are mesenchymal cells, cancer cells, hepatocytes, or splenocytes. In some embodiments, the well comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 cells. In some embodiments, the method further comprises repeating the steps on each well to create an expression profile; and calculating an average expression mean of the expression profile for each well weighted by the number of cells in each well.

在一些实施方案中,所述方法还包括计算接近度分数的步骤。在一些实施方案中,计算接近度分数的步骤包括进行本说明书第88页上的分析。在一些实施方案中,所述方法还包括进行轨迹干涉分析。In some embodiments, the method further includes the step of calculating a proximity score. In some embodiments, the step of calculating a proximity score comprises performing the analysis on page 88 of this specification. In some embodiments, the method further includes performing trajectory interference analysis.

本公开涉及一种获得单细胞的转录组的方法,所述方法包括:The present disclosure relates to a method of obtaining a transcriptome of a single cell, the method comprising:

(a)使样品与阵列接触,所述阵列包含多个孔,所述孔包含(a) contacting the sample with an array comprising a plurality of wells comprising

(b)从每个孔中的样品中分离RNA;(b) isolating RNA from the sample in each well;

(c)通过用每个孔中的一种或多种引物对RNA进行扩增,对分离的RNA进行定量PCR;(c) performing quantitative PCR on the isolated RNA by amplifying the RNA with one or more primers in each well;

(d)将RNA的扩增产物与处于与样品内的位置对应的位置处的细胞相关联;(d) associating the amplification product of the RNA with the cell at a location corresponding to the location within the sample;

其中每个孔包含与阵列内条形码和引物的位置对应的条形码和引物。Each of these wells contains a barcode and primer corresponding to the location of the barcode and primer within the array.

如本文所用,术语“条形码”是指能够鉴定核酸片段的来源的任何独特的非天然存在的核酸序列。条形码序列提供与例如DNA、RNA、cDNA、细胞或细胞核相关的条形码的高质量单独读取,从而可以对多个物种一起测序。As used herein, the term "barcode" refers to any unique non-naturally occurring nucleic acid sequence that enables identification of the source of a nucleic acid fragment. The barcode sequence provides high quality individual reads of barcodes associated with eg DNA, RNA, cDNA, cells or nuclei so that multiple species can be sequenced together.

条形码化可基于专利公布WO 2014/047561 A1中公开的任何组合物或方法进行,所述专利公布通过引用整体并入本文。不受理论束缚,来自单细胞或细胞核的扩增序列可以一起测序并基于与每个细胞或细胞核相关的条形码进行解析。还描述了其它条形码化设计和工具(参见例如Birrell等人,(2001)Proc.Natl.Acad.Sci.USA 98:12608-12613;Giaever等人,(2002)Nature 418:387-391;Winzeler等人,(1999)Science 285:901-906;以及Xu等人,(2009)Proc.Natl.Acad.Sci.USA.106:2289-2294)。Barcoding can be performed based on any of the compositions or methods disclosed in patent publication WO 2014/047561 A1, which is hereby incorporated by reference in its entirety. Without being bound by theory, amplified sequences from single cells or nuclei can be sequenced together and resolved based on the barcodes associated with each cell or nucleus. Other barcoding designs and tools are also described (see, e.g., Birrell et al., (2001) Proc. Natl. Acad. Sci. USA 98:12608-12613; Giaever et al., (2002) Nature 418:387-391; Winzeler et al. (1999) Science 285:901-906; and Xu et al., (2009) Proc. Natl. Acad. Sci. USA. 106:2289-2294).

本公开的第一种条形码化的索引引物称为“空间索引引物”。如本文所用,“空间索引引物”是指用于从位于组织样品,例如薄组织样品切片或“切片”中不同位置处的所有单细胞捕获转录物并进行标记的引物或寡核苷酸。用于分析的组织样品或切片以高度并行化的方式产生,以便保留切片中的空间信息。对每个细胞捕获的RNA分子,优选mRNA,或“转录组”,随后被转录成cDNA分子,并且例如通过高通量测序分析所得的cDNA分子。通过经由空间索引引物掺入排列的核酸中的条形码序列(或ID标签,本文定义为空间条形码结构域),可以将所得数据与原始组织样品(例如切片)的图像相关联。The first barcoded indexing primers of the present disclosure are referred to as "spatial indexing primers". As used herein, "spatially indexed primer" refers to a primer or oligonucleotide used to capture and label transcripts from all single cells located at different locations in a tissue sample, eg, a thin tissue sample section or "section". Tissue samples or slices for analysis are generated in a highly parallelized manner so that spatial information in the slices is preserved. The RNA molecules, preferably mRNA, or "transcriptome" captured for each cell are subsequently transcribed into cDNA molecules, and the resulting cDNA molecules are analyzed, eg, by high-throughput sequencing. By incorporating barcode sequences (or ID tags, defined herein as spatial barcode domains) into the aligned nucleic acids via spatially indexed primers, the resulting data can be correlated with images of the original tissue sample (eg, section).

为了实现所有这些功能,根据本公开的每个“空间索引引物”包括至少两个结构域,捕获结构域和空间条形码结构域(或空间标签)。空间索引引物还可以包含如下进一步定义的通用结构域。To achieve all these functions, each "spatial index primer" according to the present disclosure includes at least two domains, a capture domain and a spatial barcode domain (or spatial tag). Spatially indexed primers may also comprise a generic domain as further defined below.

在一些实施方案中,捕获结构域位于空间索引引物的3’末端并且包含可以通过例如模板依赖性聚合延伸的游离3’末端。捕获结构域包含能够与接触阵列的组织样品的细胞中存在的核酸、例如RNA(优选mRNA)杂交的核苷酸序列。在优选转录剖析的一些实施方案中,捕获结构域可以包含聚胸苷序列,例如聚-T(或“聚-T样”)寡核苷酸,单独或与随机寡核苷酸序列结合。如果使用的话,随机寡核苷酸序列可以例如位于聚-T序列的5’或3’,例如空间索引引物的3’末端。In some embodiments, the capture domain is located at the 3' end of the spatially indexed primer and comprises a free 3' end that can be extended, for example, by template-dependent polymerization. The capture domain comprises a nucleotide sequence capable of hybridizing to nucleic acid, such as RNA (preferably mRNA), present in cells of a tissue sample contacting the array. In some embodiments where transcriptional profiling is preferred, the capture domain may comprise a polythymidine sequence, such as a poly-T (or "poly-T-like") oligonucleotide, alone or in combination with a random oligonucleotide sequence. If used, the random oligonucleotide sequence may, for example, be located 5' or 3' to the poly-T sequence, such as the 3' end of the spatially indexed primer.

在一些实施方案中,空间索引引物的空间条形码结构域(或空间标签)包含阵列的每个微孔独有并且充当位置或空间标记(鉴定标签)的核苷酸序列。这样,组织样品的每个区域或结构域,例如组织中的每个细胞,都可以通过将来自某个细胞的核酸(例如RNA或转录物)与空间索引引物中的独特空间条形码结构域序列相联的阵列的空间分辨率来鉴定。借助空间条形码结构域,阵列中的空间索引引物可以与组织样品中的位置相关联,例如,它可以与组织样品中的细胞相关联。在一些实施方案中,特定位置处的空间分辨率为约0.1μm2至约1cm2。在一些实施方案中,特定位置处的空间分辨率为约0.1μm2。在一些实施方案中,特定位置处的空间分辨率为约0.2μm2。在一些实施方案中,特定位置处的空间分辨率为约0.5μm2。在一些实施方案中,特定位置处的空间分辨率为约0.75μm2。在一些实施方案中,特定位置处的空间分辨率为约1μm2。在一些实施方案中,特定位置处的空间分辨率为约2μm2。在一些实施方案中,特定位置处的空间分辨率为约5μm2。在一些实施方案中,特定位置处的空间分辨率为约10μm2。在一些实施方案中,特定位置处的空间分辨率为约20μm2。在一些实施方案中,特定位置处的空间分辨率为约30μm2。在一些实施方案中,特定位置处的空间分辨率为约50μm2。在一些实施方案中,特定位置处的空间分辨率为约80μm2。在一些实施方案中,特定位置处的空间分辨率为约100μm2。在一些实施方案中,特定位置处的空间分辨率为约150μm2。在一些实施方案中,特定位置处的空间分辨率为约200μm2。在一些实施方案中,特定位置处的空间分辨率为约500μm2。在一些实施方案中,特定位置处的空间分辨率为约750μm2。在一些实施方案中,特定位置处的空间分辨率为约1cm2In some embodiments, the spatial barcode domain (or spatial tag) of the spatially indexed primer comprises a nucleotide sequence that is unique to each well of the array and serves as a positional or spatial marker (identification tag). In this way, each region or domain of a tissue sample, such as each cell in a tissue, can be identified by associating nucleic acid (e.g., RNA or transcript) from a cell with a unique spatially barcoded domain sequence in spatially indexed primers. The spatial resolution of linked arrays is identified. Using the spatial barcode domain, a spatially indexed primer in the array can be associated with a location in a tissue sample, eg, it can be associated with a cell in a tissue sample. In some embodiments, the spatial resolution at a particular location is from about 0.1 μm 2 to about 1 cm 2 . In some embodiments, the spatial resolution at a particular location is about 0.1 μm 2 . In some embodiments, the spatial resolution at a particular location is about 0.2 μm 2 . In some embodiments, the spatial resolution at a particular location is about 0.5 μm 2 . In some embodiments, the spatial resolution at a particular location is about 0.75 μm 2 . In some embodiments, the spatial resolution at a particular location is about 1 μm 2 . In some embodiments, the spatial resolution at a particular location is about 2 μm 2 . In some embodiments, the spatial resolution at a particular location is about 5 μm 2 . In some embodiments, the spatial resolution at a particular location is about 10 μm 2 . In some embodiments, the spatial resolution at a particular location is about 20 μm 2 . In some embodiments, the spatial resolution at a particular location is about 30 μm 2 . In some embodiments, the spatial resolution at a particular location is about 50 μm 2 . In some embodiments, the spatial resolution at a particular location is about 80 μm 2 . In some embodiments, the spatial resolution at a particular location is about 100 μm 2 . In some embodiments, the spatial resolution at a particular location is about 150 μm 2 . In some embodiments, the spatial resolution at a particular location is about 200 μm 2 . In some embodiments, the spatial resolution at a particular location is about 500 μm 2 . In some embodiments, the spatial resolution at a particular location is about 750 μm 2 . In some embodiments, the spatial resolution at a particular location is about 1 cm 2 .

任何合适的序列都可以用作根据本公开的空间索引引物中的空间条形码结构域。合适的序列意指空间条形码结构域不干扰(即抑制或扭曲)组织样品的RNA与空间索引引物的捕获结构域之间的相互作用。例如,空间条形码结构域的设计应使组织样品中的核酸分子不与空间条形码结构域或其互补部分特异性或基本上杂交。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的空间条形码结构域的核苷酸序列或其互补序列具有小于约80%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的空间条形码结构域的核苷酸序列或其互补序列具有小于约70%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的空间条形码结构域的核苷酸序列或其互补序列具有小于约60%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的空间条形码结构域的核苷酸序列或其互补序列具有小于约50%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的空间条形码结构域的核苷酸序列或其互补序列具有小于约40%序列同一性。序列同一性可以通过本领域已知的任何合适的方法,例如使用BLAST比对算法来确定。Any suitable sequence can be used as the spatial barcode domain in the spatially indexed primers according to the present disclosure. A suitable sequence means that the spatial barcode domain does not interfere (ie inhibit or distort) the interaction between the RNA of the tissue sample and the capture domain of the spatially indexed primer. For example, the spatial barcode domain is designed such that nucleic acid molecules in the tissue sample do not specifically or substantially hybridize to the spatial barcode domain or its complement. In some embodiments, the nucleotide sequence of the spatial barcode domain of the spatially indexed primer, or its complement, has less than about 80% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the spatial barcode domain of the spatially indexed primer, or its complement, has less than about 70% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the spatial barcode domain of the spatially indexed primer, or its complement, has less than about 60% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the spatial barcode domain of the spatially indexed primer, or its complement, has less than about 50% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the spatial barcode domain of the spatially indexed primer, or its complement, has less than about 40% sequence identity among the majority of nucleic acid molecules in the tissue sample. Sequence identity may be determined by any suitable method known in the art, for example using the BLAST alignment algorithm.

可以使用随机序列生成来生成空间索引引物的空间条形码结构域的核苷酸序列。随机生成的序列之后可以通过映射到所有常见参考物种的基因组,并利用预设的Tm间隔、GC含量和与其它条形码序列的界定差异距离进行严格过滤,以确保条形码序列不会干扰从组织样品中捕获核酸,例如RNA,并且将毫无困难地相互区分。Random sequence generation can be used to generate the nucleotide sequence of the spatial barcode domain of the spatially indexed primer. Randomly generated sequences can then be rigorously filtered by mapping to the genomes of all common reference species using preset Tm intervals, GC content, and defined divergence distances from other barcode sequences to ensure that barcode sequences do not interfere from tissue samples Nucleic acids, such as RNA, are captured and will be distinguished from one another without difficulty.

如上所提到,在一些实施方案中,空间索引引物还包含通用结构域。在一些实施方案中,空间索引引物的通用结构域直接位于空间条形码结构域的上游或间接位于空间条形码结构域的上游,即更靠近空间索引引物的5’末端。在一些实施方案中,通用结构域与空间条形码结构域直接相邻,即在空间条形码结构域与通用结构域之间没有中间序列。在空间索引引物包含通用结构域的实施方案中,该结构域可以形成空间索引引物的5’末端,其可以直接或间接固定在阵列的基底上。As mentioned above, in some embodiments, the spatially indexed primer also comprises a universal domain. In some embodiments, the universal domain of the spatially indexed primer is located directly upstream of the spatial barcode domain or indirectly upstream of the spatially barcode domain, i.e., closer to the 5' end of the spatially indexed primer. In some embodiments, the universal domain is directly adjacent to the spatial barcode domain, ie, there is no intervening sequence between the spatial barcode domain and the universal domain. In embodiments where the spatially indexed primer comprises a universal domain, this domain may form the 5' end of the spatially indexed primer, which may be immobilized directly or indirectly on the substrate of the array.

如本文别处所述,随后对从空间索引引物的捕获结构域捕获的RNA分子、优选mRNA获得的cDNA分子进行测序和分析。因此,在一些实施方案中,包含在空间索引引物中的通用结构域可以包含退火结构域,该退火结构域包含被第一测序引物识别的核苷酸。为了以高通量方式对cDNA分子进行测序和分析,在一些实施方案中,每个空间索引引物中的退火结构域优选包含相同的核苷酸序列。The cDNA molecules obtained from the RNA molecules, preferably mRNA, captured by the capture domains of the spatially indexed primers are then sequenced and analyzed as described elsewhere herein. Thus, in some embodiments, the universal domain comprised in the spatially indexed primer can comprise an annealing domain comprising nucleotides recognized by the first sequencing primer. In order to sequence and analyze cDNA molecules in a high-throughput manner, in some embodiments, the annealing domains in each spatially indexed primer preferably comprise the same nucleotide sequence.

任何合适的序列都可以用作本公开的空间索引引物中的退火结构域。合适的序列意指退火结构域不干扰(即抑制或扭曲)组织样品的核酸,例如RNA与空间索引引物的捕获结构域之间的相互作用。此外,退火结构域应包含与组织样品的核酸,例如RNA中的任何序列不相同或基本上不相同的核苷酸序列,以使得用于测序的引物可以在用于测序的条件下仅与退火结构域杂交。Any suitable sequence can be used as the annealing domain in the spatially indexed primers of the present disclosure. Appropriate sequence means that the annealing domain does not interfere (ie inhibit or distort) the nucleic acid of the tissue sample, eg the interaction between RNA and the capture domain of the spatially indexed primer. In addition, the annealing domain should comprise a nucleotide sequence that is not identical or not substantially identical to any sequence in the nucleic acid of the tissue sample, such as RNA, such that primers used for sequencing can only anneal to Domain hybridization.

例如,退火结构域的设计应使组织样品中的核酸分子不与退火结构域或其互补物特异性杂交。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的退火结构域的核苷酸序列或其互补序列具有小于约80%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的退火结构域的核苷酸序列或其互补序列具有小于约70%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的退火结构域的核苷酸序列或其互补序列具有小于约60%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的退火结构域的核苷酸序列或其互补序列具有小于约50%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,空间索引引物的退火结构域的核苷酸序列或其互补序列具有小于约40%序列同一性。序列同一性可以通过本领域已知的任何合适的方法,例如使用BLAST比对算法来确定。For example, the annealing domain is designed such that nucleic acid molecules in the tissue sample do not specifically hybridize to the annealing domain or its complement. In some embodiments, the nucleotide sequence of the annealing domain of the spatially indexed primer, or its complement, has less than about 80% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the spatially indexed primer, or its complement, has less than about 70% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the spatially indexed primer, or its complement, has less than about 60% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the spatially indexed primer, or its complement, has less than about 50% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the spatially indexed primer, or its complement, has less than about 40% sequence identity among the majority of nucleic acid molecules in the tissue sample. Sequence identity may be determined by any suitable method known in the art, for example using the BLAST alignment algorithm.

本公开的第二条形码索引引物称为“细胞索引引物”。如本文所用,“细胞索引引物”是指用于扩增从逆转录获得的cDNA分子并用多孔板的每个孔独有的第二索引条形码(本文定义为细胞条形码结构域)标记每个扩增的cDNA分子的引物或寡核苷酸。如本文别处所述,扩增从逆转录获得的cDNA分子的这一PCR扩增步骤在多孔板上进行,而不是在阵列上进行,在该阵列上,本公开的第一条形码索引引物通过空间索引引物掺入排列的核酸中。The second barcode index primer of the present disclosure is referred to as a "cell index primer". As used herein, "cell indexing primer" refers to a primer used to amplify cDNA molecules obtained from reverse transcription and to label each amplified with a second index barcode (herein defined as the cell barcode domain) unique to each well of a multi-well plate. primers or oligonucleotides for cDNA molecules. As described elsewhere herein, this PCR amplification step, which amplifies cDNA molecules obtained from reverse transcription, is performed on multiwell plates rather than on arrays on which the disclosed first barcode indexed primers pass through the space Index primers are incorporated into the arrayed nucleic acids.

根据本公开,每个“细胞索引引物”包含至少一个称为“细胞条形码结构域”(或细胞标签)的结构域。细胞索引引物还可包含如下进一步定义的通用结构域。According to the present disclosure, each "cell index primer" comprises at least one domain called "cell barcode domain" (or cell tag). Cell indexing primers may also comprise a generic domain as further defined below.

细胞索引引物的细胞条形码结构域(或细胞标签)包含多孔板的每个孔特有的并且充当位于多孔板的任何给定孔中的细胞的鉴定标签的核苷酸序列。这样,每个孔中PCR扩增得到的所有PCR产物都标记有相同的细胞条形码结构域。因此,可以基于特定空间条形码结构域与特定细胞条形码结构域的组合来鉴定阵列上特定位置的单细胞的转录物。本公开涉及一种基因表达的空间识别方法,其包括鉴定空间条形码结构域和特定细胞条形码结构域。The cell barcode domain (or cell tag) of the cell index primer comprises a nucleotide sequence that is unique to each well of the multi-well plate and serves as an identification tag for the cells located in any given well of the multi-well plate. In this way, all PCR products amplified by PCR in each well are tagged with the same cellular barcode domain. Thus, transcripts from single cells at specific locations on the array can be identified based on the combination of specific spatial barcode domains and cell-specific barcode domains. The present disclosure relates to a method for spatial identification of gene expression, which includes identifying spatial barcode domains and cell-specific barcode domains.

任何合适的序列都可以用作根据本公开的细胞索引引物中的细胞条形码结构域。合适的序列是指例如细胞条形码结构域的设计应使从逆转录获得的cDNA分子不与细胞条形码结构域或其互补物特异性或基本上杂交。在一些实施方案中,在从逆转录获得的大部分cDNA分子中,细胞索引引物的细胞条形码结构域的核苷酸序列或其互补序列具有小于约80%序列同一性。在一些实施方案中,在从逆转录获得的大部分cDNA分子中,细胞索引引物的细胞条形码结构域的核苷酸序列或其互补序列具有小于约70%序列同一性。在一些实施方案中,在从逆转录获得的大部分cDNA分子中,细胞索引引物的细胞条形码结构域的核苷酸序列或其互补序列具有小于约60%序列同一性。在一些实施方案中,在从逆转录获得的大部分cDNA分子中,细胞索引引物的细胞条形码结构域的核苷酸序列或其互补序列具有小于约50%序列同一性。在一些实施方案中,在从逆转录获得的大部分cDNA分子中,细胞索引引物的细胞条形码结构域的核苷酸序列或其互补序列具有小于约40%序列同一性。序列同一性可以通过本领域已知的任何合适的方法,例如使用BLAST比对算法来确定。Any suitable sequence can be used as the cellular barcode domain in the cellular indexing primers according to the present disclosure. A suitable sequence means, for example, that the cellular barcode domain is designed such that the cDNA molecule obtained from reverse transcription does not specifically or substantially hybridize to the cellular barcode domain or its complement. In some embodiments, the nucleotide sequence of the cellular barcode domain of the cellular indexing primer, or its complement, has less than about 80% sequence identity among the majority of cDNA molecules obtained from reverse transcription. In some embodiments, the nucleotide sequence of the cellular barcode domain of the cellular indexing primer, or its complement, has less than about 70% sequence identity among the majority of cDNA molecules obtained from reverse transcription. In some embodiments, the nucleotide sequence of the cellular barcode domain of the cellular indexing primer, or its complement, has less than about 60% sequence identity among the majority of cDNA molecules obtained from reverse transcription. In some embodiments, the nucleotide sequence of the cellular barcode domain of the cellular indexing primer, or its complement, has less than about 50% sequence identity among the majority of cDNA molecules obtained from reverse transcription. In some embodiments, the nucleotide sequence of the cellular barcode domain of the cellular indexing primer, or its complement, has less than about 40% sequence identity among the majority of cDNA molecules obtained from reverse transcription. Sequence identity may be determined by any suitable method known in the art, for example using the BLAST alignment algorithm.

可以使用随机序列生成来生成细胞索引引物的细胞条形码结构域的核苷酸序列。随机生成的序列之后可以通过映射到所有常见参考物种的基因组,并利用预设的Tm间隔、GC含量和与其它条形码序列的界定差异距离进行严格过滤,以确保条形码序列不会与从逆转录获得的cDNA分子杂交,并且将毫无困难地相互区分。Random sequence generation can be used to generate the nucleotide sequence of the cellular barcode domain of the cellular index primer. Randomly generated sequences can then be rigorously filtered by mapping to the genomes of all common reference species using preset Tm intervals, GC content, and defined divergence distances from other barcode sequences to ensure that barcode sequences do not differ from those obtained from reverse transcription. cDNA molecules hybridize and will have no difficulty in distinguishing from one another.

如上所提到,细胞索引引物还可包含通用结构域。细胞索引引物的通用结构域直接位于细胞条形码结构域的上游或间接位于细胞条形码结构域的上游,即更靠近细胞索引引物的5’末端。在一些实施方案中,通用结构域与细胞条形码结构域直接相邻,即在细胞条形码结构域与通用结构域之间没有中间序列。在细胞索引引物包含通用结构域的实施方案中,该结构域将形成细胞索引引物的5’末端,其可以直接或间接固定在多孔板的基底上。As mentioned above, cell indexing primers may also comprise a universal domain. The universal domain of the cell index primer is located directly upstream of the cell barcode domain or indirectly upstream of the cell barcode domain, i.e. closer to the 5' end of the cell index primer. In some embodiments, the universal domain is directly adjacent to the cellular barcode domain, ie, there is no intervening sequence between the cellular barcode domain and the universal domain. In embodiments where the cell indexing primer comprises a universal domain, this domain will form the 5' end of the cell indexing primer, which can be immobilized directly or indirectly on the substrate of the multiwell plate.

如本文别处所述,随后对从逆转录、然后PCR扩增获得的cDNA分子进行测序和分析。因此,在一些实施方案中,包含在细胞索引引物中的通用结构域可以包含退火结构域,该退火结构域包含被第二测序引物识别的核苷酸序列。为了以高通量方式对cDNA分子进行测序和分析,在一些实施方案中,每个细胞索引引物中的退火结构域优选包含相同的核苷酸序序列。The cDNA molecules obtained from reverse transcription followed by PCR amplification were subsequently sequenced and analyzed as described elsewhere herein. Thus, in some embodiments, the universal domain comprised in the cell indexing primer can comprise an annealing domain comprising a nucleotide sequence recognized by the second sequencing primer. In order to sequence and analyze cDNA molecules in a high-throughput manner, in some embodiments, the annealing domains in each cell index primer preferably comprise the same nucleotide sequence.

任何合适的序列都可以用作本公开的细胞索引引物中的退火结构域。合适的序列意指例如任何给定细胞索引引物的退火结构域应包含与从逆转录获得的cDNA分子中的任何序列不相同或基本上不相同的核苷酸序列,以使得用于测序的引物可以在用于测序的条件下仅与退火结构域杂交。Any suitable sequence can be used as the annealing domain in the cell indexing primers of the present disclosure. Suitable sequence means, for example, that the annealing domain of any given cell index primer should comprise a nucleotide sequence that is not identical or substantially identical to any sequence in the cDNA molecule obtained from reverse transcription, such that the primer used for sequencing It is possible to hybridize only to the annealing domain under the conditions used for sequencing.

例如,退火结构域的设计应使组织样品中的核酸分子不与退火结构域或其互补序列特异性杂交。在一些实施方案中,在组织样品中的大部分核酸分子中,细胞索引引物的退火结构域的核苷酸序列或其互补序列具有小于约90%、85%、80%、75%或70%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,细胞索引引物的退火结构域的核苷酸序列或其互补序列具有小于约70%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,细胞索引引物的退火结构域的核苷酸序列或其互补序列具有小于约60%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,细胞索引引物的退火结构域的核苷酸序列或其互补序列具有小于约50%序列同一性。在一些实施方案中,在组织样品中的大部分核酸分子中,细胞索引引物的退火结构域的核苷酸序列或其互补序列具有小于约40%序列同一性。序列同一性可以通过本领域已知的任何合适的方法,例如使用BLAST比对算法来确定。For example, the annealing domain is designed such that nucleic acid molecules in the tissue sample do not specifically hybridize to the annealing domain or its complement. In some embodiments, the nucleotide sequence of the annealing domain of the cell indexing primer or its complement has less than about 90%, 85%, 80%, 75%, or 70% of the majority of the nucleic acid molecules in the tissue sample. sequence identity. In some embodiments, the nucleotide sequence of the annealing domain of the cell indexing primer, or its complement, has less than about 70% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the cell indexing primer, or its complement, has less than about 60% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the cell indexing primer, or its complement, has less than about 50% sequence identity among the majority of nucleic acid molecules in the tissue sample. In some embodiments, the nucleotide sequence of the annealing domain of the cell indexing primer, or its complement, has less than about 40% sequence identity among the majority of nucleic acid molecules in the tissue sample. Sequence identity may be determined by any suitable method known in the art, for example using the BLAST alignment algorithm.

根据本公开的阵列或微孔阵列可以含有多个或多个微孔。微孔可以由阵列上的体积、面积或不同位置来定义。在一些实施方案中,单一种类的空间索引引物被固定或在溶液中。在一些实施方案中,本公开涉及一种包含阵列的系统,其中该阵列包含6个、12个、24个、48个、96个、192个或更多个微孔。在一些实施方案中,每个微孔将包含多个相同种类的空间索引引物分子。在此上下文中将理解,虽然涵盖相同种类的每个空间索引引物可具有相同序列,但不一定是这种情况。在一些实施方案中,每个种类的空间索引引物将具有相同的空间条形码结构域(即一个种类的每个成员,因此微孔中的每个引物将被相同地被“标记”),但是微孔的每个成员(种类)的序列可能不同,因为捕获结构域的序列可能不同。如上所述,随机核酸序列可以包括在捕获结构域中。Arrays or microwell arrays according to the present disclosure may contain multiple or multiple microwells. Microwells can be defined by volume, area, or distinct locations on the array. In some embodiments, a single species of spatially indexed primer is immobilized or in solution. In some embodiments, the present disclosure relates to a system comprising an array, wherein the array comprises 6, 12, 24, 48, 96, 192 or more microwells. In some embodiments, each microwell will contain a plurality of spatially indexed primer molecules of the same kind. It will be understood in this context that while each spatially indexed primer encompassing the same species may have the same sequence, this is not necessarily the case. In some embodiments, each species of spatially indexed primer will have the same spatial barcode domain (i.e., each member of a species, and thus each primer in a microwell will be identically "labeled"), but the microwell The sequence of each member (species) of the pore may be different, as the sequence of the capture domain may be different. As mentioned above, random nucleic acid sequences can be included in the capture domain.

在一些实施方案中,微孔内的空间索引引物可以包含不同的随机序列。阵列上微孔的数量和密度将决定阵列的分辨率,即可以分析组织样品转录组的细节程度。更高密度的微孔通常会增加阵列的分辨率。如上所提到,本公开的方法提供了基于空间条形码结构域与细胞条形码结构域的特定组合的基因表达的空间识别,本公开提供了单细胞水平的分辨率。然而,组织分辨率将取决于微孔的尺寸。因此,在一些实施方案中,阵列包含多个微孔,每个微孔彼此等距并且包含约100至400微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约100至400微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约10至400微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约20至约400微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约50至约400微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约75至约350微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约100至370微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约300至约375微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约340至约360微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含约5至约100微升的体积。在一些实施方案中,所述阵列包含多个微孔,每个微孔彼此等距(如通过每个孔的中心所测量)并且包含固定在阵列的每个微孔的底部上的条形码索引引物。In some embodiments, spatially indexed primers within a microwell may comprise different random sequences. The number and density of microwells on the array will determine the resolution of the array, that is, the level of detail with which the transcriptome of a tissue sample can be analyzed. A higher density of microwells generally increases the resolution of the array. As mentioned above, the methods of the present disclosure provide spatial identification of gene expression based on specific combinations of spatial and cellular barcode domains, and the present disclosure provides resolution at the single-cell level. However, tissue resolution will depend on the size of the microwells. Thus, in some embodiments, the array comprises a plurality of microwells, each equidistant from each other and comprising a volume of about 100 to 400 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 100 to 400 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 10 to 400 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 20 to about 400 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 50 to about 400 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 75 to about 350 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 100 to 370 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 300 to about 375 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 340 to about 360 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a volume of about 5 to about 100 microliters. In some embodiments, the array comprises a plurality of microwells, each equidistant from each other (as measured through the center of each well) and comprising a barcoded index primer affixed to the bottom of each microwell of the array .

在一些实施方案中,所述方法能够以样品的约0.1μm2至约1cm2的样品特定位置处的空间分辨率检测表达谱。在一些实施方案中,样品特定位置处的空间分辨率为约0.1μm2。在一些实施方案中,样品特定位置处的空间分辨率为约0.2μm2。在一些实施方案中,样品特定位置处的空间分辨率为约0.5μm2。在一些实施方案中,样品特定位置处的空间分辨率为约0.75μm2。在一些实施方案中,样品特定位置处的空间分辨率为约1μm2。在一些实施方案中,样品特定位置处的空间分辨率为约2μm2。在一些实施方案中,样品特定位置处的空间分辨率为约5μm2。在一些实施方案中,样品特定位置处的空间分辨率为约10μm2。在一些实施方案中,样品特定位置处的空间分辨率为约20μm2。在一些实施方案中,样品特定位置处的空间分辨率为约30μm2。在一些实施方案中,样品特定位置处的空间分辨率为约50μm2。在一些实施方案中,样品特定位置处的空间分辨率为约80μm2。在一些实施方案中,样品特定位置处的空间分辨率为约100μm2。在一些实施方案中,样品特定位置处的空间分辨率为约150μm2。在一些实施方案中,样品特定位置处的空间分辨率为约200μm2。在一些实施方案中,样品特定位置处的空间分辨率为约500μm2。在一些实施方案中,样品特定位置处的空间分辨率为约750μm2。在一些实施方案中,样品特定位置处的空间分辨率为约1cm2In some embodiments, the method is capable of detecting expression profiles with a spatial resolution at a sample-specific location of about 0.1 μm 2 to about 1 cm 2 of the sample. In some embodiments, the spatial resolution at a particular location of the sample is about 0.1 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 0.2 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 0.5 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 0.75 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 1 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 2 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 5 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 10 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 20 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 30 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 50 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 80 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 100 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 150 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 200 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 500 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 750 μm 2 . In some embodiments, the spatial resolution at a particular location of the sample is about 1 cm 2 .

如上所提到,本公开的阵列上的微孔的尺寸和数量将取决于样品的性质和所需的分辨率。例如,如果样品含有大的细胞,那么阵列上的微孔的数量和/或密度可能会降低(即低于可能的最大微孔数)和/或微孔的尺寸可以增加(即每个微孔的面积可以大于最小可能微孔),例如包含几个大微孔的阵列。或者,如果需要提高分辨率或组织样品含有小的细胞,则可能需要使用可能的最大微孔数,这将需要使用最小可能微孔尺寸,例如包含许多小微孔的阵列。As mentioned above, the size and number of microwells on an array of the present disclosure will depend on the nature of the sample and the desired resolution. For example, if the sample contains large cells, the number and/or density of microwells on the array may be reduced (i.e., below the maximum possible number of microwells) and/or the size of the microwells may be increased (i.e., each microwell can be larger than the smallest possible microwell), such as an array containing several large microwells. Alternatively, if increased resolution is required or the tissue sample contains small cells, it may be desirable to use the largest possible number of wells, which would require the use of the smallest possible well size, such as an array containing many small wells.

因此,在一些实施方案中,本公开的阵列可含有至少约2个、约5个、约10个、约50个、约100个、约500个、约750个、约1000个、约1500个、约2000个、约2500个、约3000个、约3500个、约4000个、约4500个或约5000个微孔。在其它实施方案中,可以制备具有超过约5000个微孔的阵列,并且设想这样的阵列并且在本公开的范围内。如上所述,可以减小微孔尺寸并且这可以允许在相同或相似区域内容纳更大数量的微孔。举例来说,这些微孔可以包含在小于约20cm2、约10cm2、约5cm2、约1cm2、约1mm2或约100μm2的区域中。Thus, in some embodiments, arrays of the present disclosure may contain at least about 2, about 5, about 10, about 50, about 100, about 500, about 750, about 1000, about 1500 , about 2000, about 2500, about 3000, about 3500, about 4000, about 4500 or about 5000 microwells. In other embodiments, arrays having greater than about 5000 microwells can be prepared, and such arrays are contemplated and within the scope of the present disclosure. As mentioned above, the microwell size can be reduced and this can allow a greater number of microwells to be accommodated in the same or similar area. For example, the microwells may be contained in an area of less than about 20 cm 2 , about 10 cm 2 , about 5 cm 2 , about 1 cm 2 , about 1 mm 2 , or about 100 μm 2 .

取决于微孔的尺寸和它们所在的区域,本公开的微孔的中心与中心间距可以为约50微米至约500微米。在一些实施方案中,微孔的中心与中心间距为约50微米。在一些实施方案中,微孔的中心与中心间距为约100微米。在一些实施方案中,微孔的中心与中心间距为约150微米。在一些实施方案中,微孔的中心与中心间距为约200微米。在一些实施方案中,微孔的中心与中心间距为约250微米。在一些实施方案中,微孔的中心与中心间距为约300微米。在一些实施方案中,微孔的中心与中心间距为约350微米。在一些实施方案中,微孔的中心与中心间距为约400微米。在一些实施方案中,微孔的中心与中心间距为约450微米。在一些实施方案中,微孔的中心与中心间距为约500微米。Depending on the size of the microwells and the area in which they are located, the center-to-center spacing of the microwells of the present disclosure can be from about 50 microns to about 500 microns. In some embodiments, the microwells are about 50 microns center-to-center. In some embodiments, the microwells are about 100 microns center-to-center. In some embodiments, the microwells are about 150 microns center-to-center. In some embodiments, the microwells are about 200 microns center-to-center. In some embodiments, the microwells have a center-to-center spacing of about 250 microns. In some embodiments, the microwells are about 300 microns center-to-center. In some embodiments, the microwells have a center-to-center spacing of about 350 microns. In some embodiments, the microwells have a center-to-center spacing of about 400 microns. In some embodiments, the microwells have a center-to-center spacing of about 450 microns. In some embodiments, the microwells are about 500 microns center-to-center.

本公开的微孔可以是任何期望的形状,包括但不限于堆叠的平面三角形、正方形、五边形、六边形或者是圆柱形的。在一些实施方案中,微孔是三角形的。在一些实施方案中,微孔是正方形的。在一些实施方案中,微孔的水平面是五边形的。在一些实施方案中,微孔是六边形的。在一些实施方案中,微孔是圆柱形的,底部是圆形底。Microwells of the present disclosure may be of any desired shape, including but not limited to stacked planar triangles, squares, pentagons, hexagons, or be cylindrical. In some embodiments, the microwells are triangular in shape. In some embodiments, the microwells are square. In some embodiments, the horizontal planes of the microwells are pentagonal. In some embodiments, the microwells are hexagonal. In some embodiments, the microwells are cylindrical with a circular bottom.

如附图所示,在一些实施方案中,根据本公开的微孔具有3维结构而不是2维平面。在一些实施方案中,本公开的微孔的深度为约5μm、约10μm、约50μm、约100μm、约150μm、约200μm、约250μm、约300μm、约350μm、约400μm、约450μm或约500μm。在其它实施例中,取决于应用和组织样品,可以制备具有深度大于约500μm的微孔的阵列,并且设想这种阵列并在本公开的范围内。在一些实施方案中,深度为约1μm至约1000μm。As shown in the Figures, in some embodiments, microwells according to the present disclosure have a 3-dimensional structure rather than a 2-dimensional plane. In some embodiments, the microwells of the present disclosure have a depth of about 5 μm, about 10 μm, about 50 μm, about 100 μm, about 150 μm, about 200 μm, about 250 μm, about 300 μm, about 350 μm, about 400 μm, about 450 μm, or about 500 μm. In other embodiments, depending on the application and tissue sample, arrays with microwells having a depth greater than about 500 μm can be prepared and such arrays are contemplated and within the scope of the present disclosure. In some embodiments, the depth is from about 1 μm to about 1000 μm.

根据本公开的阵列或微孔阵列可以使用本领域技术人员已知的任何合适的材料制造。通常,制造微孔阵列需要一个阳模和一个阴模。在一些实施方案中,可以使用例如具有微孔的硅片来制造作为微孔的反向模板的阴模。然后使用所得到的阴模在固体载体(例如玻璃、塑料或硅芯片或载玻片)上制造具有所需尺寸、形状和间距的微孔。微孔阵列制造的非限制性实例在以下实施例中提供并在图3中示出。Arrays or microwell arrays according to the present disclosure may be fabricated using any suitable material known to those skilled in the art. Typically, a male and a female mold are required to fabricate microwell arrays. In some embodiments, a negative mold that acts as an inverse template for the microwells can be fabricated using, for example, a silicon wafer with microwells. The resulting negative mold is then used to fabricate microwells of desired size, shape and spacing on a solid support such as a glass, plastic or silicon chip or slide. A non-limiting example of microwell array fabrication is provided in the Examples below and illustrated in FIG. 3 .

根据本公开的多孔板,根据定义,含有多个或多个孔。在一些实施方案中,本公开的多孔板含有约4个、约16个、约32个、约48个、约96个、约192个、约384个、约768个或约1536个孔。在其它实施方案中,可以使用具有超过约1536个孔的多孔板,并且设想这种多孔板并且在本公开的范围内。在一些实施方案中,本公开的多孔板是微板或微量滴定板。A multiwell plate according to the present disclosure, by definition, contains a plurality or multiple wells. In some embodiments, a multiwell plate of the present disclosure contains about 4, about 16, about 32, about 48, about 96, about 192, about 384, about 768, or about 1536 wells. In other embodiments, multiwell plates having more than about 1536 wells may be used and are contemplated and within the scope of the present disclosure. In some embodiments, multiwell plates of the present disclosure are microplates or microtiter plates.

与上述微孔类似,多孔板的每个孔可定义为微孔板上固定单一种类的细胞索引引物的区域或不同位置。因此,每个孔将包含相同种类的多个细胞索引引物分子。在此上下文中将理解,虽然涵盖相同种类的每个细胞索引引物可具有相同序列,但不一定是这种情况。每个种类的细胞索引引物将具有相同的细胞条形码结构域(即一个种类的每个成员,因此孔中的每个引物将被相同地被“标记”),但是孔的每个成员(种类)的序列可能不同。如上所述,细胞索引引物可以包含通用结构域,其可以直接或间接地与细胞条形码结构域相邻。因此,特定孔内的细胞索引引物可以包含在细胞条形码结构域与通用结构域之间的不同中间序列。Similar to the microwells described above, each well of a multiwell plate can be defined as a region or different positions on the microwell plate where a single type of cell indexing primer is fixed. Therefore, each well will contain multiple cell index primer molecules of the same species. It will be understood in this context that while each cell index primer encompassing the same species may have the same sequence, this is not necessarily the case. Each species of cell index primer will have the same cell barcode domain (i.e. each member of a species, so each primer in a well will be "labeled" identically), but each member of a well (species) The sequence may be different. As noted above, the cellular indexing primer may comprise a universal domain, which may be directly or indirectly adjacent to the cellular barcode domain. Thus, the cellular indexing primers within a particular well may contain different intermediate sequences between the cellular barcode domain and the general domain.

空间索引引物和细胞索引引物可以通过任何合适的方式分别连接至阵列的微孔或多孔板的孔。在一些实施方案中,空间索引引物和细胞索引引物通过化学固定而固定到微孔或孔。这可以是阵列或板的基底(支撑材料)与空间索引引物或细胞索引引物之间基于化学反应的相互作用。这样的化学反应通常不依赖于通过热或光输入能量,而是可以通过施加热(例如化学反应的某个最佳温度)或某些波长的光来增强。例如,可以在基底上的官能团与空间索引引物或细胞索引引物上的相应功能元件之间发生化学固定。空间索引引物或细胞索引引物中的这类相应功能元件可以是引物的固有化学基团,例如羟基,或者是另外引入的。这种官能团的一个实例是胺基。通常,待固定的空间索引引物或细胞索引引物包含官能胺基或被化学修饰以包含官能胺基。这种化学修饰的手段和方法是众所周知的。Spatial indexing primers and cellular indexing primers may be attached to microwells of the array or wells of a multiwell plate, respectively, by any suitable means. In some embodiments, the spatially indexed primers and the cellularly indexed primers are immobilized to the microwells or wells by chemical immobilization. This can be a chemical reaction based interaction between the substrate (support material) of the array or plate and the spatially or cellularly indexed primers. Such chemical reactions generally do not rely on inputting energy through heat or light, but can be enhanced by applying heat (such as a certain optimal temperature for the chemical reaction) or light of certain wavelengths. For example, chemical immobilization can occur between a functional group on a substrate and a corresponding functional element on a spatially or cellularly indexed primer. Such corresponding functional elements in spatially indexed primers or cellularly indexed primers may be intrinsic chemical groups of the primers, such as hydroxyl groups, or may be introduced in addition. An example of such a functional group is an amine group. Typically, the spatially or cellularly indexed primers to be immobilized contain functional amine groups or are chemically modified to contain functional amine groups. Means and methods for such chemical modification are well known.

可以使用待固定的空间索引引物或细胞索引引物内的这种官能团的定位来控制和塑造引物的结合行为和/或方向,例如官能团可以放置在空间索引引物或细胞索引引物的5’或3’末端或在引物的序列内。待固定的空间索引引物或细胞索引引物的典型基底包含能够与这类引物,例如与胺官能化核酸结合的部分。这种基底的实例是羧基、醛或环氧基底。这样的材料是本领域技术人员已知的。在通过引入胺基而具有化学反应性的引物与阵列基底之间赋予连接反应的官能团是本领域技术人员已知的。The positioning of such functional groups within the spatially indexed or cellularly indexed primers to be immobilized can be used to control and shape the binding behavior and/or orientation of the primers, e.g. functional groups can be placed 5' or 3' of the spatially indexed or cellularly indexed primers end or within the sequence of the primer. Typical substrates for spatially or cellularly indexed primers to be immobilized comprise moieties capable of binding such primers, for example to amine functionalized nucleic acids. Examples of such substrates are carboxyl, aldehyde or epoxy substrates. Such materials are known to those skilled in the art. Functional groups imparting ligation reactions between primers chemically reactive by introducing amine groups and array substrates are known to those skilled in the art.

可在上面固定空间索引引物或细胞索引引物的替代基底可能必须进行化学活化,例如通过活化阵列基底或板基底上可用的官能团。术语“活化的基底”是指其中相互作用或反应性化学官能团通过本领域技术人员已知的化学修饰程序建立或启用的材料。例如,包含羧基的基底必须在使用前活化。此外,有一些可用的基底含有可以与核酸引物中已经存在的特定部分反应的官能团。Alternative substrates on which spatially or cellularly indexed primers can be immobilized may have to be chemically activated, for example by activating functional groups available on the array substrate or plate substrate. The term "activated substrate" refers to a material in which interactive or reactive chemical functional groups are established or enabled by chemical modification procedures known to those skilled in the art. For example, substrates containing carboxyl groups must be activated prior to use. In addition, some available substrates contain functional groups that can react with specific moieties already present in the nucleic acid primer.

通常,基底是固体载体,从而允许核酸引物在基底上的准确且可追踪的定位。基底的一个实例是包含官能化学基团,例如胺基或胺官能化基团的固体材料或基底。本发明设想的基底是无孔基底。优选的无孔基底是玻璃、硅、聚-L-赖氨酸涂层材料、硝化纤维素、聚苯乙烯、环烯烃共聚物(COC)、环烯烃聚合物(COP)、聚丙烯、聚乙烯和聚碳酸酯。Typically, the substrate is a solid support, allowing accurate and traceable positioning of the nucleic acid primers on the substrate. An example of a substrate is a solid material or substrate comprising functional chemical groups, such as amine groups or amine functional groups. Substrates contemplated by the present invention are non-porous substrates. Preferred nonporous substrates are glass, silicon, poly-L-lysine coating material, nitrocellulose, polystyrene, cycloolefin copolymer (COC), cycloolefin polymer (COP), polypropylene, polyethylene and polycarbonate.

可以使用本领域技术人员已知的任何合适的材料。通常使用玻璃或聚苯乙烯。聚苯乙烯是一种适合结合带负电荷的大分子的疏水材料,因为它通常含有很少的亲水基团。对于固定在载玻片上的核酸,还已知通过增加玻璃表面的疏水性,可以增加核酸固定。这种增强可以允许相对更密集的包装形成。除了用聚-L-赖氨酸进行涂层或表面处理外,基底,特别是玻璃,可以通过硅烷化,例如用环氧-硅烷或氨基-硅烷,或通过甲硅烷基化或通过用聚丙烯酰胺处理进行处理。Any suitable material known to those skilled in the art may be used. Usually glass or polystyrene is used. Polystyrene is a suitable hydrophobic material for binding negatively charged macromolecules because it usually contains few hydrophilic groups. For nucleic acids immobilized on glass slides, it is also known that nucleic acid immobilization can be increased by increasing the hydrophobicity of the glass surface. This enhancement can allow relatively denser pack formation. In addition to coating or surface treatment with poly-L-lysine, substrates, especially glass, can be silanized, for example with epoxy-silane or amino-silane, or by silylation or by Amide treatment for treatment.

显然,来自任何生物体的组织样品都可以用于本公开的方法中。本公开的阵列允许捕获样品细胞中存在并且能够转录和/或翻译的任何核酸,例如mRNA分子。本公开的阵列和方法特别适用于分离和分析样品中细胞的转录组,其中转录组的空间分辨率是理想的,例如在细胞相互连接或与多个细胞直接接触的情况下。然而,本领域技术人员将清楚本公开的方法也可用于分析样品内不同细胞或细胞类型的转录组,即使所述细胞不直接相互作用,例如血液样品。换句话说,细胞不需要存在于组织环境中并且可以作为单细胞(例如从非固定组织中分离的细胞)应用于阵列。这样的单细胞虽然不一定固定在组织中的某个位置,但仍应用于阵列上的某个位置并且可以单独鉴定。因此,在分析不直接相互作用或不存在于组织环境中的细胞的情况下,所述方法的空间特性可用于从单个细胞获得或检索独特或独立的空间转录组信息。本公开涉及一种鉴定样品中核酸或蛋白质的空间表达的方法,所述方法包括鉴定引物和/或样品中的内源性核酸之间的相互作用或结合事件。Obviously, tissue samples from any organism can be used in the methods of the present disclosure. The arrays of the present disclosure allow the capture of any nucleic acid, such as an mRNA molecule, present in the cells of a sample and capable of transcription and/or translation. The arrays and methods of the present disclosure are particularly useful for isolating and analyzing the transcriptome of cells in a sample where spatial resolution of the transcriptome is desirable, such as where cells are interconnected or in direct contact with multiple cells. However, it will be clear to those skilled in the art that the methods of the present disclosure can also be used to analyze the transcriptome of different cells or cell types within a sample, even if the cells do not interact directly, such as a blood sample. In other words, the cells need not be present in the tissue environment and can be applied to the array as single cells (eg, cells isolated from non-fixed tissue). Such single cells, while not necessarily anchored to a certain location in the tissue, still apply to a location on the array and can be individually identified. Thus, the spatial nature of the described method can be used to obtain or retrieve unique or independent spatial transcriptome information from individual cells in the case of analyzing cells that do not directly interact or are not present in the tissue environment. The present disclosure relates to a method of identifying the spatial expression of nucleic acids or proteins in a sample, the method comprising identifying interactions or binding events between primers and/or endogenous nucleic acids in the sample.

样品可以是收获的或活检的组织样品,或者可能是培养的样品。代表性样品包括临床样品,例如全血或血液衍生产品、血细胞、组织、活组织检查或培养的组织或细胞,包括细胞悬液。人造组织可以例如由细胞悬液(包括例如血细胞)制备。细胞可以被捕获在基质(例如凝胶基质,如琼脂、琼脂糖等)中,然后可以以常规方式切片。这类程序在本领域中在免疫组织化学的背景下是已知的(参见例如Andersson等人2006,J.Histochem.Cytochem.54(12):1413-23.Epub 2006年9月6日)。The sample may be a harvested or biopsied tissue sample, or possibly a cultured sample. Representative samples include clinical samples, such as whole blood or blood-derived products, blood cells, tissues, biopsies, or cultured tissues or cells, including cell suspensions. Artificial tissues can be prepared, for example, from cell suspensions including, for example, blood cells. Cells can be captured in a matrix (eg, a gel matrix, such as agar, agarose, etc.), which can then be sectioned in a conventional manner. Such procedures are known in the art in the context of immunohistochemistry (see eg Andersson et al. 2006, J. Histochem. Cytochem. 54(12):1413-23. Epub 2006 Sep 6).

组织制备的模式以及如何处理所得样品可能会影响本公开方法的转录组学分析。此外,各种组织样品将具有不同的物理特性,并且本领域技术人员完全可以进行必要的操作来产生用于本公开的方法的组织样品。然而,从本文的公开内容显而易见的是,任何样品制备方法都可用于获得适用于本公开的方法的组织样品。例如,在本公开的方法中可以使用具有大约1个细胞或更小的厚度的任何细胞层。在一些实施方案中,组织样品的厚度可以小于细胞横截面的约0.9、0.8、0.7、0.6、0.5、0.4、0.3、0.2或0.1。然而,由于如上所述,本公开不限于单细胞分辨率,因此不要求组织样品具有一个细胞直径或更小的厚度;如果需要,可以使用更厚的组织样品。例如,可以使用冰冻切片,其可以为约10至约50μm厚。在一些实施方案中,样品为约5μm厚。在一些实施方案中,样品为约10μm厚。在一些实施方案中,样品为约20μm厚。在一些实施方案中,样品为约30μm厚。在一些实施方案中,样品为约40μm厚。在一些实施方案中,样品为约50μm厚。在一些实施方案中,样品为约60μm厚。在一些实施方案中,样品为约70μm厚。在一些实施方案中,样品为约80μm厚。在一些实施方案中,样品为约90μm厚。在一些实施方案中,样品为约100μm厚。The mode of tissue preparation and how the resulting samples are processed may affect the transcriptomic analysis of the disclosed methods. Furthermore, various tissue samples will have different physical properties, and it is well within the skill of the art to perform the necessary manipulations to generate tissue samples for use in the methods of the present disclosure. However, it is apparent from the disclosure herein that any sample preparation method may be used to obtain a tissue sample suitable for use in the methods of the present disclosure. For example, any layer of cells having a thickness of about 1 cell or less can be used in the methods of the present disclosure. In some embodiments, the thickness of the tissue sample may be less than about 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, or 0.1 of the cell cross-section. However, since the present disclosure is not limited to single-cell resolution, as noted above, tissue samples are not required to have a thickness of one cell diameter or less; thicker tissue samples can be used if desired. For example, frozen sections can be used, which can be about 10 to about 50 μm thick. In some embodiments, the sample is about 5 μm thick. In some embodiments, the sample is about 10 μm thick. In some embodiments, the sample is about 20 μm thick. In some embodiments, the sample is about 30 μm thick. In some embodiments, the sample is about 40 μm thick. In some embodiments, the sample is about 50 μm thick. In some embodiments, the sample is about 60 μm thick. In some embodiments, the sample is about 70 μm thick. In some embodiments, the sample is about 80 μm thick. In some embodiments, the sample is about 90 μm thick. In some embodiments, the sample is about 100 μm thick.

可以以任何方便或期望的方式制备组织样品,并且本公开不限于任何特定类型的组织制备。可以使用新鲜、冷冻、固定或未固定的组织。如本领域所述和已知的,可以使用任何所需的方便程序来固定或包埋组织样品。因此,可以使用任何已知的固定剂或包埋材料。Tissue samples may be prepared in any convenient or desired manner, and the present disclosure is not limited to any particular type of tissue preparation. Fresh, frozen, fixed or unfixed tissue can be used. Tissue samples may be fixed or embedded using any desired convenient procedure, as described and known in the art. Thus, any known fixative or embedding material can be used.

在用于本公开的组织样品的一个代表性实例中,可以通过在适合维持或保持组织结构的完整性(即物理特性)的温度(例如低于约-20℃、-25℃、-30℃、-40℃、-50℃、-60℃、-70℃或-80℃)下深度冷冻来制备组织。可以通过任何合适的方式将冷冻的组织样品切片,即切成薄片,放到阵列表面上。例如,组织样品可以使用冷冻切片机、低温恒温器来制备,其设定在适合维持组织样品的结构完整性和样品中核酸的化学性质的温度,例如低于约-15℃、-20℃或-25℃。因此,应对样品进行处理以最大程度减少组织中的核酸(例如mRNA)的退化或降解。这样的条件在本领域中是公认的,并且可以通过核酸提取,例如,在组织样品制备的各个阶段进行总RNA提取和随后的质量分析来监测任何降解的程度。In a representative example of a tissue sample for use in the present disclosure, it can be heated at a temperature suitable to maintain or preserve the integrity (i.e., physical properties) of the tissue structure (e.g., below about -20°C, -25°C, -30°C , -40°C, -50°C, -60°C, -70°C, or -80°C) to prepare tissue by deep freezing. The frozen tissue sample can be sectioned, ie sliced, onto the array surface by any suitable means. For example, tissue samples can be prepared using a cryostat, cryostat set at a temperature suitable to maintain the structural integrity of the tissue sample and the chemical nature of the nucleic acids in the sample, e.g., below about -15°C, -20°C, or -25°C. Accordingly, samples should be processed to minimize degradation or degradation of nucleic acids (eg, mRNA) in the tissue. Such conditions are recognized in the art, and the extent of any degradation can be monitored by nucleic acid extraction, for example, total RNA extraction and subsequent mass analysis at various stages of tissue sample preparation.

在另一个代表性实例中,可以使用本领域公认的标准福尔马林固定和石蜡包埋(FFPE)方法制备组织。组织样品固定并包埋在石蜡或树脂块中后,可将组织样品切片,即切成薄片,放到阵列上。如上所述,可以使用其它固定剂和/或嵌入材料。In another representative example, tissue can be prepared using standard art-recognized formalin fixation and paraffin embedding (FFPE) methods. Once the tissue sample is fixed and embedded in a paraffin or resin block, the tissue sample can be sectioned, ie cut into thin slices, and placed on the array. As noted above, other fixatives and/or embedding materials may be used.

显然,在执行本公开的方法之前,组织样品切片将需要进行处理以从样品中去除包埋材料,例如脱蜡以去除石蜡或树脂。这可以通过任何合适的方法来实现,并且从组织样品中去除石蜡或树脂或其它材料在本领域中是公认的,例如通过将样品(在阵列的表面上)在合适的溶剂(例如二甲苯)中孵育,然后进行乙醇冲洗,例如约99.5%乙醇约2分钟、约96%乙醇约2分钟和约70%乙醇约2分钟。Obviously, tissue sample sections will need to be processed to remove embedding material from the sample, eg, dewaxed to remove paraffin or resin, prior to performing the methods of the present disclosure. This can be accomplished by any suitable method, and the removal of paraffin or resin or other materials from tissue samples is recognized in the art, for example by dissolving the sample (on the surface of the array) in a suitable solvent (e.g., xylene). Incubation in medium, followed by ethanol wash, for example about 2 minutes with about 99.5% ethanol, about 2 minutes with about 96% ethanol and about 2 minutes with about 70% ethanol.

用于本公开的方法的组织样品切片的厚度可取决于用于制备样品的方法和组织的物理特性。因此,在本公开的方法中可以使用任何合适的切片厚度。在一些实施方案中,组织样品切片的厚度可为至少约0.1μm、0.2μm、0.3μm、0.4μm、0.5μm、0.7μm、1.0μm、1.5μm、2μm、3μm、4μm、5μm、6μm、7μm、8μm、9μm或10μm。在其它实施方案中,组织样品切片的厚度为至少约10μm、11μm、12μm、13μm、14μm、15μm、20μm、25μm、30μm、35μm、40μm、45μm或50μm。然而,这些只是代表值。如果需要或方便,可以使用更厚的样品,例如约70μm或100μm或更厚。通常,组织样品切片的厚度为约1至约100μm、约1至约50μm、约1至约30μm、约1至约25μm、约1至约20μm,约1至约15μm、约1至约10μm、约2至约8μm、约3至约7μm或约4至约6μm,但如上所提到,可以使用更厚的样品。The thickness of tissue sample sections used in the methods of the present disclosure may depend on the method used to prepare the sample and the physical properties of the tissue. Accordingly, any suitable slice thickness may be used in the methods of the present disclosure. In some embodiments, the tissue sample section may have a thickness of at least about 0.1 μm, 0.2 μm, 0.3 μm, 0.4 μm, 0.5 μm, 0.7 μm, 1.0 μm, 1.5 μm, 2 μm, 3 μm, 4 μm, 5 μm, 6 μm, 7 μm , 8μm, 9μm or 10μm. In other embodiments, the tissue sample section has a thickness of at least about 10 μm, 11 μm, 12 μm, 13 μm, 14 μm, 15 μm, 20 μm, 25 μm, 30 μm, 35 μm, 40 μm, 45 μm, or 50 μm. However, these are representative values only. Thicker samples, such as about 70 μm or 100 μm or thicker, can be used if desired or convenient. Typically, tissue sample sections have a thickness of about 1 to about 100 μm, about 1 to about 50 μm, about 1 to about 30 μm, about 1 to about 25 μm, about 1 to about 20 μm, about 1 to about 15 μm, about 1 to about 10 μm, About 2 to about 8 μm, about 3 to about 7 μm, or about 4 to about 6 μm, but as mentioned above, thicker samples may be used.

为了将从阵列的每个微孔获得的序列分析或转录组信息与组织样品的区域(即区域或细胞)相关联,组织样品相对于阵列上的微孔定向。换句话说,组织样品被放置在阵列上,使得阵列上的空间索引引物的位置可以与组织样品中的位置相关联。因此,可以鉴定组织样品中每个种类的空间索引引物(或阵列的每个微孔)的位置所对应的位置。换句话说,可以鉴定每个种类的空间索引引物的位置对应于组织样品中的哪个位置。这可以通过阵列上存在的位置标记来完成,如下所述。方便地但不是必须地,组织样品可以在其与阵列接触之后进行成像。这可以在处理组织样品的核酸之前或之后进行,例如在该方法的cDNA产生步骤之前或之后,特别是在通过逆转录产生第一链cDNA的步骤之前或之后。在一些实施方案中,组织样品在逆转录步骤之前进行成像。在其它实施方案中,组织样品在组织样品的核酸已经被处理之后,例如在逆转录步骤之后进行成像。一般而言,成像可以在组织样品与阵列接触之后,但在任何降解或去除组织样品的步骤之前的任何时间进行。如上所述,这可能取决于组织样品。In order to correlate the sequence analysis or transcriptome information obtained from each well of the array with regions of the tissue sample (ie, regions or cells), the tissue sample is oriented relative to the wells on the array. In other words, the tissue samples are placed on the array such that the locations of the spatially indexed primers on the array can be correlated with locations in the tissue samples. Thus, the location corresponding to the location of each species of spatially indexed primer (or each microwell of the array) in the tissue sample can be identified. In other words, it is possible to identify to which position in the tissue sample the position of each species of spatially indexed primer corresponds. This can be done with position markers present on the array, as described below. Conveniently, but not necessarily, the tissue sample can be imaged after it has been contacted with the array. This may be done before or after processing the nucleic acid of the tissue sample, eg before or after the cDNA generation step of the method, in particular before or after the first strand cDNA generation step by reverse transcription. In some embodiments, the tissue sample is imaged prior to the reverse transcription step. In other embodiments, the tissue sample is imaged after the nucleic acid of the tissue sample has been processed, eg, after a reverse transcription step. In general, imaging can be performed at any time after contacting the tissue sample with the array, but before any steps to degrade or remove the tissue sample. As mentioned above, this may depend on the tissue sample.

有利地,阵列可以包含标记以促进组织样品或其图像相对于阵列微孔的定向。可以使用任何适用于标记阵列的方式,以使得在组织样品成像时可检测到它们。例如,产生信号,优选可见信号的分子,例如荧光分子,可以直接或间接地固定在阵列的表面上。因此,在一些实施方案中,阵列可以在阵列表面上的不同位置包含至少两个标记。在其它实施方案中,也可以使用超过两个标记,例如至少约3个、4个、5个、6个、7个、8个、9个、10个、12个、15个、20个、30个、40个、50个、60个、70个、80个、90个或100个标记。可以方便地使用数百甚至数千个标记。标记可以呈图案提供,例如构成阵列的外边缘,例如阵列的微孔的整个外排。可以使用其它信息图案,例如分割阵列的线。这可以有助于将组织样品的图像与阵列比对,或者实际上通常有助于将阵列的微孔与组织样品相关联。因此,标记可以是固定的分子,信号给予分子可以与之相互作用以产生信号。在一些实施方案中,可以使用用于使组织样品可视化的相同成像条件来检测标记。Advantageously, the array may contain markers to facilitate orientation of the tissue sample or image thereof relative to the microwells of the array. Any suitable means for labeling the array such that they are detectable when the tissue sample is imaged can be used. For example, molecules producing signals, preferably visible signals, such as fluorescent molecules, can be immobilized directly or indirectly on the surface of the array. Thus, in some embodiments, an array may comprise at least two labels at different locations on the array surface. In other embodiments, more than two markers can also be used, such as at least about 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 30, 40, 50, 60, 70, 80, 90 or 100 tokens. Hundreds or even thousands of markers can easily be used. Markings may be provided in a pattern, eg constituting the outer edge of the array, eg the entire outer row of microwells of the array. Other patterns of information may be used, such as lines dividing the array. This can be helpful in aligning images of the tissue sample to the array, or indeed in general correlating the microwells of the array with the tissue sample. Thus, the label can be an immobilized molecule with which the signaling molecule can interact to generate a signal. In some embodiments, the marker can be detected using the same imaging conditions used to visualize the tissue sample.

可以使用本领域已知的任何方便的组织学方式对组织样品进行成像,例如光、明场、暗场、相衬、荧光、反射、干涉、共焦显微术或它们的组合。通常,组织样品在可视化之前被染色以提供组织样品的不同区域(例如细胞)之间的对比。使用的染料类型取决于组织的类型和待染色的细胞区域。这种染色方案是本领域已知的。在一些实施方案中,可以使用不止一种染料将组织样品的不同方面,例如组织样品的不同区域、特定细胞结构(例如细胞器)或不同细胞类型可视化(成像)。在其它实施方案中,组织样品可以在样品不染色的情况下被可视化或成像,例如如果组织样品已经含有提供足够对比度的色素或者如果使用特定形式的显微镜。在一些实施方案中,使用荧光显微镜对组织样品进行可视化或成像。Tissue samples can be imaged using any convenient histological means known in the art, such as light, brightfield, darkfield, phase contrast, fluorescence, reflectance, interference, confocal microscopy, or combinations thereof. Typically, tissue samples are stained prior to visualization to provide contrast between different regions (eg, cells) of the tissue sample. The type of dye used depends on the type of tissue and the area of cells to be stained. Such staining protocols are known in the art. In some embodiments, more than one dye can be used to visualize (imaging) different aspects of a tissue sample, eg, different regions of the tissue sample, specific cellular structures (eg, organelles), or different cell types. In other embodiments, tissue samples can be visualized or imaged without staining the sample, for example if the tissue sample already contains pigments that provide sufficient contrast or if a particular form of microscopy is used. In some embodiments, the tissue sample is visualized or imaged using a fluorescence microscope.

在一些实施方案中,在使阵列与组织样品接触的步骤之后,使用垫片将组织样品密封到阵列上。垫片的使用进一步提供了足以使组织样品中的细胞落入阵列微孔中的力。根据阵列中微孔的尺寸,不同量的细胞将被迫进入每个单独的微孔。在一些实施方案中,阵列的每个单独的微孔包含约1至约100个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约90个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约80个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约70个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约60个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约50个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约40个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约30个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约20个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约10个细胞。在一些实施方案中,阵列的每个单独的微孔包含约1至约5个细胞。在一些实施方案中,阵列的每个单独的微孔包含约5至约10个细胞。In some embodiments, after the step of contacting the array with the tissue sample, a gasket is used to seal the tissue sample to the array. The use of spacers further provides a force sufficient to cause cells in the tissue sample to fall into the microwells of the array. Depending on the size of the microwells in the array, different amounts of cells will be forced into each individual microwell. In some embodiments, each individual well of the array contains from about 1 to about 100 cells. In some embodiments, each individual well of the array contains from about 1 to about 90 cells. In some embodiments, each individual well of the array contains from about 1 to about 80 cells. In some embodiments, each individual well of the array contains from about 1 to about 70 cells. In some embodiments, each individual well of the array contains from about 1 to about 60 cells. In some embodiments, each individual well of the array contains from about 1 to about 50 cells. In some embodiments, each individual well of the array contains from about 1 to about 40 cells. In some embodiments, each individual well of the array contains about 1 to about 30 cells. In some embodiments, each individual well of the array contains about 1 to about 20 cells. In some embodiments, each individual well of the array contains about 1 to about 10 cells. In some embodiments, each individual well of the array contains about 1 to about 5 cells. In some embodiments, each individual well of the array contains about 5 to about 10 cells.

在一些实施方案中,阵列的每个单独的微孔包含平均约50个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约40个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约30个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约20个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约15个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约10个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约9个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约8个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约7个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约6个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均约5个细胞。在一些实施方案中,阵列的每个单独的微孔包含平均少于约5个细胞。In some embodiments, each individual well of the array contains an average of about 50 cells. In some embodiments, each individual well of the array contains an average of about 40 cells. In some embodiments, each individual well of the array contains an average of about 30 cells. In some embodiments, each individual well of the array contains an average of about 20 cells. In some embodiments, each individual well of the array contains an average of about 15 cells. In some embodiments, each individual well of the array contains an average of about 10 cells. In some embodiments, each individual well of the array contains an average of about 9 cells. In some embodiments, each individual well of the array contains an average of about 8 cells. In some embodiments, each individual well of the array contains an average of about 7 cells. In some embodiments, each individual well of the array contains an average of about 6 cells. In some embodiments, each individual well of the array contains an average of about 5 cells. In some embodiments, each individual well of the array contains an average of less than about 5 cells.

在使阵列与组织样品接触并允许细胞落入微孔中的步骤之后,在足以允许组织样品的核酸(例如mRNA)与空间索引引物之间发生杂交的条件下,进行固定(获取)杂交核酸的步骤。固定或获取捕获的核酸涉及杂交核酸的互补链与空间索引引物的共价连接(即通过核苷酸键、两个紧邻的核苷酸的并列的3’-羟基与5’-磷酸端之间的磷酸二酯键),从而用上面捕获核酸的微孔所特有的空间条形码结构域标上或标记捕获的核酸。Following the step of contacting the array with the tissue sample and allowing the cells to fall into the microwells, immobilization (acquisition) of the hybridized nucleic acid is performed under conditions sufficient to allow hybridization between the nucleic acid (e.g., mRNA) of the tissue sample and the spatially indexed primers. step. Immobilization or capture of captured nucleic acids involves covalent attachment (i.e., via nucleotide bonds, between the juxtaposed 3'-hydroxyl and 5'-phosphate ends of two immediately adjacent nucleotides) of the complementary strand of the hybridized nucleic acid to a spatially indexed primer. phosphodiester bonds), thereby labeling or labeling the captured nucleic acid with a spatial barcode domain specific to the microwell on which the nucleic acid is captured.

在一些实施方案中,固定杂交核酸、例如单链核酸可以涉及延伸空间索引引物以产生捕获的核酸的拷贝,例如从捕获的(杂交的)RNA产生cDNA。应当理解,这是指杂交核酸的互补链的合成,例如基于捕获的RNA模板(与空间索引引物的捕获结构域杂交的RNA)产生cDNA。因此,在延伸空间索引引物的初始步骤,即cDNA产生中,捕获的(杂交的)核酸,例如RNA,在逆转录步骤中充当延伸的模板。In some embodiments, immobilizing a hybridizing nucleic acid, eg, a single-stranded nucleic acid, may involve extending a spatially indexed primer to generate a copy of the captured nucleic acid, eg, cDNA from captured (hybridized) RNA. It is understood that this refers to the synthesis of complementary strands of hybridized nucleic acids, for example the generation of cDNA based on a captured RNA template (RNA hybridized to the capture domain of a spatially indexed primer). Thus, in the initial step of extending the spatially indexed primer, ie cDNA production, the captured (hybridized) nucleic acid, eg RNA, acts as a template for extension in the reverse transcription step.

逆转录涉及通过逆转录酶从RNA,优选mRNA(信使RNA)合成cDNA的步骤。因此,可以认为cDNA是在采集组织样品时存在于细胞中的RNA的拷贝,即它代表了在分离时在该细胞中表达的全部或一些基因。Reverse transcription involves the step of synthesizing cDNA from RNA, preferably mRNA (messenger RNA), by the enzyme reverse transcriptase. Thus, cDNA can be considered to be a copy of the RNA that was present in the cell at the time the tissue sample was taken, ie it represents all or some of the genes expressed in that cell at the time of isolation.

空间索引引物,特别是空间索引引物的捕获结构域,充当用于产生与空间索引引物杂交的核酸的互补链的引物,例如,用于逆转录的引物。因此,延伸反应(逆转录反应)产生的核酸(例如cDNA)分子中包含空间索引引物的序列,即延伸反应(逆转录反应)可以看作是一种间接标记与阵列的每个微孔接触的组织样品的核酸、例如转录物的方式。如上所提到,每个种类的空间索引引物都包含一个空间条形码结构域(微孔鉴定标签),它代表阵列中每个微孔的独特序列。因此,在特定微孔中合成的所有核酸(例如cDNA)分子将包含相同的核酸“标签”。The spatially indexed primer, in particular the capture domain of the spatially indexed primer, acts as a primer for generating a complementary strand of nucleic acid to which the spatially indexed primer hybridizes, eg, as a primer for reverse transcription. Therefore, the nucleic acid (e.g., cDNA) molecule produced by the extension reaction (reverse transcription reaction) contains the sequence of the spatially indexed primer, that is, the extension reaction (reverse transcription reaction) can be regarded as a kind of indirect labeling. Nucleic acids, such as transcripts, of a tissue sample. As mentioned above, each type of spatially indexed primer contains a spatial barcode domain (well identification tag) that represents a unique sequence for each well in the array. Thus, all nucleic acid (eg cDNA) molecules synthesized in a particular microwell will contain the same nucleic acid "tag".

在阵列的每个微孔处合成的cDNA分子可代表从与该微孔接触的组织样品的区或区域,例如组织或细胞类型或其组或亚组表达的基因,并且可进一步代表在特定条件下,例如在特定时间、特定环境、发育阶段或响应刺激等表达的基因。因此,任何单个微孔中的cDNA可代表单细胞中表达的基因,或者如果微孔在细胞连接处与样品接触,则cDNA可代表在超过一个细胞中表达的基因。类似地,如果单细胞与多个微孔接触,则每个微孔可代表该细胞中表达的基因的一部分。The cDNA molecules synthesized at each microwell of the array may represent genes expressed from a region or region of a tissue sample in contact with that microwell, such as a tissue or cell type or group or subgroup thereof, and may further represent genes expressed under specific conditions. Genes that are expressed under certain circumstances, such as at a particular time, in a particular environment, in a developmental stage, or in response to a stimulus. Thus, the cDNA in any single microwell may represent a gene expressed in a single cell, or if the microwell is in contact with the sample at a cell junction, the cDNA may represent a gene expressed in more than one cell. Similarly, if a single cell is in contact with multiple microwells, each microwell may represent a fraction of the genes expressed in that cell.

延伸空间索引引物,即逆转录的步骤,可以使用本领域中存在的许多合适的酶和方案来进行,如下文详细描述。然而,很明显没有必要为第一条cDNA链的合成提供引物,因为空间索引引物的捕获结构域充当逆转录的引物。Extending the spatially indexed primer, the step of reverse transcription, can be performed using any number of suitable enzymes and protocols available in the art, as described in detail below. However, it is clearly not necessary to provide primers for first-strand cDNA synthesis, since the capture domain of the spatially indexed primer acts as a primer for reverse transcription.

在合成第一条cDNA链后,使用本领域已知的任何方法,例如离心,将阵列中的细胞汇集。然而,离心力或用于收集细胞的任何其它方法应确保每个细胞的完整性。然后将如此收集的细胞分选到如本文别处所述的一个或多个多孔板中用于二次标记。通常,将超过一个细胞分选到多孔板的单个孔中。在一些实施方案中,至少约两个细胞被分选到同一个孔中。在其它实施方案中,超过两个细胞,例如至少约3个、4个、5个、6个、7个、8个、9个、10个、12个、15个、20个、30个、40个、50个、60个、70个、80个、90个或100个细胞被分选到同一个孔中。在一些实施方案中,多孔板的每个孔含有约2至约100个、约5至约80个、约10至约60个或约25至约50个细胞。在一些实施方案中,多孔板的每个孔单独含有约5个细胞。在一些实施方案中,多孔板的每个孔单独含有约10个细胞。在一些实施方案中,多孔板的每个孔单独含有约15个细胞。在一些实施方案中,多孔板的每个孔单独含有约20个细胞。在一些实施方案中,多孔板的每个孔单独含有约25个细胞。在一些实施方案中,多孔板的每个孔单独含有约30个细胞。在一些实施方案中,多孔板的每个孔单独含有约35个细胞。在一些实施方案中,多孔板的每个孔单独含有约40个细胞。在一些实施方案中,多孔板的每个孔单独含有约45个细胞。在一些实施方案中,多孔板的每个孔单独含有约50个细胞。然而,多孔板的每个孔中所含的细胞数量不必相同。如上所述,多孔板的每个孔都包含具有细胞条形码结构域的特定细胞索引引物,其用该孔独有的序列标记位于同一孔中的细胞。After the first strand cDNA is synthesized, the cells in the array are pooled using any method known in the art, such as centrifugation. However, centrifugal force or any other method used to collect cells should ensure the integrity of each cell. Cells so collected are then sorted into one or more multiwell plates as described elsewhere herein for secondary labeling. Typically, more than one cell is sorted into a single well of a multiwell plate. In some embodiments, at least about two cells are sorted into the same well. In other embodiments, more than two cells, such as at least about 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, 30, 40, 50, 60, 70, 80, 90 or 100 cells were sorted into the same well. In some embodiments, each well of a multiwell plate contains about 2 to about 100, about 5 to about 80, about 10 to about 60, or about 25 to about 50 cells. In some embodiments, each well of the multiwell plate contains about 5 cells individually. In some embodiments, each well of the multiwell plate contains about 10 cells individually. In some embodiments, each well of the multiwell plate contains about 15 cells individually. In some embodiments, each well of the multiwell plate contains about 20 cells individually. In some embodiments, each well of the multiwell plate individually contains about 25 cells. In some embodiments, each well of the multiwell plate contains about 30 cells individually. In some embodiments, each well of the multiwell plate contains about 35 cells individually. In some embodiments, each well of the multiwell plate individually contains about 40 cells. In some embodiments, each well of the multiwell plate individually contains about 45 cells. In some embodiments, each well of the multiwell plate individually contains about 50 cells. However, each well of a multiwell plate need not contain the same number of cells. As described above, each well of a multi-well plate contains a specific cell indexing primer with a cell barcode domain that tags cells located in the same well with a sequence unique to that well.

可以通过本领域已知的任何方法,例如FACS(荧光激活细胞分选)和MACS(磁激活细胞分选)将细胞分选到一个或多个多孔板中。也可以使用FACS和MACS以外的方法。在一些实施方案中,细胞使用FACS分选。在其它实施方案中,细胞使用MACS分选。Cells can be sorted into one or more multiwell plates by any method known in the art, such as FACS (fluorescence activated cell sorting) and MACS (magnetic activated cell sorting). Methods other than FACS and MACS can also be used. In some embodiments, cells are sorted using FACS. In other embodiments, cells are sorted using MACS.

一旦细胞被分选到多孔板中,本公开的方法包括第二链cDNA合成的步骤。在一些实施方案中,cDNA合成在板上原位进行。在一些实施方案中,第二链cDNA合成可以使用模板转换的方法,例如使用来自

Figure BDA0003908335950000552

的SMARTTM技术。SMART(RNA模板5’末端的转换机制)技术在本领域中是公认的,并且基于这样的发现,即如

Figure BDA0003908335950000551

II(Invitrogen)等逆转录酶能够在延伸的cDNA分子的3’端添加一个、两个、三个或更多个核苷酸,即产生在3’末端具有单链DNA突出物的DNA/RNA杂交体。在一些实施方案中,突出物的长度为1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个或更多个核苷酸。DNA突出物可提供靶序列,寡核苷酸探针可与该靶序列杂交以提供用于cDNA分子的进一步延伸和/或扩增的额外模板。有利地,与cDNA突出物杂交的寡核苷酸探针含有扩增结构域序列,其互补序列可以在细胞索引引物中找到。这样,可以使用细胞索引引物进一步扩增和富集所得的cDNA分子,同时用第二个独特的孔特异性条形码(即细胞条形码)标记。这种方法避免了将接头连接到cDNA第一链的3’末端的需要。虽然模板转换最初是为具有5’帽结构的全长mRNA开发的,但后来证明它同样适用于没有帽结构的截短mRNA。因此,可以在本公开的方法中使用模板切换来产生全长和/或部分或截短的cDNA分子。因此,在一些实施方案中,第二链合成可利用模板转换或通过模板转换实现。Once the cells are sorted into multi-well plates, the methods of the present disclosure include the step of second strand cDNA synthesis. In some embodiments, cDNA synthesis is performed in situ on the plate. In some embodiments, second-strand cDNA synthesis can use template switching methods, for example using methods from

Figure BDA0003908335950000552

SMART TM technology. The SMART (Switch Mechanism at the 5' End of RNA Template) technique is recognized in the art and is based on the discovery that, as

Figure BDA0003908335950000551

Reverse transcriptases such as II (Invitrogen) are capable of adding one, two, three or more nucleotides to the 3' end of an extended cDNA molecule, i.e. producing a DNA/RNA with a single-stranded DNA overhang at the 3' end hybrid. In some embodiments, the protrusions are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more in length Nucleotides. The DNA overhangs can provide target sequences to which oligonucleotide probes can hybridize to provide additional templates for further extension and/or amplification of the cDNA molecules. Advantageously, the oligonucleotide probes that hybridize to the cDNA overhangs contain amplification domain sequences, the complement of which can be found in the cell indexing primers. In this way, the resulting cDNA molecules can be further amplified and enriched using cell-indexing primers while being tagged with a second unique well-specific barcode (i.e., cell barcode). This approach avoids the need to ligate adapters to the 3' end of the first strand of cDNA. Although template switching was originally developed for full-length mRNAs with a 5' cap, it was later shown to work equally well for truncated mRNAs without a cap. Accordingly, template switching can be used in the methods of the present disclosure to generate full-length and/or partial or truncated cDNA molecules. Thus, in some embodiments, second strand synthesis can utilize or be achieved by template switching.

逆转录后,使用细胞索引引物增强、富集和/或扩增cDNA分子。如上所论述,每个细胞索引引物都包含细胞条形码结构域,该结构域包含多孔板的每个孔独有的核苷酸序列。因此,位于板的一个特定孔中的所有cDNA都用与独特细胞条形码结构域对应的相同核苷酸序列进行标记。进行这种PCR扩增的条件是本领域众所周知的。After reverse transcription, the cDNA molecules are enhanced, enriched and/or amplified using cell indexing primers. As discussed above, each cell index primer contains a cell barcode domain comprising a nucleotide sequence unique to each well of the multiwell plate. Thus, all cDNA located in a particular well of the plate are tagged with the same nucleotide sequence corresponding to a unique cellular barcode domain. Conditions for performing such PCR amplification are well known in the art.

从以上描述中将显而易见的是,已经通过本公开的方法合成的来自单个阵列的cDNA分子可包含被第一测序引物识别的相同退火结构域和被第二测序引物识别的相同退火结构域。因此,可以使用本领域已知的任何测序平台,例如任何下一代测序技术,对cDNA分子进行大规模量化和分析。因此,在一些实施方案中,使用Illumina测序对cDNA分子进行量化和分析,首先通过标签化生成Illumina测序兼容库,然后进行PCR扩增。可扩增片段将优选含有在cDNA制备期间添加的条形码结构域(即空间条形码结构域和细胞条形码结构域)。It will be apparent from the above description that cDNA molecules from a single array that have been synthesized by the methods of the present disclosure may contain the same annealing domain recognized by the first sequencing primer and the same annealing domain recognized by the second sequencing primer. Accordingly, large-scale quantification and analysis of cDNA molecules can be performed using any sequencing platform known in the art, such as any next-generation sequencing technology. Thus, in some embodiments, cDNA molecules are quantified and analyzed using Illumina sequencing, first by tagging to generate an Illumina sequencing-compatible library, followed by PCR amplification. The amplifiable fragment will preferably contain a barcode domain (ie a spatial barcode domain and a cellular barcode domain) that is added during cDNA preparation.

序列分析步骤将鉴定或揭露捕获的RNA序列的一部分以及两个条形码结构域(即空间条形码结构域和细胞条形码结构域)的序列。空间条形码结构域的序列将鉴定捕获mRNA分子的微孔。可以将捕获的RNA分子的序列与样品源自的生物体的序列数据库进行比较,以确定它对应的基因。通过确定组织样品的哪个区域与微孔接触,可确定组织样品的哪个区域正在表达所述基因。由于与给定微孔接触的组织样品的给定区域可能含有超过一个细胞,因此细胞条形码结构域的序列将允许在细胞水平上区分捕获的具有相同空间条形码结构域的RNA分子。该分析可以针对由本公开的方法产生的所有cDNA分子实现,从而以单细胞方式产生组织样品的空间转录组。The sequence analysis step will identify or reveal a portion of the captured RNA sequence as well as the sequences of the two barcode domains, the spatial barcode domain and the cellular barcode domain. The sequence of the spatial barcode domain will identify the microwell that captures the mRNA molecule. The sequence of the captured RNA molecule can be compared to a sequence database of the organism from which the sample was derived to determine its corresponding gene. By determining which region of the tissue sample is in contact with the microwell, it can be determined which region of the tissue sample is expressing the gene. Since a given region of a tissue sample in contact with a given microwell may contain more than one cell, the sequence of the cellular barcode domain will allow differentiation at the cellular level of captured RNA molecules with the same spatial barcode domain. This analysis can be accomplished for all cDNA molecules produced by the methods of the present disclosure, thereby generating a spatial transcriptome of a tissue sample in a single-cell fashion.

作为一个代表性的实例,可以分析测序数据以将序列分类为特定种类的空间索引引物,即根据空间条形码结构域的序列。这可以通过使用例如FastX工具包FASTQ条形码拆分器工具将序列分类到相应的空间索引引物的空间条形码结构域的各个文件中来实现。可以分析每个种类的序列,即来自每个微孔的序列,以确定转录物的身份。例如,可以使用Blastn软件鉴定序列,以将序列与一个或多个基因组数据库,例如获得组织样品的生物体的数据库进行比较。与通过本公开的方法产生的序列具有最大相似性的数据库序列的身份将被分配给该序列。通常,只有确定性至少为约1e-6、约1e-7、约1e-8或约1e-9的命中才会被认为已成功鉴定。As a representative example, sequencing data can be analyzed to classify sequences into specific classes of spatially indexed primers, ie, sequences according to spatially barcoded domains. This can be achieved by sorting the sequences into individual files for the spatial barcode domains of the corresponding spatially indexed primers using, for example, the FastX toolkit FASTQ barcode splitter tool. Sequences from each species, ie, from each microwell, can be analyzed to determine the identity of the transcripts. For example, the sequence can be identified using Blastn software to compare the sequence to one or more genomic databases, eg, of the organism from which the tissue sample was obtained. The identity of the database sequence having the greatest similarity to a sequence generated by the methods of the present disclosure will be assigned to that sequence. Typically, only hits with a certainty of at least about 1e-6, about 1e-7, about 1e-8, or about 1e-9 are considered to have been successfully identified.

显然,任何核酸测序方法都可以用于本公开的方法中。然而,所谓的“下一代测序”技术将尤其适用于本公开中。高通量测序在本公开的方法中特别有用,因为它能够在非常短的时间内对大量核酸进行部分测序。鉴于近来完全或部分测序的基因组数量激增,要确定每个分子对应的基因,不必对产生的cDNA分子的全长进行测序。举例来说,来自cDNA分子每一末端的前约100个核苷酸应该足以鉴定出在细胞水平上捕获mRNA的微孔(即其在阵列上的位置)和表达的基因。Obviously, any nucleic acid sequencing method can be used in the methods of the present disclosure. However, so-called "next generation sequencing" technologies will find particular application in the present disclosure. High-throughput sequencing is particularly useful in the methods of the present disclosure because it enables partial sequencing of large numbers of nucleic acids in a very short period of time. Given the recent explosion in the number of fully or partially sequenced genomes, it is not necessary to sequence the full length of the resulting cDNA molecules to determine the gene to which each molecule corresponds. For example, the first approximately 100 nucleotides from each end of a cDNA molecule should be sufficient to identify the microwell (ie, its position on the array) that captures mRNA at the cellular level and the expressed gene.

作为一个代表性的实例,测序反应可以基于可逆染料终止剂,例如在IlluminaTM技术中使用的。举例来说,首先将DNA分子连接到例如玻璃或硅载玻片上的引物上并扩增,从而形成局部克隆集落(桥式扩增)。添加了四种类型的ddNTP,未掺入的核苷酸被洗掉。与焦磷酸测序不同,DNA一次只能延伸一个核苷酸。照相机拍摄荧光标记的核苷酸的图像,然后通过化学方式将染料与末端3’阻断剂一起从DNA中去除,从而进行下一个循环。这可以重复,直到获得所需的序列数据。使用这项技术,可以在单张载玻片上同时对数千个核酸进行测序。As a representative example, sequencing reactions can be based on reversible dye terminators, such as those used in Illumina™ technology. For example, DNA molecules are first attached to primers on, for example, glass or silicon slides and amplified to form local clonal colonies (bridge amplification). Four types of ddNTPs are added and unincorporated nucleotides are washed away. Unlike pyrosequencing, DNA can only be extended one nucleotide at a time. A camera takes images of the fluorescently labeled nucleotides, and the dye is then chemically removed from the DNA, along with a terminal 3' blocker, for the next cycle. This can be repeated until the desired sequence data is obtained. Using this technology, thousands of nucleic acids can be sequenced simultaneously on a single slide.

其它高通量测序技术同样适用于本公开的方法,例如焦磷酸测序。在这种方法中,DNA在油溶液中的水滴内进行扩增(乳液PCR),每个液滴都含有单个DNA模板,该模板附着在单个涂有引物的珠粒上,然后形成一个克隆集落。测序仪含有许多皮升体积的孔,每个孔都含有单个珠粒和测序酶。焦磷酸测序使用荧光素酶产生光以检测添加到新生DNA中的单个核苷酸,并且组合数据用于产生序列读数。Other high-throughput sequencing techniques are equally applicable to the methods of the present disclosure, such as pyrosequencing. In this method, DNA is amplified within water droplets in an oil solution (emulsion PCR), each droplet containing a single DNA template attached to a single primer-coated bead, which then forms a clonal colony . Sequencers contain many picoliter wells, each containing a single bead and sequencing enzyme. Pyrosequencing uses luciferase to generate light to detect individual nucleotides added to nascent DNA, and the combined data is used to generate sequence readouts.

显然,未来的测序格式正在慢慢地可用,并且更短的运行时间是那些平台的主要特征之一,很明显,其它测序技术可用于本公开的方法中。Clearly, future sequencing formats are slowly becoming available, and with shorter run times being one of the main features of those platforms, it is clear that other sequencing technologies can be used in the methods of the present disclosure.

如上所述,本公开的基本特征是本文公开的任何方法,其包括通过例如逆转录捕获的RNA分子将捕获的RNA分子的互补链固定到空间索引引物的步骤。逆转录反应是本领域众所周知的,并且在代表性逆转录反应中,反应混合物包括逆转录酶、dNTP和合适的缓冲液。反应混合物可以包含其它组分,例如RNase抑制剂。引物和模板是空间索引引物的捕获结构域,捕获的RNA分子如上所述。在主题方法中,每种dNTP通常以约10至约5000μM,通常约20至约1000μM范围内的量存在。As stated above, an essential feature of the present disclosure is any method disclosed herein comprising the step of immobilizing the complementary strand of the captured RNA molecule to a spatially indexed primer, eg by reverse transcription of the captured RNA molecule. Reverse transcription reactions are well known in the art, and in a representative reverse transcription reaction, the reaction mixture includes reverse transcriptase, dNTPs, and a suitable buffer. The reaction mixture may contain other components, such as RNase inhibitors. Primers and templates are the capture domains of the spatially indexed primers, and the captured RNA molecules are as described above. In the subject methods, each dNTP is typically present in an amount ranging from about 10 to about 5000 μM, typically from about 20 to about 1000 μM.

所需的逆转录酶活性可以由一种或多种不同的酶提供,其中合适的实例为M-MLV、MuLV、AMV、HIV、ArrayScriptTM、MultiScribeTM、ThermoScriptTM和

Figure BDA0003908335950000581

I、II和III酶。The required reverse transcriptase activity may be provided by one or more different enzymes, suitable examples of which are M-MLV, MuLV, AMV, HIV, ArrayScriptTM, MultiScribeTM, ThermoScriptTM and

Figure BDA0003908335950000581

I, II and III enzymes.

逆转录酶反应可以在任何合适的温度下进行,这取决于酶的性质。通常,逆转录酶反应在约37至约55℃之间进行,尽管超出此范围的温度也可能是合适的。反应时间可以短至约1分钟、2分钟、3分钟、4分钟或5分钟或长达约48小时。通常,根据选择,反应将进行约5至约120分钟,例如约5至约60分钟、约5至约45分钟、约5至约30分钟、约1至约10分钟或约1至约5分钟。反应时间不是关键的,可以使用任何所需的反应时间。The reverse transcriptase reaction can be performed at any suitable temperature, depending on the nature of the enzyme. Typically, the reverse transcriptase reaction is performed at between about 37 and about 55°C, although temperatures outside this range may also be suitable. The reaction time can be as short as about 1 minute, 2 minutes, 3 minutes, 4 minutes or 5 minutes or as long as about 48 hours. Typically, depending on the choice, the reaction will be carried out for about 5 to about 120 minutes, such as about 5 to about 60 minutes, about 5 to about 45 minutes, about 5 to about 30 minutes, about 1 to about 10 minutes, or about 1 to about 5 minutes . The reaction time is not critical and any desired reaction time can be used.

如上所指示,所述方法的某些实施方案包括扩增步骤,其中增加产生的cDNA分子的拷贝数,例如以富集样品,从而获得从组织样品中捕获的转录物的更好表示。根据需要,扩增可以是线性的或指数的,其中代表性的感兴趣的扩增方案包括但不限于聚合酶链反应(PCR)和等温扩增等。As indicated above, certain embodiments of the methods include an amplification step in which the copy number of cDNA molecules produced is increased, eg, to enrich the sample for a better representation of transcripts captured from the tissue sample. Amplification can be linear or exponential, as desired, where representative amplification protocols of interest include, but are not limited to, polymerase chain reaction (PCR), isothermal amplification, and the like.

在制备主题方法的步骤的逆转录酶、DNA延伸或扩增反应混合物中,各种组成成分可以以任何方便的顺序组合。举例来说,在扩增反应中,可以将缓冲液与引物、聚合酶、然后模板DNA组合,或者可以将各种组成成分全部同时组合而产生反应混合物。In preparing the reverse transcriptase, DNA extension or amplification reaction mixture of the steps of the subject methods, the various components may be combined in any convenient order. For example, in an amplification reaction, a buffer can be combined with a primer, a polymerase, and then template DNA, or the various components can be combined all at the same time to create a reaction mixture.

举一代表性实例,本公开的任何方法可以包括以下步骤:As a representative example, any method of the present disclosure may include the steps of:

(a)使阵列与组织样品接触,其中所述阵列包含基底,多个种类的空间索引引物直接在所述基底上,使得每个种类占据阵列上的不同位置并且定向为具有游离的3’末端,其中所述空间索引引物的每个种类包含从5’至3’包含以下的核酸分子:(a) contacting an array with a tissue sample, wherein the array comprises a substrate on which a plurality of species of spatially indexed primers are directly placed such that each species occupies a different position on the array and is oriented to have a free 3' end , wherein each species of the spatially indexed primer comprises from 5' to 3' a nucleic acid molecule comprising:

i)退火结构域,其包含被第一测序引物识别的核苷酸序列;i) an annealing domain comprising a nucleotide sequence recognized by a first sequencing primer;

ii)空间条形码结构域,其包含每个微孔独有的核苷酸序列;以及ii) a spatial barcode domain comprising a nucleotide sequence unique to each microwell; and

iii)捕获结构域,其包含聚胸苷序列;iii) a capture domain comprising a polythymidine sequence;

使得组织样品的一个或多个核酸序列与所述空间索引引物杂交;hybridizing one or more nucleic acid sequences of the tissue sample to the spatially indexed primers;

(b)对阵列上的组织样品进行成像;(b) imaging the tissue sample on the array;

(c)逆转录捕获的mRNA分子以产生cDNA分子;(c) reverse transcribing the captured mRNA molecule to generate a cDNA molecule;

(d)将细胞从阵列中汇集并分选到一个或多个96孔板中;(d) pooling and sorting the cells from the array into one or more 96-well plates;

(e)使细胞溶解并进行第二链cDNA合成以通过模板转换掺入5-PCR柄;(e) cells are lysed and second-strand cDNA synthesis is performed to incorporate the 5-PCR handle by template switching;

(f)扩增cDNA分子以将细胞索引引物掺入每个cDNA分子中,每个细胞索引引物包含从5’至3’包含以下的核酸分子:(f) amplifying the cDNA molecules to incorporate a cell index primer into each cDNA molecule, each cell index primer comprising from 5' to 3' a nucleic acid molecule comprising:

i)退火结构域,其包含被第二测序引物识别的核苷酸序列;以及i) an annealing domain comprising a nucleotide sequence recognized by a second sequencing primer; and

ii)细胞条形码结构域,其包含96孔板的每个孔独有的核苷酸序列;ii) a cellular barcode domain comprising a nucleotide sequence unique to each well of a 96-well plate;

以及as well as

(g)分析cDNA分子的序列和/或位置(例如,测序)。(g) Analyzing the sequence and/or position of the cDNA molecules (eg, sequencing).

本公开包括上述方法中的步骤的任何合适的组合。应当理解,本公开还包括这些方法的变体,例如,在板上原位进行扩增的情况。还涵盖省略成像步骤的方法。The present disclosure includes any suitable combination of steps in the methods described above. It should be understood that the present disclosure also encompasses variations of these methods, for example, where amplification is performed in situ on a plate. Methods that omit the imaging step are also contemplated.

本公开还涉及一种从与所述阵列接触的组织样品中捕获mRNA的方法;或一种确定和/或分析组织样品的(例如,部分或全部)转录组的方法,所述方法包括将多个种类的空间索引引物固定到阵列基底上,其中所述空间索引引物的每个种类包含从5’到3’如下的核酸分子:The present disclosure also relates to a method of capturing mRNA from a tissue sample contacted with the array; or a method of determining and/or analyzing the (eg, partial or complete) transcriptome of a tissue sample, the method comprising combining multiple The three types of spatially indexed primers are immobilized on the array substrate, wherein each type of the spatially indexed primers comprises from 5' to 3' the following nucleic acid molecules:

i)退火结构域,其包含被第一测序引物识别的核苷酸序列;i) an annealing domain comprising a nucleotide sequence recognized by a first sequencing primer;

ii)空间条形码结构域,其包含每个微孔独有的核苷酸序列;以及ii) a spatial barcode domain comprising a nucleotide sequence unique to each microwell; and

iii)捕获结构域,其包含聚胸苷序列。iii) A capture domain comprising a polythymidine sequence.

在一些实施方案中,本公开涉及一种产生本公开的阵列的方法,使得每个种类的空间索引引物作为微孔固定在阵列上。在一些实施方案中,本公开涉及一种产生阵列的方法,所述方法包括:将多个种类的空间索引引物固定到阵列基底上,其中所述空间索引引物的每个种类包含从5’到3’如下的核酸分子:In some embodiments, the present disclosure relates to a method of producing an array of the present disclosure such that each species of spatially indexed primer is immobilized on the array as microwells. In some embodiments, the present disclosure relates to a method of producing an array, the method comprising: immobilizing a plurality of species of spatially indexed primers onto an array substrate, wherein each species of the spatially indexed primers comprises from 5' to 3' Nucleic acid molecules as follows:

i)退火结构域,其包含被第一测序引物识别的核苷酸序列;i) an annealing domain comprising a nucleotide sequence recognized by a first sequencing primer;

ii)空间条形码结构域,其包含每个微孔独有的核苷酸序列;以及ii) a spatial barcode domain comprising a nucleotide sequence unique to each microwell; and

iii)捕获结构域,其包含聚胸苷序列。iii) A capture domain comprising a polythymidine sequence.

本公开还可涉及用于制造或产生用于测定和/或分析分析组织样品的(例如,部分或全部)转录组的多孔板的方法,所述方法包括将多个种类的细胞索引引物直接或间接固定到多孔板基底,其中所述细胞索引引物的每个种类包含从5’至3’包含以下的核酸分子:The present disclosure may also relate to methods for manufacturing or generating multi-well plates for assaying and/or analyzing (e.g., partial or complete) transcriptomes of tissue samples, the methods comprising direct or Indirectly immobilized to a multiwell plate substrate, wherein each species of the cell indexing primers comprises from 5' to 3' a nucleic acid molecule comprising:

i)退火结构域,其包含被第二测序引物识别的核苷酸序列;以及i) an annealing domain comprising a nucleotide sequence recognized by a second sequencing primer; and

ii)细胞条形码结构域,其包含所述多孔板的每个孔独有的核苷酸序列。ii) a cellular barcode domain comprising a nucleotide sequence unique to each well of the multiwell plate.

可以进一步定义产生本公开的多孔板的方法,使得每个种类的细胞索引引物作为孔固定在板上。The method of producing the multiwell plate of the present disclosure can be further defined such that each kind of cell index primer is immobilized on the plate as a well.

将空间索引引物固定在阵列上或将细胞索引引物固定在板上的方法可以使用本文所述的任何合适的方式来实现。在空间索引引物或细胞索引引物分别间接固定在阵列或板上的情况下,它们可以在阵列或板上合成。举例来说,空间索引引物或细胞索引引物可以分别使用自动分配系统(例如Scienion sciFLEXARRAYER S3打印机)直接在阵列或板上合成。Immobilizing spatially indexed primers on the array or immobilizing cellularly indexed primers on the plate can be accomplished using any suitable means described herein. Where spatially indexed primers or cellularly indexed primers are immobilized indirectly on the array or plate, respectively, they can be synthesized on the array or plate. For example, spatially or cellularly indexed primers can be synthesized directly on the array or on the plate, respectively, using an automated dispensing system (eg, Scienion sciFLEXARRAYER S3 printer).

在步骤(g)中获得的序列分析(例如,测序)信息可用于在细胞水平获得关于样品中核酸的空间信息。换句话说,序列分析信息可以以单细胞方式提供关于核酸在组织样品中的位置的信息。该空间信息可以源自所获得的序列分析信息的性质,例如源自确定或鉴定的序列,例如它可以揭露特定核酸分子的存在,该特定核酸分子本身在使用的组织样品环境中可提供空间信息,和/或空间信息(例如空间定位)可以源自阵列上的组织样品的位置,以及序列分析信息。然而,如上所述,可以通过将序列分析数据与组织样品的图像相关联来方便地获得空间信息。The sequence analysis (eg, sequencing) information obtained in step (g) can be used to obtain spatial information about the nucleic acids in the sample at the cellular level. In other words, sequence analysis information can provide information about the location of nucleic acids in a tissue sample in a single-cell fashion. This spatial information may be derived from the nature of the sequence analysis information obtained, e.g. from a determined or identified sequence, e.g. it may reveal the presence of specific nucleic acid molecules which themselves provide spatial information in the context of the tissue sample used , and/or spatial information (eg, spatial location) can be derived from the location of the tissue sample on the array, as well as sequence analysis information. However, as mentioned above, spatial information can be conveniently obtained by correlating sequence analysis data with images of tissue samples.

因此,在一些实施方案中,本公开的方法包括以下步骤:Accordingly, in some embodiments, the methods of the present disclosure comprise the steps of:

(h)将所述序列分析信息与所述组织样品的图像相关联,其中组织样品在步骤(b)之前或之后成像。(h) correlating said sequence analysis information with an image of said tissue sample, wherein the tissue sample was imaged before or after step (b).

在一些实施方案中,本公开的方法可用于以单细胞分辨率进行染色质测序,即ATAC-seq(转座酶可接近的染色质序列测定)。为此,使用相同的微孔阵列,但不是在微孔中打印oligo-dT,而是使用条形码化的转座酶(TN5),它将标记开放的染色质并允许生成ATAC-seq库。In some embodiments, the methods of the present disclosure can be used for chromatin sequencing at single-cell resolution, ie, ATAC-seq (Assay for Transposase-Accessible Chromatin Sequencing). For this, the same microwell array was used, but instead of printing oligo-dT in the wells, a barcoded transposase (TN5) was used, which would mark open chromatin and allow generation of ATAC-seq libraries.

在一些实施方案中,本公开的方法可用于进行TCR-seq。因为本公开的方法中提供的库是通过模板转换生成的,所以产生全长cDNA,这使得空间单细胞TCR seq成为可能。为此,需要对单细胞cDNA进行空间条形码化。然后使用与TCRα和β链可变区结合的引物进行TCR富集PCR。引物有一个Nextera R2柄,允许进行嵌套PCR以使用Illumina p5引物完成seq库。In some embodiments, the methods of the present disclosure can be used to perform TCR-seq. Because the libraries provided in the methods of the present disclosure are generated by template switching, full-length cDNAs are generated, which enables spatial single-cell TCR-seq. For this, spatial barcoding of single-cell cDNA is required. TCR enrichment PCR was then performed using primers binding to the variable regions of the TCR α and β chains. The primers have a Nextera R2 handle, allowing nested PCR to complete the seq library using Illumina p5 primers.

在一些实施方案中,本公开的方法可用于进行细胞特异性空间转录组学剖析。这之所以成为可能,是因为本公开的方法在第一条形码化与第二条形码化步骤之间包括细胞分选步骤。在第一条形码化步骤期间,可以用细胞特异性抗体标记细胞,然后仅对感兴趣的细胞进行分选以进行第二条形码化步骤。In some embodiments, the methods of the present disclosure can be used to perform cell-specific spatial transcriptomic profiling. This is possible because the method of the present disclosure includes a cell sorting step between the first and second barcoding steps. During the first barcoding step, cells can be labeled with cell-specific antibodies, and then only cells of interest are sorted for the second barcoding step.

系统system

本公开还涉及一种系统,其包含一个或多个本文公开的阵列。在一些实施方案中,这类阵列中的每一个包含一个或多个微孔,每个微孔占据阵列上的不同位置并且包含本文别处公开的任何空间索引引物。在一些实施方案中,这类空间索引引物中的每一个包含在5’到3’方向上包含以下的核酸分子:The present disclosure also relates to a system comprising one or more of the arrays disclosed herein. In some embodiments, each of such arrays comprises one or more microwells, each microwell occupying a different position on the array and comprising any of the spatially indexed primers disclosed elsewhere herein. In some embodiments, each of such spatially indexed primers comprises, in the 5' to 3' direction, a nucleic acid molecule comprising:

i)退火结构域,其包含被第一测序引物识别的核苷酸序列;i) an annealing domain comprising a nucleotide sequence recognized by a first sequencing primer;

ii)空间条形码结构域,其包含每个微孔独有的核苷酸序列;以及ii) a spatial barcode domain comprising a nucleotide sequence unique to each microwell; and

iii)捕获结构域,其包含聚胸苷序列。iii) A capture domain comprising a polythymidine sequence.

在一些实施方案中,所公开的系统的每个阵列单独包含至少约10个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约50个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约100个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约200个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约500个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约1000个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约2000个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约4000个微孔。In some embodiments, each array of the disclosed systems individually comprises at least about 10 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 50 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 100 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 200 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 500 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 1000 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 2000 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 4000 microwells.

在一些实施方案中,所公开的系统的每个阵列单独包含至少约16个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约32个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约64个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约128个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约256个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约512个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约768个微孔。在一些实施方案中,所公开的系统的每个阵列单独包含至少约1024个微孔。In some embodiments, each array of the disclosed systems individually comprises at least about 16 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 32 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 64 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 128 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 256 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 512 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 768 microwells. In some embodiments, each array of the disclosed systems individually comprises at least about 1024 microwells.

在一些实施方案中,所公开的系统的阵列中的每个微孔是三角形的。在一些实施方案中,所公开的系统的阵列中的每个微孔是正方形的。在一些实施方案中,所公开的系统的阵列中的每个微孔是五边形的。在一些实施方案中,所公开的系统的阵列中的每个微孔是六边形的。在一些实施方案中,所公开的系统的阵列中的每个微孔是圆形的。In some embodiments, each microwell in the array of the disclosed systems is triangular in shape. In some embodiments, each microwell in the array of the disclosed systems is square. In some embodiments, each microwell in the array of the disclosed systems is pentagonal. In some embodiments, each microwell in the array of the disclosed systems is hexagonal. In some embodiments, each microwell in the array of the disclosed systems is circular.

在一些实施方案中,所公开的系统的阵列中的每个微孔的深度为约25μm至约800μm。在一些实施方案中,所公开的系统的阵列中的每个微孔的深度为约1μm至约1000μm。在一些实施方案中,所公开的系统的阵列中的每个微孔的深度为约50至约500微米。在一些实施方案中,所公开的系统的阵列中的每个微孔的深度为约75μm至约250μm。在一些实施方案中,所公开的系统的阵列中的每个微孔的深度为约5μm、约10μm、约50μm、约100μm、约150μm、约200μm、约250μm、约300μm、约350μm、约400μm、约450μm、约500μm或约1000μm。在一些实施方案中,所公开的系统的阵列中的每个微孔的深度为约400微米。In some embodiments, the depth of each microwell in the array of the disclosed systems is from about 25 μm to about 800 μm. In some embodiments, the depth of each microwell in the array of the disclosed systems is from about 1 μm to about 1000 μm. In some embodiments, the depth of each microwell in the array of the disclosed systems is from about 50 to about 500 microns. In some embodiments, the depth of each microwell in the array of the disclosed systems is from about 75 μm to about 250 μm. In some embodiments, the depth of each microwell in the array of the disclosed systems is about 5 μm, about 10 μm, about 50 μm, about 100 μm, about 150 μm, about 200 μm, about 250 μm, about 300 μm, about 350 μm, about 400 μm , about 450 μm, about 500 μm, or about 1000 μm. In some embodiments, the depth of each microwell in the array of the disclosed systems is about 400 microns.

在一些实施方案中,所公开的系统的阵列中的微孔的中心与中心间距为约50微米至约500微米。在一些实施方案中,微孔的中心与中心间距为约50微米。在一些实施方案中,微孔的中心与中心间距为约100微米。在一些实施方案中,微孔的中心与中心间距为约150微米。在一些实施方案中,微孔的中心与中心间距为约200微米。在一些实施方案中,微孔的中心与中心间距为约250微米。在一些实施方案中,微孔的中心与中心间距为约300微米。在一些实施方案中,微孔的中心与中心间距为约350微米。在一些实施方案中,微孔的中心与中心间距为约400微米。在一些实施方案中,微孔的中心与中心间距为约450微米。在一些实施方案中,微孔的中心与中心间距为约500微米。In some embodiments, the microwells in the array of the disclosed systems have a center-to-center spacing of about 50 microns to about 500 microns. In some embodiments, the microwells are about 50 microns center-to-center. In some embodiments, the microwells are about 100 microns center-to-center. In some embodiments, the microwells are about 150 microns center-to-center. In some embodiments, the microwells are about 200 microns center-to-center. In some embodiments, the microwells have a center-to-center spacing of about 250 microns. In some embodiments, the microwells are about 300 microns center-to-center. In some embodiments, the microwells have a center-to-center spacing of about 350 microns. In some embodiments, the microwells have a center-to-center spacing of about 400 microns. In some embodiments, the microwells have a center-to-center spacing of about 450 microns. In some embodiments, the microwells are about 500 microns center-to-center.

在一些实施方案中,所公开的系统还包含一个或多个本文公开的多孔板。在一些实施方案中,每个多孔板包含一个或多个孔,每个孔占据多孔板上的不同位置并且包含本文公开的任何一个或多个细胞索引引物。在一些实施方案中,每个这类细胞索引引物包含从5’至3’包含以下的核酸分子:In some embodiments, the disclosed systems further comprise one or more multiwell plates disclosed herein. In some embodiments, each multiwell plate comprises one or more wells, each well occupying a different location on the multiwell plate and comprising any one or more cell indexing primers disclosed herein. In some embodiments, each such cell index primer comprises, from 5' to 3', a nucleic acid molecule comprising:

i)退火结构域,其包含被第二测序引物识别的核苷酸序列;以及i) an annealing domain comprising a nucleotide sequence recognized by a second sequencing primer; and

ii)细胞条形码结构域,其包含所述多孔板的每个孔独有的核苷酸序列。ii) a cellular barcode domain comprising a nucleotide sequence unique to each well of the multiwell plate.

在一些实施方案中,所公开的系统的每个多孔板单独包含约24个孔。在一些实施方案中,所公开的系统的每个多孔板单独包含约48个孔。在一些实施方案中,所公开的系统的每个多孔板单独包含约96个孔。在一些实施方案中,所公开的系统的每个多孔板单独包含约192个孔。在一些实施方案中,所公开的系统的每个多孔板单独包含约384个孔。在一些实施方案中,所公开的系统的每个多孔板单独包含约768个孔。In some embodiments, each multiwell plate of the disclosed systems contains about 24 wells individually. In some embodiments, each multiwell plate of the disclosed systems contains about 48 wells individually. In some embodiments, each multiwell plate of the disclosed systems contains about 96 wells individually. In some embodiments, each multiwell plate of the disclosed systems contains about 192 wells individually. In some embodiments, each multiwell plate of the disclosed systems contains about 384 wells individually. In some embodiments, each multiwell plate of the disclosed systems contains about 768 wells alone.

在一些实施方案中,所公开的系统的空间条形码结构域单独包含约8至约50个核苷酸。在一些实施方案中,所公开的系统的空间条形码结构域单独包含约9至约40个核苷酸。在一些实施方案中,所公开的系统的空间条形码结构域单独包含约10至约30个核苷酸。在一些实施方案中,所公开的系统的空间条形码结构域单独包含约12至约25个核苷酸。在一些实施方案中,所公开的系统的空间条形码结构域单独包含约8个、约9个、约10个、约11个、约12个、约13个、约14个、约15个、约16个、约17个、约18个、约19个、约20个、约21个、约22个、约23个、约24个、约25个、约26个、约27个、约28个、约29个、约30个、约35个、约40个、约45个或约50个核苷酸。在一些实施方案中,所公开的系统的空间条形码结构域单独包含约16个核苷酸。In some embodiments, the spatial barcode domains of the disclosed systems individually comprise from about 8 to about 50 nucleotides. In some embodiments, the spatial barcode domains of the disclosed systems individually comprise from about 9 to about 40 nucleotides. In some embodiments, the spatial barcode domains of the disclosed systems individually comprise from about 10 to about 30 nucleotides. In some embodiments, the spatial barcode domains of the disclosed systems individually comprise from about 12 to about 25 nucleotides. In some embodiments, the spatial barcode domains of the disclosed systems individually comprise about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28 , about 29, about 30, about 35, about 40, about 45, or about 50 nucleotides. In some embodiments, the spatial barcode domains of the disclosed systems alone comprise about 16 nucleotides.

在一些实施方案中,所公开的系统的捕获结构域中的聚胸苷序列单独包含约8至约50个脱氧胸苷残基。在一些实施方案中,所公开的系统的捕获结构域中的聚胸苷序列单独包含约9至约40个脱氧胸苷残基。在一些实施方案中,所公开的系统的捕获结构域中的聚胸苷序列单独包含约10至约30个脱氧胸苷残基。在一些实施方案中,所公开的系统的捕获结构域中的聚胸苷序列单独包含约12至约25个脱氧胸苷残基。在一些实施方案中,所公开的系统的捕获结构域中的聚胸苷序列单独包含约8个、约9个、约10个、约11个、约12个、约13个、约14个、约15个、约16个、约17个、约18个、约19个、约20个、约21个、约22个、约23个、约24个、约25个、约26个、约27个、约28个、约29个、约30个、约35个、约40个、约45个或约50个脱氧胸苷残基。在一些实施方案中,所公开的系统的捕获结构域中的聚胸苷序列单独包含约18个脱氧胸苷残基。In some embodiments, the polythymidine sequences in the capture domains of the disclosed systems alone comprise from about 8 to about 50 deoxythymidine residues. In some embodiments, the polythymidine sequences in the capture domains of the disclosed systems alone comprise from about 9 to about 40 deoxythymidine residues. In some embodiments, the polythymidine sequences in the capture domains of the disclosed systems alone comprise from about 10 to about 30 deoxythymidine residues. In some embodiments, the polythymidine sequences in the capture domains of the disclosed systems alone comprise from about 12 to about 25 deoxythymidine residues. In some embodiments, the polythymidine sequences in the capture domain of the disclosed systems individually comprise about 8, about 9, about 10, about 11, about 12, about 13, about 14, About 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27 , about 28, about 29, about 30, about 35, about 40, about 45 or about 50 deoxythymidine residues. In some embodiments, the polythymidine sequence in the capture domain of the disclosed systems alone comprises about 18 deoxythymidine residues.

在一些实施方案中,所公开的系统的细胞条形码结构域单独包含约8至约50个核苷酸。在一些实施方案中,所公开的系统的细胞条形码结构域单独包含约9至约40个核苷酸。在一些实施方案中,所公开的系统的细胞条形码结构域单独包含约10至约30个核苷酸。在一些实施方案中,所公开的系统的细胞条形码结构域单独包含约12至约25个核苷酸。在一些实施方案中,所公开的系统的细胞条形码结构域单独包含约8个、约9个、约10个、约11个、约12个、约13个、约14个、约15个、约16个、约17个、约18个、约19个、约20个、约21个、约22个、约23个、约24个、约25个、约26个、约27个、约28个、约29个、约30个、约35个、约40个、约45个或约50个核苷酸。在一些实施方案中,所公开的系统的细胞条形码结构域单独包含约16个核苷酸。In some embodiments, the cellular barcode domains of the disclosed systems individually comprise from about 8 to about 50 nucleotides. In some embodiments, the cellular barcode domains of the disclosed systems individually comprise from about 9 to about 40 nucleotides. In some embodiments, the cellular barcode domains of the disclosed systems individually comprise from about 10 to about 30 nucleotides. In some embodiments, the cellular barcode domains of the disclosed systems individually comprise from about 12 to about 25 nucleotides. In some embodiments, the cellular barcode domains of the disclosed systems individually comprise about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28 , about 29, about 30, about 35, about 40, about 45, or about 50 nucleotides. In some embodiments, the cellular barcode domains of the disclosed systems alone comprise about 16 nucleotides.

在一些实施方案中,所公开的系统还包含一个或多个垫片。通过将垫片放置在切片组织的顶部,此类垫片可用于迫使切片组织中的细胞落入所公开的阵列的微孔中。垫片可由任何已知的材料制成。在一些实施方案中,所公开的系统的垫片由硅树脂制成。在一些实施方案中,所公开的系统还包含适用于组织消化的材料和试剂。在一些实施方案中,所公开的系统还包含适用于透化的材料和试剂。在一些实施方案中,所公开的系统还包含适用于逆转录(RT)的材料和试剂。在一些实施方案中,所公开的系统呈试剂盒的形式,该试剂盒具有呈标签或产品插页形式的合适操作参数的说明。In some embodiments, the disclosed systems also include one or more shims. By placing the spacers on top of the sliced tissue, such spacers can be used to force cells in the sliced tissue to fall into the microwells of the disclosed arrays. Gaskets can be made of any known material. In some embodiments, the spacers of the disclosed systems are made of silicone. In some embodiments, the disclosed systems also comprise materials and reagents suitable for tissue digestion. In some embodiments, the disclosed systems also comprise materials and reagents suitable for permeabilization. In some embodiments, the disclosed systems also comprise materials and reagents suitable for reverse transcription (RT). In some embodiments, the disclosed systems are in the form of a kit with instructions for suitable operating parameters in the form of a label or product insert.

现在将参考所附表格和图式以示例的方式说明本公开的方面和实施方案。另外的方面和实施方案对于本领域技术人员将是显而易见的。本文中提及的所有文件均通过引用整体并入本文。Aspects and embodiments of the present disclosure will now be illustrated by way of example with reference to the accompanying tables and drawings. Additional aspects and embodiments will be apparent to those skilled in the art. All documents mentioned herein are incorporated by reference in their entirety.

实施例Example

实施例1:方法概述Example 1: Overview of the method

XYZeq使用一种改进的组合索引方法,类似于2017年发布为sci-RNA-seq(用于细胞RNA测序分析;23)和SPLiT-seq(基于池连接的录组序;24)的方法。简单地说,使用聚二甲基硅氧烷(PDMS)模具作为模板,在通用组织学载玻片上由NorlandOptical Adhesive 81(NOA81)制造500微米六边形孔阵列。然后用空间界定的条形码化的oligo(dT)18引物对每个孔进行点样并干燥。XYZeq uses a modified combinatorial indexing approach similar to those published in 2017 as sci-RNA-seq (for single- cell combinatorial indexed RNA-sequencing analysis; 23) and SPLiT-seq ( split - pool ligation -based transcription group sequencing ; 24). Briefly, arrays of 500 micron hexagonal wells were fabricated from Norland Optical Adhesive 81 (NOA81 ) on universal histology slides using polydimethylsiloxane (PDMS) molds as templates. Each well was then spotted with a spatially delimited barcoded oligo(dT)18 primer and dried.

在实验当天,用组织消化、透化和逆转录(RT)试剂的混合物对孔阵列载玻片进行点样,在其上覆盖固定的冷冻组织切片。阵列用硅垫片夹住并放置在载玻片微阵列杂交室(Agilent G2534A)中,以确保在短的原位RT反应期间微孔密封。反应后,取出阵列载玻片并放入装有1X SSC缓冲液和10% FCS的50ml锥形管中。将具有载玻片的管涡旋15秒以将细胞从孔中移出,然后以700rcf离心10分钟以使细胞形成球粒。从50ml锥形管中去除差不多1-2ml后,通过70微米细胞过滤器过滤细胞,用抗体染色,并将25-50个细胞分选到具有5μl第二RT混合物的96孔板中的孔中。此时,加入第二RT混合物中包括的DTT使细胞溶解,并在42℃下进行标准的1.5小时逆转录和模板转换反应,然后进行PCR,其中使用条形码化的Illumina P5引物用于二次索引。将所有孔中的条形码化的cDNA汇集到一个2ml管中,并使用固相可逆固定(SPRI)珠粒进行清洁和浓缩。将cDNA洗脱到15μl,进行量化并检查大小分布是否合适。然后通过标签化、接着PCR从cDNA生成Illumina兼容测序库,这样两个组合条形码都保留在测序片段上。On the day of the experiment, fixed frozen tissue sections were overlaid on well array slides spotted with a mixture of tissue digestion, permeabilization, and reverse transcription (RT) reagents. Arrays were clamped with silicon spacers and placed in a slide microarray hybridization chamber (Agilent G2534A) to ensure microwell sealing during short in situ RT reactions. After the reaction, remove the array slide and place in a 50 ml conical tube filled with 1X SSC buffer and 10% FCS. The tubes with the slides were vortexed for 15 seconds to dislodge the cells from the wells, then centrifuged at 700 rcf for 10 minutes to pellet the cells. After removing almost 1-2 ml from the 50 ml conical tube, filter the cells through a 70 micron cell strainer, stain with antibodies, and sort 25-50 cells into wells in a 96-well plate with 5 μl of the second RT mix . At this point, cells were lysed by adding DTT included in the second RT mix and subjected to a standard 1.5 hr reverse transcription and template switching reaction at 42 °C followed by PCR using barcoded Illumina P5 primers for secondary indexing . Barcoded cDNA from all wells was pooled into a 2 ml tube and cleaned and concentrated using solid phase reversible immobilization (SPRI) beads. The cDNA was eluted to 15 μl, quantified and checked for proper size distribution. An Illumina-compatible sequencing library is then generated from the cDNA by tagging followed by PCR such that both combined barcodes are retained on the sequenced fragments.

实施例2:用于XYZeq的微孔阵列芯片的制造Embodiment 2: the manufacture of the microwell array chip that is used for XYZeq

XYZeq的阵列制造涉及阳模设计和制造以及PDMS阴模的产生。对于阳模,微孔阵列设计为500μm孔的六边形包(测量的中心到中心),间隔10μm。阵列设计包括角基准标记,用于精确比对和通过Scienion sciFLEXARRAYER S3进行试剂分配。微孔设计的UV掩模是从CAD/Art Services(Bandon,Oregon)获得的。将100mm硅片用SU-8 2150光刻胶以2000rpm旋涂30秒,在95℃下软烘烤2小时,用掩模进行UV曝光30分钟,在95℃下后烘烤20分钟,然后显影1小时。Array fabrication of XYZeq involves male mold design and fabrication and PDMS negative mold generation. For the positive mold, the microwell array was designed as a hexagonal pack of 500 μm holes (measured center to centre), spaced 10 μm apart. Array design includes corner fiducial markers for precise alignment and reagent assignment via Scienion sciFLEXARRAYER S3. Microwell designed UV masks were obtained from CAD/Art Services (Bandon, Oregon). A 100mm silicon wafer was spin-coated with SU-8 2150 photoresist at 2000rpm for 30 seconds, soft baked at 95°C for 2 hours, UV exposed with a mask for 30 minutes, post-baked at 95°C for 20 minutes, and then developed 1 hour.

如下产生PDMS阴模。PDMS(Sylgard 184)有两种液体组分:组分A是基料,组分B是固化剂。使用称重秤,添加30克A组分,然后添加与组分A1:10的B组分到100mm皮氏培养皿(petri dish)中。用塑料棉签混合这两种组分。将硅片阵列阳模放入培养皿中,然后在真空干燥器中脱气30分钟至1小时,直至没有气泡残留。将装有硅片的培养皿以1000rcf离心10分钟,使晶片下降到底部并去除任何剩余的气泡。将PDMS在70℃烘箱中固化过夜。从晶片上剥下PDMS,然后使用剃须刀片切出模具。The PDMS negative mold was generated as follows. PDMS (Sylgard 184) has two liquid components: Component A is the base material and Component B is the curing agent. Using a weighing scale, add 30 grams of component A, then add component B 1:10 with component A to a 100 mm petri dish. Mix the two components with a plastic cotton swab. Put the positive silicon wafer array mold into a Petri dish, and then degas it in a vacuum desiccator for 30 minutes to 1 hour until no air bubbles remain. Centrifuge the Petri dish containing the silicon wafer at 1000 rcf for 10 min to allow the wafer to drop to the bottom and remove any remaining air bubbles. The PDMS was cured overnight in a 70 °C oven. Peel the PDMS from the wafer and cut out the mold using a razor blade.

如下制造微孔阵列芯片。将热板加热至100℃。向PDMS模具中添加150μl NOA81,并将其铺展以覆盖整个阵列。将组织学载玻片放在PDMS模具顶部,并在载玻片顶部放置透明的20g重物。在一侧UV使NOA81固化2分钟,然后在没有重物的情况下在背面固化1分钟。短暂冷却,然后将PDMS模具从NOA81阵列上剥离,完成制造过程。The microwell array chip was fabricated as follows. Heat the hot plate to 100 °C. Add 150 μl of NOA81 to the PDMS mold and spread it to cover the entire array. Place a histology slide on top of the PDMS mold and place a clear 20 g weight on top of the slide. UV cure NOA81 for 2 minutes on one side, then 1 minute on the back without weights. After cooling briefly, the PDMS mold was peeled off from the NOA81 array to complete the fabrication process.

使用Scienion sciFLEXARRAYER S3打印机用空间条形码化的oligo(dT)18引物打印微孔阵列芯片。在所进行的特定实验中,用768个独特条形码化的oligo(dT)18引物打印阵列。S3打印机被安置在一个冷冻的和湿度控制的室内,因此在打印过程中,源板不会蒸发。寡核苷酸在芯片中干燥并存储至实验当天。Microwell array chips were printed with spatially barcoded oligo(dT)18 primers using a Scienion sciFLEXARRAYER S3 printer. In the particular experiment performed, an array was printed with 768 uniquely barcoded oligo(dT)18 primers. The S3 printer is housed in a refrigerated and humidity-controlled chamber so the source plate does not evaporate during the printing process. Oligonucleotides were dried in the chip and stored until the day of the experiment.

实施例3:使用细胞系验证XYZeq平台Example 3: Validation of the XYZeq platform using cell lines

XYZeq平台的可行性已使用来自两种不同物种的细胞系进行了验证,这些细胞系的浓度由每个孔的相对空间位置确定。还使用鼠异位肝肿瘤模型验证了XYZeq平台鉴定完整组织内具有不同空间组织的独特细胞群的能力。The feasibility of the XYZeq platform was validated using cell lines from two different species whose concentrations were determined by the relative spatial position of each well. The ability of the XYZeq platform to identify distinct cell populations with distinct spatial organization within intact tissue was also validated using a murine ectopic liver tumor model.

XYZeq扩展了最近用于单细胞测序的拆分池索引(17,18)方法,使得能够同时记录空间信息。细胞转录物由离六边形微孔阵列中心250μm的条形码化寡核苷酸进行空间编码。细胞被点样到孔中,透化,并用含有独特分子标识符和PCR柄的孔特异性条形码化oligo d(T)引物(RT-索引)进行索引。随后是逆转录、通过PCR进行第二轮条形码化,以及标签化以生成单细胞RNA测序(图5A)。空间信息RT-索引与拆分池PCR-索引的组合允许获得单细胞转录组数据,同时将每个细胞分配到阵列中的特定孔。使用两轮组合条形码化,第一轮使用768个位置RT-索引,第二轮使用384个PCR-索引,最多可以生成294,912个条形码组合。XYZeq extends recent split-pool indexing (17,18) methods for single-cell sequencing, enabling simultaneous recording of spatial information. Cellular transcripts are spatially encoded by barcoded oligonucleotides located 250 μm from the center of the hexagonal microwell array. Cells were spotted into wells, permeabilized, and indexed with well-specific barcoded oligo d(T) primers (RT-indexed) containing unique molecular identifiers and PCR handles. This was followed by reverse transcription, a second round of barcoding by PCR, and tagging to generate single-cell RNA sequencing (Fig. 5A). The combination of spatially informative RT-indexing and split-pool PCR-indexing allows the acquisition of single-cell transcriptome data while assigning each cell to a specific well in the array. Using two rounds of combinatorial barcoding, the first using 768 position RT-indexes and the second using 384 PCR-indexes, a maximum of 294,912 barcode combinations could be generated.

为了验证XYZeq生成可解释的单细胞转录组,我们进行了混合物种实验,其中80个人(HEK293T)和小鼠(NIH/3T3)细胞以各种不同的比例存放在768个条形码化的微孔中。已使用来自两种不同物种的细胞系证明了XYZeq的可行性,这些细胞系以由每个孔的相对空间位置确定的浓度混合。微孔阵列中的每一列具有以梯度混合在一起的下降或上升浓度的人或小鼠细胞(图5B)。汇集来自微孔芯片的细胞,经过FACS分选到四个96孔板的每个孔中,浓度为25个细胞/孔。总共获得了4,871个独特条形码化的细胞,随后将读数与小鼠或人基因组进行比对。我们的数据揭露了物种之间读数的明显分离,其中每个细胞被明确分配给单个物种(>90%读数与单个基因组比对),只有8.4%的碰撞率,其中细胞映射到人和小鼠,这与使用这些参数的预期条形码碰撞率一致(图5C)。每个人细胞获得了939个UMI和439个基因的中值,每个小鼠细胞获得了816个UMI和336个基因的中值(图5D)。此外,每列中人与小鼠细胞的比率与打印在梯度图案上的预期细胞比率一致(图5E)。这些结果表明,当细胞在逆转录前汇集时,孔之间的条形码转移非常少,并且XYZeq生成高质量的scRNA-seq库。To validate that XYZeq generates interpretable single-cell transcriptomes, we performed mixed-species experiments in which 80 human (HEK293T) and mouse (NIH/3T3) cells were deposited in 768 barcoded microwells at various ratios . The feasibility of XYZeq has been demonstrated using cell lines from two different species mixed at concentrations determined by the relative spatial position of each well. Each column in the microwell array had decreasing or increasing concentrations of human or mouse cells mixed together in a gradient (Figure 5B). Cells from the microchip were pooled and sorted by FACS into each well of four 96-well plates at a concentration of 25 cells/well. A total of 4,871 uniquely barcoded cells were obtained, and the reads were subsequently aligned to the mouse or human genome. Our data revealed a clear separation of reads between species, where each cell was unambiguously assigned to a single species (>90% of reads aligned to a single genome), and only 8.4% of collisions, where cells mapped to both human and mouse , which is consistent with the expected barcode collision rates using these parameters (Figure 5C). A median of 939 UMIs and 439 genes per human cell and a median of 816 UMIs and 336 genes per mouse cell were obtained (Fig. 5D). Furthermore, the ratio of human to mouse cells in each column was consistent with the expected ratio of cells printed on the gradient pattern (Figure 5E). These results demonstrate that when cells are pooled prior to reverse transcription, there is very little barcode transfer between wells and that XYZeq generates high-quality scRNA-seq libraries.

实施例4:使用固定组织切片验证XYZeq平台Example 4: Validation of the XYZeq platform using fixed tissue sections

接下来确定XYZeq是否可以从固定组织切片生成单细胞RNA-seq库。这需要在微孔中进行组织消化、细胞透化和空间索引。为了测试这一点,我们使用了一种异位鼠肿瘤模型,该模型是通过将同基因结肠腺癌细胞系MC38经肝内注射到免疫活性小鼠中而建立的。MC38用荧光素酶(MC38-Luc)标记,以允许观察肝脏中的肿瘤生长,从而确定从动物切除的正确时间范围。当通过生物发光成像,肿瘤生长到直径约5mm(注射后第10-12天)时,处死小鼠,并收获负载有肿瘤结节的肝脏,固定并冷冻在包埋基质盒中。之所以选择该肝肿瘤模型,是因为清晰的边缘界定了肿瘤/肝边界并且MC38肿瘤是免疫原性的(30)。MC38肿瘤还具有免疫调节特性,免疫细胞积聚在肿瘤/组织界面。之前的数据已经显示在肿瘤接种后约12天,肿瘤中的所有细胞的约15-20%是浸润免疫细胞(23,24)。因此,预测XYZeq数据可能会在疾病进展过程中捕获具有不同空间组织的组织驻留细胞群和浸润细胞群。We next determined whether XYZeq could generate single-cell RNA-seq libraries from fixed tissue sections. This requires tissue digestion, cell permeabilization, and spatial indexing in microwells. To test this, we used an ectopic murine tumor model established by intrahepatic injection of the syngeneic colon adenocarcinoma cell line MC38 into immunocompetent mice. MC38 was labeled with luciferase (MC38-Luc) to allow visualization of tumor growth in the liver to determine the correct time frame for resection from the animal. When tumors had grown to approximately 5 mm in diameter by bioluminescent imaging (days 10-12 post-injection), mice were sacrificed and livers bearing tumor nodules were harvested, fixed and frozen in embedding matrix cassettes. This liver tumor model was chosen because a sharp margin defines the tumor/liver boundary and MC38 tumors are immunogenic (30). MC38 tumors also have immunomodulatory properties, with immune cells accumulating at the tumor/tissue interface. Previous data have shown that approximately 15-20% of all cells in a tumor are infiltrating immune cells approximately 12 days after tumor inoculation (23,24). Thus, predictive XYZeq data may capture tissue-resident and infiltrating cell populations with distinct spatial organization during disease progression.

我们修改了XYZeq平台来研究完整组织切片。为了再次确保转录组可以分配给离散的单细胞,将固定的人HEK293T细胞以平均每孔58个细胞点样到条形码化的微孔阵列中,然后在-80℃下冷冻以提供用于检测空间内混合的对照或PCR孔。接下来,将来自C57BL/6小鼠的25μm固定冷冻肝/肿瘤组织切片放于预冷冻的-80℃微孔阵列顶部,同时取连续的10μm切片并固定用于免疫组织化学染色。捕获阵列上组织的图像以确定阵列上组织的总体方向。成像后,用硅胶垫圈密封阵列,然后将其夹在Agilent微阵列杂交载玻片室中。微阵列杂交室有两个用途:1)机械压力迫使组织进入孔,和2)在42℃孵育期间,当进行组织消化、细胞透化、原位oligo(dT)退火和逆转录(RT)时防止蒸发(图5A)。We modified the XYZeq platform to study intact tissue sections. To again ensure that the transcriptome can be assigned to discrete single cells, fixed human HEK293T cells were spotted at an average of 58 cells per well into barcoded microwell arrays and then frozen at -80°C to provide space for assays Inside mixed control or PCR wells. Next, 25 μm fixed frozen liver/tumor tissue sections from C57BL/6 mice were placed on top of the pre-frozen -80°C microwell arrays, while serial 10 μm sections were taken and fixed for immunohistochemical staining. Capture images of the tissue on the array to determine the general orientation of the tissue on the array. After imaging, the array was sealed with a silicone gasket and then clamped in an Agilent microarray hybridization slide chamber. The microarray hybridization chamber serves two purposes: 1) mechanical pressure to force tissue into the wells, and 2) during incubation at 42°C, when performing tissue digestion, cell permeabilization, in situ oligo(dT) annealing, and reverse transcription (RT) Prevent evaporation (Figure 5A).

基于组织的方案生成的数据具有高单细胞完整性,56%的细胞映射到小鼠,34%的细胞映射到人,碰撞率9.6%(图6A)。在46%的测序饱和度下,检测到每个HEK293T细胞中1596个转录物UMI和629个独特基因的中值,来自异位鼠肿瘤模型的每个细胞1009个UMI转录物和456个独特基因的中值(图6B)。从阵列拍摄的组织图像以及组织的苏木精和伊红(H&E)免疫组织化学染色揭露了肿瘤和肝组织的不同边界(图6C)。从单细胞数据重建细胞的空间排列揭露了散布在整个阵列中的人细胞和被隔离在组织覆盖的孔中的小鼠细胞(图6D)。重要的是,这些结果表明XYZeq可以从冷冻组织中生成空间分辨单细胞RNA-seq数据。The tissue-based scheme generated data with high single-cell integrity, with 56% of cells mapped to mouse and 34% to human, for a collision rate of 9.6% (Fig. 6A). A median of 1596 transcripts UMI and 629 unique genes per HEK293T cell and 1009 UMI transcripts and 456 unique genes per cell from an ectopic murine tumor model were detected at 46% sequencing saturation The median value of (Figure 6B). Tissue images taken from the array and hematoxylin and eosin (H&E) immunohistochemical staining of the tissue revealed distinct borders of tumor and liver tissue (Fig. 6C). Reconstruction of the spatial arrangement of cells from single-cell data revealed human cells interspersed throughout the array and mouse cells sequestered in tissue-covered wells (Fig. 6D). Importantly, these results demonstrate that XYZeq can generate spatially resolved single-cell RNA-seq data from frozen tissues.

需要注意的是,为了从固定冷冻组织中获得高质量的RNA,容纳载玻片的微阵列杂交室必须经历从-80℃、-20℃、4℃、25℃至42℃缓缓逐步温度上升。在没有这种逐步温度变化的情况下,从阵列中提取的RNA严重降解(数据未显示)。It should be noted that in order to obtain high quality RNA from fixed frozen tissues, the microarray hybridization chamber containing the slides must undergo a slow and gradual temperature ramp from -80°C, -20°C, 4°C, 25°C to 42°C . In the absence of such stepwise temperature changes, RNA extracted from the arrays was severely degraded (data not shown).

实施例3:肝肿瘤模型中发现的不同细胞群的鉴定Example 3: Identification of different cell populations found in liver tumor models

在使用XYZeq处理的组织切片中,总共产生了26,436个独特条形码组合,对于表达至少500个UMI的4,788个条形码平均检测到456个独特基因,我们将其过滤为包含细胞的隔室。无人监督的莱顿聚类在我们的scRNAseq数据集中揭露了七种不同的细胞群:包括HEK293T、MC38肿瘤、巨噬细胞、枯否细胞、肝窦内皮细胞(LSEC)、淋巴细胞和肝细胞(图7A)。每个簇可由不同的基因表达谱定义,包括Mc38肿瘤的Plec、LSEC的Stab2、肝细胞的Dpyd、枯否细胞的Cd5l、巨噬细胞的Cd74和淋巴细胞的Skap1(图7B)。使用Harmony(一种可以将数据集归一化以整合来自具有不同实验和生物因素的多个实验的细胞的数据的算法),我们能够将XYZeq数据集与10X Chromium(v3)合并,以确定量度的比较方式。10X Chromium的细胞是从先前固定、冷冻和切片的异位肝肿瘤处理的,其一起汇集成单细胞悬浮液,并在使用10X Chromium制造商的方案生成库之前进行分选。为了合并数据集,XYZeq和10X的原始计数矩阵仅针对最终的一组细胞条形码进行过滤,同时保留所有可能的小鼠基因,并组合成一组5453个细胞,跨越22374个基因。将数据归一化为每个细胞100万个计数,记录,然后缩放到每个基因的平均值为零和方差为1。使用PC对数据进行预处理,然后使用Harmony。使用UMAP进行可视化,使用莱顿进行聚类,分辨率为0.2(图8A)。In tissue sections processed with XYZeq, a total of 26,436 unique barcode combinations were generated, with an average of 456 unique genes detected for 4,788 barcodes expressing at least 500 UMIs, which we filtered to cell-containing compartments. Unsupervised Leiden clustering revealed seven distinct cell populations in our scRNAseq dataset: including HEK293T, MC38 tumors, macrophages, Kupffer cells, liver sinusoidal endothelial cells (LSEC), lymphocytes, and hepatocytes (FIG. 7A). Each cluster could be defined by different gene expression profiles, including Plec for Mc38 tumors, Stab2 for LSECs, Dpyd for hepatocytes, Cd5l for Kupffer cells, Cd74 for macrophages, and Skap1 for lymphocytes (Fig. 7B). Using Harmony (an algorithm that can normalize datasets to integrate data from cells from multiple experiments with different experimental and biological factors), we were able to merge the XYZeq dataset with 10X Chromium (v3) to determine the measure way of comparison. Cells for 10X Chromium were processed from previously fixed, frozen, and sectioned ectopic liver tumors, pooled together into single-cell suspensions, and sorted prior to library generation using the 10X Chromium manufacturer's protocol. To combine the datasets, the raw count matrices of XYZeq and 10X were filtered against only the final set of cell barcodes while retaining all possible mouse genes and combined into a set of 5453 cells spanning 22374 genes. Data were normalized to 1 million counts per cell, recorded, and then scaled to have a mean of zero and a variance of 1 for each gene. Data was preprocessed using PC and then Harmony. Visualization was performed using UMAP and clustering was performed using Leiden at a resolution of 0.2 (Fig. 8A).

为了确定这两个平台的相关性如何,我们过滤了2500个表达UMI最多的细胞条形码。使用来自合并数据集的注释,计算来自每种方法和属于每种细胞类型的细胞比例。绘制每种细胞类型的比例,并通过拟合假设两种方法之间的比例相等的模型来计算决定系数。使用该量度,从10X数据到XYZeq的簇之间的相关性在两个不同的单细胞平台之间的r^2值为0.961时为高的,两个平台之间的簇组成相似。(图7B)。从10X Chromium(v3)中检测到的UMI的中值为每个细胞1805个和857个基因。相反,使用固定冷冻组织切片的XYZeq平台处理从6个组织切片的聚合数据中恢复的单细胞量度,每个细胞检测到1124个UMI和468个基因(图7C)。比较分析允许我们揭露每个群体中在基因表达谱、功能和组织方面不同的异质性。基于7种细胞类型的已知代表性标记基因平铺不同的表达谱,我们能够将跨细胞群的基因表达重叠可视化(图8B)。每个基因的气泡大小与细胞类型的表达程度相关。To determine how related the two platforms were, we filtered the 2500 barcodes of cells expressing the UMI the most. Using the annotations from the merged dataset, the proportion of cells from each method and belonging to each cell type was calculated. The proportion of each cell type was plotted and the coefficient of determination was calculated by fitting a model assuming equal proportions between the two methods. Using this measure, the correlation between clusters from the 10X data to XYZeq was high at an r^2 value of 0.961 between the two different single-cell platforms, with similar cluster composition between the two platforms. (FIG. 7B). Median UMIs detected from 10X Chromium (v3) were 1805 and 857 genes per cell. In contrast, processing single-cell measures recovered from aggregated data from six tissue sections using the XYZeq platform of fixed frozen tissue sections detected 1124 UMIs and 468 genes per cell (Fig. 7C). Comparative analysis allowed us to uncover distinct heterogeneities within each population in terms of gene expression profiles, function, and organization. Based on tiling different expression profiles of known representative marker genes across seven cell types, we were able to visualize gene expression overlap across cell populations (Fig. 8B). The bubble size for each gene correlates with the degree of expression in the cell type.

为了确定XYZeq与10X genomics平台之间的一致程度,尝试通过热图进行可视化,其中将测定生成的簇与10x genomics平台生成的簇之间的缩放基因表达相关联(图7D)。在测定中发现的所有簇都与使用10X平台发现的除了一个以外的相应细胞类型相关,唯一的例外是一小群B细胞。这些细胞没有单独在XYZeq数据中聚类,而是可能至少部分被淋巴细胞群捕获。在免疫细胞类型之间观察到其它相关性,特别是在两个巨噬细胞簇之间,以Cd74和Tgfbr1为标记的巨噬细胞表明来自外周的浸润,而其它以Clec4f和Timd4为标记的巨噬细胞表明它们是非造血来源的组织驻留枯否细胞。这些数据显示了XYZeq方法与10Xgenomics平台之间的高度一致性。To determine the degree of agreement between XYZeq and the 10X genomics platform, visualization by heatmap was attempted, in which assay-generated clusters were correlated with scaled gene expression between clusters generated by the 10X genomics platform (Figure 7D). All clusters found in the assay were associated with all but one of the corresponding cell types found using the 10X platform, with the only exception being a small population of B cells. These cells did not cluster in the XYZeq data alone, but were likely at least partially captured by the lymphocyte population. Additional correlations were observed between immune cell types, notably between two macrophage clusters, with those marked by Cd74 and Tgfbr1 indicating infiltration from the periphery, and others marked by Clec4f and Timd4. Phagocytes indicate that they are tissue-resident Kupffer cells of non-hematopoietic origin. These data show a high degree of agreement between the XYZeq method and the 10Xgenomics platform.

实施例4:淋巴细胞基因表达谱揭露组织特异性适应Example 4: Lymphocyte Gene Expression Profiling Reveals Tissue-Specific Adaptations

10X Chromium可以生成基因表达谱和细胞类型的综合数据集,它不能在组织环境中对细胞进行空间定位。为了确定XYZeq的单细胞数据是否可以如实地重建肝肿瘤组织的空间组织学特征,我们探索了单细胞数据簇在空间阵列中的定位。大体上,跨空间孔的肝细胞和肿瘤细胞的密度热图与连续切片的苏木精和伊红(H&E)免疫组织化学染色(用灰色虚线勾勒)重叠(图7D和图7E)。其它细胞类型的投影揭露了淋巴细胞、巨噬细胞、枯否细胞、肝细胞、MC38和LSEC的不同空间组织模式,具有分散在整个阵列中的不同密度模式(图7E)。特别是,淋巴细胞分布与肝细胞和肿瘤重叠,而巨噬细胞似乎被隔离在肿瘤区域。LSEC孔也与肿瘤和肝细胞区域重叠,而枯否细胞预计仅与肝细胞界定的孔重叠。与UMAP投影中细胞类型特异性标记的富集一致,Plec的表达与肿瘤细胞空间共定位,Stab2的表达与淋巴细胞空间共定位,Dpyd的表达与肝细胞空间共定位,Cd5l的表达与枯否细胞空间共定位,Cd74的表达与巨噬细胞空间共定位,以及Sk ap1的表达与LSEC空间共定位(图8)。然而,密度空间图揭露了多种不同细胞类型的空间重叠,表明细胞相互作用的潜在热点。为了量化占据每个空间孔的细胞组成,利用单细胞数据生成了一个孔特异性饼图,该饼图描绘了每个孔中存在的细胞亚群的比率(图7F)。基于饼图的分析揭露了肝/肿瘤界面富含的免疫细胞的共定位——在组织解离的scRNA-seq平台中未获得的信息。空间阵列上一列的量化表示为条形图。与对空间密度图的视觉分析类似,巨噬细胞被隔离在肿瘤区域,而淋巴细胞则在肝细胞和肿瘤区域中共同检测到,表明在完整组织内发生了不同的空间组织。这些实验表明,XYZeq可以剖析组织中的单细胞转录组,并且可以产生与其它基于原位的高通量scRNAseq平台相当的量度,同时将细胞类型映射到组织微环境中的特定区域。10X Chromium can generate comprehensive datasets of gene expression profiles and cell types, it cannot spatially localize cells in a tissue context. To determine whether single-cell data from XYZeq can faithfully reconstruct the spatial histological features of liver tumor tissue, we explored the positioning of single-cell data clusters in the spatial array. Grossly, heatmaps of density of hepatocytes and tumor cells across spatial wells were overlaid with hematoxylin and eosin (H&E) immunohistochemical staining (outlined by dashed gray lines) of serial sections (Figure 7D and Figure 7E). Projections of other cell types revealed distinct patterns of spatial organization of lymphocytes, macrophages, Kupffer cells, hepatocytes, MC38, and LSECs, with distinct density patterns dispersed throughout the array (Fig. 7E). In particular, the distribution of lymphocytes overlapped with hepatocytes and tumors, whereas macrophages appeared to be sequestered in tumor areas. LSEC pores also overlapped tumor and hepatocyte regions, whereas Kupffer cells were expected to overlap only hepatocyte-defined pores. Consistent with the enrichment of cell type-specific markers in the UMAP projection, the expression of Plec spatially colocalizes with tumor cells, the expression of Stab2 spatially colocalizes with lymphocytes, the expression of Dpyd spatially colocalizes with hepatocytes, and the expression of Cd5l with Kupffer Spatial colocalization of cells, expression of Cd74 spatially colocalizes with macrophages, and expression of Sk api1 spatially colocalizes with LSECs (Fig. 8). However, the density-spatial map revealed a spatial overlap of multiple different cell types, indicating potential hotspots of cellular interactions. To quantify the composition of cells occupying each spatial well, single-cell data were used to generate a well-specific pie chart depicting the ratio of cell subpopulations present in each well (Fig. 7F). Pie chart-based analysis revealed immune cell-enriched colocalization at the liver/tumor interface—information not obtained in tissue-dissociated scRNA-seq platforms. Quantification of a column on a spatial array is represented as a bar graph. Similar to the visual analysis of the spatial density maps, macrophages were isolated in the tumor region, whereas lymphocytes were co-detected in the hepatocytes and tumor region, suggesting that different spatial organization occurs within the intact tissue. These experiments demonstrate that XYZeq can dissect single-cell transcriptomes in tissues and can generate metrics comparable to other in situ-based high-throughput scRNAseq platforms, while mapping cell types to specific regions in the tissue microenvironment.

空间分辨测序允许在组织结构背景下进行表达分析,这是当前单细胞测序方法无法实现的。这些方法缺乏空间信息,阻止对细胞状态的变化如何影响组织微环境中的相邻细胞进行分析。XYZeq首先是一种新的scRNA-seq工作流程,它保留了空间信息,从而使我们能够概括组织切片的总体组织布局以了解细胞比例和异质性,同时也使我们能够辨别驻留于组织微环境中的每个单细胞的位置和基因表达。借助XYZeq,我们能够开始破译作为正常和异常组织功能基础的细胞间动力学。虽然基于FISH成像的方法也提供真正的单细胞空间分辨率,但它们在通量和定制探针的创建方面受到限制。作为一种基于测序的方法,XYZeq利用NGS领域的巨大技术发展,受益于通量的增加和每个数据点的成本的降低。虽然现在预测空间分辨转录组学是否会整合到常规临床病理学中还为时过早,但它至少可以开始在组织和生物体的背景下绘制大规模的转录组学数据。Spatially resolved sequencing allows expression analysis in the context of tissue architecture, which is not possible with current single-cell sequencing methods. These methods lack spatial information, preventing analysis of how changes in cell state affect neighboring cells in the tissue microenvironment. XYZeq is first and foremost a new scRNA-seq workflow that preserves spatial information, thereby allowing us to generalize the overall tissue layout of tissue sections for cell proportion and heterogeneity, while also allowing us to discern cells residing in tissue microstructures. The location and gene expression of each single cell in the environment. With XYZeq, we were able to begin to decipher the intercellular dynamics that underlie normal and abnormal tissue function. While methods based on FISH imaging also provide true single-cell spatial resolution, they are limited in throughput and creation of custom probes. As a sequencing-based approach, XYZeq takes advantage of the enormous technological developments in the NGS field, benefiting from increased throughput and reduced cost per data point. While it is too early to predict whether spatially resolved transcriptomics will be integrated into routine clinical pathology, it can at least begin to map large-scale transcriptomic data in the context of tissues and organisms.

实施例5:使用XYZeq进行细胞特异性空间转录组学剖析Example 5: Cell-specific spatial transcriptomic profiling using XYZeq

XYZeq可用于研究细胞特异性空间转录组学剖析。为此,在将RT缓冲液点样到微孔阵列的步骤中,可以将感兴趣的抗体添加到第一个RT混合物中。这将允许对感兴趣的细胞的抗体标记进行分类。表1中提供了可以使用的抗体的非限制性实例。XYZeq can be used to study cell-specific spatial transcriptomic profiling. To do this, the antibody of interest can be added to the first RT mix during the step of spotting RT buffer onto the microwell array. This will allow sorting for antibody labeling of cells of interest. Non-limiting examples of antibodies that can be used are provided in Table 1.

表1.用于使用XYZeq进行细胞特异性空间转录组学剖析的抗体实例。Table 1. Examples of antibodies used for cell-specific spatial transcriptomic profiling using XYZeq.

Figure BDA0003908335950000741

Figure BDA0003908335950000741

Figure BDA0003908335950000751

Figure BDA0003908335950000751

Figure BDA0003908335950000761

Figure BDA0003908335950000761

实施例6:使用XYZeq进行空间TCR-seqExample 6: Spatial TCR-seq using XYZeq

库制备的第一部分与上述相同,直到产生cDNA。然后通过与V区段末端结合的TCRα和TCRβ可变区引物的混合物对TCRα和TCRβ基因进行PCR扩增,以进行半嵌套PCR。表2中提供了使用XYZeq的空间TCR-seq的非限制性示例性多个引物序列列表。The first part of library preparation is the same as above until cDNA generation. The TCRα and TCRβ genes were then PCR amplified by a mixture of TCRα and TCRβ variable region primers bound to the ends of the V segments for semi-nested PCR. A non-limiting exemplary multiple primer sequence list for spatial TCR-seq using XYZeq is provided in Table 2.

表2.使用XYZeq的空间TCR-seq的多个引物序列实例。Table 2. Multiple examples of primer sequences for spatial TCR-seq using XYZeq.

Figure BDA0003908335950000771

Figure BDA0003908335950000771

Figure BDA0003908335950000781

Figure BDA0003908335950000781

Figure BDA0003908335950000791

Figure BDA0003908335950000791

第一次PCR在具有Hotstart PCR混合物的管中进行50个循环以富集TCR。然后使用Illumina P5引物进行第二次PCR,并使用P7引物添加库索引。简单地说,将1ng cDNA与Qigen 1×HotStar Taq缓冲液、10nM混合的TCRα和TCRβV区段引物、1μl每种dNTP和1μlHotStar Taq和H2O一起添加,使最终体积为100μl。PCR循环如下:94℃10分钟,然后50个循环的94℃40秒、62℃45秒,30个循环的94℃40秒、62℃45秒、72℃1分钟,以及最后在72℃下孵育1分钟。PCR产物用Ampure珠粒净化并洗脱至25μl。第二PCR使用5x Kapa Mg2+缓冲液、1μl DNTP、1μl KAPA HIFI酶、0.2μl IFC-F引物、0.2μl N7XX引物、H2O进行,使最终体积为50μl,循环如下:The first PCR was performed for 50 cycles in a tube with Hotstart PCR mix to enrich for TCR. A second PCR was then performed using Illumina P5 primers and library indexing was added using P7 primers. Briefly, 1 ng of cDNA was added with Qigen 1× HotStar Taq buffer, 10 nM mixed TCRα and TCRβ V segment primers, 1 μl of each dNTP and 1 μl of HotStar Taq and H2O to a final volume of 100 μl. PCR cycles were as follows: 94°C for 10 min, followed by 50 cycles of 94°C for 40 s, 62°C for 45 s, 30 cycles of 94°C for 40 s, 62°C for 45 s, 72°C for 1 min, and finally incubation at 72°C 1 minute. PCR products were cleaned up with Ampure beads and eluted to 25 μl. The second PCR was performed using 5x Kapa Mg2+ buffer, 1 μl DNTP, 1 μl KAPA HIFI enzyme, 0.2 μl IFC-F primer, 0.2 μl N7XX primer, H2O to a final volume of 50 μl, cycled as follows:

Kapa AMPKapa AMP 步骤1step 1 72℃72°C 3分钟3 minutes 步骤2step 2 95℃95°C 10秒10 seconds 步骤3step 3 95℃95°C 30秒30 seconds 步骤4step 4 66℃66°C 30秒30 seconds 步骤5step 5 72℃72°C 1分钟1 minute 步骤6step 6 转到步骤3go to step 3 14次14 times 步骤7step 7 72℃72°C 5分钟5 minutes 步骤8Step 8 4℃4°C 永远forever

PCR产物再次使用Ampure珠粒净化并洗脱至15μl,用于Qubit量化和生化分析仪的尺寸分析,然后在Illumina Miseq上测序(2×300bp读数)。最终结果是空间单细胞TCR-seq库,它可以(理论上)将TCR克隆映射回组织中的区域。PCR products were again purified using Ampure beads and eluted to 15 μl for Qubit quantification and size analysis on a biochemical analyzer before sequencing on an Illumina Miseq (2 × 300 bp reads). The end result is a spatial single-cell TCR-seq library that can (theoretically) map TCR clones back to regions in tissues.

实施例7:使用XYZeq进行空间ATAC-seqExample 7: Spatial ATAC-seq using XYZeq

基本方案与XYZeq RNAseq方案相同,反应混合物在孔中进行空间条形码化,然后将整个芯片冷冻至-80℃,以便将组织放在顶部,孵育反应后,取出细胞然后通过PCR分选到96孔板中进行第二次条形码化。库进行索引和测序。示例性程序如下:The basic protocol is the same as the XYZeq RNAseq protocol, the reaction mixture is spatially barcoded in the wells, then the whole chip is frozen to -80°C to allow the tissue to be placed on top, after the reaction is incubated, the cells are removed and sorted by PCR into 96-well plates Do the second barcoding in . Libraries are indexed and sequenced. An exemplary program is as follows:

1.反应混合物由5x DMF-TAPS缓冲液、30个定制且独特索引的单面Tn5转座体(10个与条形码化的P5接头连接,20个与条形码化的P7接头连接)、洋地黄皂苷(组织消化试剂)和H20组成。通过TN5-P5沿行点样并且Tn5-P7沿列点样,可以获得200个具有独特条形码化Tn5组合的孔。1. The reaction mixture consists of 5x DMF-TAPS buffer, 30 custom-made and uniquely indexed single-faced Tn5 transposomes (10 ligated to the barcoded P5 adapter, 20 ligated to the barcoded P7 adapter), digitonin (tissue digestion reagent) and H20 composition. By spotting TN5-P5 along rows and Tn5-P7 along columns, 200 wells with unique barcoded Tn5 combinations can be obtained.

2.将微孔阵列密封并在55℃下孵育30分钟,然后在37℃下孵育15分钟。2. Seal the microwell array and incubate at 55°C for 30 minutes, then at 37°C for 15 minutes.

3.标签化后,将微孔阵列放于50ml锥形管中,添加40mM EDTA(补充有1mM亚精胺、20% FCS和PBS)以停止反应并涡旋。将锥形管中的细胞离心,重悬于1ml中,过滤,并用DAPI染色。将25个DAPI+细胞分选到含有12.5μl溶解缓冲液(11μl EB缓冲液、0.5μl 100X BSA和1μl DTT)的96孔板的每个孔中。3. After labeling, the microwell array was placed in a 50 ml conical tube, 40 mM EDTA (supplemented with 1 mM spermidine, 20% FCS and PBS) was added to stop the reaction and vortexed. The cells in the conical tube were centrifuged, resuspended in 1 ml, filtered, and stained with DAPI. 25 DAPI+ cells were sorted into each well of a 96-well plate containing 12.5 μl lysis buffer (11 μl EB buffer, 0.5 μl 100X BSA and 1 μl DTT).

4.分选后,将PCR引物索引到每个孔(0.5μM最终浓度),将聚合酶预混液添加到每个孔中。然后对标签化的DNA进行PCR扩增。4. After sorting, PCR primers were indexed into each well (0.5 μM final concentration), and polymerase master mix was added to each well. The tagged DNA is then PCR amplified.

5.PCR扩增后,使用1X Ampure珠粒(Agencourt)净化DNA,并在15μl EB缓冲液中洗脱,然后进行量化。5. After PCR amplification, DNA was purified using 1X Ampure beads (Agencourt) and eluted in 15 μl EB buffer before quantification.

6.使用生物分析仪确定库的浓度和质量。6. Determine the concentration and quality of the library using a bioanalyzer.

实施例8:XYZeq揭露肿瘤微环境中的表达异质性Example 8: XYZeq reveals expression heterogeneity in the tumor microenvironment

组织的单细胞RNA测序(scRNA-seq)揭露了细胞类型和状态的显著异质性,但没有直接提供有关复杂组织结构中细胞的空间组织的信息。为了更好地了解单个细胞在解剖空间中的功能,我们开发了XYZeq,这是一种将空间元数据编码到scRNA-seq库中的新颖工作流程。我们使用XYZeq剖析异位小鼠肝和脾肿瘤模型,以从八个组织切片的数万个细胞捕获转录组。对这些数据的分析揭露了不同细胞类型的空间分布和肿瘤相关间充质干细胞(MSC)中与细胞迁移相关的转录组学程序。此外,鉴定了MSC对肿瘤抑制基因的局部表达,这些基因在与肿瘤核心的接近度方面有所不同。证明XYZeq可用于同时当场绘制单个细胞的转录组和空间定位,以揭露复杂病理组织中的位置如何影响细胞组成和细胞状态。Single-cell RNA sequencing (scRNA-seq) of tissues has revealed remarkable heterogeneity in cell types and states, but has not directly provided information on the spatial organization of cells in complex tissue structures. To better understand the function of individual cells in anatomical space, we developed XYZeq, a novel workflow for encoding spatial metadata into scRNA-seq libraries. We dissected ectopic mouse liver and spleen tumor models using XYZeq to capture transcriptomes from tens of thousands of cells across eight tissue sections. Analysis of these data revealed the spatial distribution of different cell types and transcriptomic programs associated with cell migration in tumor-associated mesenchymal stem cells (MSCs). Furthermore, localized expression of tumor suppressor genes by MSCs was identified that differed in their proximity to the tumor core. Demonstrate that XYZeq can be used to simultaneously map the transcriptome and spatial localization of individual cells in situ to uncover how location in complex pathological tissues affects cellular composition and cellular state.

1.材料和方法1. Materials and Methods

i.小鼠、肿瘤细胞系和肿瘤接种i. Mice, Tumor Cell Lines, and Tumor Inoculation

6-12周龄C57BL/6雌性小鼠购自Jackson Laboratories,饲养在无特定病原体的条件下。将MC38结肠腺癌细胞系在完全细胞培养基(RPMI 1640,具有GlutaMAX、青霉素(penicillin)、链霉素(streptomy cin)、丙酮酸钠、HEPES、NEAA和10%胎牛血清(FBS))中培养。常规测试细胞系的支原体污染。对于实验,在手术前30分钟给予小鼠丁丙诺啡(Buprenorphine)(300ul)与美洛昔康(Meloxiacam)(300ul)的麻醉混合物。在手术时,施用1滴布比卡因(Bupivacaine)并用异氟醚麻醉小鼠,然后使用301/2号针肝内(或脾内)注射MC38结肠腺癌细胞(50μl,10×106个细胞/毫升)。将切口缝合闭合并对小鼠进行术后护理。所有实验均按照加州大学旧金山分校IACUC委员会(Univer sity of California,SanFrancisco IACUC committee)批准的动物方案进行。C57BL/6 female mice aged 6-12 weeks were purchased from Jackson Laboratories and raised under specific pathogen-free conditions. The MC38 colon adenocarcinoma cell line was grown in complete cell culture medium (RPMI 1640 with GlutaMAX, penicillin, streptomy cin, sodium pyruvate, HEPES, NEAA and 10% fetal bovine serum (FBS)) nourish. Routinely test cell lines for mycoplasma contamination. For the experiments, mice were given an anesthetic mixture of Buprenorphine (300ul) and Meloxiacam (300ul) 30 minutes before surgery. At the time of surgery, administer 1 drop of Bupivacaine and anesthetize the mouse with isoflurane, then inject MC38 colon adenocarcinoma cells (50 μl, 10× 10 cells) intrahepatically (or intrasplenicly) using a 30 1/2 gauge needle cells/ml). The incision was closed with sutures and the mice were cared for postoperatively. All experiments were performed in accordance with animal protocols approved by the IACUC committee of the University of California, San Francisco (University of California, San Francisco IACUC committee).

ii.癌症模型系统ii. Cancer model system

最近发表的报告Lee等人2020(21)中详细描述了我们用于本文的肝内和脾内癌症模型。简单地说,肝内和脾内肿瘤是通过被膜下将肿瘤细胞直接注射到器官中而产生的。为了确定处死小鼠的理想时间点,对接种了肿瘤的小鼠进行了体内成像。器官内注射的MC38细胞被修饰成表达萤火虫荧光素酶。小鼠经腹膜内感染D-荧光素(150mg/kg;GoldBiotechnology)7分钟,然后使用Xenogen IVIS成像系统成像。处死具有至少5mm荧光的可检测肿瘤结节的小鼠用于收获组织。将用于XYZeq的器官用二硫代双(琥珀酰亚胺基丙酸酯)(DSP)(Thermo Scientific)固定并冷冻保存,而用于10X Genomics Chromium单细胞测序的器官在补充有胶原酶D(125U/ml;Roche)和脱氧核糖核酸酶I(20mg/ml;Roche)的RPMI完全培养基中消化,然后根据制造商的方案(Miltenyi)使用gentleMACS组织解离器处理,用于形成单细胞悬浮液。The intrahepatic and intrasplenic cancer models we used here are described in detail in the recently published report Lee et al. 2020(21). Briefly, intrahepatic and intrasplenic tumors are generated by subcapsular injection of tumor cells directly into the organ. To determine the ideal time point for sacrificing mice, in vivo imaging of tumor-inoculated mice was performed. Intra-organ injected MC38 cells were modified to express firefly luciferase. Mice were infected intraperitoneally with D-luciferin (150 mg/kg; Gold Biotechnology) for 7 minutes and then imaged using the Xenogen IVIS imaging system. Mice with detectable tumor nodules of at least 5 mm fluorescence were sacrificed for tissue harvest. Organs used for XYZeq were fixed with dithiobis(succinimidyl propionate) (DSP) (Thermo Scientific) and cryopreserved, while those used for 10X Genomics Chromium single-cell sequencing were treated with collagenase D supplemented with (125 U/ml; Roche) and deoxyribonuclease I (20 mg/ml; Roche) in RPMI complete medium, then processed using a gentleMACS tissue dissociator according to the manufacturer's protocol (Miltenyi) for single cell formation suspension.

iii.10X Genomic Chromium平台iii.10X Genomic Chromium Platform

将从组织中分离的细胞洗涤并以1000个细胞/微升重悬于具有0.04% BSA的PBS中,并根据制造商的说明加载到10X Genomics Chromium平台上,并在NovaSeq或HiSeq4000(Illumina)上进行测序。Cells isolated from tissue were washed and resuspended at 1000 cells/μl in PBS with 0.04% BSA and loaded onto the 10X Genomics Chromium platform according to the manufacturer's instructions and processed on NovaSeq or HiSeq4000 (Illumina) Perform sequencing.

iv.组织收获和冷冻保存iv. Tissue Harvest and Cryopreservation

在肿瘤接种后第10天,处死小鼠并收获注射肿瘤的肝(或脾脏),并在冰冷的无DMSO冷冻培养基(Bulldog Bio)中孵育30分钟。随后在补充有10% FCS的冰冷DSP(ThermoScientific)中孵育30分钟,然后在冰冷的20mM Tris-HCl pH 7.5中进行中和。将器官放于冷冻模具中,密封,并在-80℃中缓慢冷冻过夜。On day 10 after tumor inoculation, mice were sacrificed and tumor-injected livers (or spleens) were harvested and incubated in ice-cold DMSO-free freezing medium (Bulldog Bio) for 30 minutes. Subsequent incubation in ice-cold DSP (ThermoScientific) supplemented with 10% FCS for 30 minutes was followed by neutralization in ice-cold 20 mM Tris-HCl pH 7.5. Organs were placed in freezing molds, sealed, and slowly frozen overnight at -80°C.

v.细胞和试剂分配到阵列中v. Distribution of cells and reagents into arrays

sciFLEXARRAYER S3(Scienion AG)用于将细胞和试剂分配到微孔阵列中。对每个实验的液滴稳定性和阵列质量进行评估。在分配到微孔阵列载玻片之前,使用Autodrop检测来评估液滴稳定性并量化每种试剂的速度、偏差和液滴体积。体积输入用于确定达到指定的总孔体积所需的滴数。每孔oligo(dT)引物5’CTACACGACGCTCTTCCGATCTNNNNNNNNNN[16bp独特空间条形码]TTTTTTTTTTTTTTTTTT-3’,其中“N”是任何碱基;SEQ ID NO:43;IDT)进行点样。在条形码化期间,露点控制软件监测环境温度和湿度,允许动态控制源板的温度,以在整个运行期间保持标称寡核苷酸浓度。条形码化的载玻片在存储前在孔中干燥。将反应混合物(Thermo Fisher Scientific)添加到孔中,并自动在每个探针之间使用10%的漂白剂清洗以消除残留污染。在实验当天将解离/透化缓冲液打印到每个孔中,并将组织切片加载到微孔阵列载玻片上。对于所有组织实验,将DSP固定的HEK293T细胞以5μl添加(10×106个细胞/毫升)到RT消化混合物中,然后分配到微阵列中的所有孔中。HEK293T细胞的平均数量为58个细胞/孔,然而,由于细胞悬浮在分配喷嘴内,所以每个孔的绝对细胞数量可能在整个阵列中有所不同。在ARIA(BD biosciences)上分析孵育后从阵列收获的细胞,并使用FlowJo软件(Tree Star Inc.)分析数据集。sciFLEXARRAYER S3 (Scienion AG) was used to dispense cells and reagents into microwell arrays. Droplet stability and array quality were assessed for each experiment. Autodrop assays were used to assess droplet stability and quantify velocity, deflection, and droplet volume for each reagent prior to dispensing onto microwell array slides. The volume input is used to determine the number of drops required to achieve the specified total pore volume. The oligo(dT) primer 5' CTACACGACGCTCTTCCGATCTNNNNNNNNNN [16 bp unique spatial barcode] TTTTTTTTTTTTTTTTTT-3', where "N" is any base; SEQ ID NO:43; IDT) was spotted per well. During barcoding, the dew point control software monitors ambient temperature and humidity, allowing dynamic control of the temperature of the source plate to maintain nominal oligonucleotide concentrations throughout the run. Barcoded slides were dried in the wells before storage. Reaction mix (Thermo Fisher Scientific) was added to the wells and automatically washed with 10% bleach between each probe to eliminate carryover contamination. Print dissociation/permeabilization buffer into each well and load tissue sections onto microwell array slides on the day of the experiment. For all tissue experiments, DSP-fixed HEK293T cells were added in 5 μl ( 10 x 106 cells/ml) to the RT digestion mixture and then distributed to all wells in the microarray. The average number of HEK293T cells is 58 cells/well, however, the absolute number of cells per well may vary across the array due to the cells being suspended within the dispensing nozzle. Cells harvested from the array after incubation were analyzed on ARIA (BD biosciences) and the data sets were analyzed using FlowJo software (Tree Star Inc.).

vi.阵列制造vi. Array Fabrication

光刻胶母板是通过将一层光刻胶SU-8 2150(Fisher Scientific)以1500rpm旋涂到3英寸硅片(University Wafer)上,然后在95℃下软烘烤2小时来创建的。然后,具有光刻胶层的硅片在以12,000DPI打印的光刻掩模(CAD/Art Sciences,USA)上暴露于紫外光(UV)30分钟。紫外线照射后,将晶片在95℃下硬烘烤20分钟,然后在新鲜的丙二醇单甲醚乙酸酯溶液(Sigma Aldrich)中显影2小时以显影,然后用新鲜的丙二醇单甲醚乙酸酯手动冲洗,然后在95℃下烘烤2分钟以去除残留溶剂。将10:1预聚物:固化剂的比率的聚甲基硅氧烷(PDMS)混合物(Sylgard 184,Dow Corning Midland)倒在SU-8硅片母板上。将其放入100mm皮氏培养皿中,并在70℃烤箱中固化过夜。第二天,从SU-8硅母板上剥离该PDMS阴模。将PDMS块放置在平面上,并将Norland光学粘合剂81(NOA81)(Thorlabs)倒入模具中以覆盖整个表面。将载玻片放在NOA倾倒的PDMS模具顶部,并在顶部放置透明重物。使NOA在紫外光下固化2分钟,在紫外光固化时间中途翻转一次。最后,将PDMS模具从固化的NOA微孔阵列载玻片(称为微孔阵列芯片)上分离。每个六边形孔的尺寸约为400μm高和500μm直径,体积为0.04mm3,可容纳40nl液体。The photoresist master was created by spin-coating a layer of photoresist SU-8 2150 (Fisher Scientific) onto a 3-inch silicon wafer (University Wafer) at 1500 rpm, followed by a soft bake at 95 °C for 2 h. Then, the silicon wafer with the photoresist layer was exposed to ultraviolet light (UV) for 30 minutes on a photolithographic mask (CAD/Art Sciences, USA) printed at 12,000 DPI. After UV exposure, wafers were hard baked at 95 °C for 20 min, then developed in fresh propylene glycol monomethyl ether acetate solution (Sigma Aldrich) for 2 h, and then treated with fresh propylene glycol monomethyl ether acetate solution (Sigma Aldrich). Rinse manually, then bake at 95 °C for 2 min to remove residual solvent. A 10:1 prepolymer:curing agent ratio polymethylsiloxane (PDMS) mixture (Sylgard 184, Dow Corning Midland) was poured onto the SU-8 silicon wafer master. It was placed in a 100mm Petri dish and cured overnight in a 70°C oven. The next day, the negative PDMS mold was peeled off from the SU-8 silicon master. The PDMS block was placed on a flat surface and Norland Optical Adhesive 81 (NOA81) (Thorlabs) was poured into the mold to cover the entire surface. Place the slide on top of the NOA poured PDMS mold with a clear weight on top. Allow the NOA to cure under UV light for 2 minutes, inverting once halfway through the UV curing time. Finally, the PDMS mold was detached from the cured NOA microwell array slide (called microwell array chip). The dimensions of each hexagonal well are approximately 400 μm high and 500 μm in diameter, with a volume of 0.04 mm 3 and a capacity of 40 nl of liquid.

vii.XYZeq方法vii. XYZeq method

将肝/肿瘤器官安装在Cyrostat(Leica)上并以25μm切片,用作XYZeq实验样品,或安装组织学载玻片上,以10μm切片,用于免疫组织化学染色。在实验当天,用外加了固定的HEK293T细胞的逆转录混合物对XYZeq微孔阵列芯片点样。将微孔阵列芯片降至-80℃,并将组织切片放于阵列顶部。拍摄一张数字图像以记录组织的方向,然后将硅胶垫片夹在XYZeq微孔阵列芯片与空白组织学载玻片之间。将芯片放于微阵列杂交室(Agilent)中,以确保在进行组织消化和逆转录的同时气密密封。为了从固定的冷冻组织中回收高质量的RNA,容纳芯片的微阵列杂交室必须逐步升温至42℃,然后再孵育20分钟以进行逆转录。将芯片从腔室中取出并放入具有50ml 1x SSC缓冲液和25% FCS的50ml锥形管中。将管涡旋并以1000rcf离心10分钟。去除多余的体积,过滤细胞并进行DAPI(Life Technologies)染色,然后分选(BD Aria)到预加载有5μl第二RT混合物的96孔板中。将板在42℃下逆转录1.5小时,然后使用2x Kapa Hotstart Readymix(Kapa Biosystems)进行PCR。使用索引引物进行PCR扩增(5’-AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’;SEQ ID NO:44;IDT)。将PCR板的内容物汇集到2ml Eppendorf管中,并用AMpure XPSPRI珠粒(Beckman)纯化cDNA。使用Illumina Nextera库p7索引将cDNA标签化和扩增。最终库由生物分析仪(Agilent)分析并由Qubit(Invitrogen)量化并在NovaSeq或HiSeq 4000(Illumina)上测序(读取1:26个循环,读取2:98个循环,索引1:8个周期,索引2:8个周期)。Liver/tumor organoids were mounted on Cyrostat (Leica) and sectioned at 25 μm for use as XYZeq experimental samples, or mounted on histology slides and sectioned at 10 μm for immunohistochemical staining. On the day of the experiment, the XYZeq microwell array chip was spotted with the reverse transcription mix spiked with fixed HEK293T cells. Cool the microwell array chip to -80°C and place the tissue section on top of the array. A digital image was taken to document the orientation of the tissue, and the silicone spacer was sandwiched between the XYZeq Microwell Array Chip and a blank histology slide. Chips were placed in a microarray hybridization chamber (Agilent) to ensure an air-tight seal while tissue digestion and reverse transcription took place. To recover high-quality RNA from fixed frozen tissue, the microarray hybridization chamber housing the chip must be ramped up to 42°C and then incubated for an additional 20 minutes for reverse transcription. Remove the chip from the chamber and place into a 50 ml conical tube with 50 ml 1x SSC buffer and 25% FCS. The tubes were vortexed and centrifuged at 1000 rcf for 10 minutes. Excess volume was removed and cells were filtered and stained with DAPI (Life Technologies) before sorting (BD Aria) into 96-well plates preloaded with 5 μl of the secondary RT mix. Plates were reverse transcribed for 1.5 hours at 42°C, followed by PCR using 2x Kapa Hotstart Readymix (Kapa Biosystems). PCR amplification was performed using indexing primers (5'-AATGATACGGCGACCACCGAGATCTACAC[i5]ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3'; SEQ ID NO: 44; IDT). The contents of the PCR plates were pooled into 2ml Eppendorf tubes and cDNA was purified using AMpure XPSPRI beads (Beckman). cDNA was tagged and amplified using the Illumina Nextera library p7 index. Final libraries were analyzed by Bioanalyzer (Agilent) and quantified by Qubit (Invitrogen) and sequenced on NovaSeq or HiSeq 4000 (Illumina) (read 1: 26 cycles, read 2: 98 cycles, index 1: 8 cycles, index 2: 8 cycles).

viii.XYZeq去污分析viii.XYZeq Decontamination Analysis

在我们的分析中,我们认识到一些与小鼠基因比对的读数存在于与人基因组高度比对的细胞中。怀疑这些读数是环境RNA污染,并试图去除它们。首先去除了在人细胞群中具有极高表达的小鼠比对转录物(n=59,log(计数+1)>6)。人细胞群被视为污染检测的对照,因为来自溶解细胞的任何环境RNA预计都会污染小鼠和人细胞。然后进行DecontX(2)以使用人鼠混合数据集估计不同细胞群的污染率,从而从原始数据中推导出去污染计数矩阵。简单地说,该算法应用变分推断将每个细胞的观察计数建模为其相应细胞群的真实基因表达与污染特征(来自其它细胞群)的混合物,然后减去污染特征(图17C)。通过考虑人-小鼠混合物种实验,可以去除那些可能导致碰撞的计数,并有效地解释溶解细胞中所有可能导致环境RNA的转录物。在图17C中,绘制了每种小鼠细胞类型的初始估计污染率,中值估计值范围为0.06%-0.31%,在具有2.18%初始污染分数的肝细胞细胞簇中观察到最高值。所有下游分析均根据污染去除后的去污数据进行。In our analysis, we recognized that some reads that aligned to the mouse gene were present in cells that aligned highly to the human genome. These reads were suspected to be environmental RNA contamination and attempts were made to remove them. Mouse-aligned transcripts with extremely high expression in the human cell population were first removed (n=59, log(count+1)>6). Human cell populations are considered controls for contamination assays, as any environmental RNA from lysed cells is expected to contaminate mouse and human cells. DecontX(2) was then performed to estimate the contamination rates of different cell populations using a human-mouse mixed dataset to derive a decontaminated count matrix from the raw data. Briefly, the algorithm applies variational inference to model the observed counts of each cell as a mixture of its corresponding cell population's true gene expression and contaminating features (from other cell populations), which are then subtracted (Fig. 17C). By considering human-mouse mixed-species experiments, those counts that could lead to collisions can be removed and effectively account for all transcripts in lysed cells that could lead to environmental RNA. In Figure 17C, the initial estimated contamination fraction for each mouse cell type is plotted, with median estimates ranging from 0.06% to 0.31%, with the highest values observed in hepatocyte cell clusters with an initial contamination fraction of 2.18%. All downstream analyzes were performed on decontaminated data after contamination removal.

ix.如何区分碰撞率和污染率ix. How to distinguish between collision rate and contamination rate

碰撞率是基于小鼠比对与人比对的转录物之间的比率,直接从人-小鼠混合数据集的基因表达中计算出来,而每个细胞的碰撞率被估计为通过DecontX的变分推理,在贝叶斯分层模型(Bayesian hierarchical model)中估计为细胞特异性参数。为了说明污染率,每个细胞都有β分布参数来模拟其转录计数的比例,这些转录计数来自其天然表达分布。每个细胞的估计污染率是来自贝叶斯模型中污染的转录计数的比例。假定伯努利隐藏状态(Bernoulli hidden state),细胞中的每个转录物都遵循由其细胞群的天然表达分布或所有其它细胞群的污染参数化的多项分布,这表明了转录物来自其天然表达分布还是来自污染分布。The collision rate is based on the ratio between the transcripts of the mouse alignment and the human alignment, calculated directly from the gene expression of the mixed human-mouse dataset, and the collision rate per cell is estimated as a variable by DecontX. sub-inference, estimated as cell-specific parameters in a Bayesian hierarchical model. To account for contamination rates, each cell had beta distribution parameters to model the proportion of its transcript counts derived from its native expression distribution. The estimated contamination rate per cell is the fraction of transcript counts that are contaminated in the Bayesian model. Assuming a Bernoulli hidden state, each transcript in a cell follows a multinomial distribution parameterized by the natural expression distribution of its cell population or the contamination of all other cell populations, which indicates that the transcript is derived from its The natural expression distribution is also derived from the contamination distribution.

x.细胞种类混合实验x. Cell type mixing experiment

HEK293T和NIH/3T3细胞跨越阵列的列以梯度模式沉积到孔中,总共有11个不同的细胞比例比。明确地说,阵列上的列用以下来点样:100/0、90/10、80/20、70/30、60/40、50/50、40/60、30/70、20/80、10/90、0/100、10/90、20/80、30/70、40/60、50/50、60/40、70/30、80/20、90/10、100/0的人细胞与小鼠细胞的比率;只有位于末端列的两侧的人细胞;以及只有位于中心列的小鼠细胞。计算每个细胞的与人或小鼠参考基因组比对的消除UMI重复的读数的比率,与单个物种比对小于66%的那些被视为条形码碰撞细胞。HEK293T and NIH/3T3 cells were deposited into the wells in a gradient pattern across the columns of the array, with a total of 11 different cell ratios. Specifically, the columns on the array are spotted with: 100/0 , 90/10, 80/20, 70/30, 60/40, 50/50, 40/60, 30/70, 20/80, 10/90, 0/100 , 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80/20, 90/10, 100/0 human cells Ratio to mouse cells; human cells only on the sides of the end columns; and only mouse cells on the center column. The ratio of UMI-duplicate-eliminated reads that aligned to the human or mouse reference genome was calculated for each cell, and those that aligned less than 66% to a single species were considered barcode-colliding cells.

xi.XYZeq单细胞分析xi.XYZeq Single Cell Analysis

进行单细胞RNA序列数据处理,其中测序读数如前所述(17)进行处理。简单地说,原始碱基调用被转换为FASTQ文件,并使用bcl2fastq v2.20在第二个组合索引上进行多路分解。使用trim galorev0.6.5修剪读数,与混合人(GRCh38)小鼠(mm10)参考基因组比对,并消除UMI重复数据。然后通过对第一个组合索引进行多路分解,将读数分配给单细胞,接着通过细胞计数矩阵构建基因。使用Scanpy工具包处理计数矩阵。丢弃少于500个UMI和大于10000个UMI的细胞,以及表达少于100个独特基因或多于15000个独特基因的细胞。线粒体读数百分比超过1%的细胞也被丢弃。基因计数被归一化为10,000个/细胞,进行对数转换,并使用过滤基因分散函数进一步过滤高平均表达和高分散,最小平均值为0.35,最大平均值为7,最小分散为1。然后使用回归函数校正基因计数,其中每个细胞的总计数和每个细胞的线粒体UMI百分比作为协变量。随后的降维是通过将基因计数缩放到0的平均值和单位方差来进行的,然后进行主成分分析、邻域图的计算和t分布的随机邻域嵌入(tSNE)。莱顿聚类以0.8的分辨率进行,细胞被分组以揭露不同的鼠细胞类型和人HEK293T细胞。Single-cell RNA-seq data processing was performed, where sequencing reads were processed as previously described (17). Briefly, raw base calls were converted to FASTQ files and demultiplexed on a second composite index using bcl2fastq v2.20. Reads were trimmed using trim galorev0.6.5, aligned to the mixed human (GRCh38) mouse (mm10) reference genome, and UMI duplicate data removed. Reads were then assigned to single cells by demultiplexing the first combined index, followed by gene construction by cell count matrix. Count matrices were processed using the Scanpy toolkit. Cells with fewer than 500 UMIs and more than 10,000 UMIs, and cells expressing fewer than 100 unique genes or more than 15,000 unique genes were discarded. Cells with a percentage of mitochondrial reads exceeding 1% were also discarded. Gene counts were normalized to 10,000/cell, log-transformed, and further filtered for high mean expression and high scatter using the filter gene scatter function with a minimum mean of 0.35, a maximum mean of 7, and a minimum scatter of 1. Gene counts were then corrected using a regression function with total count per cell and percent mitochondrial UMI per cell as covariates. Subsequent dimensionality reduction was performed by scaling gene counts to a mean of 0 and unit variance, followed by principal component analysis, computation of neighborhood maps, and stochastic neighborhood embedding (tSNE) of the t-distribution. Leiden clustering was performed at a resolution of 0.8 and cells were grouped to reveal distinct murine cell types and human HEK293T cells.

xii.10X数据处理xii.10X Data processing

计数矩阵是使用Cellranger 3.1.0版中的“计数”工具生成的,使用组合的人与小鼠参考数据集(3.1.0版)并将“化学”标志设置为“fiveprime”。使用Scanpy工具包处理计数矩阵。丢弃少于500个UMI和大于75,000个UMI的细胞,以及表达少于100个独特基因或多于10,000个独特基因的细胞。线粒体读数百分比超过7.5%的细胞也被丢弃。基因计数被归一化为10,000个/细胞,进行对数转换,并使用过滤基因分散函数进一步过滤高平均表达和高分散,最小平均值为0.2,最大平均值为7,最小分散为1。然后使用回归函数校正基因计数,其中每个细胞的总计数和每个细胞的线粒体UMI百分比作为协变量。随后的降维是通过将基因计数缩放到0的平均值和单位方差来进行的,然后进行主成分分析、邻域图的计算和tSNE。莱顿聚类以1的分辨率进行,细胞被分组以揭露主要鼠细胞类型和人HEK293T细胞。Count matrices were generated using the 'count' tool in Cellranger version 3.1.0, using the combined human and mouse reference dataset (version 3.1.0) and setting the 'chem' flag to 'fiveprime'. Count matrices were processed using the Scanpy toolkit. Cells with fewer than 500 UMIs and greater than 75,000 UMIs, and cells expressing fewer than 100 unique genes or more than 10,000 unique genes were discarded. Cells with a percentage of mitochondrial reads above 7.5% were also discarded. Gene counts were normalized to 10,000/cell, log-transformed, and further filtered for high mean expression and high scatter using the filter gene scatter function with a minimum mean of 0.2, a maximum mean of 7, and a minimum scatter of 1. Gene counts were then corrected using a regression function with total count per cell and percent mitochondrial UMI per cell as covariates. Subsequent dimensionality reduction was performed by scaling gene counts to a mean of 0 and unit variance, followed by principal component analysis, computation of neighborhood maps, and tSNE. Leiden clustering was performed at a resolution of 1 and cells were grouped to reveal major murine cell types and human HEK293T cells.

xiii.XYZeq的热图xiii. Heatmap of XYZeq

从XYZeq处理的数据矩阵中对小鼠细胞进行子集化。将处理后的基因表达值绘制在最小变化倍数为1.5的热图中,并使用Scanpy的热图函数进行层次聚类,默认设置为皮尔逊相关方法和完全连锁。Mouse cells were subsetted from the XYZeq processed data matrix. The processed gene expression values were plotted in a heatmap with a minimum fold change of 1.5 and hierarchically clustered using Scanpy's heatmap function with default settings of Pearson's correlation method and full linkage.

xiv.XYZeq基因配对图xiv.XYZeq gene pairing map

四个肝/肿瘤组织切片使用XYZeq测定法(加入HEK293T细胞)进行处理,并与人和小鼠联合参考进行比对。保留每个切片中至少有一个计数的所有基因,并且成对切片之间的共同基因集合的计数以下三角形绘制,数据的斯皮尔曼相关性(Spearman correlation)以上三角形显示。沿对角线绘制直方图,显示每个切片的所有非零基因的每个基因的计数分布。Four liver/tumor tissue sections were processed using the XYZeq assay (incorporated into HEK293T cells) and compared to a combined human and mouse reference. All genes with at least one count in each slice are retained, and the counts for the common gene set between paired slices are plotted in the lower triangle, and the Spearman correlation of the data is shown in the upper triangle. Histograms are plotted along the diagonal showing the distribution of counts per gene for all nonzero genes for each slice.

xv.XYZeq细胞/孔配对图xv.XYZeq cell/well pairing plot

显示含有细胞类型成对组合的微孔数量的配对图。对于散点图,图中的每个点代表一个孔,其坐标位置表示该孔中存在的每种细胞类型的细胞数量。散点图上的每个圆点是一个基因,代表切片中所有细胞中共同基因的每个基因的平均值。沿着图的对角线是直方图,显示给定细胞类型的每孔细胞数的单变量分布。Pairwise plot showing the number of wells containing pairwise combinations of cell types. For a scatterplot, each point in the plot represents a well, and its coordinate position indicates the number of cells of each cell type present in that well. Each dot on the scatterplot is a gene and represents the mean value for each gene for genes common across all cells in the slice. Along the diagonal of the plots are histograms showing the univariate distribution of the number of cells per well for a given cell type.

xvi.比较10X与XYZeq的热图xvi. Comparing heatmaps of 10X and XYZeq

从每个处理的数据矩阵中对小鼠细胞进行子集化。对于在XYZeq与10X之间发现的成对小鼠莱顿簇,绘制共同基因的缩放和对数转换的基因表达值。对于每次比较,计算皮尔逊相关性并将其绘制在热图中。行/列标记是根据其相应的细胞类型排序的。Mouse cells were subsetted from each processed data matrix. Scaled and log-transformed gene expression values for common genes are plotted for paired mouse Leiden clusters found between XYZeq and 10X. For each comparison, the Pearson correlation is calculated and plotted in a heatmap. Row/column labels are ordered according to their corresponding cell type.

xvii.相关性图xvii. Correlation diagram

从每个处理的数据矩阵中对小鼠细胞进行子集化。绘制每种细胞类型的比例(由莱顿聚类确定并使用tSNE可视化),并通过拟合假设两种测定法之间的比例相等的模型来计算决定系数。Mouse cells were subsetted from each processed data matrix. The proportion of each cell type (determined by Leiden clustering and visualized using tSNE) was plotted and the coefficient of determination was calculated by fitting a model assuming equal proportions between the two assays.

xviii.最高贡献基因的基因模块分析xviii. Gene module analysis of the highest contributing genes

为了使用非负矩阵分解鉴定基因模块,过滤掉在少于5个细胞中表达的基因和表达少于100个基因的细胞。对计数数据进行方差稳定化变换,并使用Seurat R包中的SCTransform(48)函数通过正则化负二项式回归模型回归掉混杂协变量,包括每个细胞的计数数目、批次和线粒体读取百分比。将回归模型的皮尔逊残差值居中,所有负值都转换为零。使用NMF R包中的nmf(49)函数对等级值为20的所得到的表达数据进行非平滑非负矩阵分解(nsNMF)。在每个模块中,基因按它们在相应系数矩阵中的量值以降序排列。使用GOrilla(50)对每个模块中分选的基因进行基因本体富集分析。对于每个模块,与所有其它模块相比,该模块中具有较高系数的顶部连续基因被进一步选择为组织特异性分析中对模块贡献最大的基因(51)。通过首先基于对数归一化的基因表达数据计算每个批次中每个孔的所有细胞的中值表达来生成二元空间图。然后,提取了每个孔的一个模块内所有基因的平均表达,并计算了每个孔的选定模块基因的平均表达平均值,由每个孔中的细胞数量加权。具有高于加权平均值的基因平均表达的孔被标记为该基因模块高表达,并且具有那些选定模块基因的非零表达的所有其它孔被标记为该基因模块低表达。代表基因模块的tSNE图通过注释模块内基因的平均表达来着色。To identify gene modules using nonnegative matrix factorization, genes expressed in fewer than 5 cells and cells expressing fewer than 100 genes were filtered out. The count data were subjected to a variance stabilizing transformation and regressed out confounding covariates, including the number of counts per cell, batch and mitochondrial reads, with a regularized negative binomial regression model using the SCTransform(48) function in the Seurat R package percentage. Centers the Pearson residual values for a regression model, and converts all negative values to zero. The resulting expression data with a rank value of 20 were subjected to a non-smoothed non-negative matrix factorization (nsNMF) using the nmf(49) function in the NMF R package. Within each module, genes are listed in descending order by their magnitude in the corresponding coefficient matrix. Gene Ontology enrichment analysis was performed on the genes sorted in each module using GOrilla (50). For each module, the top consecutive genes with higher coefficients in that module compared to all other modules were further selected as the genes that contributed the most to the module in the tissue-specific analysis (51). Binary spatial maps were generated by first calculating the median expression of all cells in each well in each batch based on the log-normalized gene expression data. Then, the mean expression of all genes within a module per well was extracted, and the average expression of genes of the selected module was calculated per well, weighted by the number of cells in each well. Wells with average expression of genes above the weighted average were marked as highly expressed for that gene module, and all other wells with non-zero expression of those selected module genes were marked as low for that gene module. tSNE plots representing gene modules are colored by the average expression of genes within the annotated modules.

xix.肝/肿瘤和脾/肿瘤中鉴定的基因模块之间的重叠分析xix. Overlap analysis between gene modules identified in liver/tumor and spleen/tumor

首先使用nsNMF,利用两种组织肝/肿瘤和脾/肿瘤分别为20的等级值鉴定基因模块。一个模块的每个分选基因列表中的前200个基因被选为与该模块具有高度关联性。对于肝/肿瘤组织中的每个模块,具有最大基因重叠的脾/肿瘤模块最初被匹配为功能相似。然后,从肝/肿瘤模块的前200个基因中去除了那些重叠基因少于25%的匹配对。为了计算构成每个模块的细胞类型分数,计算所有细胞中每个基因的平均基因表达。进一步计算每种细胞类型的所有重叠基因的中值表达,然后通过除以所有细胞类型的中值表达之和而转化为分数。Gene modules were first identified using nsNMF with rank values of 20 for liver/tumor and spleen/tumor for the two tissues, respectively. The top 200 genes in each sorted gene list for a module were selected as highly correlated with that module. For each module in liver/tumor tissue, the spleen/tumor module with the largest gene overlap was initially matched as functionally similar. Then, those matched pairs with less than 25% overlapping genes were removed from the top 200 genes of the liver/tumor module. To calculate the fraction of cell types that make up each module, the average gene expression for each gene across all cells was calculated. The median expression of all overlapping genes for each cell type was further calculated and converted to a score by dividing by the sum of the median expression of all cell types.

xx.通过孔定义接近度分数xx. Define Proximity Score by Hole

我们试图为六边形孔阵列的每个孔定义一个分数,该分数将捕获一个孔在肿瘤或非肿瘤组织结构域内的中心位置。该方法的核心是确定与所讨论的孔相邻的连续同心“层”孔:与其直接邻域对应的孔(第1层)、正好相距2孔的孔(第2层),等等,共n层。在脾/肿瘤中,在肿瘤区域的远端选择几个孔,并将这些孔的分数设置为1。然后连续取10层孔,并随着每一层线性降低分数,其中设置第10层及以上的孔为0。在肝中,在不同的位置发现MC38细胞,因此,与脾不同,没有单一的单向空间维度可以将所有MC38细胞放置在一端,将所有非肿瘤组织细胞放置在另一端。因此,使用了另一种方法来计算肝/肿瘤组织中的这些分数。对于每个孔wx,y,由它们在六边形孔阵列上的x,y位置进行注释,计算出肝细胞的比例px,y,因为肝细胞是非肿瘤肝组织中最丰富的实质细胞,并与非肿瘤肝组织密切相关:We attempted to define for each pore of the hexagonal pore array a fraction that would capture the central location of a pore within a tumor or non-tumor tissue domain. At the heart of the method is the identification of successive concentric "layers" of holes adjacent to the hole in question: the hole corresponding to its immediate neighbor (layer 1), the hole exactly 2 holes away (layer 2), etc., for a total of n layers . In the spleen/tumor, select a few wells distal to the tumor area and set the fraction of these wells to 1. Then take 10 layers of holes continuously, and decrease the score linearly with each layer, among which the holes on the 10th layer and above are set to 0. In the liver, MC38 cells are found in different locations, so, unlike the spleen, there is no single unidirectional spatial dimension that places all MC38 cells at one end and all non-neoplastic tissue cells at the other. Therefore, another method was used to calculate these fractions in liver/tumor tissue. For each well w x,y , annotated by their x,y position on the hexagonal well array, the proportion p x,y of hepatocytes is calculated, since hepatocytes are the most abundant parenchymal cells in non-neoplastic liver tissue , and closely related to non-neoplastic liver tissue:

tx,y=wx,y中总肝细胞和MC38细胞数量t x,y = total hepatocytes and MC38 cell numbers in w x,y

hx,y=wx,y中肝细胞数量h x, y = number of hepatocytes in w x, y

Figure BDA0003908335950000901

Figure BDA0003908335950000901

然后,对于所讨论的每个孔wx,y,将连续同心10层中的每一层中的周围孔制成表格。表示这些孔wx′y′以区别于所讨论的孔。对于那些层中的每一层l,取其组成孔的px′,y′并计算细胞数量加权平均值px,y,lThen, for each hole wx ,y in question, the surrounding holes in each of the successive concentric 10 layers are tabulated. These pores w x'y' are indicated to distinguish them from the pores in question. For each of those layers l, take the px ',y' of its constituent wells and calculate the cell number-weighted mean px ,y,l :

wx,y,l={wx′y′∈wx,y的层}w x, y, l = {w x′y ∈ layer of w x,y }

tx,y,l=wx,y,l中总肝细胞和MC38细胞数量t x,y,l = number of total hepatocytes and MC38 cells in w x,y,l

Figure BDA0003908335950000911

Figure BDA0003908335950000911

然后,对于所讨论的孔wx,y,计算所有px,y,l的距离加权平均值,这成为了所讨论的孔的接近度分数sx,y。每层的距离权重ul基于指数衰减,终止为10项,然后通过除以所有权重的总和us归一化为1。给予px,y和第1层邻域的值px,y,1相等的权重。根据经验选择1.05的衰减因子d,因为它似乎在所有孔中创建了最近似均匀的分数分布。Then, for the hole wx ,y in question, the distance-weighted average of all px ,y,l is calculated, which becomes the proximity score sx,y for the hole in question. The distance weight u l for each layer is based on exponential decay, terminated at 10 terms, and then normalized to 1 by dividing by the sum of all weights u s . Give equal weight to p x,y and the value p x,y,1 of the layer 1 neighborhood. A decay factor d of 1.05 was chosen empirically as it seemed to create the most approximately uniform distribution of fractions across all wells.

d=1.05,

Figure BDA0003908335950000912

d=1.05,

Figure BDA0003908335950000912

Figure BDA0003908335950000913

Figure BDA0003908335950000913

Figure BDA0003908335950000914

Figure BDA0003908335950000914

对含有至少1个鼠细胞的所有孔重复这些计算。These calculations were repeated for all wells containing at least 1 murine cell.

xxi.轨迹推断分析xxi. Trajectory inference analysis

排除在少于5个细胞中表达的基因以及表达少于100个基因的细胞。使用R Seurat包中的SCTransform(48)函数进行方差稳定化变换。使用R中的tradeSeq(41)包,将一个组织中MSC中得到的校正计数数据用作轨迹推断分析中的计数矩阵输入。表达与接近度分数相关的基因由tradeSeq中的associationTest函数,基于负二项式广义加法模型下的沃尔德检验(Wald test)鉴定。使用本杰明-霍克伯格多重检验程序校正p值,校正的p值小于0.05的基因被认为与接近度分数显著相关。Genes expressed in fewer than 5 cells and cells expressing fewer than 100 genes were excluded. Variance stabilizing transformations were performed using the SCTransform(48) function in the R Seurat package. Corrected count data obtained from MSCs in one tissue were used as count matrix input in trajectory inference analysis using the tradeSeq(41) package in R. Genes whose expression correlated with proximity scores were identified by the associationTest function in tradeSeq, based on the Wald test under the negative binomial generalized additive model. The p-values were corrected using the Benjamin-Hochberg multiple testing procedure, and genes with corrected p-values less than 0.05 were considered significantly associated with proximity scores.

2.结果2. Results

我们开发了XYZeq,这是一种使用两轮拆分池索引将组织样品中每个细胞的空间位置编码到组合索引的scRNA-seq库中的方法(17,18)。对于XYZeq的性能至关重要的是,我们用二硫代双(琥珀酰亚胺基丙酸酯)(DSP)固定组织切片,DSP是一种可逆的交联固定剂,已显示可以保存组织学组织形态,同时保持RNA完整性以用于单细胞转录组学的(19)。在第一轮索引中,将固定并冷冻保存的组织切片放置在中心至中心间距为500μm的微孔阵列上并密封在其中。微孔含有不同条形码化的逆转录(RT)引物(空间条形码)。这一步骤将完整的细胞从组织中物理分割成不同的原位条形码反应。逆转录后,从阵列中取出完整细胞,汇集并分配到孔中以进行第二轮PCR索引,赋予每个单细胞一个组合条形码(图5A和图5B)。在测序和多路分解之后,空间条形码将每个细胞映射回其在阵列中的物理位置(图5B)。这种组合条形码策略理论上能够对大型单细胞集合进行空间转录组学分析——通过两轮拆分池索引、768个空间RT条形码和384个PCR条形码,可以生成多达294,912个独特单细胞条形码。We developed XYZeq, a method that uses two rounds of split-pool indexing to encode the spatial location of each cell in a tissue sample into a combinatorially indexed scRNA-seq library (17,18). Critical to the performance of XYZeq, we fixed tissue sections with dithiobis(succinimidyl propionate) (DSP), a reversibly cross-linked fixative that has been shown to preserve histology tissue morphology while maintaining RNA integrity for single-cell transcriptomics (19). In the first round of indexing, fixed and cryopreserved tissue sections were placed on and sealed in microwell arrays with a center-to-center spacing of 500 μm. Microwells contain differently barcoded reverse transcription (RT) primers (spatial barcodes). This step physically separates intact cells from the tissue into different in situ barcoded reactions. After reverse transcription, intact cells were removed from the array, pooled and distributed into wells for a second round of PCR indexing, assigning each single cell a combined barcode (Figure 5A and Figure 5B). After sequencing and multiplexing, spatial barcoding maps each cell back to its physical location in the array (Fig. 5B). This combinatorial barcoding strategy theoretically enables spatial transcriptomic analysis of large single-cell collections—up to 294,912 unique single-cell barcodes could be generated through two rounds of split pool indexing, 768 spatial RT barcodes, and 384 PCR barcodes .

为了确定XYZeq是否可以将转录组分配给单细胞,我们进行了混合物种实验,其中将总共11个不同比率的DSP固定的人(HEK293T)与小鼠(NIH/3T3)细胞混合物沉积到768个条形码化的微孔中的每一个中,沿阵列的列产生细胞比例梯度(图5C和方法)。使用XYZeq生成6,447个细胞的scRNA-seq数据。基于读数映射到人和小鼠转录组的细胞条形码的百分比,94.8%的细胞条形码被分配给单个物种,估计条形码碰撞率为5.1%(图15A)。假设部分碰撞是由于受损细胞释放的环境RNA引起的污染造成的。使用DeconX(20)(一种分层贝叶斯方法,它假设观察到的细胞转录物计数是来自两个二项式分布的计数的混合),我们去除了污染转录物,将碰撞率降低到0.7%(图5D和方法)。在计算去污和消除碰撞事件后,获得了每个人细胞939个UMI和439个基因的中值,以及每个小鼠细胞816个UMI和336个基因的中值。将每个单细胞映射到其原始微孔,我们观察到沿孔的各列观察到的和预期的细胞类型比例之间高度一致(林氏一致性相关系数=0.91;图5E和图15B)。总之,这些结果表明,在汇集后,每个孔中的单细胞以及阵列上的相邻孔之间的条形码污染最少,这表明了XYZeq工作流程成功地产生了空间分辨scRNA-seq库。To determine whether XYZeq could assign transcriptomes to single cells, we performed mixed-species experiments in which a total of 11 different ratios of DSP-fixed human (HEK293T) to mouse (NIH/3T3) cell mixtures were deposited to 768 barcodes In each of the depleted microwells, a gradient of cell ratios was created along the columns of the array (Fig. 5C and Methods). scRNA-seq data for 6,447 cells was generated using XYZeq. Based on the percentage of reads mapped to cellular barcodes in the human and mouse transcriptomes, 94.8% of cellular barcodes were assigned to a single species, with an estimated barcode collision rate of 5.1% (Fig. 15A). It is hypothesized that some of the collisions are due to contamination by environmental RNA released by damaged cells. Using DeconX (20), a hierarchical Bayesian approach that assumes that observed cellular transcript counts are a mixture of counts from two binomial distributions, we removed contaminating transcripts, reducing the collision rate to 0.7% (Fig. 5D and Methods). After counting decontamination and elimination of collision events, median values of 939 UMIs and 439 genes per human cell and 816 UMIs and 336 genes per mouse cell were obtained. Mapping each single cell to its original microwell, we observed a high degree of agreement between the observed and expected cell type proportions along the columns of wells (Lin's correlation coefficient for agreement = 0.91; Figure 5E and Figure 15B). Taken together, these results demonstrate minimal barcode contamination after pooling, both in single cells in each well and between adjacent wells on the array, demonstrating the success of the XYZeq workflow for generating spatially resolved scRNA-seq libraries.

接下来将XYZeq应用于异位鼠肿瘤模型,该模型是通过将同基因结肠腺癌细胞系MC38经肝内注射到免疫活性小鼠中而建立的。这一模型模拟转移性癌症的组织浸润特征,更重要的是,它与相对明确的肿瘤边界相关(21,22)。MC38肿瘤细胞也具有免疫调节特性,先前的数据显示在肿瘤接种后约10天免疫细胞浸润肿瘤/组织界面(23,24)。因此,预测XYZeq可以同时捕获实质肝细胞、癌细胞和肿瘤相关免疫细胞群的基因表达状态和空间组织。将来自C57BL/6小鼠的25μm固定冷冻肝/肿瘤组织切片放于预冷冻的微孔阵列顶部,同时将连续的10μm切片固定用于免疫组织化学染色(图16A和方法)。我们还将固定的人HEK293T细胞以每孔平均58个细胞沉积到同一阵列中,用作混合物种内部对照,以实验量化碰撞率。进行XYZeq并基于比较人与小鼠转录物的比率观察到7.3%的初始碰撞率(图16B)。在计算去污和进一步质量控制(包括根据细胞计数和线粒体表达过滤细胞)后,碰撞率降低到4.4%(图11A和方法)。去除碰撞后,总共获得8,746个细胞,并在46%测序饱和度下检测到每个HEK293T细胞1,596个UMI和629个独特基因的中值,以及来自异位鼠肿瘤模型的每个细胞1,009个UMI和456个独特基因的中值(图11B)。苏木精和伊红(H&E)染色的连续组织切片显示肿瘤与相邻肝/肿瘤组织之间的组织学边界(图11C)。正如预期的那样,观察到HEK293T人细胞分布在整个阵列中,而小鼠细胞被隔离在鼠组织的边界内(图11D)。请注意,检测到没有细胞的空的空间孔可能是由于测序靶向的细胞数量有限(约10,000个)。获得了3个人细胞/孔和9个小鼠细胞/孔的中值,预计总共13个细胞/孔(图16C)。XYZeq was next applied to an ectopic murine tumor model established by intrahepatic injection of the syngeneic colon adenocarcinoma cell line MC38 into immunocompetent mice. This model mimics the tissue invasion characteristic of metastatic cancer and, more importantly, it is associated with relatively well-defined tumor boundaries (21,22). MC38 tumor cells also have immunomodulatory properties, with previous data showing immune cell infiltration at the tumor/tissue interface approximately 10 days after tumor inoculation (23,24). Therefore, XYZeq is predicted to simultaneously capture the gene expression state and spatial organization of parenchymal hepatocytes, cancer cells, and tumor-associated immune cell populations. 25 μm fixed frozen liver/tumor tissue sections from C57BL/6 mice were placed on top of pre-frozen microwell arrays, while serial 10 μm sections were fixed for immunohistochemical staining (Figure 16A and Methods). We also deposited fixed human HEK293T cells into the same array at an average of 58 cells per well to serve as a mixed-species internal control to experimentally quantify the collision rate. XYZeq was performed and an initial hit rate of 7.3% was observed based on comparing the ratio of human to mouse transcripts (Figure 16B). After calculated decontamination and further quality controls including filtering cells based on cell count and mitochondrial expression, the collision rate was reduced to 4.4% (Fig. 11A and Methods). After collision removal, a total of 8,746 cells were obtained and a median of 1,596 UMIs and 629 unique genes were detected per HEK293T cell at 46% sequencing saturation, and 1,009 UMIs per cell from an ectopic murine tumor model and a median of 456 unique genes (Fig. 11B). Serial tissue sections stained with hematoxylin and eosin (H&E) showed histological boundaries between the tumor and adjacent liver/tumor tissue (Fig. 11C). As expected, HEK293T human cells were observed distributed throughout the array, whereas mouse cells were isolated within the boundaries of the murine tissue (Fig. 11D). Note that the detection of empty spatial wells with no cells may be due to the limited number of cells targeted by the sequencing (approximately 10,000). A median of 3 human cells/well and 9 mouse cells/well was obtained for a total of 13 cells/well expected (Figure 16C).

XYZeq揭露了鼠肝和肿瘤内不同的细胞类型。半监督莱顿聚类揭露了鼠肿瘤模型中的13个细胞群(图17A),其中基于定义每个群体的标记注释七种细胞类型:肝细胞、癌细胞(MC38)、枯否细胞、肝窦内皮细胞(LSEC)、间充质干细胞(MSC)、淋巴细胞和骨髓细胞(图12A)。从XYZeq scRNA-seq数据和公开可用的MC38细胞遗传学数据估计的染色体拷贝数的高度相关性支持MC38肿瘤细胞的注释(皮尔逊r=0.78)(25)。值得注意的是,在XYZeq数据中观察到的第15号染色体的部分扩增和第14号染色体的部分缺失与在MC38细胞中看到的常见染色体异常一致(图17B)。作为阴性对照,在比较MC38细胞与肝细胞(26)和免疫细胞(21)时,发现染色体拷贝数相关性较低(分别皮尔逊r=0.05和r=0.17)(图17B)。显示跨七种细胞类型差异表达的基因的热图揭露了由每种细胞类型相对专有的典型基因的表达定义的不同细胞簇(图12B)。注意,我们估计每个细胞簇的污染率均较低(中值低于1%),肝细胞除外,其具有略高污染率,为2.2%(图17C和方法)。发现在所有细胞簇中检测到的中值UMI和基因可比较,包括使用其它组合索引方法难以剖析的免疫细胞群(27)(图17D和图17E)。使用先前描述的标记鉴定非荷瘤肝中预期的细胞类型,包括肝细胞、枯否细胞和LSEC(26)。与已知的肝细胞异质性一致,鉴定出由中心周围标记(Glul、Oat和Gulo)的表达注释的肝细胞子集(26)(图17F)。MC38腺癌细胞构成一个大的均匀簇,并通过已知标记Plec的表达来区分(22)。骨髓细胞由经典标记Cd11b和Cd74(28)定义,但也观察到其它非经典标记,包括Myo1f(29)和Tgfb(30)。淋巴细胞显示出类似的细胞类型标记广泛和特异性表达模式的混合,其中泛淋巴细胞标记Il18r1、T淋巴细胞标记Prkcq和细胞毒性T细胞标记Cd8b(31-33)表达。最后,检测到一簇间充质干/基质细胞,其表达广泛的间充质细胞标记Rbms3和Tshz2和干/基质细胞标记Prkg1和Gpc6(34-38)(图17F)。XYZeq reveals distinct cell types within mouse liver and tumors. Semi-supervised Leiden clustering revealed 13 cell populations in a murine tumor model (Fig. 17A), where seven cell types were annotated based on the markers defining each population: hepatocytes, cancer cells (MC38), Kupffer cells, liver Sinus endothelial cells (LSEC), mesenchymal stem cells (MSC), lymphocytes and myeloid cells (Fig. 12A). High correlation of chromosomal copy numbers estimated from XYZeq scRNA-seq data and publicly available MC38 cytogenetics data supports annotation of MC38 tumor cells (Pearson r = 0.78) (25). Notably, the partial amplification of chromosome 15 and partial deletion of chromosome 14 observed in the XYZeq data is consistent with common chromosomal abnormalities seen in MC38 cells (Fig. 17B). As a negative control, when comparing MC38 cells to hepatocytes (26) and immune cells (21 ), a lower correlation of chromosome copy number was found (Pearson r=0.05 and r=0.17, respectively) (Fig. 17B). A heatmap showing genes differentially expressed across the seven cell types revealed distinct clusters of cells defined by the expression of canonical genes relatively exclusive to each cell type (Figure 12B). Note that we estimated a low contamination rate for each cell cluster (median below 1%), with the exception of hepatocytes, which had a slightly higher contamination rate of 2.2% (Fig. 17C and Methods). Median UMIs and genes detected in all cell clusters were found to be comparable, including immune cell populations (27) that were difficult to dissect using other combinatorial indexing methods (Figure 17D and Figure 17E). Predicted cell types in non-tumor-bearing liver were identified using previously described markers, including hepatocytes, Kupffer cells, and LSECs (26). Consistent with known hepatocyte heterogeneity, a subset of hepatocytes annotated by the expression of pericentric markers (Glu1, Oat and Gulo) was identified (26) (Fig. 17F). MC38 adenocarcinoma cells form a large homogeneous cluster and are distinguished by the expression of the known marker Plec (22). Myeloid cells are defined by the canonical markers Cd11b and Cd74 (28), but other non-canonical markers have also been observed, including Myo1f (29) and Tgfb (30). Lymphocytes showed a similar mix of broad and specific expression patterns of cell type markers, with expression of the pan-lymphocyte marker Il18r1, the T-lymphocyte marker Prkcq, and the cytotoxic T-cell marker Cd8b (31-33). Finally, a cluster of mesenchymal stem/stromal cells was detected expressing the broad mesenchymal cell markers Rbms3 and Tshz2 and the stem/stromal cell markers Prkg1 and Gpc6 (34-38) (Fig. 17F).

接下来,评估XYZeq的可重复性,同时比较器官z层转录图谱的变化。对来自同一冷冻肝/肿瘤样品块的四个非连续25μm组织切片进行处理和分析。在所有切片中检测到的基因在所有细胞中的平均表达在每对切片之间高度相关(平均成对斯皮尔曼r=0.93)(图18A)。注意到,在四个组织切片中,切片1和切片2是其z坐标中最近的两个切片(相隔80μm),具有最高的表达相关性(斯皮尔曼r=0.96)。相比之下,在z坐标中最远端的切片1和切片4(相隔830μm)具有最低的相关性(斯皮尔曼r=0.91)。此外,在所有四个切片上联合注释的簇由来自每个切片的细胞组成,表明观察到的异质性不是由批次效应造成的(图18B)。Next, the reproducibility of XYZeq was assessed while comparing changes in organ z-layer transcriptional profiles. Four non-contiguous 25 μm tissue sections from the same frozen liver/tumor sample block were processed and analyzed. The average expression in all cells of genes detected in all sections was highly correlated between each pair of sections (mean paired Spearman's r = 0.93) (Fig. 18A). Note that among the four tissue sections, slice 1 and slice 2 are the two closest slices in their z-coordinates (80 μm apart) with the highest expression correlation (Spearman's r = 0.96). In contrast, the most distal slices 1 and 4 (830 μm apart) in z-coordinate had the lowest correlation (Spearman's r = 0.91). Furthermore, clusters annotated jointly across all four slices consisted of cells from each slice, suggesting that the observed heterogeneity was not due to batch effects (Fig. 18B).

进一步将XYZeq生成的scRNA-seq数据的质量与另一种市售的单细胞技术进行比较。为了做到这一点,我们将XYZeq鉴定的细胞类型簇与使用10X Genomics基于液滴的Chromium系统生成的相同肝/肿瘤的独立scRNA-seq数据集鉴定的细胞类型簇进行比较。XYZeq也观察到10X检测到的大多数细胞群,除了中性粒细胞、红系祖细胞和浆细胞(图12C和图19A),已知对XYZeq所需的冷冻保存敏感的免疫细胞群(39)。有趣的是,即使细胞是从新鲜的肝/肿瘤样品中分离出来的,10X也不能捕获MSC。此外,使用10X平台鉴定的B细胞与XYZeq检测到的骨髓群体相关,这可能是由于Ly86、Cd74和几个II类组织相容性抗原基因(例如H2ab1或H2dmb1)的转录物捕获。对于10X和XYZeq数据中鉴定的六种细胞类型,观察到两种细胞类型比例的高度相关性(林氏CCC=0.99;图19B)和每种细胞类型的伪批量表达谱(皮尔逊r=0.64-0.86,p<0.01,图12C)。The quality of scRNA-seq data generated by XYZeq was further compared with another commercially available single-cell technology. To do this, we compared the cell type clusters identified by XYZeq to those identified by an independent scRNA-seq dataset of the same liver/tumor generated using the 10X Genomics droplet-based Chromium system. XYZeq also observed most of the cell populations detected by 10X, with the exception of neutrophils, erythroid progenitors, and plasma cells (Figure 12C and Figure 19A), immune cell populations known to be sensitive to the cryopreservation required by XYZeq (39 ). Interestingly, 10X failed to capture MSCs even when the cells were isolated from fresh liver/tumor samples. In addition, B cells identified using the 10X platform correlated with the myeloid population detected by XYZeq, likely due to transcript capture of Ly86, Cd74, and several class II histocompatibility antigen genes such as H2ab1 or H2dmb1. For the six cell types identified in the 10X and XYZeq data, a high correlation was observed for the proportions of the two cell types (Lin's CCC = 0.99; Fig. 19B) and the pseudo-bulk expression profile of each cell type (Pearson's r = 0.64 -0.86, p<0.01, Figure 12C).

接下来,转向XYZeq是否可以确定每个细胞的空间位置的关键问题。为此,将每个细胞簇的空间定位与H&E染色的连续切片的图像进行比较。首先为了确定可以从肿瘤组织中准确地定义肝,确认空间孔中的肝细胞和癌细胞的密度与相邻切片的组织学注释重叠(图12D)。其它细胞类型的投影揭露了骨髓细胞、淋巴细胞、枯否细胞、MSC和LSEC的不同空间组织模式(图12D和图20A)。对占据每个空间孔的细胞组成的量化揭露,MSC、淋巴细胞和骨髓细胞与癌细胞共定位,而枯否细胞和LSEC与肝细胞共定位,这表明了肿瘤浸润组织中存在细胞相互作用的潜在区域(图12E和方法)。通过所有孔中细胞类型比例的成对相关性分析证实了这些定性观察(0.37≤皮尔逊r≤0.77,p<0.05;图12F和图20B)。Next, we turn to the critical question of whether XYZeq can determine the spatial location of each cell. To this end, the spatial localization of each cell cluster was compared with images of H&E-stained serial sections. First to ascertain that the liver could be accurately defined from the tumor tissue, it was confirmed that the density of hepatocytes and cancer cells in the spatial wells overlapped with the histological annotations of adjacent sections (Fig. 12D). Projections of other cell types revealed different spatial organization patterns of myeloid cells, lymphocytes, Kupffer cells, MSCs and LSECs (Fig. 12D and Fig. 20A). Quantification of the composition of cells occupying each spatial pore revealed that MSCs, lymphocytes, and myeloid cells co-localized with cancer cells, whereas Kupffer cells and LSECs co-localized with hepatocytes, suggesting the presence of cellular interactions in tumor-infiltrating tissues. Latent regions (Fig. 12E and Methods). These qualitative observations were confirmed by pairwise correlation analysis of cell type proportions in all wells (0.37≤Pearson r≤0.77, p<0.05; Figure 12F and Figure 20B).

为了评估XYZeq对其它组织的普遍适用性,我们处理了来自脾中相同异位鼠肿瘤模型的样品。以每个HEK293T细胞1,312个UMI和661个独特基因的中值和每个小鼠细胞1,169个UMI和577个独特基因的中值回收了总共7,505个细胞,估计碰撞率为1.36%(图21A和图21B)。类似于肝/肿瘤模型,XYZeq能够重建脾小鼠组织的边界,其中MC38肿瘤区域在连续H&E染色切片上注释(图21C至图21E)。检测到4个人细胞/孔和7个小鼠细胞/孔的中值(图21F)。半监督莱顿聚类揭露了脾/肿瘤模型的六个不同的细胞群,包括:B细胞、T细胞、骨髓细胞、MSC、内皮细胞和MC38肿瘤细胞(图22A)。观察到所有四个脾/肿瘤切片对每个细胞类型簇有贡献,这表明注释的簇不是由批次效应造成的(图22B)。显示跨六种细胞类型差异表达的基因的热图揭露了表达每种类型相对专有的典型基因的不同细胞簇(图22C)。来自每种类型的细胞可以在整个组织中进行空间映射(图22D)。总的来说,这些结果表明XYZeq可以从不同的固定冷冻组织中生成空间分辨单细胞RNA-seq数据。To assess the general applicability of XYZeq to other tissues, we processed samples from the same heterotopic murine tumor model in the spleen. A total of 7,505 cells were recovered at a median of 1,312 UMIs and 661 unique genes per HEK293T cell and 1,169 UMIs and 577 unique genes per mouse cell, for an estimated collision rate of 1.36% (Figure 21A and Figure 21B). Similar to the liver/tumor model, XYZeq was able to reconstruct the boundaries of splenic mouse tissue, where MC38 tumor regions were annotated on serial H&E stained sections (Fig. 21C to Fig. 21E). A median of 4 human cells/well and 7 mouse cells/well was detected (FIG. 21F). Semi-supervised Leiden clustering revealed six distinct cell populations of the spleen/tumor model, including: B cells, T cells, myeloid cells, MSCs, endothelial cells, and MC38 tumor cells (Fig. 22A). All four spleen/tumor sections were observed to contribute to each cell type cluster, suggesting that the annotated clusters were not due to batch effects (Fig. 22B). A heatmap showing genes differentially expressed across the six cell types revealed distinct clusters of cells expressing canonical genes relatively exclusive to each type (Figure 22C). Cells from each type could be spatially mapped throughout the tissue (Fig. 22D). Collectively, these results demonstrate that XYZeq can generate spatially resolved single-cell RNA-seq data from diverse fixed-frozen tissues.

同时获得空间和单细胞转录组数据的能力允许我们评估细胞组成对跨空间基因表达模式的影响。我们将非负矩阵分解(NMF)应用于肝/肿瘤和脾/肿瘤scRNA-seq数据来定义共表达基因的模块,并将每种细胞类型中每个模块的表达与其在空间孔中的表达相关联。使用我们的方法,我们鉴定了每个组织中共表达基因的20个模块(方法)。作为该方法原理的证明,首先从肝/肿瘤数据鉴定肝模块(LM)14,其主要由tSNE空间中的肝细胞簇表达(图13A)。正如所料,最高LM14表达孔富含肝细胞,这表明该模块的空间变化性主要由肝细胞的频率驱动(图13B)。The ability to simultaneously obtain spatial and single-cell transcriptome data allows us to assess the impact of cellular composition on gene expression patterns across space. We applied non-negative matrix factorization (NMF) to liver/tumor and spleen/tumor scRNA-seq data to define modules of co-expressed genes and correlated the expression of each module in each cell type with its expression in spatial wells couplet. Using our method, we identified 20 modules (methods) of co-expressed genes per tissue. As a proof of principle of this approach, liver module (LM) 14, which is predominantly expressed by hepatocyte clusters in the tSNE space, was first identified from liver/tumor data (Fig. 13A). As expected, the highest LM14 expression wells were enriched in hepatocytes, suggesting that the spatial variability of this module is mainly driven by the frequency of hepatocytes (Fig. 13B).

接下来,推断由于肝和脾都注射了相同的肿瘤细胞系,因此入侵的肿瘤可能会诱导共享的基因表达谱,该基因表达谱在空间上有所不同,部分是由肿瘤微环境的细胞组成驱动的。为了检验这一假设,首先通过NMF分析(方法)鉴定出两个组织之间的匹配基因模块对。发现四个不同的肝模块(LM),它们至少有25%的基因与脾/肿瘤模块(SM)重叠(图13C和图23A)。模块的基因本体论(GO)分析揭露了与肿瘤反应、免疫调节和细胞迁移有关的基因的富集(图23B和图23C;以及图24B)。与富集分析一致,这些模块中的许多基因与肿瘤发生有关(表3中的完整基因列表)。与LM14不同,对这些匹配模块的进一步分析揭露了细胞群的异质组成,这有助于特定模块基因的表达(图23D和方法)。例如,肿瘤反应模块LM5和其匹配模块SM2和SM12(图13C和图23A)由主要在MC38肿瘤细胞中表达的基因组成,在骨髓细胞和淋巴细胞中也有一些表达(图13D;图23D;和方法)。免疫调节模块LM13和LM19(与SM7和SM20匹配)由主要在常规(例如骨髓和淋巴细胞)和非常规(例如来自肝样品的枯否细胞)免疫细胞中表达的基因组成(图13C和图13D;以及图23D)。这些重叠模块的表达在癌细胞密集浸润的区域中最高(图13E和图13F)。总的来说,这些结果表明,来自XYZeq的scRNA-seq和空间元数据的联合分析可以鉴定出由于组织样品中细胞组成的差异而导致的空间可变基因模块。Next, it was reasoned that because both the liver and spleen were injected with the same tumor cell line, the invading tumor might induce a shared gene expression profile that is spatially distinct and partially composed of cells of the tumor microenvironment Driven. To test this hypothesis, matching gene module pairs between the two tissues were first identified by NMF analysis (Methods). Four distinct liver modules (LM) with at least 25% gene overlap with the spleen/tumor module (SM) were found (Fig. 13C and Fig. 23A). Gene Ontology (GO) analysis of the modules revealed enrichment of genes related to tumor response, immune regulation, and cell migration (Figure 23B and Figure 23C; and Figure 24B). Consistent with the enrichment analysis, many genes in these modules were associated with tumorigenesis (full gene list in Table 3). Unlike LM14, further analysis of these matched modules revealed a heterogeneous composition of cell populations that contributed to the expression of module-specific genes (Fig. 23D and Methods). For example, the tumor response module LM5 and its matching modules SM2 and SM12 (Fig. 13C and Fig. 23A) consist of genes mainly expressed in MC38 tumor cells, with some expression in myeloid cells and lymphocytes (Fig. 13D; Fig. 23D; and method). The immune regulatory modules LM13 and LM19 (matched to SM7 and SM20) consist of genes mainly expressed in conventional (e.g. bone marrow and lymphocytes) and unconventional (e.g. Kupffer cells from liver samples) immune cells (Figure 13C and Figure 13D ; and Figure 23D). Expression of these overlapping modules was highest in areas of dense cancer cell infiltration (Figure 13E and Figure 13F). Collectively, these results demonstrate that the joint analysis of scRNA-seq and spatial metadata from XYZeq can identify spatially variable gene modules due to differences in cellular composition in tissue samples.

表3.肝与脾之间前200个贡献基因的重叠基因列表。Table 3. Overlap gene list of top 200 contributing genes between liver and spleen.

Figure BDA0003908335950000971

Figure BDA0003908335950000971

Figure BDA0003908335950000981

Figure BDA0003908335950000981

Figure BDA0003908335950000991

Figure BDA0003908335950000991

接下来,将分析重点放在匹配模块LM10和SM15/SM17,这些模块主要由MSC表达并富含参与细胞迁移的基因(图13C;图14A;图23D;图24A;以及图24B)。因为已知MSC具有针对受伤或发炎部位的归巢能力(40),所以假设LM10可能基于它们与肿瘤的接近度而在MSC中差异表达。为了检验这一假设,首先根据附近孔的组成和与附近孔的距离计算每个孔的肿瘤接近度分数(图14B;分数定义参见方法和图25)。将接近度分数投影到tSNE空间中的MSC上揭露了群体的转录异质性与对肿瘤的空间接近度相关(图14C)。然后,使用tradeSeq(41)分析了MSC表达谱,以鉴定用接近度分数跟踪的差异表达基因。鉴定并聚类了来自肝/肿瘤组织的177个基因(p<0.05)和来自脾/肿瘤组织的66个基因(p<0.05)与连续的一维接近度分数相关联(图14D)。根据细胞与肿瘤的接近度,基因大致分为三组:肿瘤内、肿瘤-组织边界和组织内,其中突出显示脾/肿瘤组织的具有统计学意义的基因(本杰明-霍克伯格FDR<0.05)(图14D)。有趣的是,对于在脾/肿瘤的肿瘤内区域发现的MSC,据报道,许多差异表达的基因可调节细胞外基质(ECM)(图14D,右图)(42-45),这表明MC38细胞可能在邻近的MSC中诱导局部基因表达程序,这可能有助于ECM的恶性重塑。Next, the analysis focused on matching modules LM10 and SM15/SM17, which are predominantly expressed by MSCs and enriched in genes involved in cell migration (Fig. 13C; Fig. 14A; Fig. 23D; Fig. 24A; and Fig. 24B). Because MSCs are known to have the ability to homing to injured or inflamed sites (40), it was hypothesized that LM10 might be differentially expressed in MSCs based on their proximity to tumors. To test this hypothesis, a tumor proximity score for each well was first calculated based on the composition of and distance from nearby wells (Fig. 14B; see Methods and Fig. 25 for score definitions). Projecting the proximity scores onto MSCs in tSNE space revealed that the transcriptional heterogeneity of the population correlated with the spatial proximity to the tumor (Fig. 14C). MSC expression profiles were then analyzed using tradeSeq (41) to identify differentially expressed genes tracked with proximity scores. 177 genes (p<0.05) from liver/tumor tissue and 66 genes (p<0.05) from spleen/tumor tissue were identified and clustered to be associated with continuous one-dimensional proximity scores (Figure 14D). Genes were roughly divided into three groups according to the proximity of the cells to the tumor: intratumoral, tumor-tissue border, and intratissue, with statistically significant genes (Benjamin-Hochberg FDR<0.05) for spleen/tumor tissue highlighted ( Figure 14D). Interestingly, for MSCs found in intratumoral regions of the spleen/tumor, a number of differentially expressed genes have been reported to regulate the extracellular matrix (ECM) (Fig. 14D, right panel) (42-45), suggesting that MC38 cells A local gene expression program may be induced in neighboring MSCs, which may contribute to malignant remodeling of the ECM.

最后,利用来自XYZeq的scRNA-seq数据将单个MSC如何表达Tshz2和Csmd1可视化,这两个具有不同功能的基因在空间上相对于脾中的肿瘤是可变的。这两种基因都被表征为肿瘤抑制基因,并且经常在癌细胞中沉默以促进恶性生长和转移(36,46,47)。然而,发现脾/肿瘤MSC在更接近肿瘤时表达较低水平的Csmd1但Tshz2的水平较高(图14E)。重要的是,这些基因的平均差异表达是脾脏MSC特有的,MC38肿瘤细胞不表达。这些基因中的每一个在空间中的表达模式揭露了与上述空间轨迹分析一致的模式,表明它们在MSC中的异质表达可能由细胞相对于肿瘤的位置决定(图14F)。总之,这些结果表明,对来自XYZeq的空间和单细胞转录组学数据的联合分析可以检测特定细胞类型(例如MSC)内的转录可变基因,这些基因由它们在复杂组织结构中的位置驱动。Finally, we utilized scRNA-seq data from XYZeq to visualize how individual MSCs express Tshz2 and Csmd1, two genes with distinct functions that are spatially variable relative to tumors in the spleen. Both genes have been characterized as tumor suppressors and are frequently silenced in cancer cells to promote malignant growth and metastasis (36,46,47). However, spleen/tumor MSCs were found to express lower levels of Csmdl but higher levels of Tshz2 closer to the tumor (Fig. 14E). Importantly, the average differential expression of these genes was specific to splenic MSCs and not expressed by MC38 tumor cells. The spatial expression pattern of each of these genes revealed a pattern consistent with the spatial trajectory analysis described above, suggesting that their heterogeneous expression in MSCs may be determined by the location of the cells relative to the tumor (Fig. 14F). Taken together, these results demonstrate that combined analysis of spatial and single-cell transcriptomic data from XYZeq can detect transcriptionally variable genes within specific cell types, such as MSCs, driven by their location in complex tissue structures.

3.讨论3 Discussion

我们介绍了XYZeq,这是一种新的单细胞RNA测序工作流程,它以500μm的分辨率对空间元信息进行编码。XYZeq支持无偏见的单细胞转录组学分析,以捕获所有细胞类型和状态,同时将每个细胞放于复杂组织的空间环境中。在鼠肿瘤模型中,证明了XYZeq可以鉴定由细胞组成决定的基因表达的空间可变模式以及由空间接近度决定的细胞类型内的异质性。展望未来,XYZeq提供了一个可扩展的工作流程,它可以适应多个组织z层,并有可能促进整个器官的分析。对映射到其组织结构特征的单细胞的多种模式进行大规模综合剖析,将有助于更好地了解组织微环境如何影响细胞浸润以及在健康和疾病中的相互作用。We introduce XYZeq, a novel single-cell RNA-sequencing workflow that encodes spatial meta-information at 500 μm resolution. XYZeq enables unbiased single-cell transcriptomic analysis to capture all cell types and states while placing each cell in the spatial context of complex tissues. In murine tumor models, it was demonstrated that XYZeq can identify spatially variable patterns of gene expression determined by cellular composition as well as heterogeneity within cell types determined by spatial proximity. Going forward, XYZeq offers a scalable workflow that can accommodate multiple tissue z-layers and potentially facilitate analysis of whole organs. Large-scale integrated dissection of the multiple patterns of single cells mapped to their tissue architectural features will lead to a better understanding of how the tissue microenvironment influences cellular infiltration and interactions in health and disease.

参考文献:references:

1.A.P.Patel et al.,Single-cell RNA-seq highlights intratumoralheterogeneity in primary glioblastoma.Science 344,1396-1401(2014).1.A.P.Patel et al.,Single-cell RNA-seq highlights intratumoralheterogeneity in primary glioblastoma.Science 344,1396-1401(2014).

2.S.V.Puram et al.,Single-Cell Transcriptomic Analysis of Primary andMetastatic Tumor Ecosystems in Head and Neck Cancer.Cell 171,1611-1624e1624(2017).2.S.V.Puram et al.,Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer.Cell 171,1611-1624e1624(2017).

3.C.Ziegenhain et al.,Comparative Analysis of Single-Cell RNASequencing Methods.Mol Cell 65,631-643e634(2017).3. C. Ziegenhain et al., Comparative Analysis of Single-Cell RNASequencing Methods. Mol Cell 65, 631-643e634 (2017).

4.I.C.Macaulay,C.P.Ponting,T.Voet,Single-Cell Multiomics:MultipleMeasurements from Single Cells.Trends Genet 33,155-168(2017).4. I.C. Macaulay, C.P. Ponting, T. Voet, Single-Cell Multiomics: Multiple Measurements from Single Cells. Trends Genet 33, 155-168 (2017).

5.M.L.Suva,I.Tirosh,Single-Cell RNA Sequencing in Cancer:LessonsLeamed and Emerging Challenges.Mol Cell 75,7-12(2019).5. M.L.Suva, I.Tirosh, Single-Cell RNA Sequencing in Cancer: Lessons Leamed and Emerging Challenges. Mol Cell 75, 7-12(2019).

6.V.Svensson,R.Vento-Tormo,S.A.Teichmann,Exponential scaling ofsingle-cell RNA-seq in the past decade.Nat Protoc 13,599-604(2018).6. V. Svensson, R. Vento-Tormo, S. A. Teichmann, Exponential scaling of single-cell RNA-seq in the past decade. Nat Protoc 13, 599-604 (2018).

7.K.H.Chen,A.N.Boettiger,J.R.Moffitt,S.Wang,X.Zhuang,RNAimaging.Spatially resolved,highly multiplexed RNA profiling in singlecells.Science 348,aaa6090(2015).7. K.H.Chen, A.N.Boettiger, J.R.Moffitt, S.Wang, X.Zhuang, RNAimaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090(2015).

8.A Raj,P.van den Bogaard,S.A.Rifkin,A.van Oudenaarden,S.Tyagi,Imaging individual mRNA molecules using multiple singly labeled probes.NatMethods 5,877-879(2008).8. A Raj, P. van den Bogaard, S.A. Rifkin, A. van Oudenaarden, S. Tyagi, Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods 5, 877-879 (2008).

9.C.L.Eng et al.,Transcriptome-scale super-resolved imaging intissues by RNA seqFISH.Nature 568,235-239(2019).9. C.L.Eng et al., Transcriptome-scale super-resolved imaging intissues by RNA seqFISH. Nature 568, 235-239(2019).

10.S.Shah,E.Lubeck,W.Zhou,L.Cai,seqFISH Accurately DetectsTranscripts in Single Cells and Reveals Robust Spatial Organization in theHippocampus.Neuron 94,752-758 e751(2017).10. S. Shah, E. Lubeck, W. Zhou, L. Cai, seqFISH Accurately Detects Transcripts in Single Cells and Reveals Robust Spatial Organization in the Hippocampus. Neuron 94, 752-758 e751(2017).

11.P.L.Stahl et al.,Visualization and analysis of gene expression intissue sections by spatial transcriptomics.Science 353,78-82(2016).11.P.L.Stahl et al.,Visualization and analysis of gene expression intissue sections by spatial transcriptomics.Science 353,78-82(2016).

12.S.G.Rodriques et al.,Slide-seq:A scalable technology for measuringgenome-wide expression at high spatial resolution.Science 363,1463-1467(2019).12. S.G. Rodriques et al., Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463-1467 (2019).

13.S.Vickovic et al.,High-definition spatial transcriptomics for insitu tissue profiling.Nat Methods 16,987-990(2019).13. S. Vickovic et al., High-definition spatial transcriptomics for insitu tissue profiling. Nat Methods 16, 987-990 (2019).

14.R.R.Stickels etal.,Highly sensitive spatial transcriptomics atnear-cellular resolution with Slide-seqV2.Nat Biotechnol,(2020).14. R.R. Stickels et al., Highly sensitive spatial transcriptomics atnear-cellular resolution with Slide-seqV2. Nat Biotechnol, (2020).

15.K.Achim et al.,High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin.Nat Biotechnol 33,503-509(2015).15. K. Achim et al., High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol 33, 503-509 (2015).

16.R.Satija,J.A.Farrell,D.Gennert,A.F.Schier,A.Regev,Spatialreconstruction of single-cell gene expression data.Nat Biotechnol33,495-502(2015).16. R. Satija, J.A. Farrell, D. Gennert, A.F. Schier, A. Regev, Spatial reconstruction of single-cell gene expression data. Nat Biotechnol33, 495-502 (2015).

17.J.Cao et al.,Comprehensive single-cell transcriptional profilingofa multicellular organism.Science 357,661-667(2017).17.J.Cao et al.,Comprehensive single-cell transcriptional profiling of a multicellular organism.Science 357,661-667(2017).

18.A.B.Rosenberg et al.,Single-cell profiling of the developing mousebrain and spinal cord with split-pool barcoding.Science 360,176-182(2018).18. A.B. Rosenberg et al., Single-cell profiling of the developing mousebrain and spinal cord with split-pool barcoding. Science 360, 176-182(2018).

19.M.Attar et al.,A practical solution for preserving single cellsfor RNA sequencing.Sci Rep8,2151(2018).19. M. Attar et al., A practical solution for preserving single cells for RNA sequencing. Sci Rep8, 2151(2018).

20.S.Yang et al.,Decontamination of ambient RNA in single-cell RNA-seq with DecontX.Genome Biol21,57(2020).20. S. Yang et al., Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020).

21.J.C.Lee et al.,RegulatoryT cell control of systemic immunity andimmunotherapy response in liver metastasis.Sci Immunol 5,(2020).21. J.C. Lee et al., Regulatory T cell control of systemic immunity and immunotherapy response in liver metastasis. Sci Immunol 5, (2020).

22.M.Yadav et al.,Predicting immunogenic tumour mutations bycombining mass spectrometry and exome sequencing.Nature 515,572-576(2014).22. M. Yadav et al., Predicting immunogenic tumor mutations by combining mass spectrometry and exome sequencing. Nature 515, 572-576 (2014).

23.K.N.Kodumudi et al.,Immune Checkpoint Blockade to Improve TumorInfiltrating Lymphocytes for Adoptive Cell Therapy.PLoS One 11,e0153053(2016).23. K.N. Kodumudi et al., Immune Checkpoint Blockade to Improve Tumor Infiltrating Lymphocytes for Adoptive Cell Therapy. PLoS One 11, e0153053 (2016).

24.H.Tang et al.,PD-L1 on host cells is essential for PD-L1 blockade-mediated tumor regression.J Clin Invest 128,580-588(2018).24. H. Tang et al., PD-L1 on host cells is essential for PD-L1 blockade-mediated tumor regression. J Clin Invest 128, 580-588 (2018).

25.M.Efremova et al.,Targeting immune checkpoints potentiatesimmunoediting and changes the dynamics of tumor evolution.Nat Commun 9,32(2018).25. M. Efremova et al., Targeting immune checkpoints potentiate immunoediting and changes the dynamics of tumor evolution. Nat Commun 9, 32 (2018).

26.C.Tabula Muris et al.,Single-cell transcriptomics of 20 mouseorgans creates a Tabula Muris.Nature 562,367-372(2018).26.C.Tabula Muris et al.,Single-cell transcriptomics of 20 mouseorgans creates a Tabula Muris.Nature 562,367-372(2018).

27.J.Ding et al.,Systematic comparative analysis of single cell RNA-sequencing methods.bioRxiv,632216(2019).27. J. Ding et al., Systematic comparative analysis of single cell RNA-sequencing methods. bioRxiv, 632216 (2019).

28.M.J.C.Jordao et al.,Single-cell profiling identifies myeloid cellsubsets with distinct fates during neuroinflammation.Science 363,(2019).28.M.J.C.Jordao et al.,Single-cell profiling identifies myeloid cell subsets with distinct fates during neuroinflammation.Science 363,(2019).

29.S.V.Kim et al.,Modulation of cell adhesion and motility in theimmune system by Myolf.Science 314,136-139(2006).29. S.V. Kim et al., Modulation of cell adhesion and motility in the immune system by Myolf. Science 314, 136-139 (2006).

30.X Yu et al.,The Cytokine TGF-beta Promotes the Development andHomeostasis of Alveolar Macrophages.Immunity 47,903-912 e904(2017).30. X Yu et al., The Cytokine TGF-beta Promotes the Development and Homeostasis of Alveolar Macrophages. Immunity 47, 903-912 e904 (2017).

31.H.Helgeland et al.,Transcriptome profiling of human thymic CD4+andCD8+T cells compared to primary peripheral Tcells.BMC Genomics 21,350(2020).31. H. Helgeland et al., Transcriptome profiling of human thymic CD4+ and CD8+ T cells compared to primary peripheral T cells. BMC Genomics 21, 350 (2020).

32.O.J.Harrison et al.,Epithelial-derived IL-18 regulates Th17 celldifferentiation and Foxp3(+)Treg cell function in the intestine.MucosalImmunol 8,1226-1236(2015).32. O.J.Harrison et al.,Epithelial-derived IL-18 regulates Th17 cell differentiation and Foxp3(+)Treg cell function in the intestine.Mucosal Immunol 8,1226-1236(2015).

33.N.Isakov,A.Altman,PKC-theta-mediated signal delivery from the TCR/CD28 surface receptors.Front Immunol 3,273(2012).33. N. Isakov, A. Altman, PKC-theta-mediated signal delivery from the TCR/CD28 surface receptors. Front Immunol 3, 273 (2012).

34.L.E.Oikari et al.,Cell surface heparan sulfate proteoglycans asnovel markers of human neural stem cell fate determination.Stem CellRes 16,92-104(2016).34.L.E.Oikari et al.,Cell surface heparan sulfate proteoglycans as novel markers of human neural stem cell fate determination.Stem CellRes 16,92-104(2016).

35.D.Fritz,B.Stefanovic,RNA-binding protein RBMS3 is expressed inactivated hepatic stellate cells and liver fibrosis and increases expressionof transcription factor Prxl.J Mol Biol 371,585-595(2007).35. D. Fritz, B. Stefanovic, RNA-binding protein RBMS3 is expressed inactivated hepatic stellate cells and liver fibrosis and increases expression of transcription factor Prxl. J Mol Biol 371, 585-595 (2007).

36.M.Riku et al.,Down-regulation of the zinc-finger homeobox proteinTSHZ2 releases GLI1 from the nuclear repressor complex to restore itstranscriptional activity during mammary tumorigenesis.Oncotanget 7,5690-5701(2016).36. M. Riku et al., Down-regulation of the zinc-finger homeobox protein TSHZ2 releases GLI1 from the nuclear repressor complex to restore its transcriptional activity during mammary tumorigenesis. Oncotanget 7, 5690-5701 (2016).

37.H.Kalyanaraman,N.Schall,R.B.Pilz,Nitric oxide and cyclic GMPfunctions in bone.Nitric Oxide 76,62-70(2018).37. H. Kalyanaraman, N. Schall, R. B. Pilz, Nitric oxide and cyclic GMP functions in bone. Nitric Oxide 76, 62-70 (2018).

38.N.Schali et al.,Protein kinase G1 regulates bone regeneration andrescues diabetic fracture healing.,JCI Insight 5,(2020).38. N. Schali et al., Protein kinase G1 regulates bone regeneration and rescues diabetic fracture healing., JCI Insight 5, (2020).

39.J.Baboo et al.,The Impact of varying Cooling and Thawing Rates onthe Quality of Cryopreserved Human Peripheral Blood T Cells.Sci Rep 9,3417(2019).39. J. Baboo et al., The Impact of varying Cooling and Thawing Rates on the Quality of Cryopreserved Human Peripheral Blood T Cells. Sci Rep 9, 3417 (2019).

40.Q.Wang,T.Li,W.Wu,G.Ding,Interplay between mesenchymal stem celland tumor and potential application.Hum Cell 33,444-458(2020).40. Q. Wang, T. Li, W. Wu, G. Ding, Interplay between mesenchymal stem cell and tumor and potential application. Hum Cell 33, 444-458 (2020).

41.K.Van den Berge et al.,Trajectory-based differential expressionanalysis for single-cell sequencing data.Nat Commun 11,1201(2020).41. K.Van den Berge et al., Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun 11, 1201(2020).

42.J.Soikkeli et al.,Metastatic outgrowth encompasses COL-I,FN1,andPOSTN up-regulation and assembly to fibrillar networks regulating celladhesion,migration,and growth.Am J Pathol 177,387-403(2010).42. J. Soikkeli et al., Metastatic outgrowth encompasses COL-I, FN1, and POSTN up-regulation and assembly to fibrillar networks regulating cell adhesion, migration, and growth. Am J Pathol 177, 387-403 (2010).

43.Y.Wang,H.Xu,B.Zhu,Z.Qiu,Z.Lin,Systematic identification of the keycandidate genes in breast cancer stroma.Cell Mol Biol Lett23,44(2018).43. Y. Wang, H. Xu, B. Zhu, Z. Qiu, Z. Lin, Systematic identification of the key candidate genes in breast cancer stroma. Cell Mol Biol Lett23, 44 (2018).

44.J.Li et al.,Stromal microenvironment promoted infiltration inesophageal adenocarcinoma and squamous cell carcinoma:a multi-cohort gene-based analysis.Sci Rep 10,18589(2020).44. J.Li et al.,Stromal microenvironment promoted infiltration inesophageal adenocarcinoma and squamous cell carcinoma: a multi-cohort gene-based analysis. Sci Rep 10, 18589(2020).

45.Y.Gao,S.P.Yin,X.S.Xie,D.D.Xu,W.D.Du,The relationship betweenstromal cell derived SPARC in human gastric cancer tissue and itsclinicopathologic sinificance.Oncotarget 8,86240-86252(2017).45. Y.Gao, S.P.Yin, X.S.Xie, D.D.Xu, W.D.Du, The relationship between stromal cell derived SPARC in human gastric cancer tissue and its clinicopathologic sinificence. Oncotarget 8, 86240-86252 (2017).

46.A.Escudero-Esparza et al.,Complement inhibitor CSMD1 acts as tumorsuppressor in human breast cancer.Owotarget7,76920-76933(2016).46. A. Escudero-Esparza et al., Complement inhibitor CSMD1 acts as tumor suppressor in human breast cancer. Owotarget7, 76920-76933 (2016).

47.S.Ropero et al.,Epigenetic loss of the familial tumor-suppressorgene exostosin-1(EXT1)disrupts heparan sulfate synthesis in cancer cells.HumMol Genet 13,2753-2765(2004).47. S. Ropero et al., Epigenetic loss of the familial tumor-suppressorgene exostosin-1 (EXT1) disrupts heparan sulfate synthesis in cancer cells. HumMol Genet 13, 2753-2765 (2004).

48.C.Hafemeister,R.Satija,Normalization and variance stabilizationofsingle-cell RNA-seq data using regularized negative binomialregression.Genome Biol20,296(2019).48. C. Hafemeister, R. Satija, Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20, 296 (2019).

49.R.Gaujoux,C.Seoighe,A flexible R package for nonnegative matrixfactorization.BMC Bioinfiormatics 11,367(2010).49. R. Gaujoux, C. Seoighe, A flexible R package for nonnegative matrix factorization. BMC Bioinfiormatics 11, 367 (2010).

50.E.Eden,R.Navon,I.Steinfeld,D.Lipson,Z.Yakhini,GOrilla:a tool fordiscovery and visualization of enriched GO terms in ranked gene lists.BMCBioinformatics 10,48(2009).50. E. Eden, R. Navon, I. Steinfeld, D. Lipson, Z. Yakhini, G Orilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).

51.P.Carmona-Saez,R.D.Pascual-Marqui,F.Tirado,J.M.Carazo,A.Pascual-Montano,Biclustering of gene expression data by Non-smooth Non-negativeMatrix Factorization.BMC Bioinformatics 7,78(2006).51. P. Carmona-Saez, R.D. Pascual-Marqui, F. Tirado, J.M. Carazo, A. Pascual-Montano, Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization. BMC Bioinformatics 7, 78 (2006).

52.C.Giesen et al.,Highly multiplexed imaging of tumor tissues withsubcellular resolution by mass cytometry.Nat Methods 11,417-422(2014).52. C. Giesen et al., Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat Methods 11, 417-422(2014).

53.Y.Goltsev et al.,Deep Profiling of Mouse Splenic Architecture withCODEX Multiplexed Imaging.Cell174,968-981 e915(2018).53. Y.Goltsev et al., Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell174, 968-981 e915(2018).

序列表sequence listing

<110> 加利福尼亚大学董事会<110> Regents of the University of California

<120> 空间分辨单细胞RNA测序方法<120> Spatially resolved single-cell RNA sequencing method

<130> 37944.0015P1<130> 37944.0015P1

<150> US 62/979,235<150> US 62/979,235

<151> 2020-02-20<151> 2020-02-20

<160> 44<160> 44

<170> PatentIn version 3.5<170> PatentIn version 3.5

<210> 1<210> 1

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 1<400> 1

gtctcgtggg ctcggagatg tgtataagag acagcagggt gtggagcagc ctgccaa 57gtctcgtggg ctcggagatg tgtataagag acagcagggt gtggagcagc ctgccaa 57

<210> 2<210> 2

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 2<400> 2

gtctcgtggg ctcggagatg tgtataagag acagatctat tggtaccgac aggttcc 57gtctcgtggg ctcggagatg tgtataagag acagatctat tggtaccgac aggttcc 57

<210> 3<210> 3

<211> 55<211> 55

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 3<400> 3

gtctcgtggg ctcggagatg tgtataagag acagggcgag caggtggagc agcgc 55gtctcgtggg ctcggagatg tgtataagag acagggcgag caggtggagc agcgc 55

<210> 4<210> 4

<211> 55<211> 55

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 4<400> 4

gtctcgtggg ctcggagatg tgtataagag acagtctgct ctgagatgca atttt 55gtctcgtggg ctcggagatg tgtataagag acagtctgct ctgagatgca atttt 55

<210> 5<210> 5

<211> 58<211> 58

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 5<400> 5

gtctcgtggg ctcggagatg tgtataagag acagctactt cccttggtat aagcaaga 58gtctcgtggg ctcggagatg tgtataagag acagctactt cccttggtat aagcaaga 58

<210> 6<210> 6

<211> 56<211> 56

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 6<400> 6

gtctcgtggg ctcggagatg tgtataagag acagacccaa ctctkttctg gtatgt 56gtctcgtggg ctcggagatg tgtataagag acagacccaa ctctkttctg gtatgt 56

<210> 7<210> 7

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 7<400> 7

gtctcgtggg ctcggagatg tgtataagag acagaaggta cagcagagcc cagaatc 57gtctcgtggg ctcggagatg tgtataagag acagaaggta cagcagagcc cagaatc 57

<210> 8<210> 8

<211> 56<211> 56

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 8<400> 8

gtctcgtggg ctcggagatg tgtataagag acagcctgag catccacgag ggtgaa 56gtctcgtggg ctcggagatg tgtataagag acagcctgag catccacgag ggtgaa 56

<210> 9<210> 9

<211> 55<211> 55

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 9<400> 9

gtctcgtggg ctcggagatg tgtataagag acagagctga gatgcaasta ttcct 55gtctcgtggg ctcggagatg tgtataagag acagagctga gatgcaasta ttcct 55

<210> 10<210> 10

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 10<400> 10

gtctcgtggg ctcggagatg tgtataagag acagcatgga gagaaggtcg agcaaca 57gtctcgtggg ctcggagatg tgtataagag acagcatgga gagaaggtcg agcaaca 57

<210> 11<210> 11

<211> 56<211> 56

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 11<400> 11

gtctcgtggg ctcggagatg tgtataagag acagaagacc caagtggagc agagtc 56gtctcgtggg ctcggagatg tgtataagag acagaagacc caagtggagc agagtc 56

<210> 12<210> 12

<211> 56<211> 56

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 12<400> 12

gtctcgtggg ctcggagatg tgtataagag acaggtgacc cagacagaag gcctgg 56gtctcgtggg ctcggagatg tgtataagag acaggtgacc cagacagaag gcctgg 56

<210> 13<210> 13

<211> 54<211> 54

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 13<400> 13

gtctcgtggg ctcggagatg tgtataagag acaggtcctt ggttctgcag gagg 54gtctcgtggg ctcggagatg tgtataagag acaggtcctt ggttctgcag gagg 54

<210> 14<210> 14

<211> 54<211> 54

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 14<400> 14

gtctcgtggg ctcggagatg tgtataagag acagcagcag caggtgagac aaag 54gtctcgtggg ctcggagatg tgtataagag acagcagcag caggtgagac aaag 54

<210> 15<210> 15

<211> 58<211> 58

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 15<400> 15

gtctcgtggg ctcggagatg tgtataagag acagctggac tgttcatatg agacaagt 58gtctcgtggg ctcggagatg tgtataagag acagctggac tgttcatatg agacaagt 58

<210> 16<210> 16

<211> 56<211> 56

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 16<400> 16

gtctcgtggg ctcggagatg tgtataagag acagagaagg taacacagac tcagac 56gtctcgtggg ctcggagatg tgtataagag acagagaagg taacacagac tcagac 56

<210> 17<210> 17

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 17<400> 17

gtctcgtggg ctcggagatg tgtataagag acagcagtcc gtggaccagc ctgatgc 57gtctcgtggg ctcggagatg tgtataagag acagcagtcc gtggaccagc ctgatgc 57

<210> 18<210> 18

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 18<400> 18

gtctcgtggg ctcggagatg tgtataagag acaggagcag agtcctcggt ttctgag 57gtctcgtggg ctcggagatg tgtataagag acaggagcag agtcctcggt ttctgag 57

<210> 19<210> 19

<211> 58<211> 58

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 19<400> 19

gtctcgtggg ctcggagatg tgtataagag acagccagca agttaaacaa agctctcc 58gtctcgtggg ctcggagatg tgtataagag acagccagca agttaaacaa agctctcc 58

<210> 20<210> 20

<211> 55<211> 55

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 20<400> 20

gtctcgtggg ctcggagatg tgtataagag acagcctccg tttctcggct cctgg 55gtctcgtggg ctcggagatg tgtataagag acagcctccg tttctcggct cctgg 55

<210> 21<210> 21

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 21<400> 21

gtctcgtggg ctcggagatg tgtataagag acaggtgact ttgctggagc aaaaccc 57gtctcgtggg ctcggagatg tgtataagag acaggtgact ttgctggagc aaaaccc 57

<210> 22<210> 22

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 22<400> 22

gtctcgtggg ctcggagatg tgtataagag acaggacccg aaaattatcc agaaacc 57gtctcgtggg ctcggagatg tgtataagag acaggacccg aaaattatcc agaaacc 57

<210> 23<210> 23

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 23<400> 23

gtctcgtggg ctcggagatg tgtataagag acagggaccc aaagtcttac agatccc 57gtctcgtggg ctcggagatg tgtataagag acagggaccc aaagtcttac agatccc 57

<210> 24<210> 24

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 24<400> 24

gtctcgtggg ctcggagatg tgtataagag acaggagacg gctgttttcc agactcc 57gtctcgtggg ctcggagatg tgtataagag acaggagagacg gctgttttcc agactcc 57

<210> 25<210> 25

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 25<400> 25

gtctcgtggg ctcggagatg tgtataagag acagaacact aaaattactc agtcacc 57gtctcgtggg ctcggagatg tgtataagag acagaacact aaaattactc agtcacc 57

<210> 26<210> 26

<211> 34<211> 34

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 26<400> 26

gtctcgtggg ctcggagatg tgtataagag acag 34gtctcgtggg ctcggagatg tgtataagag acag 34

<210> 27<210> 27

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 27<400> 27

gtctcgtggg ctcggagatg tgtataagag acaggaggct gcagtcaccc aaagccc 57gtctcgtggg ctcggagatg tgtataagag acaggaggct gcagtcaccc aaagccc 57

<210> 28<210> 28

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 28<400> 28

gtctcgtggg ctcggagatg tgtataagag acaggaggct gcagtcaccc aaagtcc 57gtctcgtggg ctcggagatg tgtataagag acaggaggct gcagtcaccc aaagtcc 57

<210> 29<210> 29

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 29<400> 29

gtctcgtggg ctcggagatg tgtataagag acaggaagct ggagtcaccc agtctcc 57gtctcgtggg ctcggagatg tgtataagag acaggaagct ggagtcaccc agtctcc 57

<210> 30<210> 30

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 30<400> 30

gtctcgtggg ctcggagatg tgtataagag acaggatgct ggagttaccc agacacc 57gtctcgtggg ctcggagatg tgtataagag acaggatgct ggagttacccc agacacc 57

<210> 31<210> 31

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 31<400> 31

gtctcgtggg ctcggagatg tgtataagag acagaatgct ggtgtcatcc aaacacc 57gtctcgtggg ctcggagatg tgtataagag acagaatgct ggtgtcatcc aaacacc 57

<210> 32<210> 32

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 32<400> 32

gtctcgtggg ctcggagatg tgtataagag acaggatact acggttaagc agaaccc 57gtctcgtggg ctcggagatg tgtataagag acaggatact acggttaagc agaaccc 57

<210> 33<210> 33

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 33<400> 33

gtctcgtggg ctcggagatg tgtataagag acagggtggc atcattactc agacacc 57gtctcgtggg ctcggagatg tgtataagag acagggtggc atcattactc agacacc 57

<210> 34<210> 34

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 34<400> 34

gtctcgtggg ctcggagatg tgtataagag acagggagca ctcgtctatc aatatcc 57gtctcgtggg ctcggagatg tgtataagag acaggggagca ctcgtctatc aatatcc 57

<210> 35<210> 35

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 35<400> 35

gtctcgtggg ctcggagatg tgtataagag acaggactct ggggttgtcc agaatcc 57gtctcgtggg ctcggagatg tgtataagag acaggactct ggggttgtcc agaatcc 57

<210> 36<210> 36

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 36<400> 36

gtctcgtggg ctcggagatg tgtataagag acaggatgct gcagttacac agaagcc 57gtctcgtggg ctcggagatg tgtataagag acaggatgct gcagttacac agaagcc 57

<210> 37<210> 37

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 37<400> 37

gtctcgtggg ctcggagatg tgtataagag acaggttgct ggagtaaccc agactcc 57gtctcgtggg ctcggagatg tgtataagag acagggttgct ggagtaaccc agactcc 57

<210> 38<210> 38

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 38<400> 38

gtctcgtggg ctcggagatg tgtataagag acagaattca aaagtcattc agactcc 57gtctcgtggg ctcggagatg tgtataagag acagaattca aaagtcattc agactcc 57

<210> 39<210> 39

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 39<400> 39

gtctcgtggg ctcggagatg tgtataagag acaggacatg aaagtaaccc agatgcc 57gtctcgtggg ctcggagatg tgtataagag acaggacatg aaagtaaccc agatgcc 57

<210> 40<210> 40

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 40<400> 40

gtctcgtggg ctcggagatg tgtataagag acagagtgtc ctcctctacc aaaagcc 57gtctcgtggg ctcggagatg tgtataagag acagagtgtc ctcctctacc aaaagcc 57

<210> 41<210> 41

<211> 57<211> 57

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 引物<223> Primer

<400> 41<400> 41

gtctcgtggg ctcggagatg tgtataagag acaggctcag actatccatc aatggcc 57gtctcgtggg ctcggagatg tgtataagag acaggctcag actatccatc aatggcc 57

<210> 42<210> 42

<211> 450<211> 450

<212> PRT<212> PRT

<213> 未知物(Unknown)<213> Unknown (Unknown)

<220><220>

<223> E54K/L372P Tn5转座酶<223> E54K/L372P Tn5 transposase

<400> 42<400> 42

Met Ile Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser ValMet Ile Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser Val

1 5 10 151 5 10 15

Phe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu ValPhe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu Val

20 25 30 20 25 30

Asn Val Ala Ala Gln Leu Ala Lys Tyr Ser Gly Lys Ser Ile Thr IleAsn Val Ala Ala Gln Leu Ala Lys Tyr Ser Gly Lys Ser Ile Thr Ile

35 40 45 35 40 45

Ser Ser Glu Gly Ser Lys Ala Met Gln Glu Gly Ala Tyr Arg Phe IleSer Ser Glu Gly Ser Lys Ala Met Gln Glu Gly Ala Tyr Arg Phe Ile

50 55 60 50 55 60

Arg Asn Pro Asn Val Ser Ala Glu Ala Ile Arg Lys Ala Gly Ala MetArg Asn Pro Asn Val Ser Ala Glu Ala Ile Arg Lys Ala Gly Ala Met

65 70 75 8065 70 75 80

Gln Thr Val Lys Leu Ala Gln Glu Phe Pro Glu Leu Leu Ala Ile GluGln Thr Val Lys Leu Ala Gln Glu Phe Pro Glu Leu Leu Ala Ile Glu

85 90 95 85 90 95

Asp Thr Thr Ser Leu Ser Tyr Arg His Gln Val Ala Glu Glu Leu GlyAsp Thr Thr Ser Leu Ser Tyr Arg His Gln Val Ala Glu Glu Leu Gly

100 105 110 100 105 110

Lys Leu Gly Ser Ile Gln Asp Lys Ser Arg Gly Trp Trp Val His SerLys Leu Gly Ser Ile Gln Asp Lys Ser Arg Gly Trp Trp Val His Ser

115 120 125 115 120 125

Val Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu HisVal Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu His

130 135 140 130 135 140

Gln Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu LysGln Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu Lys

145 150 155 160145 150 155 160

Glu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg MetGlu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg Met

165 170 175 165 170 175

Gly Ser Met Met Ser Asn Val Ile Ala Val Cys Asp Arg Glu Ala AspGly Ser Met Met Ser Asn Val Ile Ala Val Cys Asp Arg Glu Ala Asp

180 185 190 180 185 190

Ile His Ala Tyr Leu Gln Asp Lys Leu Ala His Asn Glu Arg Phe ValIle His Ala Tyr Leu Gln Asp Lys Leu Ala His Asn Glu Arg Phe Val

195 200 205 195 200 205

Val Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr LeuVal Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr Leu

210 215 220 210 215 220

Tyr Asp His Leu Lys Asn Gln Pro Glu Leu Gly Gly Tyr Gln Ile SerTyr Asp His Leu Lys Asn Gln Pro Glu Leu Gly Gly Tyr Gln Ile Ser

225 230 235 240225 230 235 240

Ile Pro Gln Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn ArgIle Pro Gln Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn Arg

245 250 255 245 250 255

Pro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg Ile Thr LeuPro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg Ile Thr Leu

260 265 270 260 265 270

Lys Gln Gly Asn Ile Thr Leu Asn Ala Val Leu Ala Glu Glu Ile AsnLys Gln Gly Asn Ile Thr Leu Asn Ala Val Leu Ala Glu Glu Ile Asn

275 280 285 275 280 285

Pro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Ser GluPro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Ser Glu

290 295 300 290 295 300

Pro Val Glu Ser Leu Ala Gln Ala Leu Arg Val Ile Asp Ile Tyr ThrPro Val Glu Ser Leu Ala Gln Ala Leu Arg Val Ile Asp Ile Tyr Thr

305 310 315 320305 310 315 320

His Arg Trp Arg Ile Glu Glu Phe His Lys Ala Trp Lys Thr Gly AlaHis Arg Trp Arg Ile Glu Glu Phe His Lys Ala Trp Lys Thr Gly Ala

325 330 335 325 330 335

Gly Ala Glu Arg Gln Arg Met Glu Glu Pro Asp Asn Leu Glu Arg MetGly Ala Glu Arg Gln Arg Met Glu Glu Pro Asp Asn Leu Glu Arg Met

340 345 350 340 345 350

Val Ser Ile Leu Ser Phe Val Ala Val Arg Leu Leu Gln Leu Arg GluVal Ser Ile Leu Ser Phe Val Ala Val Arg Leu Leu Gln Leu Arg Glu

355 360 365 355 360 365

Ser Phe Thr Pro Pro Gln Ala Leu Arg Ala Gln Gly Leu Leu Lys GluSer Phe Thr Pro Pro Gln Ala Leu Arg Ala Gln Gly Leu Leu Lys Glu

370 375 380 370 375 380

Ala Glu His Val Glu Ser Gln Ser Ala Glu Thr Val Leu Thr Pro AspAla Glu His Val Glu Ser Gln Ser Ala Glu Thr Val Leu Thr Pro Asp

385 390 395 400385 390 395 400

Glu Cys Gln Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg LysGlu Cys Gln Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg Lys

405 410 415 405 410 415

Glu Lys Ala Gly Ser Leu Gln Trp Ala Tyr Met Ala Ile Ala Arg LeuGlu Lys Ala Gly Ser Leu Gln Trp Ala Tyr Met Ala Ile Ala Arg Leu

420 425 430 420 425 430

Gly Gly Phe Met Asp Ser Lys Arg Thr Gly Ile Ala Ser Trp Gly AlaGly Gly Phe Met Asp Ser Lys Arg Thr Gly Ile Ala Ser Trp Gly Ala

435 440 445 435 440 445

Leu TrpLeu Trp

450 450

<210> 43<210> 43

<211> 66<211> 66

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> Oligo(dT)引物<223> Oligo(dT) Primer

<220><220>

<221> misc_feature<221> misc_feature

<222> (23)..(48)<222> (23)..(48)

<223> n是a、c、g或t<223> n is a, c, g or t

<220><220>

<221> misc_feature<221> misc_feature

<222> (33)..(48)<222> (33)..(48)

<223> 独特空间二维码<223> Unique Space QR Code

<400> 43<400> 43

ctacacgacg ctcttccgat ctnnnnnnnn nnnnnnnnnn nnnnnnnntt tttttttttt 60ctacacgacg ctcttccgat ctnnnnnnnn nnnnnnnnnn nnnnnnnnntttttttttttt 60

tttttt 66tttttt 66

<210> 44<210> 44

<211> 70<211> 70

<212> DNA<212>DNA

<213> 人工序列(Artificial Sequence)<213> Artificial Sequence

<220><220>

<223> 索引i5引物<223> index i5 primer

<220><220>

<221> misc_feature<221> misc_feature

<222> (30)..(37)<222> (30)..(37)

<223> n是a、c、g或t<223> n is a, c, g or t

<400> 44<400> 44

aatgatacgg cgaccaccga gatctacacn nnnnnnnaca ctctttccct acacgacgct 60aatgatacgg cgaccaccga gatctacacn nnnnnnnaca ctctttccct acacgacgct 60

cttccgatct 70cttccgatct70