Illinois Data Bank Dataset Search Results
Results
published:
2026-03-20
Wu, Yulun; Kudeki, Erhan
(2026)
Arecibo ISR CLP/ULP/LULP ion-line spectra obtained from USRP receiver with 500 kHz bandwidth and 120-1400 km altitude range, experiment dates September 23-26, 2016. Used for Joint inversions of coded and uncoded long pulse1 F-region ISR returns measured at Arecibo.
keywords:
Remote sensing; Incoherent scatter radar; Arecibo Observatory
published:
2026-03-26
Leakey, Andrew; Fischer, James
(2026)
Includes three different types of stacks, in two folders:
"REC_RAW_STACKS.zip" contains:
(1) Raw gray-scale reconstructed microCT x-ray scans, in the form of individual stacks per sample.
"ML_STACKS.zip" contains:
(2) Stacks that have been labeled using a machine-learning mask-RCNN pipeline identifying epidermis, mesophyll, airspace, vascular bundle, and background.
(3) Stacks that have stomata locations labeled using a small point.
The three types of stacks were used to calculate a variety of anatomical and physiological traits, using ImageJ Macros provided on Github: https://github.com/leakey-lab/microCT-Macros-Fischer. Young leaves from WT and transgenic Sorghum plants.
CSV "microCT_REC2_META.csv" contains metadata, including sample/sub-sample labels.
keywords:
Stomata; microCT; Imaging; Sorghum; 3D; Gas exchange; Segmentation
published:
2026-03-24
Kamarei, Farhad; Sozio, Fabio; Lopez-Pamies, Oscar
(2026)
This dataset accompanies the research paper "The single edge notch fracture test for viscoelastic elastomers" by Kamarei, Sozio, and Lopez-Pamies, published in the Journal of Theoretical, Computational and Applied Mechanics (2026). Making use of the Griffith criticality condition introduced by Shrimali and Lopez-Pamies (Extreme Mechanics Letters 58: 101944, 2023), the paper presents a comprehensive analysis of the single edge notch fracture test for viscoelastic elastomers — combining a parametric study with direct comparisons against experiments — to reveal how non-Gaussian elasticity, nonlinear viscosity, and intrinsic fracture energy interact to govern fracture nucleation from a pre-existing crack. The dataset contains figure data, numerical results, and supporting materials for reproducing the findings of the paper.
keywords:
Rubber; Elastomers; Adhesives; Cavitation; Fracture
published:
2025-10-01
Crawford, Reed; Wolff, Patrick; Pierce, Ellen; Braun de Torrez, Elizabeth; Pourshoushtari, Roxanne; O'Keefe, Joy
(2025)
This dataset contains the raw Florida bonneted bat echolocation calls recorded in southern Florida, USA from the years 2021 and 2022. This dataset also includes our artificial roost microclimate data (2021 only) and observations of bats recorded in our artificial roosts (2021 and 2022). Lastly, we include the R script required to analyze the Florida bonneted bat echolocation calls and the R script to produce the supplemental table and supplemental figure for our microclimate data.
keywords:
bats; roosts; acoustics
published:
2026-02-11
Sponzilli, Ryan; Looney, Leslie
(2026)
Data for the publication Protostellar Outflows Shed Light on the Dominant Close Companion Star Formation Pathways (Sponzilli et al). Contains the fits files, data files, and python scripts. The entire analysis is containerized with Docker. The `Dockerfile` in the root folder can be used to build the image.
<b>Note:</b> __MACOSX folder or files starting with dot can be safely ignored or removed.
keywords:
Protobinaries; ALMA; FITS; 12CO imaging of outflows in Perseus and Orion
published:
2024-02-16
Mohasel Arjomandi, Hossein; Korobskiy, Dmitriy; Chacko, George
(2024)
This dataset contains five files. (i) open_citations_jan2024_pub_ids.csv.gz, open_citations_jan2024_iid_el.csv.gz, open_citations_jan2024_el.csv.gz, and open_citation_jan2024_pubs.csv.gz represent a conversion of Open Citations to an edge list using integer ids assigned by us. The integer ids can be mapped to omids, pmids, and dois using the open_citation_jan2024_pubs.csv and open_citations_jan2024_pub_ids.scv files. The network consists of 121,052,490 nodes and 1,962,840,983 edges. Code for generating these data can be found https://github.com/chackoge/ERNIE_Plus/tree/master/OpenCitations.
(ii) The fifth file, baseline2024.csv.gz, provides information about the metadata of PubMed papers. A 2024 version of PubMed was downloaded using Entrez and parsed into a table restricted to records that contain a pmid, a doi, and has a title and an abstract. A value of 1 in columns indicates that the information exists in metadata and a zero indicates otherwise. Code for generating this data: https://github.com/illinois-or-research-analytics/pubmed_etl. If you use these data or code in your work, please cite https://doi.org/10.13012/B2IDB-5216575_V1.
keywords:
PubMed
published:
2023-03-16
Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George
(2023)
Curated networks and clustering output from the manuscript: Well-Connected Communities in Real-World Networks https://arxiv.org/abs/2303.02813
keywords:
Community detection; clustering; open citations; scientometrics; bibliometrics
published:
2024-06-04
Park, Minhyuk; Tabatabaee, Yasamin; Warnow, Tandy; Chacko, George
(2024)
This dataset contains files and relevant metadata for real-world and synthetic LFR networks used in the manuscript "Well-Connectedness and Community Detection (2024) Park et al. presently under review at PLOS Complex Systems. The manuscript is an extended version of Park, M. et al. (2024). Identifying Well-Connected Communities in Real-World and Synthetic Networks. In Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-031-53499-7_1. “The Overview of Real-World Networks image provides high-level information about the seven real-world networks.
TSVs of the seven real-world networks are provided as [network-name]_cleaned to indicate that duplicated edges and self-loops were removed, where column 1 is source and column 2 is target.
LFR datasets are contained within the zipped file. Real-world networks are labeled _cleaned_ to indicate that duplicate edges and self loops were removed.
#LFR datasets for the Connectivity Modifier (CM) paper
### File organization
Each directory `[network-name]_[resolution-value]_lfr` includes the following files:
* `network.dat`: LFR network edge-list
* `community.dat`: LFR ground-truth communities
* `time_seed.dat`: time seed used in the LFR software
* `statistics.dat`: statistics generated by the LFR software
* `cmd.stat`: command used to run the LFR software as well as time and memory usage information
published:
2026-02-01
Xu, Xiaotian; Yao, Yu; Liu, Yicen; Curtis, Jeffrey; West, West; Riemer, Nicole
(2026)
This dataset contains simulation results from PartMC-MOSAIC and WRF-PartMC that used in the journal article: Quantifying the Impact of Surfactants on Cloud Condensation Nuclei Activity Using a Particle-Resolved Model. Two compressed folder are uploaded here, one is for the data that used in this article, the other folder is the python scripts to process the data. For more details of the uploaded files, please check the README file.
keywords:
Surfactants; CCN; Effective surface tension
published:
2026-03-23
Han, Myung-Ja (MJ); Heng, Greta; Lampron, Patricia; Kudeki, Deren
(2026)
The dataset includes data used for the MARC-to-BIBFRAME conversion, as well as code developed and used for reconciling BIBFRAME Work and Hub data with the Library of Congress BIBFRAME database.
The dataset is organized into three ZIP files: MARC, BIBFRAME, and Work-Reconciliation.
The MARC and BIBFRAME ZIP files each contain three sets of records: Concerto (86 records), Hamlet (8,678 records), and Local (237 records).
The Work-Reconciliation ZIP file includes the following components:
1. reconcileWorks.py: a script that adds links to BIBFRAME records generated using the marc2bibframe2 tool
2. README.md: documentation describing how to run the script, required inputs, and the methodology for selecting links from Library of Congress search results
3. requirements.txt: a list of Python dependencies required to execute the script
4. notes.txt: supplementary notes on converting MARCXML to BIBFRAME using marc2bibframe2, including input requirements and setup considerations
keywords:
MARC to BIBFRAME conversion; reconciliation at scale; BIBFRAME Work; BIBFRAME Hub
published:
2026-03-16
Dingilian, Armine; Kurella, Aarnah; Chamria, Div; Mitchell, Cheyenne; Dhruva, Dhananjay; Durden, David; Backlund, Mikael
(2026)
This folder contains the data and analysis code used to produce the results reported in "Quantifying classical and quantum bounds for resolving closely spaced, non-interacting, simultaneously emitting dipole sources in optical microscopy", (accepted, J. Chem. Phys. 2026).
published:
2026-03-13
Majeed, Fahd; Khanna, Madhu
(2026)
An economic model was developed that incorporates spatially varying joint yield and price distributions for the multiple crop choices a farmer faces when choosing between conventional and bioenergy crops. The model is developed in Matlab, and has options for no, annual and upfront payment results.
keywords:
Carbon; modeling; Sustainable Aviation Fuel
published:
2026-02-17
Nie, Ke; Bradford, J. Nofear; Mandal, Supriya; Bista, Aayam; Pfaff, Wolfgang; Kou, Angela
(2026)
This dataset contains all the raw and processed data used to generate the figures presented in the main text and the appendix of the paper "Fluxonium as a control qubit for bosonic quantum information". It also includes code for data analysis and figure generation.
keywords:
superconducting qubit; fluxonium; bosonic control; quantum information
published:
2026-02-09
Park, Minhyuk; Chacko, George
(2026)
This dataset consists of a directed network in edge list format where nodes correspond to articles in the scientific literature and edges represent citations. The network was constructed by seed set expansion (two rounds of citing and cited papers ) of the article (seed node) reporting the discovery of PI 3-Kinase activity. " Malcolm Whitman, C Peter Downes, Marilyn Keeler, Tracy Keller, and Lewis Cantley. (1988) Type I phosphatidylinositol kinase makes a novel inositol phospholipid, phosphatidylinositol-3-phosphate. Nature, 332(6165):644–646." The edge list comprises 17,970,340 nodes and 127,255,020 edges.
The dataset was obtained from the Dimensions database via a two-level expansion of the seed node (article). The first expansion included four groups of nodes: the seed node; all publications cited by the seed node; all publications citing the seed node; and all publications cited by publications citing the seed node. The second expansion included all nodes that either cited or were cited by a node in the first expansion set.
Node ids used were converted from the proprietary identifiers in Dimensions using a zero-based sequence of integer_ids [0: (n-1)]. Access to the original identifiers requires a license from Digital Science.
published:
2026-02-13
Frederick, Samuel; Mohebalhojeh, Matin; Curtis, Jeffrey; West, Matthew; Riemer, Nicole
(2026)
This dateset contains data files necessary to replicate figures from "Idealized Particle-Resolved Large-Eddy Simulations to Evaluate the Impact of Emissions Spatial Heterogeneity on CCN Activity" submitted to Atmospheric Chemistry and Physics.
Within the compressed folder data.zip are two subdirectories, "processed_data" and "spatial-het". The "processed_data" directory contains netCDF files which contain a subset of simulation output used in figure generation. The "spatial-het" subdirectory contains a .csv file with spatial heterogeneity values computed via an exact algorithm of the spatial heterogeneity metric described by Mohebalhojeh et al. 2025. The subdirectory "sh-patterns" contains .csv files for each emissions scenario. Each entry corresponds to a single grid cell over a domain of dimension 100x100 (lateral resolution of the computational domain employed in this paper).
Within scripts.zip are python notebooks for generating figures. Additional python modules are included which contain helper functions for notebooks. Furthermore, a Fortran version of the spatial heterogeneity metric is included alongside shells scripts for creating a python environment in which the code can be compiled and convert into a Python module. Note that the create_env.sh and compile_nsh.sh scripts must be run prior to executing cells in notebooks to make use of the spatial heterogeneity subroutines.
<b>*Note*:</b> New in this V3: During review, a bug regarding vertical diffusion of particles was discovered in WRF-PartMC which necessitated re-running simulations. We present new simulations with diffusion fixed. Furthermore, we have run additional simulations in response to reviewer comments--simulations with emissions turned off at t = 4 h to investigate reversible partitioning and simulations with the RH raised near saturation throughout the domain to model the effects of co-condensation. The README PDF has been updated to reflect changes to the dataset collection. Also, we have added a shell script in scripts_v3.zip which was used to process simulation output and create the data subsets contained in data_v3.zip. Lastly, notebooks were re-run with updated datasets to create manuscript figures and additional plotting routines were added for new figures pertaining to the requested simulations.
keywords:
Atmospheric chemistry; aerosols; Particle-resolved modeling; spatial heterogeneity
published:
2026-03-12
Acharya, Rishi; Gerber, Eli; Bielinski, Nina; Aguirre, Hannah E.; Kim, Younsik; Bernal-Choban, Camille; Tenkila, Gaurav; Sheikh, Suhas; Mahaadev, Pranav; Hoveyda-Marashi, Faren; ROYCHOWDHURY, SUBHAJIT; Shekhar, Chandra; Felser, Claudia; Abbamonte, Peter; Wieder, Benjamin; Mahmood, Fahad
(2026)
This repository contains source data for key plots presented in the manuscript "Plasmon-driven exciton formation in a non-equilibrium Fermi liquid."
Experimental data that was analyzed in Igor Pro 8 are presented as the .pxp files used to generate individual sub-plots. Electronic spectral function calculations are provided as .txt files, in which consecutive rows refer to the meshgrid x coordinate, y coordinate, spectral function (and, where relevant, axis-projected local angular momentum). We additionally include the Wannier model and DFT-obtained bulk band structure on which the Wannier model was based.
Files are named as the number of the figure in the manuscript to which they correspond, with additional details included where necessary.
<b>Details of file names:</b>
2a_DOS_Lxz_Ek_KGM_40layer_xnum_800kpt_tot.txt: Density of states, xz-axis projected local orbital angular momentum, for 800 points along the K-Gamma-M path, for a 40-layer model.
2c_composite_y.pxp: ARPES (angle-resolved photoemission spectroscopy) spectra along the ky axis, including both a scan near the Fermi level and a scan at high kinetic energies.
2d_LCP_RCP_diff_Sect_20K.pxp: difference between ARPES constant energy cuts at T=20 K at E0 + 0.23 eV taken with left- and right-circularly polarized photons. The polarization-integrated intensity at the constant energy cut is also included.
2e_DOS_L45_E11pt79_m0pt25to0pt25_xnum_800kpt_tot.txt: Density of states, xz-projected local orbital angular momentum, and corresponding k-points in two dimensions from ab-initio electronic structure calculations for a constant-energy cut.
3a_[x]_[y]ps: ARPES cut under excitation at a fluence of x uJ/cm2, measured y ps after photoexcitation. Measurements were performed at 9 K.
3b_[x]: Energy distribution curves under excitation at a fluence x uJ/cm2 at selected delay times after photoexcitation.
4a_ImSigma_vs_temperature.pxp: Imaginary self energy (extracted from ARPES linewidths) at different energies above E0 for selected lattice temperatures.
4b_EELS_lowE.pxp: Electron energy loss spectrum over a low energy range
5b_diff_55m15.pxp: Difference between momentum-integrated Tr-ARPES traces at 55 uJ/cm2 and 15 uJ/cm2 photoexcitation. Time-dependent intensity at each energy level has been normalized to a maximum of 1 for each individual fluence prior to subtraction.
5d_invtau_at_EX_vs_fluence.pxp: decay rate at a specified energy EX for different excitation fluences, from single exponential fits.
<b>NOTE: Analyses based on the Wannier model presented here should cite both the associated Article and this dataset. For all other files in the repository, citing the dataset alone is sufficient.</b>
published:
2026-02-11
Kim, Hyunhwa; Purba, Denissa Sari Darmawi; Kontou, Eleftheria
(2026)
The dataset and code enable replication of the case study in Section 6 titled "California wildfire energy supply logistics" of the Transportation Research Part E: Logistics and Transportation Review published paper "Bidirectional Energy Supply Logistics Using Uncrewed Electric Aerial and Ground Vehicles: A Two-Echelon Location-Routing Problem with Resource-Constrained Demand Allocation and Time Windows."
keywords:
electric vehicle; energy supply logistics; location-routing problem; bidirectional energy; uncrewed aerial vehicle
published:
2026-03-09
Nambiar, Ananthan; Dubinkina, Veronika; Liu, Simon; Maslov, Sergei
(2026)
mRNA levels of all genes in a genome is a critical piece of information defining the overall state of the cell in a given environmental condition. Being able to reconstruct such condition-specific expression in fungal genomes is particularly important to metabolically engineer these organisms to produce desired chemicals in industrially scalable conditions. Most previous deep learning approaches focused on predicting the average expression levels of a gene based on its promoter sequence, ignoring its variation across different conditions. Here we present FUN-PROSE—a deep learning model trained to predict differential expression of individual genes across various conditions using their promoter sequences and expression levels of all transcription factors. We train and test our model on three fungal species and get the correlation between predicted and observed condition-specific gene expression as high as 0.85. We then interpret our model to extract promoter sequence motifs responsible for variable expression of individual genes. We also carried out input feature importance analysis to connect individual transcription factors to their gene targets. A sizeable fraction of both sequence motifs and TF-gene interactions learned by our model agree with previously known biological information, while the rest corresponds to either novel biological facts or indirect correlations.
keywords:
Genomics; Modeling
published:
2026-03-09
Lee, Jung Woo; Vittore, Kayla Marie; Namoi, Nictor; Hwang, Soonho; Lee, DoKyoung
(2026)
Understanding how establishment practices influence the mechanisms underlying Miscanthus × giganteus (miscanthus) productivity and canopy development is critical for optimizing management. Data was collected during the juvenile (2011–2013) and mature (2024) phases of a long-term field experiment established in Urbana, Illinois, to evaluate the effects of propagation method (plug propagation [PP] and rhizome propagation [RP]), planting density (1.0, 0.75, and 0.25 plants m⁻²), and nitrogen application (0 and 67 kg N ha⁻¹) on end-of-season biomass yield, tiller mass, tiller density, and tiller height. Linear regression models identified the dominant predictors of yield across stand ages and management regimes. Planting density, nitrogen (N) application, and propagation method significantly influenced early yield and canopy development. During the juvenile phase, biomass yield was driven by tiller density due to canopy expansion; in the mature phase, yield became driven by tiller mass. The PP plots produced higher tiller density than the RP plots, resulting in faster canopy closure and higher juvenile-phase yields. Rhizome-propagated (RP) plots produced lower tiller density, but individual tillers were 3.3–6.4 g tiller−1 heavier than PP tillers. After the canopy reached equilibrium, the PP and RP yields were similar because greater RP tiller mass compensated for its lower tiller density. Higher planting density resulted in greater yield and tiller density during the second year (2012), but this effect was absent from the third year (2013) onward. In the juvenile phase, N fertilization enhanced yield by 1.6–3.4 Mg ha−1. Initiating fertilization in 2013 on unfertilized plots produced biomass similar to that in fertilized plots, suggesting yield recovery in the mature phase. These findings revealed that establishment strategies, including propagation method and planting density, influence juvenile miscanthus canopy development and productivity, transitioning from tiller-density- to mass-dominated yields, but not mature phase productivity.
keywords:
Miscanthus
published:
2026-03-05
Xue, Xueyi; Beuchat, Gabriel; Wang, Jiang; Yu, Ya-Chi; Moose, Stephen; Chen, Jin; Chen, Li-Qing
(2026)
Sweet sorghum has emerged as a promising source of bioenergy mainly due to its high biomass and high soluble sugar yield in stems. Studies have shown that loss-of-function Dry locus alleles have been selected during sweet sorghum domestication, and decapitation can further boost sugar accumulation in sweet sorghum, indicating that the potential for improving sugar yields is yet to be fully realized. To maximize sugar accumulation, it is essential to gain a better understanding of the mechanism underlying the massive accumulation of soluble sugars in sweet sorghum stems in addition to the Dry locus. We performed a transcriptomic analysis upon decapitation of near-isogenic lines for mutant (d, juicy stems, and green leaf midrib) and functional (D, dry stems and white leaf midrib) alleles at the Dry locus. Our analysis revealed that decapitation suppressed photosynthesis in leaves, but accelerated starch metabolic processes in stems. SbbHLH093 negatively correlates with sugar levels supported by genotypes (DD vs. dd), treatments (control vs. decapitation), and developmental stages post anthesis (3d vs.10d). D locus gene SbNAC074A and other programmed cell death-related genes were down regulated by decapitation, while sugar transporter-encoding gene SbSWEET1A was induced. Both SbSWEET1A and Invertase 5 were detected in phloem companion cells by RNA in situ assay. Loss of the SbbHLH093 homolog, AtbHLH093, in Arabidopsis led to a sugar accumulation increase. This study provides new insights into sugar accumulation enhancement in bioenergy crops, which can be potentially achieved by reducing reproductive sink strength and enhancing phloem unloading.
keywords:
Transcriptomics
published:
2026-03-05
Bista, Aayam; Thibodeau, Matthew; Nie, Ke; Kaicheung, Chow; Clark, Bryan; Kou, Angela
(2026)
This dataset contains all the raw and processed data used to generate the figures presented in the main text and the appendix of the paper "Readout-induced leakage of the fluxonium qubit", Physical Review Applied, 2026 (https://doi.org/10.1103/wjdb-4814) . It also includes code for data analysis and code for generating the figures.
keywords:
fluxonium; dispersive readout; superconducting qubits; quantum information
published:
2026-03-04
Tran, Vinh; Mishra, Somesh; Sarang, Bhagwat; Shafaei, Saman; Shen, Yihui; Allen, Jayne; Tan, Shih-I; Fatma, Zia; Rabinowitz, Joshua; Guest, Jeremy; Singh, Vijay; Zhao, Huimin
(2026)
Microbial production of succinic acid (SA) at an industrially relevant scale has been hindered by high downstream processing costs arising from neutral pH fermentation for over three decades. Here, we metabolically engineer the acid-tolerant yeast Issatchenkia orientalis for SA production, attaining the highest titers in sugar-based media at low pH (pH 3) in fed-batch fermentations, i.e. 109.5 g/L in minimal medium and 104.6 g/L in sugarcane juice medium. We further perform batch fermentation using sugarcane juice medium in a pilot-scale fermenter (300×) and achieve 63.1 g/L of SA, which can be directly crystallized with a yield of 64.0%. Finally, we simulate an end-to-end low-pH SA production pipeline, and techno-economic analysis and life cycle assessment indicate our process is financially viable and can reduce greenhouse gas emissions by 34–90% relative to fossil-based production processes. We expect I. orientalis can serve as a general industrial platform for production of organic acids.
keywords:
Metabolomics; Modeling
published:
2026-03-04
Arnav, Arushi; Zhang, Rui; Karakoc, Deniz Berfin; Konar, Megan
(2026)
This dataset provides estimates of annual agricultural and food commodity flows (in kg) between all county pairs within the United States from 2018 to 2022. The database provides 343.7 million data points, since pairwise information is provided between 3134 counties, for 7 commodity categories, and 5 time periods. The commodity categories correspond to the Standardized Classification of Transported Goods and are:
- SCTG 1: Iive animals and fish
- SCTG 2: cereal grains
- SCTG 3: agricultural products (except for animal feed, cereal grains, and forage products)
- SCTG 4: animal feed, eggs, honey, and other products of animal origin
- SCTG 5: meat, poultry, fish, seafood, and their preparations
- SCTG 6: milled grain products and preparations, and bakery products
- SCTG 7: other prepared foodstuffs, fats and oils
For additional information, please see the related paper by Arnav et al. (2026) in Environmental Research: Food Systems. http://iopscience.iop.org/article/10.1088/2976-601X/ae487c.
keywords:
food flows; high-resolution; county-scale; time-series; United States
published:
2026-01-22
Cao, Yanghui; Dietrich, Christopher H.; Dmitriev, Dmitry A.; Zou, Hongfen; Xue, Qingquan; Zhang, Yalin
(2026)
The following 5 files were used to reconstruct the phylogeny of the Membracoidea.
1. Taxon_sampling.csv: contains the sample IDs (1st column, used in the alignments) and the taxonomic information (2nd to 6th columns) for 269 samples.
2. concatenated_aa_.phy: a concatenated amino acid dataset with 52,987 amino acid positions. This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
3. concatenated_nt.phy: a concatenated nucleotide dataset with all codon positions included (158,961 nucleotide positions). This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
4. concatenated_12nt.phy: a concatenated nucleotide dataset with the third codon positions excluded (105,974 nucleotide positions). This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
5. Individual_gene_alignment.zip: contains 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5.
keywords:
Auchenorrhyncha; evolution; phylogeny; timetree
published:
2026-03-02
Liu, Xing; Wickland, Daniel; Borges dos Santos, Lucas; Hudson, Karen; Hudson, Matthew
(2026)
Height is a critical component of plant architecture, significantly affecting crop yield. The genetic basis of this trait in soybean remains unclear. In this study, we report the characterization of the Compact mutant of soybean, which has short internodes. The candidate gene was mapped to chromosome 17, and the interval containing the causative mutation was further delineated using biparental mapping. Whole-genome sequencing of the mutant revealed an 8.7 kb deletion in the promoter of the Glyma.17g145200 gene, which encodes a member of the class III gibberellin (GA) 2-oxidases. The mutation has a dominant effect, likely via increased expression of the GA 2-oxidase transcript observed in green tissue, as a result of the deletion in the promoter of Glyma.17g145200. We further demonstrate that levels of GA precursors are altered in the Compact mutant, supporting a role in GA metabolism, and that the mutant phenotype can be rescued with exogenous GA3. We also determined that overexpression of Glyma.17g145200 in Arabidopsis results in dwarfed plants. Thus, gain of promoter activity in the Compact mutant leads to a short internode phenotype in soybean through altered metabolism of gibberellin precursors. These results provide an example of how structural variation can control an important crop trait and a role for Glyma.17g145200 in soybean architecture, with potential implications for increasing crop yield.
keywords:
Biomass Analytics; Genomics