archiveteam.org

US Government - Archiveteam

  • ️Wed May 15 2024
US Government
US Government logo
Status Online! but Endangered
Archiving status In progress...
Archiving type DPoS
Project source usgovernment-grab, usgovernment-items
Project tracker usgovernment
IRC channel #UncleSamsArchive (on hackint)
Data[how to use] archiveteam_usgovernment

Discovery

An official list of all registered .gov domains and federal .gov domains is available. The raw CSV files and the .gov zone file are also available on GitHub

Content at risk

Site Name Reason Archival Notes Status
https://data.gov data.gov There have been reports of datasets disappearing from the website[1] though this behavior might be normal due to the way that the site collects datasets from other locations. https://catalog.data.gov
https://inventory.data.gov
https://resources.data.gov
https://strategy.data.gov
https://sdg.data.gov
GitHub
job:4hb15f3ijn846c1dw0w58k4fe
job:4qlh2ol2vq2i525747l0yq6a4
job:25o494lfnnlxtobegl9grx7tt
job:e1ioqt5kilh8l4irihid8sqoq
job:79u49omgtqkj83cnpyuhx0xr8
job:akwvpyvnzeuhrvgh51tokrmsv
https://cdc.gov Centers for Disease Control Directed to pause communication[2] along with other health agencies. https://data.cdc.gov/
https://ftp.cdc.gov/
GitHub
https://cdc.gov -> job:hd3tvx4w14ybj2al0peewcv
https://ftp.cdc.gov/ -> job:8zn8f6a2620t1tnje3f1cyr2o
https://data.cdc.gov/ -> job:1u2ougx4kn6ueaiqddwjfeib7
https://www.ncei.noaa.gov/ National Centers for Environmental Information (Some?) data is linked to from data.gov.
It appears to be possible to enumerate datasets with 7-digit integer IDs starting at 0000001, e.g. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:0000001. Legacy URL format that redirects appears to be http://accession.nodc.noaa.gov/0000001
https://www.nccs.nasa.gov/services/data-collections NASA Center for Climate Simulation Data
https://www.ipcc-data.org IPCC Data Distribution Centre Appears to have sequential IDs
https://www.bco-dmo.org/ Biological and Chemical Oceanography Data Management Office Appears to have sequential IDs
https://ncela.ed.gov/ National Clearinghouse for English Language Acquisition Department of Education has been threatened many, many times Its "Resource Library" section is mainly a list of links to both internal and external links resources. e.g. could be PDFs or SoundCloud links.

Domains and properties that require extra scripting

Site Name Reason Archival Notes Status
https://fdic.gov
https://banks.data.fdic.gov/bankfind-suite/bankfind
https://orders.fdic.gov/s/?source=govdelivery
https://catalog.fdic.gov/catalog/s/
https://playmoneysmart.fdic.gov/games
FDIC Federal Deposit Insurance Corporation "The ones I spotted were https://banks.data.fdic.gov/bankfind-suite/bankfind https://orders.fdic.gov/s/?source=govdelivery https://catalog.fdic.gov/catalog/s/ https://playmoneysmart.fdic.gov/games but I suspect even more will need special handling"
https://research-hub.nrel.gov/ National Renewable Energy Laboratory Buttflare with TLS sniffing
https://wisqars.cdc.gov/ CDC Web-based Injury Statistics Query and Reporting System "I don't think they provide raw data for privacy/legal reasons, but I can probably save the data for the default charts at least"
https://liheappm.acf.hhs.gov/datawarehouse HHS Low Income Home Energy Assistance Program "I've done the grantee profiles from https://liheappm.acf.hhs.gov/datawarehouse. There's probably additional data that could be exported through the reports but the site is fairly complicated"
https://transfer.archivete.am/inline/qtDPx/apps.bea.gov_seed_urls.txt has some sections using JS
https://www.osti.gov https://www.osti.gov + https://www.osti.gov/opennet/ - lots of sitemaps in https://www.osti.gov/robots.txt. https://www.osti.gov/sitemap_ostigov/xml uses https://www.osti.gov/sitemap_ostigov_1.txt which I don't think archivebot will parse this looks like something that would be worth doing via DPoS
https://www.eia.gov/ Energy Information Administration Seems to have some scripty things
https://chemview.epa.gov/chemview/ EPA Chemview looks scripty and complicated, but they do have a tutorial
https://campd.epa.gov/data/bulk-data-files
https://watersgeo.epa.gov/cwa/CWA-JDs/
EPA Clean Air Markets Program Data looks scripty
https://ejscreen.epa.gov/ is arcgis
https://www.facadatabase.gov/FACA/s/FACADatasets is salesforce
https://liheappm.acf.hhs.gov/datawarehouse looks scripty but probably won't be too hard
https://ecos.fws.gov/ecdms4/ is arcgis and there's probably more on that domain
https://www.lcacommons.gov/lca-collaboration/ is scripty and big looking
https://usgovernmentmanual.gov/ looks like a really helpful resource for finding stuff
https://adams-search.nrc.gov/ is big - searching "nuclear" gives 3252079 results e.g. https://www.nrc.gov/docs/ML0726/ML072630079.pdf coverage seems to be decent but from lots of random projects. Of the few I sampled only https://www.nrc.gov/docs/ML2008/ML20083B799.pdf wasn't saved (added 2024-05-15 but created 1991-09-20)
https://liheappm.acf.hhs.gov/api/search/years etc requires a token from POST on https://liheappm.acf.hhs.gov/token.php. Will try to figure out if it's just those 3 or if there's more to it.
https://ffrms.climate.gov/
https://floodstandard.climate.gov/
is arcgis
https://data.fs.usda.gov/geodata/edw/datasets.php arcgis/geodb zip archives
https://remdb.nrel.gov/ is scripty; I'm doing a basic pass over it but it won't get everything (they do have a data download though)
https://maps.nrel.gov/ https://climate.nrel.gov/ uses arcgis (?)
https://www.fs.usda.gov/nrs/atlas/bird/ Climate Change Bird Atlas Looks script-y

Other content that may be at risk based on subject matter

This list was based on a fast manual scroll through the list of .gov domains. It contains some domains that are already dead (and likely have been for a long time) and might contain duplicates with other lists on this page.

https://www.headstart.gov/
https://www.section508.gov/
https://www.ada.gov/
https://agingstats.gov/
https://blackhistorymonth.gov/
https://www.hiv.gov/
https://www.benefeds.gov/
https://aviationweather.gov/
https://birthcontrol.gov/
https://www.childwelfare.gov/
https://childcare.gov/
https://www.childstats.gov/
https://www.coldcaserecords.gov/
https://www.conservation.gov/
https://coralreef.gov/
https://www.employeeexpress.gov/
https://www.evergladesrestoration.gov/
https://familyplanning.gov
https://findtreatment.gov/
https://www.fatherhood.gov/
https://www.samhsa.gov/
https://foreignassistance.gov/
https://girlshealth.gov/
https://forestsandrangelands.gov/
https://greengov.gov/
https://hispanicheritagemonth.gov/
https://www.jewishheritagemonth.gov/
https://www.macpac.gov/
https://migrantworker.gov (-> https://www.dol.gov/general/migrantworker )
https://mitigationcommission.gov/
https://www.ncd.gov/
https://www.nbrc.gov
https://nativeamericanheritagemonth.gov/
https://reproductivehealthservices.gov/
https://www.sustainability.gov/
https://www.usaid.gov/
https://www.vaccines.gov/en/
https://womenshealth.gov/
https://www.workwithusaid.gov/
https://womenshistorymonth.gov/

See Also

References