US Government - Archiveteam
- ️Wed May 15 2024
US Government | |
![]() | |
Status | Online! but Endangered |
Archiving status | In progress... |
Archiving type | DPoS |
Project source | usgovernment-grab, usgovernment-items |
Project tracker | usgovernment |
IRC channel | #UncleSamsArchive (on hackint) |
Data[how to use] | archiveteam_usgovernment |
Contents
Discovery
An official list of all registered .gov domains and federal .gov domains is available. The raw CSV files and the .gov zone file are also available on GitHub
Content at risk
Site | Name | Reason | Archival Notes | Status | |
---|---|---|---|---|---|
https://data.gov | data.gov | There have been reports of datasets disappearing from the website[1] though this behavior might be normal due to the way that the site collects datasets from other locations. | https://catalog.data.gov https://inventory.data.gov https://resources.data.gov https://strategy.data.gov https://sdg.data.gov GitHub |
job:4hb15f3ijn846c1dw0w58k4fe job:4qlh2ol2vq2i525747l0yq6a4 job:25o494lfnnlxtobegl9grx7tt job:e1ioqt5kilh8l4irihid8sqoq job:79u49omgtqkj83cnpyuhx0xr8 job:akwvpyvnzeuhrvgh51tokrmsv | |
https://cdc.gov | Centers for Disease Control | Directed to pause communication[2] along with other health agencies. | https://data.cdc.gov/ https://ftp.cdc.gov/ GitHub |
https://cdc.gov -> job:hd3tvx4w14ybj2al0peewcv https://ftp.cdc.gov/ -> job:8zn8f6a2620t1tnje3f1cyr2o https://data.cdc.gov/ -> job:1u2ougx4kn6ueaiqddwjfeib7 | |
https://www.ncei.noaa.gov/ | National Centers for Environmental Information | (Some?) data is linked to from data.gov. It appears to be possible to enumerate datasets with 7-digit integer IDs starting at 0000001 , e.g. https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:0000001 . Legacy URL format that redirects appears to be http://accession.nodc.noaa.gov/0000001
|
|||
https://www.nccs.nasa.gov/services/data-collections | NASA Center for Climate Simulation Data | ||||
https://www.ipcc-data.org | IPCC Data Distribution Centre | Appears to have sequential IDs | |||
https://www.bco-dmo.org/ | Biological and Chemical Oceanography Data Management Office | Appears to have sequential IDs | |||
https://ncela.ed.gov/ | National Clearinghouse for English Language Acquisition | Department of Education has been threatened many, many times | Its "Resource Library" section is mainly a list of links to both internal and external links resources. e.g. could be PDFs or SoundCloud links. |
Domains and properties that require extra scripting
Site | Name | Reason | Archival Notes | Status |
---|---|---|---|---|
https://fdic.gov https://banks.data.fdic.gov/bankfind-suite/bankfind https://orders.fdic.gov/s/?source=govdelivery https://catalog.fdic.gov/catalog/s/ https://playmoneysmart.fdic.gov/games |
FDIC Federal Deposit Insurance Corporation | "The ones I spotted were https://banks.data.fdic.gov/bankfind-suite/bankfind https://orders.fdic.gov/s/?source=govdelivery https://catalog.fdic.gov/catalog/s/ https://playmoneysmart.fdic.gov/games but I suspect even more will need special handling" | ||
https://research-hub.nrel.gov/ | National Renewable Energy Laboratory | Buttflare with TLS sniffing | ||
https://wisqars.cdc.gov/ | CDC Web-based Injury Statistics Query and Reporting System | "I don't think they provide raw data for privacy/legal reasons, but I can probably save the data for the default charts at least" | ||
https://liheappm.acf.hhs.gov/datawarehouse | HHS Low Income Home Energy Assistance Program | "I've done the grantee profiles from https://liheappm.acf.hhs.gov/datawarehouse. There's probably additional data that could be exported through the reports but the site is fairly complicated" | ||
https://transfer.archivete.am/inline/qtDPx/apps.bea.gov_seed_urls.txt | has some sections using JS | |||
https://www.osti.gov | https://www.osti.gov + https://www.osti.gov/opennet/ - lots of sitemaps in https://www.osti.gov/robots.txt. https://www.osti.gov/sitemap_ostigov/xml uses https://www.osti.gov/sitemap_ostigov_1.txt which I don't think archivebot will parse this looks like something that would be worth doing via DPoS | |||
https://www.eia.gov/ | Energy Information Administration | Seems to have some scripty things | ||
https://chemview.epa.gov/chemview/ | EPA Chemview | looks scripty and complicated, but they do have a tutorial | ||
https://campd.epa.gov/data/bulk-data-files https://watersgeo.epa.gov/cwa/CWA-JDs/ |
EPA Clean Air Markets Program Data | looks scripty | ||
https://ejscreen.epa.gov/ | is arcgis | |||
https://www.facadatabase.gov/FACA/s/FACADatasets | is salesforce | |||
https://liheappm.acf.hhs.gov/datawarehouse | looks scripty but probably won't be too hard | |||
https://ecos.fws.gov/ecdms4/ | is arcgis and there's probably more on that domain | |||
https://www.lcacommons.gov/lca-collaboration/ | is scripty and big looking | |||
https://usgovernmentmanual.gov/ | looks like a really helpful resource for finding stuff | |||
https://adams-search.nrc.gov/ | is big - searching "nuclear" gives 3252079 results e.g. https://www.nrc.gov/docs/ML0726/ML072630079.pdf coverage seems to be decent but from lots of random projects. Of the few I sampled only https://www.nrc.gov/docs/ML2008/ML20083B799.pdf wasn't saved (added 2024-05-15 but created 1991-09-20) | |||
https://liheappm.acf.hhs.gov/api/search/years | etc requires a token from POST on https://liheappm.acf.hhs.gov/token.php. Will try to figure out if it's just those 3 or if there's more to it. | |||
https://ffrms.climate.gov/ https://floodstandard.climate.gov/ |
is arcgis | |||
https://data.fs.usda.gov/geodata/edw/datasets.php | arcgis/geodb zip archives | |||
https://remdb.nrel.gov/ | is scripty; I'm doing a basic pass over it but it won't get everything (they do have a data download though) | |||
https://maps.nrel.gov/ https://climate.nrel.gov/ | uses arcgis (?) | |||
https://www.fs.usda.gov/nrs/atlas/bird/ | Climate Change Bird Atlas | Looks script-y |
Other content that may be at risk based on subject matter
This list was based on a fast manual scroll through the list of .gov domains. It contains some domains that are already dead (and likely have been for a long time) and might contain duplicates with other lists on this page.
https://www.headstart.gov/ https://www.section508.gov/ https://www.ada.gov/ https://agingstats.gov/ https://blackhistorymonth.gov/ https://www.hiv.gov/ https://www.benefeds.gov/ https://aviationweather.gov/ https://birthcontrol.gov/ https://www.childwelfare.gov/ https://childcare.gov/ https://www.childstats.gov/ https://www.coldcaserecords.gov/ https://www.conservation.gov/ https://coralreef.gov/ https://www.employeeexpress.gov/ https://www.evergladesrestoration.gov/ https://familyplanning.gov https://findtreatment.gov/ https://www.fatherhood.gov/ https://www.samhsa.gov/ https://foreignassistance.gov/ https://girlshealth.gov/ https://forestsandrangelands.gov/ https://greengov.gov/ https://hispanicheritagemonth.gov/ https://www.jewishheritagemonth.gov/ https://www.macpac.gov/ https://migrantworker.gov (-> https://www.dol.gov/general/migrantworker ) https://mitigationcommission.gov/ https://www.ncd.gov/ https://www.nbrc.gov https://nativeamericanheritagemonth.gov/ https://reproductivehealthservices.gov/ https://www.sustainability.gov/ https://www.usaid.gov/ https://www.vaccines.gov/en/ https://womenshealth.gov/ https://www.workwithusaid.gov/ https://womenshistorymonth.gov/