data-analysis
synthea
data-analysis | synthea | |
---|---|---|
6 | 8 | |
44 | 2,011 | |
- | 1.9% | |
7.3 | 8.2 | |
10 months ago | 4 days ago | |
Jupyter Notebook | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
data-analysis
- Why a public database of hospital prices doesn't exist yet
-
Open Database of Hospital Prices
https://github.com/dolthub/data-analysis/tree/main/transpare...
-
Show HN: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19
Absolutely interested, on my end at least. I wrote this to manage the transparency in coverage files: https://github.com/dolthub/data-analysis/tree/main/transpare... but I'm always looking for better techniques.
Oh wow, I see you used it on those exact files. How about that.
- Healthcare datasets with multiple continuous variables
-
Beyond the trillion prices: pricing C-sections in America
Details: data repository, code repository, and notebook. The linked GitHub repo gives you the tools you need to reproduce this analysis or create your own.
- I wrote some tools to find the prices of C-sections in America. Context in README
synthea
- Survey on Synthea Use to Shape the Future of Open Source Medical Records
- Synthea: Open-Source Synthetic Patient Generation
-
Simulated Hospital
As someone working in this arena, I offer an alternative perspective for your consideration: healthcare was an early adopter of information technology and as a result many of its most core technologies come from a nearly unrecognizable time in computing. These systems are “outdated” as a result of success.
The current prevalence of these venerable technologies may be in part due to regulation, but more often has to do with their success.
HL7v2 is just token delimited ascii. Not unlike the similarly primitive but ubiquitous csv. The fields within it are defined by standards documents and once you use it a little, you can read enough to get the gist of most messages. As you might guess, modules in your language of choice are used to parse and compose HL7v2 so its detail isn’t that important.
Something I’d like to point out about Google Hospital is that under the hood it uses MITRE’s Synthea to generate synthetic patient data.
https://www.healthcareittoday.com/2017/09/13/open-source-too...
https://synthetichealth.github.io/synthea/
- Looking for Mock Hospital Dataset. Financial, Human Resource, Departments, In/Out Patients Data.
-
Will pay for realistic large dataset of HL7 messages
Have you tried Synthea? https://github.com/synthetichealth/synthea
- Healthcare datasets with multiple continuous variables
- I'm being threatened to be sued by my college for copyright infringement
What are some alternatives?
json_benchmark - Python JSON benchmarking and "correctness".
simhospital
simdjson-go - Golang port of simdjson: parsing gigabytes of JSON per second
fhir - Official source for the HL7 FHIR Specification
jsplit - A Go program to split large JSON files into many jsonl files
FHIR-Converter - Conversion utility to translate legacy data formats into FHIR
japronto - Screaming-fast Python 3.5+ HTTP toolkit integrated with pipelining HTTP server based on uvloop and picohttpparser.
clojure-hl7-messaging-2-parser - HL7 v2.x Messaging Parser
msgspec - A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
JSL - The JSL is an open-source discrete event simulation library written in Java
typedload - Python library to load dynamically typed data into statically typed data structures
log-synth - Generates more or less realistic log data for testing simple aggregation queries.