synthea vs log-synth

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

synthea		log-synth
	Project
8	Mentions	1
2,002	Stars	253
1.4%	Growth	-
8.2	Activity	2.5
6 days ago	Latest Commit	5 months ago
Java	Language	Java
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

synthea

Posts with mentions or reviews of synthea. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-19.

Survey on Synthea Use to Shape the Future of Open Source Medical Records
1 project | news.ycombinator.com | 21 Jun 2023
Synthea: Open-Source Synthetic Patient Generation
1 project | /r/hypeurls | 19 May 2023

3 projects | news.ycombinator.com | 19 May 2023
Simulated Hospital
4 projects | news.ycombinator.com | 17 May 2023

As someone working in this arena, I offer an alternative perspective for your consideration: healthcare was an early adopter of information technology and as a result many of its most core technologies come from a nearly unrecognizable time in computing. These systems are “outdated” as a result of success.
The current prevalence of these venerable technologies may be in part due to regulation, but more often has to do with their success.
HL7v2 is just token delimited ascii. Not unlike the similarly primitive but ubiquitous csv. The fields within it are defined by standards documents and once you use it a little, you can read enough to get the gist of most messages. As you might guess, modules in your language of choice are used to parse and compose HL7v2 so its detail isn’t that important.
Something I’d like to point out about Google Hospital is that under the hood it uses MITRE’s Synthea to generate synthetic patient data.
https://www.healthcareittoday.com/2017/09/13/open-source-too...
https://synthetichealth.github.io/synthea/
Looking for Mock Hospital Dataset. Financial, Human Resource, Departments, In/Out Patients Data.
1 project | /r/datasets | 14 Nov 2022
Will pay for realistic large dataset of HL7 messages
1 project | /r/HL7 | 24 Oct 2022

Have you tried Synthea? https://github.com/synthetichealth/synthea
Healthcare datasets with multiple continuous variables
2 projects | /r/datasets | 17 Oct 2022
I'm being threatened to be sued by my college for copyright infringement
1 project | /r/legaladvice | 30 Sep 2022

log-synth

Posts with mentions or reviews of log-synth. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-19.

Synthea: Open-Source Synthetic Patient Generation
3 projects | news.ycombinator.com | 19 May 2023

Building synthetic BOMs can be fairly straightforward if you can define the level of coherency you want to see. The only big trick in building structured data like this that I have built is to first build dictionaries of randomized data with very little coherence and then build larger structures that include elements of the dictionaries.
As an example, you might want to have a model of users interacting with a web site, ordering products and shipping them to their homes. This can start with building a dictionary of user records and orderable item descriptions. The user records would have an address and some "interest" variables that define what the user is likely to order. The item descriptions can have lots of a little information but would centrally contain a part number and some information that allows the part to be selected efficiently (a numerical vector may be enough). If you want to be crazy, you can use generative models to generate descriptions from random semantic starting points or use lower level tables to piece together these things.
At this point, you can pretty easily build a user model and run it for each user to generate coherent transactional histories.
Several of these ideas are present in a project I worked on called log-synth [1]. For instance, the VIN generator has tables of factories and such for BMW and Ford so it generates kind of coherent VINs that can be traced back with factory location, engine and body type. If you look hard these are nonsense, but if you squint the right way they look fine.
The commuter generator or the DNS query generator are examples of a higher-level transaction generators. For the commuter, there is a model of a user with a home location and a work location. These commuters go to work some days and run errands other day and there is a simple model to pick an activity. Digging in, each activity breaks down into journeys along entirely incoherent road structures but details like a physical model of the engine and car velocity is maintained so you can get realistic diagnostics from the vehicles from somewhat realistic life histories. The DNS query generator is similar but with less physics.
One nice statistical concept in all of this is the concept of a statistical distribution over a notionally infinite set. Some things in the set will be much more commonly seen than others and thus we are likely to see those sooner. The generator of these things can maintain an estimate of the frequency of all previously seen things and a probability of seeing something new (see the Chinese Restaurant process [2]). You only need to generate the specifics of a thing in this infinite when you first see it which gives you pretty realistic texture to the fictional transactional world.
Relative to your problem of multi-level BOMs, you could say that a BOM is a list of items. Pick the desired length from a suitable distribution. Then pick each item from a Chinese Restaurant process. As you generate new items, decide if the item is composite and if so, generate a BOM for it recursively. Constraints like forcing a composite item to not recursively contain anything of the same type can be enforced using a rejection method (sometimes).
If this seems at all interesting, ping me by filing an issue on the log-synth github repository.
[1] https://github.com/tdunning/log-synth

What are some alternatives?

When comparing synthea and log-synth you can also consider the following projects:

simhospital

ETL-Synthea - A package supporting the conversion from Synthea CSV to OMOP CDM

fhir - Official source for the HL7 FHIR Specification

FHIR-Converter - Conversion utility to translate legacy data formats into FHIR

clojure-hl7-messaging-2-parser - HL7 v2.x Messaging Parser

data-analysis

JSL - The JSL is an open-source discrete event simulation library written in Java

mars-sim - Mars Simulation Project Official Codebase

synthea vs simhospital log-synth vs ETL-Synthea synthea vs fhir synthea vs FHIR-Converter synthea vs clojure-hl7-messaging-2-parser synthea vs data-analysis synthea vs JSL synthea vs mars-sim synthea vs ETL-Synthea

Compare synthea vs log-synth and see what are their differences.

synthea

log-synth

synthea

log-synth

What are some alternatives?