How do you guys handle pandas and its sh*tty data type inference

This page summarizes the projects mentioned and recommended in the original post on /r/Python

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • csv_log_cleaner

    Clean CSV files to conform to a type schema by streaming them through small memory buffers using multiple threads and logging data loss.

  • Sounds like it could be more of a data cleansing problem you're facing than a data inference one. Even a single non-numerical value in a million rows of numbers will necessarily mess up type inference for the whole column. I work with a lot of CSVs and that's one of the issues we have to spend a huge amount of time dealing with. I even ended up writing this open source tool to handle the cleansing: https://github.com/ambidextrous/csv_log_cleaner

  • dtype_diet

    Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • CSVLint

    CSV Lint plug-in for Notepad++ for syntax highlighting, csv validation, automatic column and datatype detecting, fixed width datasets, change datetime format, decimal separator, sort data, count unique values, convert to xml, json, sql etc. A plugin for data cleaning and working with messy data files.

  • There's also the CSV Lint plug-in for Notepad++ which can detect datatypes, and then you can do CSV Lint > Generate metadata > Python script. Although idk it might not work correctly for all datetime datatypes.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A question for the pro's, am I misusing SQL?

    1 project | /r/SQL | 11 Dec 2023
  • Looking for a CSV editor that doesn't modify the data like Excel does

    1 project | /r/software | 7 Dec 2023
  • Best Way to Import a CSV From Into PostrgeSQL

    1 project | /r/SQL | 28 Jun 2023
  • How to Import Data (XLSX, CSV, etc) into pgadmin

    1 project | /r/PostgreSQL | 22 Jun 2023
  • Problem importing CSV file

    2 projects | /r/tableau | 11 Jun 2023