Detect file encoding and convert it to UTF-8 without BOM #detecting file encoding
Edit
by Ashok - 6 years ago (2019-01-23)
I am unable to detect file encoding, that needs to be converted
| I am working on a project where I need to upload csv to databse, but here the problem is the file is encoded in the unknown format, so I am unable to upload the file to the database.
I tried many ways like mb_detect_encoding and tried to convert into UTF-8. but not succeed.
Can anyone try to help me to solve this problem? |
- 2 Clarification requests
2.
by Ray Paseur - 6 years ago (2019-01-31) Reply
This class (#11059) has not been approved yet, but it should help you.
https://www.phpclasses.org/package/11059-PHP-Validate-strings-in-UTF-8-encoding.html
1.
by Ray Paseur - 6 years ago (2019-01-28) Reply
Hi, Ashok. I can help with this. Please give me a test file, or better yet a link to a set of test files. UTF-8 is self-evident, and BOM is easy to remove. I wrote an article about this a few years ago.
My email is Ray.Paseur [at] Gmail if you want to send me a link to the test data. If you send me an email, I will send you links to my articles and presentation deck. Best regards, Ray
Ask clarification
4 Recommendations
PHP Common Class Library: Set of classes that provides common functionality
The common classes package provides some of the classes used by CIDRAM, phpMussel, etc., as a separate packages.
The common classes package currently contains the following classes:
- Cache: A simple, unified cache handler. Currently, it supports APCu, Memcached, Redis, PDO, and flat file caching.
- CommonAbstract: Common abstract for the package.
- ComplexStringHandler: Provides an easy way to iterate over the parts of a given string, identified by a given pattern, in order to execute a given closure to those parts of the given string, or to the glue that separates those parts.
- Context: Context attribute class (used internally by a few classes in the package).
- DelayedIO: Provides an easy, simple solution for when needing to read and update a number of files, but delay rewriting the files for a while.
- Demojibakefier: Intended to normalize the character encoding of a given string to a preferred character encoding when the given string's byte sequences don't match the expectations of the preferred character encoding. Useful in cases where a block of data might conceivably be composed of several different unspecified, unknown encodings.
- Events: Allows the orchestration of "events" throughout a codebase by providing some simple methods to assign handlers to a particular event and to subsequently invoke those handlers at a later point in the codebase where the "event" is to occur.
- IPHeader: Attempts to resolve an originating IP address from a preferred source, or REMOTE_ADDR if the preferred source isn't available.
- L10N: Used by projects to handle L10N data, the L10N handler reads in an array of L10N strings and provides some safe and simple methods for manipulating and returning those strings when needed, and for handling cardinal plurals, where integers and fractions are concerned alike, based upon the pluralization rules specified by the L10N from a range of various pluralization rules available, to be able to suit the needs of most known languages.
- Matrix: Facilitates the generation of multidimensional arrays to an arbitrarily specified depth and number of elements, and facilitates iteration through those multidimensional arrays in any direction (whether up and down a particular array, across different depths, etc.) via arbitrary callable functions and closures.
- NumberFormatter: Can format numbers into various different kinds of numeral systems, or remove the format from an already formatted number.
The class provides a more controllable, customizable mechanism for number formatting than PHP's internal number_format() function.
Operation: Used by projects for various operations related to dependency management (an integral part of the internal updates system for some packages).
Request: Used by projects to send outbound requests through cURL.
YAML: Used by projects to handle YAML data (e.g., reading data from a YAML file, reconstructing PHP variables back into YAML data, etc.)
| by Caleb package author 30 - 6 years ago (2019-06-24) Comment
Check out my "Demojibakefier" class. Could be useful for what you need. :-) |
PHP Convert CSV to UTF-8: Convert a CSV file to have data in UTF-8 encoding
This class can convert a CSV file to have data in UTF-8 encoding.
It takes the name of a file with data in CSV format, detects the encoding of the text data that it contains and converts it to UTF-8 in case the data is not already in this encoding.
The resulting data can be stored in the same file or another file with a given name.
| by peyman package author 65 - 6 years ago (2019-02-03) Comment
hi. I write this class especially for your problem, please check this out and see if it will help your problem |
Class that outputs a table with the data from the result rows of a database query. It features:
- Database independency (works with any DBMS supported by Metabase).
- Splits the display of the result rows in multiple pages of configurable number of rows displaying automatic links to Next, Previous, First, Last, etc.. pages.
- Arbitrary column display display.
- Automatic alignmnent of columns according to their data types.-
- Configurable colors for the table headers and data alternating between even and odd rows.
| by Agro Biz 60 - 6 years ago (2019-02-02) Comment
010 01, Strážov ,Žilina |
PHP UTF-8 Validation: Validate and repair strings in UTF-8 encoding
This class can validate and repair strings in UTF-8 encoding.
It takes a text string and checks if the characters are valid in UTF-8.
The class can also repair an invalid string by removing some invalid UTF-8 characters sequences and Byte-Order Marks.
The class can return an object instance of itself with the string, byte length, character count, and the position of any encoding errors.
| by Ray Paseur package author 120 - 6 years ago (2019-01-31) Comment
See if this helps. If you use this class and have any difficulty with it, please reach out to me. Best of luck, Ray |
- 1 Comment
1.
by Johnny Mast - 6 years ago (2019-04-01) Reply
based on this request im working on a package that can scan xx content (emails / files / websites) and detect nom bom content. Content that could display unsupported chars on any content. Developers will be able to add scanning mechanisms and let the engine detect non bom chars in email / csv / files / db / etc etc .. and those adapters could fix the content. It's meganism will be a bit like flysystem for the filesystem.