Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All Sony codes here #47

Open
dobbelina opened this issue Dec 5, 2020 · 13 comments
Open

All Sony codes here #47

dobbelina opened this issue Dec 5, 2020 · 13 comments

Comments

@dobbelina
Copy link
Contributor

dobbelina commented Dec 5, 2020

I thought i should share an amazing document i found on github that basically got all Sony codes there is.
Well, the document is 4 years old, so up until that time anyway.

Here: All Sony Codes V1.0.xls 7,65 MB

Either use the search macro or click the "Cover Page" tab where all devices are listed according to device number.

All cred to RogueProeliator, as it comes from his/her public repo.

@probonopd
Copy link
Owner

probonopd commented Dec 6, 2020

Thank you very much @dobbelina.

Had a quick look at the document. Should be possible to convert automatically...

  • Device category (directory name): Field D2
  • protocol: Sony12 if field C2 says "12 bits", Sony15 if field C2 says "15 bits"
  • device: Field H2 up to the dot, if any
  • subdevice: Field H2 after the dot, if any; otherwise -1
  • function: Column C
  • functionname: Column E (needs to be normalized)

If there is enough interest, one could probably write a Python script that would read the .xls and write .csv files for inclusion in this repository.

@probonopd
Copy link
Owner

This does the trick:

# -*- coding: utf-8 -*-
"""SONY.ipynb

Original file is located at
    https://colab.research.google.com/drive/1QugHmC4nJIwVCZVbdAQo5eBtliGMoBIh

Convert all SONY codes from Excel to csv suitable for irdb
"""

import pandas as pd

url = 'https://github.com/RogueProeliator/IndigoPlugins-Sony-Network-Remote/raw/bac07e464b559a53ffe11477047894533a24aa37/Documentation/All%20Sony%20Codes%20V1.0.xls'

xl = pd.ExcelFile(url)
sheet_names = xl.sheet_names[7:]

import shutil
try:
  shutil.rmtree('/tmp/')
except:
  pass

for sheet_name in sheet_names:
  df = xl.parse(sheet_name,usecols=[2,4])
  df.columns = ['function', 'functionname']
  protocol = "Sony" + str(df.iloc[0,0])
  category = df.iloc[0,1].strip().replace(" ", "_").replace("_/_", "_").replace("(", "").replace(")", "").replace("/", "_")
  df['protocol'] = protocol
  device = sheet_name.split(".")[0].lstrip("0")
  df['device'] = device
  subdevice = "-1"
  if len(sheet_name.split(".")) > 1:
    subdevice = sheet_name.split(".")[1].lstrip("0")
  df['subdevice'] = subdevice

  df = df[4:]
  df.dropna(inplace=True, subset=['functionname'])

  df = df[['functionname',	'protocol',	'device',	'subdevice',	'function']]

  df["functionname"] = df.applymap(lambda s:s.upper() if type(s) == str else s)

  os.makedirs("/tmp/codes/" + category,  exist_ok=True)
  with open("/tmp/codes/" + category + "/" + device + "," + subdevice + ".csv", "w") as text_file:
      text_file.write(df.to_csv(index=False))

shutil.make_archive("/tmp/codes", 'zip', "/tmp/codes")

from google.colab import files
files.download("/tmp/codes.zip")

Output is attached.

SONY.zip

@probonopd
Copy link
Owner

@dobbelina, @bengtmartensson please review the generated CSVs, and compare with what we had so far. These codesets seem to be more complete than what we had but I think some software may trip over the non-generic functionnames.

@dobbelina
Copy link
Contributor Author

dobbelina commented Dec 6, 2020

Wow, that was quick!

I thought of the approach of writing a VBA macro in excel, but your solution as a finished one is offcourse better 😄
Interesting 26,42 is not a TV, it's a projector, https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/26%2C42.csv
But off course newer devices after this document was made might "reuse" the same codeset.
An example of that can already be seen here 26,1 where 2 protocols are used: https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/26%2C-1.csv

There are some buttons missing in the document, example 151,1 https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/151%2C-1.csv

But overall, if merged with IRDB would make it more complete, and with less errors.

I had a look at the generated csv's, looks super!

@bengtmartensson
Copy link
Collaborator

@bengtmartensson please review

will take a few days for personal reasons.

@probonopd
Copy link
Owner

No hurries @bengtmartensson. Thank you very much.

@bengtmartensson
Copy link
Collaborator

Wow, that is an impressive work. (Some other similar resources I list here).

An example of that can already be seen here 26,1 where 2 protocols are used

That is a fundamental problem with the design of IRDB; it cannot model a device using more than one protocol. Projector VPL HW50 uses two for example.

I have found two issues:

  • While the present content in general uses device types with a space ("Laser Disk"), the Python script replaces the spaces by underscores ("Laser Disk"). For uniformity, I suggest keeping the spaces form.

  • The Excel list uses en-dash (– Unicode 0x2013) to denote "minus" (like "Speed –"). En-dash is used to denote a range of number (like "pages 123 – 133"), so it is clearly typographically a mistake. Minus in Unicode is "− numerically 0x2212. Suggestion: in the python script, replace by Unicode minus (if Unicode is considered allowed), or hyphen "-") is Unicode is not desired.

@probonopd
Copy link
Owner

Thanks for your careful review @bengtmartensson.

@probonopd
Copy link
Owner

@bengtmartensson re. Pioneer

Manufacturer supplied information, mostly not machine parseable

Actually

# Unzip
import zipfile
from urllib.parse import unquote

filename = unquote(os.path.basename(url))
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall(".")

mhtname = filename.replace(".zip", ".mht")

# Extract mht
# using https://tewarid.github.io/2019/06/04/extract-all-tabular-data-from-multipart-mime-documents.html
# and interpret its contents
!pip install html-table-extractor
import pprint
pp = pprint.PrettyPrinter(indent=2)
import email
from bs4 import BeautifulSoup
from html_table_extractor.extractor import Extractor
with open(mhtname) as fp:
    message = email.message_from_file(fp)
    for part in message.walk():
        if (part.get_content_type() == "text/html"):
            soup = BeautifulSoup(part.get_payload(decode=True))
            for table in soup.body.find_all("table", recursive=False):
                extractor = Extractor(table)
                extractor.parse()
                results = extractor.return_list()
                # print(results)
                for result in results:
                  if(result[0].strip()!=""):
                    category = result[0].strip().replace("\n", "")
                    # pp.pprint(result)
                  # print(len(result))
                  if(len(result)>2):
                    if(result[7].strip() != ""):
                      categoryandfunction = result[1].replace("(", " ").replace(")", " ")
                      categoryandfunction = ' '.join((category + " " + categoryandfunction.strip().replace("\n", "")).split())
                      categoryandfunction = categoryandfunction.upper() if type(categoryandfunction) == str else categoryandfunction
                      print(categoryandfunction + ";" + ' '.join(result[7].strip().replace("\n", "").split()))

results in

CATEGORY FUNCTION;PRONTO HEX
INPUT DVD;0000 0068 0000 0022 0168 00b4 0016 0043 0016 0016 0016 0043 0016 0016 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0016 0016 0016 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0043 0016 0043 0016 0043 0016 0016 0016 06d9
(...)

which we can further process e.g., with https://github.com/probonopd/decodeir to get

protocol=Pioneer device=165 subdevice=-1 obc=133 hex0=94 hex1=-1 hex2=-1 hex3=-1 misc= error=

which we could then bring into a format suitable for irdb...

@probonopd
Copy link
Owner

probonopd commented Dec 10, 2020

Regarding #47 (comment)

  (...)
  protocol = "Sony" + str(df.iloc[0,0])
  category = df.iloc[0,1].strip().replace(" / ", " ").replace("(", "").replace(")", "").replace("/", " ").replace("–", "-")
  # print(category)
  (...)

gives

codes.zip

Does this fix the isues correctly @bengtmartensson?

@bengtmartensson
Copy link
Collaborator

Does this fix the isues correctly @bengtmartensson?

It fixes the first issue, but not the second. It is the command names, not the categories, that are ndashy.

@bengtmartensson
Copy link
Collaborator

@bengtmartensson re. Pioneer

Manufacturer supplied information, mostly not machine parseable

Well, (almost) everything is parseable if you write a separate parser for every file... ;-).

which we can further process e.g., with https://github.com/probonopd/decodeir to get

Please, do not use decodeir any longer. It has truckloads of problems and is since 10 years a dead end. Use IrpTransmogrifier instead, either from command line or API.

asedeno added a commit to asedeno/IrScrutinizer that referenced this issue May 13, 2021
irdb.tk is down, and I don't know if/when it's coming back.
This reworks the irdb importer to use the raw csv data from
https://github.com/probonopd/irdb via the jsdelivr.net CDN.

The CSV parsing is currently sufficient, but this does make a few
assumptions:
- The only field that might be quoted is the first, functioname

- In a given file, it is assumed that the protocol, device, and
  subdevice are constant.
  - protocol was already noted here:
    probonopd/irdb#47 (comment)
  - device and subdevice are also encoded in the filename, so it would
    be surprising if they were not constant
@bengtmartensson
Copy link
Collaborator

I have created a Girr version of said file, found here.

In IrScrutinizer it looks like this:

Screenshot from 2021-08-01 20-17-25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants