The twitter datasets you’ll be creating with Twarc contain characters, like emojis, that require your computer’s console to use Unicode UTF-8 character encoding. This is not default for Windows computers.
To change these settings (Windows 10 required):
From your taskbar, search for the program Run. Click to open:
Type intl.cpl into the prompt and click OK:
This opens your Region settings (which can also be accessed via control panel, we just used a shortcut). Select the tab for Administrative:
Under ‘Language for non-Unicode Programs’, click the button for ‘Change system locale…’
Check the box for ‘Beta: Use Unicode UTF-8 for worldwide language support”
Click OK.
An an alert box will appear telling you to restart your computer. Click ‘Restart now’ to restart your computer with these settings saved.
You can uncheck these settings later if you’d prefer. These settings are only required when using selected Utilities, like emojis, hashtags, users, and others.
If you do not change this region setting, you will see an error message like the below when you try to run these Twarc utilities: